Skip to content

feat: add roaring-based position bitmap#595

Open
wgtmac wants to merge 3 commits intoapache:mainfrom
wgtmac:bitmap
Open

feat: add roaring-based position bitmap#595
wgtmac wants to merge 3 commits intoapache:mainfrom
wgtmac:bitmap

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Mar 19, 2026

No description provided.

@wgtmac wgtmac force-pushed the bitmap branch 2 times, most recently from 11497ae to 72b274a Compare March 20, 2026 15:20
@wgtmac wgtmac marked this pull request as ready for review March 20, 2026 15:20
[[nodiscard]] Status AddRange(int64_t pos_start, int64_t pos_end);

/// \brief Checks if a position is set in the bitmap.
/// \return Result<bool> or InvalidArgument error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid argument is for an out of range value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't added OutOfRange error kind and so does Arrow C++. This is consistent with the Java impl below:

  private static void validatePosition(long pos) {
    Preconditions.checkArgument(
        pos >= 0 && pos <= MAX_POSITION,
        "Bitmap supports positions that are >= 0 and <= %s: %s",
        MAX_POSITION,
        pos);
  }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking more about what types of errors could be thrown, InvalidArgument seems fine (or even just having this be infallible and require requestors to provide a value > 1 and false for anything more then the current bitmap).

}
bitmap.ForEach([&](int64_t pos) {
ASSERT_TRUE(positions.count(pos) > 0) << "Unexpected position: " << pos;
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also assert empty vs is_empty?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you intend to add a check here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this because ASSERT_EQ(bitmap.Cardinality(), positions.size()); above has already checked it.

src/iceberg/deletes/roaring_position_bitmap.cc:196:25 [modernize-use-integer-sign-comparison]
Check warning: src/iceberg/deletes/roaring_position_bitmap.cc:196:25 [modernize-use-integer-sign-comparison]
comparison between 'signed' and 'unsigned' integers
@wgtmac
Copy link
Member Author

wgtmac commented Mar 24, 2026

Thanks @emkornfield for your detailed review! I think I've addressed all your comments. Please let me know what you think.

/// significant 4 bytes. For each key, a 32-bit Roaring bitmap is
/// maintained to store positions for that key.
///
/// \note The Puffin deletion-vector-v1 wrapping (length prefix, magic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// \note The Puffin deletion-vector-v1 wrapping (length prefix, magic
/// \note This class is used to represent deletion vectors. The Puffin puffin reader/write handle adding the additional required framing (length prefix, magic, magic bytes, CRC-32) for `deletion-vector-v1` persistence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants