Conversation
9b822d0 to
4c4ffff
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Note: This review was generated by Gemini.
src/iceberg/table_scan.cc
Outdated
| for (const auto& snapshot : append_snapshots) { | ||
| SnapshotCache snapshot_cache(snapshot.get()); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto manifests, snapshot_cache.DataManifests(io_)); | ||
| std::ranges::copy_if(manifests, std::back_inserter(data_manifests), |
There was a problem hiding this comment.
In Java's BaseIncrementalAppendScan.java, manifests from append snapshots are collected into a Set<ManifestFile> to prevent duplicate processing if multiple snapshots reference the same manifest. The C++ implementation pushes directly into std::vector<ManifestFile> data_manifests, which will cause duplicate processing if a manifest is retained across multiple append snapshots since ManifestGroup::Make does not deduplicate them internally.
Suggestion: Deduplicate data_manifests (e.g., by tracking seen manifest_paths using a std::unordered_set<std::string>) before passing them to ManifestGroup::Make.
src/iceberg/table_scan.h
Outdated
|
|
||
| /// \brief Returns true if this scan is a current lineage scan, which means it does not | ||
| /// specify from/to snapshot IDs. | ||
| bool IsScanCurrentLineage() const; |
There was a problem hiding this comment.
Can we just move these three new functions to the anonymous namespace in the table_scan.cc?
There was a problem hiding this comment.
I made these functions members because I don't want to keep writing context.xxx inside the method bodies
src/iceberg/table_scan.h
Outdated
| Result<std::vector<std::shared_ptr<ScanTaskType>>> | ||
| IncrementalScan<ScanTaskType>::PlanFiles() const { | ||
| if (context_.IsScanCurrentLineage()) { | ||
| ICEBERG_ASSIGN_OR_RAISE(auto current_snapshot, metadata_->Snapshot()); |
There was a problem hiding this comment.
Do we need to consider if branch has been specified?
There was a problem hiding this comment.
If current_snapshot is nullptr, it indicates that the current table is empty, so we can simply return an empty task list. No need care about the branch.
d1bacdf to
5c41a78
Compare
5c41a78 to
3f97b02
Compare
…nstantiate in header
3f97b02 to
8e2ad99
Compare
No description provided.