[Append Scan] Introduce IncrementalAppendScan class (without integration tests)#2234
[Append Scan] Introduce IncrementalAppendScan class (without integration tests)#2234smaheshwar-pltr wants to merge 4 commits intoapache:mainfrom
IncrementalAppendScan class (without integration tests)#2234Conversation
|
|
||
| append_snapshot_ids: Set[int] = {snapshot.snapshot_id for snapshot in append_snapshots} | ||
|
|
||
| manifests = { |
There was a problem hiding this comment.
| limit=limit, | ||
| ) | ||
|
|
||
| def incremental_append_scan( |
There was a problem hiding this comment.
| Optional ID of the "from" snapshot, to start the incremental scan from, exclusively. This can be set | ||
| on the IncrementalAppendScan object returned, but ultimately must not be None. |
There was a problem hiding this comment.
| return current_schema.select(*self.selected_fields, case_sensitive=self.case_sensitive) | ||
|
|
||
| def plan_files(self) -> Iterable[FileScanTask]: | ||
| from_snapshot_id, to_snapshot_id = self._validate_and_resolve_snapshots() |
There was a problem hiding this comment.
| ).plan_files( | ||
| manifests=list(manifests), | ||
| manifest_entry_filter=lambda manifest_entry: manifest_entry.snapshot_id in append_snapshot_ids | ||
| and manifest_entry.status == ManifestEntryStatus.ADDED, |
There was a problem hiding this comment.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Note: Contains changes from
AbstractTableScanwith default methods #2230__eq__and__hash__methods toManifestFile#2233Smaller diff from those changes: smaheshwar-pltr#5.
Rationale for this change
Split up from incremental append scan work - see #2031 (comment). PyIceberg doesn't support incremental reading of appended data between snapshots, like Spark does.
This PR adds equality adds the
IncrementalAppendScanclass and the API for constructing it onpyiceberg.Table.Are these changes tested?
Integration tests are separated into a different PR - #2235, to keep this one small.
Are there any user-facing changes?
Ignoring the other PRs, there's a new scan class and method on
Table.