Bigtable: Implement query sharding by generalizing ReadRows resume request builder.#3103
Merged
garrettjonesgoogle merged 10 commits intogoogleapis:masterfrom Aug 14, 2018
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This extends the work done in #2986 to allow map reduce style frameworks like beam to split queries into shards and execute them in parallel. The mechanism for chopping off part of query in a resume request is very similar to splitting a query into multiple shards. The main difference is how many splits are used.
The common functionality is extracted to an internal
RowSetUtilclass that does all of the heavy lifting. The class is used both byReadRowsResumptionStrategyfor computing the resume request and the newly introducedQuery#shardmethod.Also expose the ability to get a Query's bounding range. The combination of Query#shard & Query#getBound is needed to implement a Beam source