Phase 1: Make streaming pipeline chunk-emitting#583
Draft
aram356 wants to merge 21 commits intospecs/streaming-response-optimizationfrom
Draft
Phase 1: Make streaming pipeline chunk-emitting#583aram356 wants to merge 21 commits intospecs/streaming-response-optimizationfrom
aram356 wants to merge 21 commits intospecs/streaming-response-optimizationfrom
Conversation
…ation test - Add debug-level logging to process_chunks showing total bytes read and written per pipeline invocation - Add brotli-to-brotli round-trip test to cover the into_inner() finalization path - Add test proving HtmlWithPostProcessing accumulates output when post-processors are registered while streaming path passes through
- Group std imports together (cell, io, rc) before external crates - Document supported compression combinations on PipelineConfig - Remove dead-weight byte counters from process_chunks hot loop - Fix stale comment referencing removed process_through_compression - Fix brotli finalization: use drop(encoder) instead of into_inner() to make the intent clear (CompressorWriter writes trailer on drop) - Document reset() as no-op on HtmlRewriterAdapter (single-use) - Add brotli round-trip test covering into_inner finalization path - Add gzip HTML rewriter pipeline test (compressed round-trip with lol_html, not just StreamingReplacer) - Add HtmlWithPostProcessing accumulation vs streaming behavior test
- Add Eq derive to Compression enum (all unit variants, trivially correct) - Brotli finalization: use into_inner() instead of drop() to skip redundant flush and make finalization explicit - Document process_chunks flush semantics: callers must still call encoder-specific finalization after this method returns - Warn when HtmlRewriterAdapter receives data after finalization (rewriter already consumed, data would be silently lost) - Make HtmlWithPostProcessing::reset() a true no-op matching its doc (clearing auxiliary state without resetting rewriter is inconsistent) - Document extra copying overhead on post-processor path vs streaming - Assert output content in reset-then-finalize test (was discarded) - Relax per-chunk emission test to not depend on lol_html internal buffering behavior — assert concatenated correctness + at least one intermediate chunk emitted
- Add test that feeds multiple chunks through HtmlWithPostProcessing with an active post-processor (should_process returns true, mutates HTML). Verifies the post-processor receives the complete accumulated document and its mutations appear in the output. - Make flush semantics per-codec explicit in process_chunks doc: flate2 needs finish() after flush, brotli is finalized by flush
lol_html fragments text nodes across chunk boundaries when fed incrementally. This breaks script rewriters (NextJS __NEXT_DATA__, GTM) that expect complete text content — a split domain like "google" + "tagmanager.com" would silently miss the rewrite. Add dual-mode HtmlRewriterAdapter: - new(): streaming mode, emits output per chunk (no script rewriters) - new_buffered(): accumulates input, feeds lol_html in one write() call on is_last (script rewriters registered) create_html_processor selects the mode based on whether script_rewriters is non-empty. This preserves the old behavior (single-pass processing) when rewriters need it, while enabling streaming when they don't. Add regression test proving lol_html does fragment text across chunk boundaries, confirming the issue is real.
lol_html fragments text nodes across input chunk boundaries. Script rewriters (NextJS __NEXT_DATA__, GTM) expect complete text content and would silently miss rewrites on split fragments. Add dual-mode HtmlRewriterAdapter: - new(): streaming, emits output per chunk (no script rewriters) - new_buffered(): accumulates input, single write() on is_last create_html_processor selects mode based on script_rewriters. This preserves correctness while enabling streaming for configs without script rewriters. Phase 3 will make rewriters fragment-safe. Add regression test proving lol_html does fragment text nodes.
Add section to spec explaining the lol_html text fragmentation issue, the dual-mode HtmlRewriterAdapter workaround (Phase 1), and the planned fix to make script rewriters fragment-safe (Phase 3, #584).
- Add post-finalization warning to buffered path (was only in streaming) - Add buffered_adapter_prevents_text_fragmentation test proving new_buffered() delivers complete text to lol_html handlers - Fix spec: html_processor.rs is changed (selects adapter mode), and script_rewriters do require buffered mode (not "unaffected")
5 tasks
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make the internal streaming pipeline truly chunk-emitting so processors emit output incrementally instead of buffering entire documents. This is the foundation for Phase 2 (streaming responses to the client via
StreamingBody) but ships independently with immediate memory and correctness benefits.Closes #568, closes #569, closes #570, closes #571, closes #572.
Part of epic #563.
What changed
Encoder finalization fix (
process_through_compression):W(by value) to&mut Wso callers retain ownershipflush()+drop(encoder)with explicitencoder.finish()/encoder.into_inner()All compression paths now chunk-based:
process_gzip_to_gzip: replacedread_to_end()withprocess_through_compressionpatterndecompress_and_process(used by*_to_nonepaths): replacedread_to_end()with chunk read loopUnified
process_chunksmethod:process_uncompressed,process_through_compression, anddecompress_and_processinto a singleprocess_chunksmethodHtmlRewriterAdapter incremental streaming:
Rc<RefCell<Vec<u8>>>shared output sinkprocess_chunk()now emits output on every call, not justis_lastreset()is a no-op —Settingsconsumed by constructor)HtmlWithPostProcessing adaptation:
is_lastchunk when post-processors are registeredFiles changed
streaming_processor.rshtml_processor.rsis_lastVerification
cargo test --workspace— 748 passed, 0 failedcargo clippy --workspace --all-targets --all-features -- -D warnings— cleancargo fmt --all -- --check— cleannpx vitest run— 282 passedcargo build --release --target wasm32-wasip1— successTest plan