Phase 1: Make streaming pipeline chunk-emitting by aram356 · Pull Request #583 · IABTechLab/trusted-server

aram356 · 2026-03-26T17:39:06Z

Summary

Make the internal streaming pipeline truly chunk-emitting so processors emit output incrementally instead of buffering entire documents. This is the foundation for Phase 2 (streaming responses to the client via StreamingBody) but ships independently with immediate memory and correctness benefits.

Closes #568, closes #569, closes #570, closes #571, closes #572.
Part of epic #563.

What changed

Encoder finalization fix (process_through_compression):

Changed signature from W (by value) to &mut W so callers retain ownership
Replaced flush() + drop(encoder) with explicit encoder.finish() / encoder.into_inner()
Finalization errors now propagate instead of being silently swallowed
Fixes pre-existing correctness issue for deflate/brotli, prevents regression when gzip moves to this path

All compression paths now chunk-based:

process_gzip_to_gzip: replaced read_to_end() with process_through_compression pattern
decompress_and_process (used by *_to_none paths): replaced read_to_end() with chunk read loop
All 6 compression paths now follow the same structure: read chunk → process → write → repeat

Unified process_chunks method:

Extracted shared chunk read/process/write loop from process_uncompressed, process_through_compression, and decompress_and_process into a single process_chunks method
Reduces 3 near-identical implementations to 1

HtmlRewriterAdapter incremental streaming:

lol_html rewriter created eagerly in constructor with Rc<RefCell<Vec<u8>>> shared output sink
process_chunk() now emits output on every call, not just is_last
Adapter is single-use (reset() is a no-op — Settings consumed by constructor)

HtmlWithPostProcessing adaptation:

Added output accumulation buffer for the is_last chunk when post-processors are registered
Intermediate chunks pass through directly; final chunk triggers post-processing on accumulated output
When no post-processors are registered (the streaming path), it's a pure passthrough

Files changed

File	Lines	What
`streaming_processor.rs`	+380 -329	All pipeline changes above
`html_processor.rs`	+29 -1	Post-processor accumulation for `is_last`

Verification

cargo test --workspace — 748 passed, 0 failed
cargo clippy --workspace --all-targets --all-features -- -D warnings — clean
cargo fmt --all -- --check — clean
npx vitest run — 282 passed
cargo build --release --target wasm32-wasip1 — success

Test plan

Existing streaming_processor tests updated for incremental output
New deflate round-trip test
New gzip round-trip test (chunk-based)
New gzip-to-none test
HtmlRewriterAdapter per-chunk emission test
Full pipeline integration test with HTML rewriting
WASM build succeeds

…er, rollback

…reaming refactor

…ssing paths

…ation test - Add debug-level logging to process_chunks showing total bytes read and written per pipeline invocation - Add brotli-to-brotli round-trip test to cover the into_inner() finalization path - Add test proving HtmlWithPostProcessing accumulates output when post-processors are registered while streaming path passes through

- Group std imports together (cell, io, rc) before external crates - Document supported compression combinations on PipelineConfig - Remove dead-weight byte counters from process_chunks hot loop - Fix stale comment referencing removed process_through_compression - Fix brotli finalization: use drop(encoder) instead of into_inner() to make the intent clear (CompressorWriter writes trailer on drop) - Document reset() as no-op on HtmlRewriterAdapter (single-use) - Add brotli round-trip test covering into_inner finalization path - Add gzip HTML rewriter pipeline test (compressed round-trip with lol_html, not just StreamingReplacer) - Add HtmlWithPostProcessing accumulation vs streaming behavior test

- Add Eq derive to Compression enum (all unit variants, trivially correct) - Brotli finalization: use into_inner() instead of drop() to skip redundant flush and make finalization explicit - Document process_chunks flush semantics: callers must still call encoder-specific finalization after this method returns - Warn when HtmlRewriterAdapter receives data after finalization (rewriter already consumed, data would be silently lost) - Make HtmlWithPostProcessing::reset() a true no-op matching its doc (clearing auxiliary state without resetting rewriter is inconsistent) - Document extra copying overhead on post-processor path vs streaming - Assert output content in reset-then-finalize test (was discarded) - Relax per-chunk emission test to not depend on lol_html internal buffering behavior — assert concatenated correctness + at least one intermediate chunk emitted

- Add test that feeds multiple chunks through HtmlWithPostProcessing with an active post-processor (should_process returns true, mutates HTML). Verifies the post-processor receives the complete accumulated document and its mutations appear in the output. - Make flush semantics per-codec explicit in process_chunks doc: flate2 needs finish() after flush, brotli is finalized by flush

lol_html fragments text nodes across chunk boundaries when fed incrementally. This breaks script rewriters (NextJS __NEXT_DATA__, GTM) that expect complete text content — a split domain like "google" + "tagmanager.com" would silently miss the rewrite. Add dual-mode HtmlRewriterAdapter: - new(): streaming mode, emits output per chunk (no script rewriters) - new_buffered(): accumulates input, feeds lol_html in one write() call on is_last (script rewriters registered) create_html_processor selects the mode based on whether script_rewriters is non-empty. This preserves the old behavior (single-pass processing) when rewriters need it, while enabling streaming when they don't. Add regression test proving lol_html does fragment text across chunk boundaries, confirming the issue is real.

lol_html fragments text nodes across input chunk boundaries. Script rewriters (NextJS __NEXT_DATA__, GTM) expect complete text content and would silently miss rewrites on split fragments. Add dual-mode HtmlRewriterAdapter: - new(): streaming, emits output per chunk (no script rewriters) - new_buffered(): accumulates input, single write() on is_last create_html_processor selects mode based on script_rewriters. This preserves correctness while enabling streaming for configs without script rewriters. Phase 3 will make rewriters fragment-safe. Add regression test proving lol_html does fragment text nodes.

Add section to spec explaining the lol_html text fragmentation issue, the dual-mode HtmlRewriterAdapter workaround (Phase 1), and the planned fix to make script rewriters fragment-safe (Phase 3, #584).

- Add post-finalization warning to buffered path (was only in streaming) - Add buffered_adapter_prevents_text_fragmentation test proving new_buffered() delivers complete text to lol_html handlers - Fix spec: html_processor.rs is changed (selects adapter mode), and script_rewriters do require buffered mode (not "unaffected")

aram356 added 17 commits March 26, 2026 09:08

Fix encoder finalization: explicit finish instead of drop

6968201

Convert process_gzip_to_gzip to chunk-based processing

a4fd5c6

Convert decompress_and_process to chunk-based processing

a4f4a7c

Rewrite HtmlRewriterAdapter for incremental lol_html streaming

105244c

Unify compression paths into single process_chunks method

d72669c

Update plan with compression refactor implementation note

80e51d4

Accumulate output for post-processors in HtmlWithPostProcessing

c505c00

Add streaming response optimization spec for non-Next.js paths

6cae7f9

Address spec review: Content-Length, streaming gate, finalization ord…

930a584

…er, rollback

Address deep review: header timing, error phases, process_response_st…

a2b71bf

…reaming refactor

Address deep review: remove fastly::init, fix API assumptions, add mi…

b363e56

…ssing paths

Merge branch 'main' into feature/streaming-pipeline-phase1

13366f8

Apply rustfmt formatting to streaming_processor

b83f61c

aram356 self-assigned this Mar 26, 2026

aram356 added 4 commits March 26, 2026 11:46

Document text node fragmentation workaround and Phase 3 plan

73c992e

Add section to spec explaining the lol_html text fragmentation issue, the dual-mode HtmlRewriterAdapter workaround (Phase 1), and the planned fix to make script rewriters fragment-safe (Phase 3, #584).

aram356 mentioned this pull request Mar 26, 2026

Phase 2: Stream responses to client via StreamingBody #585

Draft

5 tasks

aram356 marked this pull request as draft March 26, 2026 21:39

aram356 mentioned this pull request Mar 26, 2026

Implement streaming response optimization for non-Next.js publisher proxy #563

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1: Make streaming pipeline chunk-emitting#583

Phase 1: Make streaming pipeline chunk-emitting#583
aram356 wants to merge 21 commits intospecs/streaming-response-optimizationfrom
feature/streaming-pipeline-phase1

aram356 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aram356 commented Mar 26, 2026

Summary

What changed

Files changed

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant