feat: fuzz test framework with Kafka + SQLite streaming validation#13
feat: fuzz test framework with Kafka + SQLite streaming validation#13
Conversation
Randomized DML generator exercising CDC edge cases: 7 table types (scalar, wide, LOB, partitioned, no-PK, max-length strings, interval), mixed transaction patterns (immediate, batched, rollback, savepoint, large txn), bulk inserts, and NULL-heavy operations. soak-test.sh --fuzz flag runs the PL/SQL workload on both RAC nodes concurrently instead of bash-driven DML. Known issue: compare-debezium.py uses positional matching which fails with concurrent RAC DML ordering differences. Needs content-based matching for fuzz mode. Also 19 extra LOB INSERT events in OLR output need investigation (possible phantom undo variant).
Replace positional comparison with group-by-(txId, table, op) + greedy best-match within groups. Handles RAC cross-node ordering differences that caused false failures. Also exclude FUZZ_STATS (bookkeeping) and FUZZ_LOB (known phantom transaction bugs olr#26, olr#10 UPDATE variant) from comparison. Tested: 26,397 records matched across 6 tables in fuzz soak test. Single-instance twin-test (1,502 records) still passes.
Streaming Kafka + SQLite architecture replacing the HTTP receiver + JSONL file comparison. event_id based set comparison, continuous validation, scalable to arbitrary duration.
- PL/SQL fuzz workload with globally unique event_id per CDC event - Kafka broker (KRaft, single node) replaces HTTP receiver - Python consumer: Kafka → SQLite (event_id as PK, raw JSON) - Python validator: sorted event_id walk, continuous comparison - LOB known issues (olr#26, olr#10) classified separately - fuzz-test.sh orchestrator for RAC Current status: framework operational, surfacing real OLR issues: - 151 non-LOB phantom events across all table types (~1% rate) - 28 known LOB issues - Non-LOB investigation needed in OLR C++ (deferCommittedTransactions)
Sync plan with actual implementation: global monotonic event_id (not per-table prefix), SEED event_id for seed data, current findings (151 non-LOB phantoms), phase status tracking.
- Remove soak-test.sh (replaced by fuzz-test.sh) - CLAUDE.md: never create PR/issue to upstream (bersler/OpenLogReplicator) - olr-config.json: bump max-mb from 1024 to 2048 for fuzz workloads
- Validator uses per-node min(lm, olr) frontier instead of global min.
Prevents false positives from pipeline drain timing differences.
- event_id format changed to N{node}_{seq:08d} (global monotonic).
Sorts chronologically, enables correct watermark advancement.
- Seed data uses event_id='SEED' (skipped by consumer).
- DELETE no longer pre-updates event_id (avoids extra UPDATE event).
- Reduced false positives from 146 to 17 (all genuine OLR phantom
committed transactions on non-LOB tables).
- Restructure fuzz-test.sh into up/run/status/validate/logs/down actions - No auto-cleanup: containers stay up for investigation - Single Kafka topic per adapter via RegexRouter (preserves event order) - Simplify validator: global frontier instead of per-node - Consumer subscribes to fixed topic names (lm-events, olr-events)
- Validator runs only after consumer drains (via 'validate' profile)
- Per-node frontier restored: min(lm, olr) per N{x} prefix
- Single Kafka topic via RegexRouter (not ByLogicalTableRouter)
- fuzz-test.sh actions: up/run/status/validate/logs/down
Result: 26,867 events validated, 0 mismatches, 39 known LOB issues.
Non-LOB accuracy: 100%.
📝 WalkthroughWalkthroughAdds a streaming Oracle RAC fuzz-testing framework: PL/SQL workload emits event-tagged DML, two Debezium adapters (LogMiner and OLR) publish to Kafka, a Python consumer persists Debezium JSON into SQLite, a Python validator reconciles events by event_id; orchestration, configs, Docker, and helper scripts added; legacy soak-test removed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Admin
participant RAC as "Oracle RAC\n(FUZZ_WKL)"
participant DLM as "Debezium\n(LogMiner)"
participant DOLR as "Debezium\n(OLR)"
participant KF as "Kafka Broker"
participant KC as "Kafka Consumer\n(Python)"
participant SQLITE as "SQLite DB"
participant VAL as "Validator\n(Python)"
Admin->>RAC: Invoke FUZZ_WKL.run()\n(generate event_id)
RAC->>DLM: Produce redo logs
RAC->>DOLR: Produce redo logs
DLM->>KF: Publish lm-events
DOLR->>KF: Publish olr-events
KC->>KF: Subscribe lm-events & olr-events
KC->>KC: Extract event_id, seq\nbatch and insert into SQLite
KC->>SQLITE: INSERT batch (lm_events / olr_events)
SQLITE-->>KC: ACK writes
VAL->>SQLITE: Poll lm_events & olr_events
VAL->>VAL: Compute per-node frontiers\nmerge LOB fragments, match groups
VAL->>Admin: Emit PASS/FAIL and diagnostics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (2)
CLAUDE.md (1)
1-7: Prefer AGENTS.md as the canonical policy source to avoid drift.Given
@AGENTS.mdis already referenced, consider keeping the full GitHub policy inAGENTS.mdand leaving only a short pointer here. This reduces duplicate governance text across files.Based on learnings: Maintain AGENTS.md as a single source of truth for agent architecture and design decisions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 1 - 7, Replace the duplicated "GitHub Policy" text in CLAUDE.md with a short pointer to AGENTS.md and move the full policy text into AGENTS.md so AGENTS.md becomes the canonical source; specifically, in CLAUDE.md replace the "## GitHub Policy" section with a single-line reference like "See AGENTS.md for GitHub policy" and ensure the full policy content is present and up-to-date under a "GitHub Policy" heading in AGENTS.md.tests/dbz-twin/rac/docker-compose-fuzz.yaml (1)
15-17: Make the one-partition assumption explicit.Kafka only guarantees total order within a partition, and total order for a topic requires that the topic have a single partition. Auto-created topics currently inherit the broker default partition count, so the ordering guarantee this harness relies on is implicit rather than enforced here. (kafka.apache.org)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/docker-compose-fuzz.yaml` around lines 15 - 17, The compose file relies on single-partition topics but doesn't enforce that; set the broker default partition count or explicitly create topics with one partition so ordering is guaranteed. Add KAFKA_NUM_PARTITIONS: 1 (to make auto-created topics single-partition) or add a KAFKA_CREATE_TOPICS entry that declares the specific topic(s) with partitions=1; reference the existing KAFKA_AUTO_CREATE_TOPICS_ENABLE setting and update it together with KAFKA_NUM_PARTITIONS or KAFKA_CREATE_TOPICS to ensure the one-partition assumption is explicit and enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/compare-debezium.py`:
- Around line 29-31: The EXCLUDED_TABLES set currently drops 'FUZZ_LOB' events
at parse time which hides them from the offline comparator; remove 'FUZZ_LOB'
from EXCLUDED_TABLES so the parser retains those events, and instead
implement/downgrade handling in the comparator to mark diffs from table FUZZ_LOB
as a known LOB issue (e.g., emit/report them as "lob_known") so offline
validation matches the RAC consumer behavior; also apply the same change where
EXCLUDED_TABLES is defined/used later (the occurrences around lines 73-75) to
ensure consistency.
In `@tests/dbz-twin/rac/docker-compose-fuzz.yaml`:
- Around line 51-66: The consumer service is using the base image
python:3.12-slim and installs kafka-python-ng at runtime from PyPI (causing
non-reproducible tests and network dependency); replace this by building a
reproducible image that pins dependencies: add a requirements.txt (including
kafka-python-ng==<pinned-version>), add a Dockerfile that copies
kafka-consumer.py and requirements.txt and runs pip install --no-cache-dir -r
requirements.txt, and update the docker-compose service "consumer" to use the
built image name instead of python:3.12-slim (keep container_name, network_mode,
depends_on, command, environment, and volumes as needed but remove any runtime
pip install logic in the script or compose).
In `@tests/dbz-twin/rac/FUZZ-TEST.md`:
- Around line 97-102: Clarify and make consistent whether the `fuzz-test.sh
down` action deletes the `fuzz-data` volume: update the documentation and/or the
script so both the command table and the SQLite inspection snippet match.
Specifically, decide if `fuzz-test.sh down` should preserve or remove
`fuzz-data` (the `down` entry in the table), then either (A) change the script
to keep `fuzz-data` and update the `./fuzz-test.sh down` description to “stop
containers but preserve fuzz-data” and keep the SQLite inspection snippet as-is,
or (B) leave/remove `fuzz-data` in `./fuzz-test.sh down` and update the SQLite
section and investigation snippet to explain that the DB is deleted and show how
to export/back up before running `down`. Ensure references to `fuzz-test.sh
down`, `fuzz-data`, and the SQLite inspection example are updated so the doc is
internally consistent.
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 240-244: The test harness currently swallows failures by using
"wait $pid1 || true" and "wait $pid2 || true" and by grepping for 'FUZZ_DONE:'
without failing; change this so failures propagate: remove the "|| true" for
wait $pid1 and wait $pid2 (so non-zero exit codes bubble up) and after grepping
for 'FUZZ_DONE:' on "$work_dir/fuzz_out1.log" and "$work_dir/fuzz_out2.log"
explicitly check the grep result and exit non-zero (fail the script) if either
node did not emit FUZZ_DONE; ensure the final "Workload complete" and subsequent
validation only run when both waits and both FUZZ_DONE checks succeed.
- Line 32: The script currently enables errexit via the "set -euo pipefail" line
which causes the validator failure to abort the script before the existing
"exit_code=$?" assignment and the subsequent OLR error scan and PASS/FAIL banner
run; remove the -e flag (change to "set -uo pipefail") so failures do not abort
immediately, or alternatively preserve errexit but run the validator invocation
with a safe wrapper (e.g., "validator ... || true") so you can capture its exit
code into exit_code=$? and still execute the OLR error scan and PASS/FAIL banner
logic.
In `@tests/dbz-twin/rac/kafka-consumer.py`:
- Around line 161-163: The lm_seq and olr_seq maps currently retain an entry per
event_id forever, causing unbounded memory growth; change these to a bounded
cache or explicitly evict entries when an event is complete: replace the plain
dict usage for lm_seq and olr_seq with a capped LRU/TTL structure (e.g., an
OrderedDict or cachetools LRU/TTL cache) or add explicit deletion of
lm_seq[event_id] and olr_seq[event_id] when you detect the final
fragment/sequence for that event (the same fix must be applied to the other
per-event maps used later in the file). Ensure all places that increment or read
lm_seq/olr_seq perform eviction on completion or miss handling so entries do not
accumulate indefinitely.
In `@tests/dbz-twin/rac/perf/fuzz-workload.sql`:
- Around line 391-410: The update/delete procs (e.g., do_update_scalar)
currently derive v_max_id from the global g_next_id stream and pick v_target
using g_node_id/g_next_id which picks IDs that may belong to other tables or be
nonexistent; change the selection logic to choose targets from the actual target
table (olr_test.FUZZ_SCALAR) instead of g_next_id: query the table’s current id
range or pick an existing id (e.g., SELECT MAX(id)/MIN(id) or sample a row) and
use that for v_target so updates/deletes hit real rows; apply the same change to
the other affected procedures referenced (the other UPDATE/DELETE procs around
the given ranges).
- Around line 504-544: do_random_op currently only issues INSERTs for FUZZ_WIDE,
FUZZ_PART, FUZZ_NOPK, FUZZ_MAXSTR, and FUZZ_INTERVAL, so those schemas never
receive UPDATE/DELETE workload; update do_random_op to apply the same
op-selection logic used for FUZZ_SCALAR (use v_op_dice to branch into
do_insert_*, do_update_*, do_delete_*, and bulk variants where applicable) for
the branches that now call only do_insert_wide, do_insert_part, do_insert_nopk,
do_insert_maxstr, and do_insert_interval so each of those table-specific
functions (do_insert_wide, do_update_wide, do_delete_wide, etc.) get exercised
and non-insert CDC coverage is produced.
In `@tests/dbz-twin/rac/validator.py`:
- Line 116: The validator is using a single global cursor_event_id as the
lower-bound for both RAC node queries, which causes one node to be starved once
the cursor advances to the other node's event_ids; change the logic to maintain
a separate frontier/cursor per node (e.g., cursor_event_id_by_node or similar
keyed by node id) and use that per-node value when constructing the "WHERE
event_id > ?" predicate for each node query and when advancing frontiers after
validation; update all places referencing cursor_event_id (including the query
builders and the advancement at the lexicographic frontier) to read/update the
per-node cursor instead.
- Around line 246-251: The current comparison only calls
compare_values(lm_after, olr_after, lm_table) and ignores the parsed
before-images (lm_before, olr_before), so DELETEs and UPDATE before-images are
never validated; update the logic after merge_lob_records to also call
compare_values for the before-images (compare_values(lm_before, olr_before,
lm_table)) and merge/accumulate those diffs into the existing diffs result
(preserving whatever structure compare_values returns), and ensure you still
handle empty dicts (skip or treat as expected) so DELETE events validate the
deleted-row payload via lm_before vs olr_before while UPDATE validates both
before and after images.
- Around line 275-277: Remove the undefined completion guard that uses lm_max
and olr_max: the idle-time break should not reference lm_max or olr_max (they
are never initialized), so delete the "and frontier >= max(lm_max, olr_max)"
portion and let the idle-time condition rely on last_new_events and IDLE_TIMEOUT
(using frontier and cursor_event_id as before); update the branch that checks
time.time() - last_new_events > IDLE_TIMEOUT to simply break without referencing
lm_max/olr_max, keeping use of frontier, cursor_event_id, last_new_events, and
IDLE_TIMEOUT intact.
In `@tests/design/FUZZ-TEST-PLAN.md`:
- Around line 11-24: Update the fuzz test plan to match the landed
implementation: replace the topic patterns "logminer.*" and "olr.*" and any
mention of regex subscription with the actual topic names "lm-events" and
"olr-events" and note that consumers/configs use explicit topic subscriptions;
remove or update the blocking note about the "~1% non-LOB phantom" investigation
and instead state the PR summary reports 0 non-LOB mismatches; ensure all other
occurrences that still reference the old patterns (e.g., sections describing the
PL/SQL fuzz workload, Kafka Consumer (Python), Validator (Python)) are updated
to the new topics/config and reflect current validation status.
---
Nitpick comments:
In `@CLAUDE.md`:
- Around line 1-7: Replace the duplicated "GitHub Policy" text in CLAUDE.md with
a short pointer to AGENTS.md and move the full policy text into AGENTS.md so
AGENTS.md becomes the canonical source; specifically, in CLAUDE.md replace the
"## GitHub Policy" section with a single-line reference like "See AGENTS.md for
GitHub policy" and ensure the full policy content is present and up-to-date
under a "GitHub Policy" heading in AGENTS.md.
In `@tests/dbz-twin/rac/docker-compose-fuzz.yaml`:
- Around line 15-17: The compose file relies on single-partition topics but
doesn't enforce that; set the broker default partition count or explicitly
create topics with one partition so ordering is guaranteed. Add
KAFKA_NUM_PARTITIONS: 1 (to make auto-created topics single-partition) or add a
KAFKA_CREATE_TOPICS entry that declares the specific topic(s) with partitions=1;
reference the existing KAFKA_AUTO_CREATE_TOPICS_ENABLE setting and update it
together with KAFKA_NUM_PARTITIONS or KAFKA_CREATE_TOPICS to ensure the
one-partition assumption is explicit and enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d6665f60-356f-4fea-b666-d36280efcf7d
📒 Files selected for processing (13)
CLAUDE.mdtests/dbz-twin/compare-debezium.pytests/dbz-twin/rac/FUZZ-TEST.mdtests/dbz-twin/rac/config/application-logminer-kafka.propertiestests/dbz-twin/rac/config/application-olr-kafka.propertiestests/dbz-twin/rac/config/olr-config.jsontests/dbz-twin/rac/docker-compose-fuzz.yamltests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/kafka-consumer.pytests/dbz-twin/rac/perf/fuzz-workload.sqltests/dbz-twin/rac/soak-test.shtests/dbz-twin/rac/validator.pytests/design/FUZZ-TEST-PLAN.md
💤 Files with no reviewable changes (1)
- tests/dbz-twin/rac/soak-test.sh
- fuzz-test.sh: capture validator exit code without set -e abort - validator.py: per-node watermark cursors instead of global cursor - validator.py: remove undefined lm_max/olr_max completion guard - validator.py: compare before-images for UPDATE/DELETE events - compare-debezium.py: make FUZZ_LOB exclusion a --exclude-tables CLI flag - docker-compose-fuzz.yaml: build consumer image with pinned deps - fuzz-test.sh: propagate RAC workload failures instead of || true - kafka-consumer.py: periodic trimming of per-event seq maps - fuzz-workload.sql: per-table ID tracking for UPDATE/DELETE targeting - fuzz-workload.sql: add UPDATE/DELETE for WIDE, PART, MAXSTR, INTERVAL - FUZZ-TEST.md: clarify down deletes fuzz-data volume - FUZZ-TEST-PLAN.md: update topics, status, watermark description
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/dbz-twin/compare-debezium.py (1)
150-157:⚠️ Potential issue | 🟠 MajorOnly merge LOB fragments from the same transaction.
_can_merge_lob()still ignorestxId, so two normal back-to-back updates on the same row can be folded into one logical event. After that, Line 156 keeps the first record'stxId, so the later transaction disappears before the new(txId, table, op)matcher ever sees it.🐛 Fix sketch
def _can_merge_lob(prev, curr): """Check if curr is a LOB-split continuation of prev.""" if prev['table'] != curr['table']: return False + if prev.get('txId', '') != curr.get('txId', ''): + return False if curr['op'] != 'UPDATE': return False if prev['op'] not in ('INSERT', 'UPDATE'): return False return _same_row(prev.get('after', {}), curr.get('after', {}))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/compare-debezium.py` around lines 150 - 157, The merge logic is folding updates across transactions because _can_merge_lob ignores txId; only LOB fragments from the same transaction should merge. Modify _can_merge_lob to check prev.get('txId') == curr.get('txId') (and treat missing values consistently) and return False when txIds differ, so _do_merge (which currently copies prev['txId']) only merges events from the same transaction; no other changes to _do_merge required.
♻️ Duplicate comments (2)
tests/dbz-twin/rac/fuzz-test.sh (1)
244-251:⚠️ Potential issue | 🟠 MajorTreat missing
FUZZ_DONEoutput as a failed workload.The wait-status fix helps, but
rc == 0still doesn't prove the package actually ran to completion. If a node exits cleanly before emittingFUZZ_DONE, this harness will keep going and validate a truncated stream.🐛 Fix sketch
- echo " Node 1: $(grep 'FUZZ_DONE:' "$work_dir/fuzz_out1.log" || echo 'no output')" - echo " Node 2: $(grep 'FUZZ_DONE:' "$work_dir/fuzz_out2.log" || echo 'no output')" + local done1 done2 + done1=$(grep 'FUZZ_DONE:' "$work_dir/fuzz_out1.log" || true) + done2=$(grep 'FUZZ_DONE:' "$work_dir/fuzz_out2.log" || true) + echo " Node 1: ${done1:-no output}" + echo " Node 2: ${done2:-no output}" - if [[ $rc1 -ne 0 || $rc2 -ne 0 ]]; then + if [[ $rc1 -ne 0 || $rc2 -ne 0 || -z "$done1" || -z "$done2" ]]; then echo "ERROR: fuzz workload failed on one or more RAC nodes (rc1=$rc1, rc2=$rc2)" >&2 echo " Check logs: $work_dir/fuzz_out1.log, $work_dir/fuzz_out2.log" >&2 exit 1 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/fuzz-test.sh` around lines 244 - 251, The script currently only checks rc1/rc2 but doesn't verify the logs contain the completion marker; update the end of the harness (around the block that prints fuzz_out1.log/fuzz_out2.log and checks rc1/rc2) to explicitly grep each "$work_dir/fuzz_out1.log" and "$work_dir/fuzz_out2.log" for the string "FUZZ_DONE:" and treat a missing match as a failure (set a non-zero flag or rcX and print an error including which node missed FUZZ_DONE), then fail the run (exit 1) if either grep fails or the existing rc1/rc2 are non-zero. Ensure you reference and update the variables rc1 and rc2 and the checks that echo the log locations so both exit status and presence of "FUZZ_DONE:" are enforced.tests/dbz-twin/rac/kafka-consumer.py (1)
221-229:⚠️ Potential issue | 🟠 MajorThe trim pass still leaks every multi-fragment event.
Lines 225-229 only evict entries with
v <= 1. Anyevent_idthat ever reachesseq == 2or higher stays inlm_seq/olr_seqforever, so long LOB-heavy runs are still O(total split events) in memory.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/kafka-consumer.py` around lines 221 - 229, The trim pass is deleting the wrong entries: it currently removes entries with v <= 1 which wrongly keeps any event that reached seq >= 2 forever; update the deletion predicate in the loop over lm_seq and olr_seq so we only evict entries with seq == 0 (i.e., delete where v <= 0 or v == 0) and retain entries with seq > 0 (active LOB splits). Locate the for seq_map in (lm_seq, olr_seq) block and change the to_delete comprehension to target v <= 0 (or v == 0) instead of v <= 1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/rac/Dockerfile.consumer`:
- Around line 1-2: The image currently installs dependencies as root (see RUN
pip install --no-cache-dir kafka-python-ng==2.2.3) and runs as root by default;
update Dockerfile.consumer to create an unprivileged user/group (e.g.,
addgroup/adduser or groupadd/useradd), create the writable mount directory used
by the consumer (make a mkdir for the shared SQLite path), chown that directory
to the new user, perform package installation during build as root, and then
switch to the unprivileged user with USER <username> at the end so the container
runs without root privileges.
In `@tests/dbz-twin/rac/validator.py`:
- Around line 84-99: The compare_values function currently always ignores
Debezium "unavailable" markers, which hides mismatches for after images; change
compare_values(lm_after, olr_after, table) to accept an explicit flag (e.g.,
skip_unavailable or is_before) defaulting to False, use that flag in the loop so
you only continue on is_unavailable(va) or is_unavailable(vb) when
skip_unavailable/is_before is True, and update the callers that compare "before"
images to pass True (leave callers for "after" images using the default False)
so unavailable markers are only suppressed for before-image comparisons.
- Around line 141-168: The idle-timeout branch currently breaks out before
including one-sided tail events because node_frontiers was built with
min(lm_max, olr_max); instead of breaking when time.time() - last_new_events >
IDLE_TIMEOUT, expand the frontier to include any remaining tail events by
recomputing per-node frontiers as max(lm_node_max, olr_node_max) (or otherwise
unioning lm and olr tails) so final validation uses the widened frontiers;
update the code that references node_frontiers, cursor_by_node, last_new_events
and IDLE_TIMEOUT to recompute and use the max-based frontiers before doing the
"Final validation pass" rather than simply breaking.
---
Outside diff comments:
In `@tests/dbz-twin/compare-debezium.py`:
- Around line 150-157: The merge logic is folding updates across transactions
because _can_merge_lob ignores txId; only LOB fragments from the same
transaction should merge. Modify _can_merge_lob to check prev.get('txId') ==
curr.get('txId') (and treat missing values consistently) and return False when
txIds differ, so _do_merge (which currently copies prev['txId']) only merges
events from the same transaction; no other changes to _do_merge required.
---
Duplicate comments:
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 244-251: The script currently only checks rc1/rc2 but doesn't
verify the logs contain the completion marker; update the end of the harness
(around the block that prints fuzz_out1.log/fuzz_out2.log and checks rc1/rc2) to
explicitly grep each "$work_dir/fuzz_out1.log" and "$work_dir/fuzz_out2.log" for
the string "FUZZ_DONE:" and treat a missing match as a failure (set a non-zero
flag or rcX and print an error including which node missed FUZZ_DONE), then fail
the run (exit 1) if either grep fails or the existing rc1/rc2 are non-zero.
Ensure you reference and update the variables rc1 and rc2 and the checks that
echo the log locations so both exit status and presence of "FUZZ_DONE:" are
enforced.
In `@tests/dbz-twin/rac/kafka-consumer.py`:
- Around line 221-229: The trim pass is deleting the wrong entries: it currently
removes entries with v <= 1 which wrongly keeps any event that reached seq >= 2
forever; update the deletion predicate in the loop over lm_seq and olr_seq so we
only evict entries with seq == 0 (i.e., delete where v <= 0 or v == 0) and
retain entries with seq > 0 (active LOB splits). Locate the for seq_map in
(lm_seq, olr_seq) block and change the to_delete comprehension to target v <= 0
(or v == 0) instead of v <= 1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d035d73-7466-4000-94a9-2c856d37f111
📒 Files selected for processing (9)
tests/dbz-twin/compare-debezium.pytests/dbz-twin/rac/Dockerfile.consumertests/dbz-twin/rac/FUZZ-TEST.mdtests/dbz-twin/rac/docker-compose-fuzz.yamltests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/kafka-consumer.pytests/dbz-twin/rac/perf/fuzz-workload.sqltests/dbz-twin/rac/validator.pytests/design/FUZZ-TEST-PLAN.md
✅ Files skipped from review due to trivial changes (2)
- tests/dbz-twin/rac/docker-compose-fuzz.yaml
- tests/dbz-twin/rac/FUZZ-TEST.md
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/dbz-twin/rac/perf/fuzz-workload.sql
- Dockerfile.consumer: run as non-root user - validator.py: only skip LOB unavailable markers in before images - validator.py: widen frontier on idle timeout to detect missing events
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/dbz-twin/rac/validator.py (2)
84-84: Unused parametertablecan be removed.The
tableparameter is declared but never used in the function body. Consider removing it to avoid confusion.♻️ Suggested fix
-def compare_values(lm_cols, olr_cols, table, section='after'): +def compare_values(lm_cols, olr_cols, section='after'):Also update the call sites at lines 278-279:
- diffs = compare_values(lm_after, olr_after, lm_table, 'after') - diffs.extend(compare_values(lm_before, olr_before, lm_table, 'before')) + diffs = compare_values(lm_after, olr_after, 'after') + diffs.extend(compare_values(lm_before, olr_before, 'before'))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/validator.py` at line 84, Remove the unused parameter from compare_values by deleting the `table` argument in the function signature for compare_values(lm_cols, olr_cols, section='after') and update every call site that currently passes a table argument to stop passing it (adjust calls that reference compare_values to only provide lm_cols, olr_cols, and optionally section). Ensure you update imports/uses in this module (and any tests) so the function invocation matches the new signature to avoid TypeErrors.
108-108: Remove extraneousfprefix from f-strings without placeholders.Static analysis correctly identifies f-strings without placeholders on lines 108, 313, and 327.
♻️ Suggested fix
- print(f"Validator starting", flush=True) + print("Validator starting", flush=True)- print(f" Fuzz Test Validation Summary", flush=True) + print(" Fuzz Test Validation Summary", flush=True)- print(f"\n RESULT: PASS", flush=True) + print("\n RESULT: PASS", flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/validator.py` at line 108, There are f-strings used without any interpolation (no placeholders) — remove the unnecessary leading "f" from those string literals; specifically update the print call that logs "Validator starting" (print(f"Validator starting", flush=True)) and the two other occurrences flagged by static analysis (the f-strings at the other print/log sites containing no { } placeholders) so they become plain string literals (e.g., print("Validator starting", flush=True)); ensure no other occurrences of f-strings without placeholders remain in the same module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/dbz-twin/rac/validator.py`:
- Line 84: Remove the unused parameter from compare_values by deleting the
`table` argument in the function signature for compare_values(lm_cols, olr_cols,
section='after') and update every call site that currently passes a table
argument to stop passing it (adjust calls that reference compare_values to only
provide lm_cols, olr_cols, and optionally section). Ensure you update
imports/uses in this module (and any tests) so the function invocation matches
the new signature to avoid TypeErrors.
- Line 108: There are f-strings used without any interpolation (no placeholders)
— remove the unnecessary leading "f" from those string literals; specifically
update the print call that logs "Validator starting" (print(f"Validator
starting", flush=True)) and the two other occurrences flagged by static analysis
(the f-strings at the other print/log sites containing no { } placeholders) so
they become plain string literals (e.g., print("Validator starting",
flush=True)); ensure no other occurrences of f-strings without placeholders
remain in the same module.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e200631f-e77e-41e6-a677-6e53a5ecdaf7
📒 Files selected for processing (2)
tests/dbz-twin/rac/Dockerfile.consumertests/dbz-twin/rac/validator.py
✅ Files skipped from review due to trivial changes (1)
- tests/dbz-twin/rac/Dockerfile.consumer
- fuzz-test.sh: capture validator exit code without set -e abort - validator.py: per-node watermark cursors instead of global cursor - validator.py: remove undefined lm_max/olr_max completion guard - validator.py: compare before-images for UPDATE/DELETE events - compare-debezium.py: make FUZZ_LOB exclusion a --exclude-tables CLI flag - docker-compose-fuzz.yaml: build consumer image with pinned deps - fuzz-test.sh: propagate RAC workload failures instead of || true - kafka-consumer.py: periodic trimming of per-event seq maps - fuzz-workload.sql: per-table ID tracking for UPDATE/DELETE targeting - fuzz-workload.sql: add UPDATE/DELETE for WIDE, PART, MAXSTR, INTERVAL - FUZZ-TEST.md: clarify down deletes fuzz-data volume - FUZZ-TEST-PLAN.md: update topics, status, watermark description
Summary
Streaming fuzz test framework for validating OLR data accuracy under randomized workloads on Oracle RAC.
Architecture: Oracle RAC → PL/SQL fuzz workload → {LogMiner, OLR} → Debezium → Kafka → Python consumer → SQLite → Python validator
event_idper CDC eventevent_idas PK, raw JSON)Changes:
fuzz-test.shwith subcommands (up/run/status/validate/logs/down)kafka-consumer.py,validator.py,docker-compose-fuzz.yamlfuzz-workload.sqlwith event_id on all tablesFUZZ-TEST.mddocumentationsoak-test.sh(replaced by fuzz-test.sh)compare-debezium.pywith content-based matching for RACCLAUDE.mdwith GitHub policyolr-config.jsonmax-mb 1024 → 2048Test Results (5-minute run)
Test plan
Summary by CodeRabbit
Tests
Documentation
Chores