test: add Debezium OLR vs LogMiner performance and validation framework#9
test: add Debezium OLR vs LogMiner performance and validation framework#9
Conversation
Swingbench-based performance test comparing Debezium with OLR adapter vs LogMiner adapter on Oracle RAC. Measures throughput and latency under sustained OLTP load. Key results (16GB RAC VM, Debug build): - LogMiner: stable at ~300 TPS, diverges at 600 TPS - OLR: stable up to 1700 TPS (VM-limited, ceiling not reached) - OLR latency 11x lower under sustained load Includes: - perf/run.sh: automated test orchestrator - perf/docker-compose.yaml: Debezium services for perf testing - perf/RESULTS.md: detailed benchmark results - Enhanced debezium-receiver.py: /metrics endpoint with throughput and latency percentiles (p50/p95/p99) - PL/SQL DML generator (alternative to Swingbench) - Java 21 in mise.toml (required by Swingbench)
Add step 3.4 to DEPLOY.md to disable firewalld. The VM runs on a host-only libvirt network where firewall is unnecessary and blocks Prometheus scraping of node_exporter.
- Dockerfile.swingbench: containerized Swingbench load generator - validator.py: tails LogMiner + OLR JSONL files, matches events by content in real-time, stops swingbench on mismatch via Docker socket - docker-compose.yaml: full stack with receiver, dbz-logminer, dbz-olr, swingbench, validator, prometheus - VALIDATION-PLAN.md: architecture and design decisions Designed for long-running (hours/days) continuous validation. On mismatch, swingbench is stopped immediately to preserve redo logs and event history for offline replay.
📝 WalkthroughWalkthroughAdds an end-to-end Debezium Oracle RAC performance benchmarking suite (OLR vs LogMiner) plus CMAN deployment steps and Java 21 tooling pin. Introduces orchestration scripts, container configs, test data generators, validator, receiver metrics, and test documentation/configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant Swingbench as Swingbench (workload)
participant Oracle as Oracle RAC
participant CMAN as CMAN/SCAN
participant OLR as Debezium OLR Adapter
participant LogMiner as Debezium LogMiner Adapter
participant Receiver as HTTP Receiver (/metrics & /status)
participant Validator as Validator (comparator)
Swingbench->>Oracle: execute DML (inserts/updates/deletes)
Oracle->>CMAN: redo stream / SCAN discovery
CMAN->>OLR: provide redo stream (OLR reads)
CMAN->>LogMiner: provide redo stream (LogMiner reads)
OLR->>Receiver: POST events (http://.../olr)
LogMiner->>Receiver: POST events (http://.../logminer)
Receiver->>Validator: write `olr.jsonl` / `logminer.jsonl`
Validator->>Receiver: tail files, match events, report mismatch/metrics
Receiver->>Prometheus: expose /metrics (throughput, latencies)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (9)
oracle-rac/DEPLOY.md (2)
168-196: Consider adding checksum verification for the node_exporter download.The installation downloads the binary without verifying its integrity. For production deployments, add checksum validation to prevent supply-chain attacks. Additionally, node_exporter v1.10.2 is available; consider updating from v1.9.0.
🔐 Enhanced installation with checksum verification
cd /tmp +# Download checksums file +curl -sLO https://github.com/prometheus/node_exporter/releases/download/v1.9.0/sha256sums.txt curl -sLO https://github.com/prometheus/node_exporter/releases/download/v1.9.0/node_exporter-1.9.0.linux-amd64.tar.gz +# Verify checksum +sha256sum -c --ignore-missing sha256sums.txt tar xzf node_exporter-1.9.0.linux-amd64.tar.gz🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@oracle-rac/DEPLOY.md` around lines 168 - 196, The curl download and tar/extract steps for node_exporter (node_exporter-1.9.0.linux-amd64.tar.gz and subsequent tar xzf/ cp operations) lack integrity checks; modify the install sequence to download the release checksum (or .sha256 file) alongside the tarball, verify the tarball with sha256sum (or gpg/asc signature) and abort on mismatch before running tar xzf, and update the download URL and filenames to the newer v1.10.2 release if you choose to bump versions; ensure any temporary checksum files are removed after a successful verify and that the systemctl install steps (creating /etc/systemd/system/node_exporter.service and enabling the service) only run if verification passes.
198-224: Pin cAdvisor to a specific version instead of:latest.Using
:latestcan lead to unexpected behavior and version drift. Pin to a specific stable version for reproducibility and predictable deployments.📌 Pinned version example
- gcr.io/cadvisor/cadvisor:latest + gcr.io/cadvisor/cadvisor:v0.56.2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@oracle-rac/DEPLOY.md` around lines 198 - 224, Replace the image tag in the podman run command so it does not use gcr.io/cadvisor/cadvisor:latest; update the image reference in the shown podman run invocation (the line containing gcr.io/cadvisor/cadvisor:latest) to a specific stable cAdvisor tag (for example gcr.io/cadvisor/cadvisor:v0.47.0 or whichever current stable release you’ve validated) so deployments are reproducible and avoid unintended version drift.tests/sql/environments/rac/debezium/perf/dml-generator.sql (1)
54-59: Comment is misleading — DELETE uses same ID range as UPDATE.Line 55 says "target an old row" but line 56 uses the identical formula as line 47 for UPDATE. Both target rows uniformly from the same range
[v_start_id, v_next_id).📝 Fix comment or adjust logic
Either update the comment to match the actual behavior:
ELSE - -- DELETE (10%) — target an old row + -- DELETE (10%) — target a random existing row v_target_id := v_start_id + MOD(ABS(DBMS_RANDOM.RANDOM), GREATEST(v_next_id - v_start_id, 1));Or, if you truly want to target older rows preferentially, bias the selection differently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/dml-generator.sql` around lines 54 - 59, The comment for the DELETE branch is misleading: it says "target an old row" but DELETE computes v_target_id using the same uniform formula as the UPDATE branch (v_target_id := v_start_id + MOD(ABS(DBMS_RANDOM.RANDOM), GREATEST(v_next_id - v_start_id, 1))) and thus samples the same range [v_start_id, v_next_id). Either update the comment to describe the actual uniform selection behavior for the DELETE that deletes from olr_test.PERF_BENCH, or change the selection logic for v_target_id in the DELETE branch to bias older rows (for example by sampling a smaller upper bound or applying a non-uniform distribution) while keeping the DELETE statement and v_delete_cnt increment intact.tests/sql/scripts/debezium-receiver.py (1)
37-49: Unbounded memory growth for long-running tests.The
latencies_msandtimestampslists grow indefinitely. For the hour/day-long validation runs mentioned in PR objectives, this could consume significant memory (e.g., 1000 eps × 3600s = 3.6M entries per hour per adapter).Consider either:
- Keeping only recent entries (e.g., last 60s) for throughput calculation and computing latency percentiles incrementally using a streaming algorithm (t-digest, reservoir sampling).
- Periodically pruning old entries that are no longer needed for the 10s window.
📝 Minimal fix: prune old timestamps
def compute_metrics(channel): """Compute throughput and latency stats for a channel. Caller holds lock.""" m = metrics[channel] count = state[f'{channel}_count'] now_ms = time.time() * 1000 + # Prune timestamps older than 60s (only needed for 10s window) + cutoff_60s = now_ms - 60000 + m['timestamps'] = [t for t in m['timestamps'] if t >= cutoff_60s] + result = {Note: This doesn't address latencies, which would require a more sophisticated approach if full percentile accuracy is needed over the entire run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/scripts/debezium-receiver.py` around lines 37 - 49, The metrics dict's lists (metrics['...']['timestamps'] and metrics['...']['latencies_ms']) grow without bound; modify the code that appends to these lists so it also prunes entries older than the rolling window (e.g., 10s) on each insert: compute a cutoff using the arrival timestamp (or time.time()*1000 for ms), then filter each adapter's timestamps list to keep only values >= cutoff and trim latencies_ms in the same index range (or maintain parallel timestamp->latency pairs and prune by timestamp) so throughput and percentile calculations only use recent data; update any places that read metrics['logminer'] or metrics['olr'] to expect pruned lists.tests/sql/environments/rac/debezium/perf/validator.py (2)
110-110: Remove extraneousfprefix from string without placeholders.- print(f'Validator starting', flush=True) + print('Validator starting', flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/validator.py` at line 110, The print statement uses an unnecessary f-string: replace the call to print(f'Validator starting', flush=True) with a plain string print('Validator starting', flush=True) to remove the extraneous f prefix in the validator startup message (the print invocation in validator.py).
77-97: Stopping swingbench via raw HTTP over Unix socket is functional but fragile.The implementation works but has some robustness considerations:
- Catching broad
Exception(line 96) is acceptable here for defensive error handling since this is a cleanup action.- The hardcoded container name
perf-swingbenchmatchesdocker-compose.yamlline 41.Consider adding a timeout to the socket operations to prevent hanging if Docker is unresponsive.
📝 Add socket timeout
try: sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) + sock.settimeout(5.0) # 5 second timeout sock.connect('/var/run/docker.sock')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/validator.py` around lines 77 - 97, stop_swingbench currently opens a Unix socket without any timeout which can hang if Docker is unresponsive; after creating the socket in stop_swingbench (the sock = socket.socket(...) line) call sock.settimeout(<seconds>) (e.g. 5s) before connect/send/recv, and ensure the socket is closed in a finally block (or use the socket as a context manager) so resources are released; optionally catch socket.timeout specifically to print a clear warning and then fall back to the existing broad exception handling.tests/sql/environments/rac/debezium/perf/docker-compose.yaml (1)
13-37: Inconsistent restart policy between adapters.
dbz-olrhasrestart: unless-stopped(line 29) whiledbz-logminerhas no restart policy. For consistency in the benchmark environment, consider applying the same policy to both adapters or documenting why they differ.📝 Add restart policy to dbz-logminer
dbz-logminer: image: quay.io/debezium/server:3.5.0.Beta1 container_name: dbz-logminer network_mode: host + restart: unless-stopped depends_on: receiver: condition: service_started🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/docker-compose.yaml` around lines 13 - 37, The docker-compose services are inconsistent: add the same restart policy to dbz-logminer to match dbz-olr (or explicitly document why they differ); specifically, in the dbz-logminer service block (service name "dbz-logminer") insert the line restart: unless-stopped at the same indentation level as network_mode and container_name so both adapters use the same restart behavior, or alternatively add a comment in the compose file explaining the intentional difference.tests/debezium/PERF-TEST-PLAN.md (1)
20-33: Add language specifier to fenced code block.The architecture diagram should have a language specifier (e.g.,
textorplaintext) to satisfy markdown linting rules.📝 Suggested fix
-``` +```text Oracle RAC (2 nodes) └── PL/SQL DML generator (DBMS_SCHEDULER jobs on both nodes)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/PERF-TEST-PLAN.md` around lines 20 - 33, The fenced code block starting with ``` and containing "Oracle RAC (2 nodes)" in PERF-TEST-PLAN.md lacks a language specifier; update the opening fence to include a plain-text language token (e.g., change ``` to ```text or ```plaintext) so markdown linting passes, keeping the block content unchanged and ensuring only the opening fence is modified.tests/sql/environments/rac/debezium/perf/VALIDATION-PLAN.md (1)
11-21: Add language specifier to fenced code block.The architecture diagram should have a language specifier for consistency with markdown linting rules.
📝 Suggested fix
-``` +```text Oracle RAC (VM) └── OLR container (reads redo → TCP:5000)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/VALIDATION-PLAN.md` around lines 11 - 21, Add a language specifier to the fenced code block containing the architecture diagram: change the opening fence from ``` to ```text for the block that begins with "Oracle RAC (VM)" in VALIDATION-PLAN.md so the diagram is marked as plain text and satisfies markdown linting rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@oracle-rac/DEPLOY.md`:
- Around line 158-166: Replace the "Disable firewall" guidance in DEPLOY.md with
a targeted-firewall approach: instruct users to keep firewalld enabled and add
persistent rules for the required ports instead of stopping/disabling it —
specifically mention opening 9100/tcp (node_exporter), 9101/tcp (cAdvisor), and
1521/tcp (Oracle/CMAN) using firewall-cmd --permanent --add-port=... and then
firewall-cmd --reload; update the section header and example commands so the doc
recommends allowing specific ports rather than disabling firewalld entirely.
In
`@tests/sql/environments/rac/debezium/perf/config/application-logminer.properties`:
- Around line 12-13: Replace the hardcoded IP in the LogMiner Debezium config by
using the SCAN hostname to match the OLR setup: change the property
debizium.source.database.hostname (debzium.source.database.hostname) from
192.168.122.130 to the SCAN name (racnodepc1-scan) and keep
debizium.source.database.port=1521; if the IP is intentionally the CMAN proxy
endpoint, add a short inline comment in the same file explaining why the
LogMiner adapter uses the IP while OLR uses SCAN so reviewers understand the
difference.
In `@tests/sql/environments/rac/debezium/perf/config/prometheus.yml`:
- Around line 1-10: The prometheus.yml currently hardcodes the VM IP
(192.168.122.130) for the 'node-exporter' and 'cadvisor' scrape_configs; update
generation or the static file so the target uses a VM_IP placeholder or actual
runtime value: if prometheus.yml is produced by perf/run.sh, modify that script
to substitute ${VM_IP} into the targets for job_name 'node-exporter' and
'cadvisor' (and add an instance label like 'oracle-rac-vm' to static_configs);
if the file is static, replace the literal IP with a placeholder ${VM_IP} and
add labels to static_configs to make the instance explicit.
In `@tests/sql/environments/rac/debezium/perf/dml-generator.sql`:
- Around line 67-73: The elapsed_ms calculation in the DBMS_OUTPUT.PUT_LINE uses
EXTRACT(SECOND FROM (SYSTIMESTAMP - v_start_ts)) which only returns 0–59 and
thus undercounts runs >60s; update the expression that builds ' elapsed_ms=' (in
the DBMS_OUTPUT.PUT_LINE call that references v_start_ts, v_start_ts variable
and v_start_ts-based interval) to compute total elapsed seconds by summing
EXTRACT(DAY), EXTRACT(HOUR), EXTRACT(MINUTE) and EXTRACT(SECOND) from
(SYSTIMESTAMP - v_start_ts) and multiplying by 1000 (or use an equivalent
full-interval arithmetic approach that yields total milliseconds) so elapsed_ms
reports the true total milliseconds for durations >60s.
In `@tests/sql/environments/rac/debezium/perf/run.sh`:
- Around line 37-39: The _poll_metrics function currently swallows failures by
returning '{}' on curl failure; change it to fail loudly instead — remove the
"|| echo '{}'" fallback so that a non-zero curl exit propagates, or explicitly
detect curl's non-zero exit and log the error and exit non-zero; update the same
pattern at the other occurrence that fetches "$RECEIVER_URL/metrics" (the lines
noted in the comment) so receiver metrics unavailability causes the script to
fail rather than produce a silent empty metrics object.
- Around line 11-12: Add a trap so the existing cleanup routine always runs on
failure or interrupt: after the set -euo pipefail line register the cleanup
function used later (the block at lines ~223-227 labeled cleanup) with something
like trap 'cleanup' INT TERM EXIT (or trap cleanup EXIT); this ensures the
cleanup logic for local compose services and the remote OLR container executes
on any error or signal.
- Line 148: The script currently masks Swingbench failures by using "wait
$SB_PID || true"; change this so failures are propagated instead of suppressed —
either remove the "|| true" so the script will exit non‑zero if the Swingbench
background process ($SB_PID) fails, or explicitly check the wait exit status
(capture $? after wait $SB_PID) and call exit with that non‑zero status; update
the wait handling in run.sh around $SB_PID accordingly.
In `@tests/sql/environments/rac/debezium/perf/setup.sql`:
- Around line 7-18: The PERF_BENCH table is created in schema olr_test but
adapters and generators target SOE; fix by making schema names consistent:
either move/CREATE PERF_BENCH in schema SOE (update CREATE TABLE to use
SOE.PERF_BENCH and its PK/ALTER statements) or update adapter configs and
generator to reference olr_test (change debezium property
debezium.source.schema.include.list, olr-config.json "owner", and any references
in application-olr.properties and dml-generator.sql to OL R_TEST/olr_test).
Ensure the schema identifier (olr_test vs SOE) is used consistently across
PERF_BENCH, dml-generator.sql, application-logminer.properties,
application-olr.properties, and olr-config.json so CDC captures the changes.
In `@tests/sql/environments/rac/debezium/perf/validator.py`:
- Around line 148-153: The pending dicts olr_pending and lm_pending currently
store a single (now, line) per event key, causing later duplicate keys to
overwrite earlier ones; change both olr_pending and lm_pending to map each key
to a list (or collections.deque) of entries and update the matching logic in the
blocks that reference olr_pending and lm_pending (the if key in olr_pending ...
del olr_pending[key] ... matched += 1 branch and the symmetric branch around
lines 167-172) to pop a FIFO entry from the list when matching, delete the key
when the list becomes empty, and append new (now, line) tuples to the list when
no match is found so duplicates are tracked and matched correctly.
- Around line 106-108: The argparse flag for stop-on-fail is misconfigured:
parser.add_argument('--stop-on-fail', action='store_true', default=True, ...)
makes the option always True; change the argument so users can disable it—either
switch to an inverse flag (e.g., keep '--stop-on-fail' default=False with
action='store_true' or add '--no-stop-on-fail' using action='store_false') or
use argparse.BooleanOptionalAction (requires Python 3.9+) to support both
--stop-on-fail and --no-stop-on-fail; update the parser.add_argument call and
ensure code that reads args.stop_on_fail continues to work.
In `@tests/sql/scripts/debezium-receiver.py`:
- Around line 165-171: The throughput calculation skips the case len(recent) ==
1 causing a 0 value; in the block around cutoff/recent where throughput_10s_eps
is set, handle the single-event edge case by assigning a throughput of 1 event
per 10 seconds (0.1 eps) or computing 1.0 / 10.0 using now_ms and recent[0] to
derive a 10s window; update the logic that sets result['throughput_10s_eps'] so
it covers len(recent) == 1 (using m['timestamps'], cutoff, recent, and
result['throughput_10s_eps'] identifiers).
---
Nitpick comments:
In `@oracle-rac/DEPLOY.md`:
- Around line 168-196: The curl download and tar/extract steps for node_exporter
(node_exporter-1.9.0.linux-amd64.tar.gz and subsequent tar xzf/ cp operations)
lack integrity checks; modify the install sequence to download the release
checksum (or .sha256 file) alongside the tarball, verify the tarball with
sha256sum (or gpg/asc signature) and abort on mismatch before running tar xzf,
and update the download URL and filenames to the newer v1.10.2 release if you
choose to bump versions; ensure any temporary checksum files are removed after a
successful verify and that the systemctl install steps (creating
/etc/systemd/system/node_exporter.service and enabling the service) only run if
verification passes.
- Around line 198-224: Replace the image tag in the podman run command so it
does not use gcr.io/cadvisor/cadvisor:latest; update the image reference in the
shown podman run invocation (the line containing
gcr.io/cadvisor/cadvisor:latest) to a specific stable cAdvisor tag (for example
gcr.io/cadvisor/cadvisor:v0.47.0 or whichever current stable release you’ve
validated) so deployments are reproducible and avoid unintended version drift.
In `@tests/debezium/PERF-TEST-PLAN.md`:
- Around line 20-33: The fenced code block starting with ``` and containing
"Oracle RAC (2 nodes)" in PERF-TEST-PLAN.md lacks a language specifier; update
the opening fence to include a plain-text language token (e.g., change ``` to
```text or ```plaintext) so markdown linting passes, keeping the block content
unchanged and ensuring only the opening fence is modified.
In `@tests/sql/environments/rac/debezium/perf/dml-generator.sql`:
- Around line 54-59: The comment for the DELETE branch is misleading: it says
"target an old row" but DELETE computes v_target_id using the same uniform
formula as the UPDATE branch (v_target_id := v_start_id +
MOD(ABS(DBMS_RANDOM.RANDOM), GREATEST(v_next_id - v_start_id, 1))) and thus
samples the same range [v_start_id, v_next_id). Either update the comment to
describe the actual uniform selection behavior for the DELETE that deletes from
olr_test.PERF_BENCH, or change the selection logic for v_target_id in the DELETE
branch to bias older rows (for example by sampling a smaller upper bound or
applying a non-uniform distribution) while keeping the DELETE statement and
v_delete_cnt increment intact.
In `@tests/sql/environments/rac/debezium/perf/docker-compose.yaml`:
- Around line 13-37: The docker-compose services are inconsistent: add the same
restart policy to dbz-logminer to match dbz-olr (or explicitly document why they
differ); specifically, in the dbz-logminer service block (service name
"dbz-logminer") insert the line restart: unless-stopped at the same indentation
level as network_mode and container_name so both adapters use the same restart
behavior, or alternatively add a comment in the compose file explaining the
intentional difference.
In `@tests/sql/environments/rac/debezium/perf/VALIDATION-PLAN.md`:
- Around line 11-21: Add a language specifier to the fenced code block
containing the architecture diagram: change the opening fence from ``` to
```text for the block that begins with "Oracle RAC (VM)" in VALIDATION-PLAN.md
so the diagram is marked as plain text and satisfies markdown linting rules.
In `@tests/sql/environments/rac/debezium/perf/validator.py`:
- Line 110: The print statement uses an unnecessary f-string: replace the call
to print(f'Validator starting', flush=True) with a plain string print('Validator
starting', flush=True) to remove the extraneous f prefix in the validator
startup message (the print invocation in validator.py).
- Around line 77-97: stop_swingbench currently opens a Unix socket without any
timeout which can hang if Docker is unresponsive; after creating the socket in
stop_swingbench (the sock = socket.socket(...) line) call
sock.settimeout(<seconds>) (e.g. 5s) before connect/send/recv, and ensure the
socket is closed in a finally block (or use the socket as a context manager) so
resources are released; optionally catch socket.timeout specifically to print a
clear warning and then fall back to the existing broad exception handling.
In `@tests/sql/scripts/debezium-receiver.py`:
- Around line 37-49: The metrics dict's lists (metrics['...']['timestamps'] and
metrics['...']['latencies_ms']) grow without bound; modify the code that appends
to these lists so it also prunes entries older than the rolling window (e.g.,
10s) on each insert: compute a cutoff using the arrival timestamp (or
time.time()*1000 for ms), then filter each adapter's timestamps list to keep
only values >= cutoff and trim latencies_ms in the same index range (or maintain
parallel timestamp->latency pairs and prune by timestamp) so throughput and
percentile calculations only use recent data; update any places that read
metrics['logminer'] or metrics['olr'] to expect pruned lists.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 133dbe7c-25ec-4443-a14c-1bb54ec3dbf0
📒 Files selected for processing (17)
mise.tomloracle-rac/DEPLOY.mdtests/debezium/PERF-TEST-PLAN.mdtests/sql/environments/rac/debezium/config/olr-config.jsontests/sql/environments/rac/debezium/perf/Dockerfile.swingbenchtests/sql/environments/rac/debezium/perf/RESULTS.mdtests/sql/environments/rac/debezium/perf/VALIDATION-PLAN.mdtests/sql/environments/rac/debezium/perf/config/application-logminer.propertiestests/sql/environments/rac/debezium/perf/config/application-olr.propertiestests/sql/environments/rac/debezium/perf/config/prometheus.ymltests/sql/environments/rac/debezium/perf/dml-generator.sqltests/sql/environments/rac/debezium/perf/docker-compose.yamltests/sql/environments/rac/debezium/perf/olr-config.jsontests/sql/environments/rac/debezium/perf/run.shtests/sql/environments/rac/debezium/perf/setup.sqltests/sql/environments/rac/debezium/perf/validator.pytests/sql/scripts/debezium-receiver.py
tests/sql/environments/rac/debezium/perf/config/application-logminer.properties
Show resolved
Hide resolved
| BEGIN EXECUTE IMMEDIATE 'DROP TABLE olr_test.PERF_BENCH PURGE'; EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; END; | ||
| / | ||
|
|
||
| CREATE TABLE olr_test.PERF_BENCH ( | ||
| id NUMBER, | ||
| val VARCHAR2(200), | ||
| node_id NUMBER(1), | ||
| batch_num NUMBER, | ||
| created TIMESTAMP DEFAULT SYSTIMESTAMP, | ||
| CONSTRAINT perf_bench_pk PRIMARY KEY (id, node_id) | ||
| ); | ||
| ALTER TABLE olr_test.PERF_BENCH ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify which schema names are used across the performance test configs
echo "=== Schema references in adapter configs ==="
rg -n "SOE|OLR_TEST|olr_test" tests/sql/environments/rac/debezium/perf/ --type json --type propertiesRepository: rophy/OpenLogReplicator
Length of output: 163
🏁 Script executed:
#!/bin/bash
# Find the config files mentioned in the review
echo "=== Directory structure ==="
find tests/sql/environments/rac/debezium/perf/ -type f | head -20
echo ""
echo "=== Search for application-logminer.properties ==="
find tests/ -name "*logminer*" -o -name "*.properties" | grep -E "(logminer|properties)" | head -20
echo ""
echo "=== Search for olr-config.json ==="
find tests/ -name "olr-config.json" -o -name "*olr*.json" | head -20Repository: rophy/OpenLogReplicator
Length of output: 1497
🏁 Script executed:
#!/bin/bash
# Search for schema references in the perf directory without type restrictions
echo "=== All schema references ==="
rg -n "SOE|OLR_TEST|olr_test" tests/sql/environments/rac/debezium/perf/Repository: rophy/OpenLogReplicator
Length of output: 2163
🏁 Script executed:
#!/bin/bash
# Read the config files to verify schema references
echo "=== application-logminer.properties (around line 19) ==="
cat -n tests/sql/environments/rac/debezium/perf/config/application-logminer.properties | sed -n '15,25p'
echo ""
echo "=== olr-config.json (around line 34) ==="
cat -n tests/sql/environments/rac/debezium/perf/olr-config.json | sed -n '30,40p'
echo ""
echo "=== Full setup.sql (lines 7-18 for context) ==="
cat -n tests/sql/environments/rac/debezium/perf/setup.sql | sed -n '7,18p'Repository: rophy/OpenLogReplicator
Length of output: 1592
Critical: Table schema mismatch with adapter configurations.
The table is created in the olr_test schema (lines 7, 10, 18), but both adapter configurations filter on the SOE schema:
application-logminer.propertiesline 19:debezium.source.schema.include.list=SOEolr-config.jsonline 34:"owner": "SOE"
This mismatch means neither adapter will capture changes to olr_test.PERF_BENCH. Additionally, dml-generator.sql performs INSERT, UPDATE, and DELETE operations on olr_test.PERF_BENCH, so these changes will not be replicated.
Fix options
Option 1: Change table to SOE schema
-BEGIN EXECUTE IMMEDIATE 'DROP TABLE olr_test.PERF_BENCH PURGE'; EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; END;
+BEGIN EXECUTE IMMEDIATE 'DROP TABLE SOE.PERF_BENCH PURGE'; EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; END;
/
-CREATE TABLE olr_test.PERF_BENCH (
+CREATE TABLE SOE.PERF_BENCH (
id NUMBER,
val VARCHAR2(200),
node_id NUMBER(1),
batch_num NUMBER,
created TIMESTAMP DEFAULT SYSTIMESTAMP,
CONSTRAINT perf_bench_pk PRIMARY KEY (id, node_id)
);
-ALTER TABLE olr_test.PERF_BENCH ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
+ALTER TABLE SOE.PERF_BENCH ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;Option 2: Update adapter configs to use olr_test schema
Update application-logminer.properties, application-olr.properties, and olr-config.json to filter OLR_TEST instead of SOE.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| BEGIN EXECUTE IMMEDIATE 'DROP TABLE olr_test.PERF_BENCH PURGE'; EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; END; | |
| / | |
| CREATE TABLE olr_test.PERF_BENCH ( | |
| id NUMBER, | |
| val VARCHAR2(200), | |
| node_id NUMBER(1), | |
| batch_num NUMBER, | |
| created TIMESTAMP DEFAULT SYSTIMESTAMP, | |
| CONSTRAINT perf_bench_pk PRIMARY KEY (id, node_id) | |
| ); | |
| ALTER TABLE olr_test.PERF_BENCH ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; | |
| BEGIN EXECUTE IMMEDIATE 'DROP TABLE SOE.PERF_BENCH PURGE'; EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; END; | |
| / | |
| CREATE TABLE SOE.PERF_BENCH ( | |
| id NUMBER, | |
| val VARCHAR2(200), | |
| node_id NUMBER(1), | |
| batch_num NUMBER, | |
| created TIMESTAMP DEFAULT SYSTIMESTAMP, | |
| CONSTRAINT perf_bench_pk PRIMARY KEY (id, node_id) | |
| ); | |
| ALTER TABLE SOE.PERF_BENCH ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/debezium/perf/setup.sql` around lines 7 - 18, The
PERF_BENCH table is created in schema olr_test but adapters and generators
target SOE; fix by making schema names consistent: either move/CREATE PERF_BENCH
in schema SOE (update CREATE TABLE to use SOE.PERF_BENCH and its PK/ALTER
statements) or update adapter configs and generator to reference olr_test
(change debezium property debezium.source.schema.include.list, olr-config.json
"owner", and any references in application-olr.properties and dml-generator.sql
to OL R_TEST/olr_test). Ensure the schema identifier (olr_test vs SOE) is used
consistently across PERF_BENCH, dml-generator.sql,
application-logminer.properties, application-olr.properties, and olr-config.json
so CDC captures the changes.
- validator.py: use lists for pending events to handle duplicate keys, fix --stop-on-fail flag (was not disableable) - debezium-receiver.py: fix 10s window throughput calc (use full window duration, not first-to-last event span)
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/sql/environments/rac/debezium/perf/validator.py (2)
80-97: Narrow exception handling instop_swingbench().Catching
Exceptionhere can hide unexpected programming errors; catch expected I/O-related failures instead.🔧 Proposed fix
def stop_swingbench(): """Stop the swingbench container via Docker socket.""" import socket try: - sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) - sock.connect('/var/run/docker.sock') - request = ( - 'POST /v1.40/containers/perf-swingbench/stop HTTP/1.1\r\n' - 'Host: localhost\r\n' - 'Content-Length: 0\r\n' - '\r\n' - ) - sock.sendall(request.encode()) - response = sock.recv(4096).decode() - sock.close() + with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as sock: + sock.connect('/var/run/docker.sock') + request = ( + 'POST /v1.40/containers/perf-swingbench/stop HTTP/1.1\r\n' + 'Host: localhost\r\n' + 'Content-Length: 0\r\n' + '\r\n' + ) + sock.sendall(request.encode()) + response = sock.recv(4096).decode() if '204' in response or '304' in response: print(' Swingbench stopped', flush=True) else: print(f' WARNING: Unexpected response: {response[:100]}', flush=True) - except Exception as e: + except OSError as e: print(f' WARNING: Failed to stop swingbench: {e}', flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/validator.py` around lines 80 - 97, The stop_swingbench() block currently catches a broad Exception which can mask programming errors; replace the broad except with explicit I/O/network related exceptions (e.g., OSError, socket.error, ConnectionError, BrokenPipeError, TimeoutError) to only handle expected socket failures, keep the same logging behavior using the existing sock/connect/request variables and the same print message when catching the error, and ensure the exception tuple is used in the except clause so only those errors are swallowed while other exceptions propagate.
149-177: Usedequefor FIFO pending queues under sustained load.
list.pop(0)is O(n); with high duplicate-key traffic this can become a hotspot.🔧 Proposed fix
-from collections import defaultdict +from collections import deque @@ - lm_pending = {} # key -> [(timestamp, event_json), ...] - olr_pending = {} # key -> [(timestamp, event_json), ...] + lm_pending = {} # key -> deque([(timestamp, event_json), ...]) + olr_pending = {} # key -> deque([(timestamp, event_json), ...]) @@ if key in olr_pending and olr_pending[key]: # Match found — OLR already has this event - olr_pending[key].pop(0) + olr_pending[key].popleft() if not olr_pending[key]: del olr_pending[key] matched += 1 else: - lm_pending.setdefault(key, []).append((now, line)) + lm_pending.setdefault(key, deque()).append((now, line)) @@ if key in lm_pending and lm_pending[key]: # Match found — LogMiner already has this event - lm_pending[key].pop(0) + lm_pending[key].popleft() if not lm_pending[key]: del lm_pending[key] matched += 1 else: - olr_pending.setdefault(key, []).append((now, line)) + olr_pending.setdefault(key, deque()).append((now, line))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/perf/validator.py` around lines 149 - 177, Replace list-based FIFO queues in lm_pending and olr_pending with collections.deque to avoid O(n) list.pop(0); import deque and when adding use setdefault(key, deque()).append((now, line)) and when consuming replace pop(0) with popleft(), keeping the existing emptiness check (if not lm_pending[key]: del lm_pending[key]) and other logic unchanged; update all occurrences in the matching logic (the blocks that handle lm_lines and olr_lines and call event_key) to use deque operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/sql/environments/rac/debezium/perf/validator.py`:
- Line 110: The print statement in validator.py uses an unnecessary f-string:
change the call to remove the f-prefix so it becomes a normal string literal in
the print(...) call (locate the print(f'Validator starting', flush=True)
invocation and replace it with print('Validator starting', flush=True)).
- Around line 104-108: The --match-window CLI option (added via
parser.add_argument with dest 'match_window' and default DEFAULT_MATCH_WINDOW)
must be validated to ensure it's strictly positive; update the argument parsing
flow after args = parser.parse_args() to check args.match_window > 0 and raise a
parser.error or exit with a clear message if not. Specifically, reference the
parser instance and the argument name 'match-window' (or args.match_window) and
enforce the constraint so zero or negative values are rejected before the
validator logic runs.
---
Nitpick comments:
In `@tests/sql/environments/rac/debezium/perf/validator.py`:
- Around line 80-97: The stop_swingbench() block currently catches a broad
Exception which can mask programming errors; replace the broad except with
explicit I/O/network related exceptions (e.g., OSError, socket.error,
ConnectionError, BrokenPipeError, TimeoutError) to only handle expected socket
failures, keep the same logging behavior using the existing sock/connect/request
variables and the same print message when catching the error, and ensure the
exception tuple is used in the except clause so only those errors are swallowed
while other exceptions propagate.
- Around line 149-177: Replace list-based FIFO queues in lm_pending and
olr_pending with collections.deque to avoid O(n) list.pop(0); import deque and
when adding use setdefault(key, deque()).append((now, line)) and when consuming
replace pop(0) with popleft(), keeping the existing emptiness check (if not
lm_pending[key]: del lm_pending[key]) and other logic unchanged; update all
occurrences in the matching logic (the blocks that handle lm_lines and olr_lines
and call event_key) to use deque operations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aa59bc10-6b3a-4e5c-bb42-3e4ead601b8c
📒 Files selected for processing (2)
tests/sql/environments/rac/debezium/perf/validator.pytests/sql/scripts/debezium-receiver.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/sql/scripts/debezium-receiver.py
| parser.add_argument('--match-window', type=int, default=DEFAULT_MATCH_WINDOW, | ||
| help=f'Seconds to wait for matching event (default: {DEFAULT_MATCH_WINDOW})') | ||
| parser.add_argument('--no-stop-on-fail', dest='stop_on_fail', action='store_false', default=True, | ||
| help='Do not stop swingbench on mismatch') | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Validate --match-window as strictly positive.
--match-window currently accepts 0 or negative values, which can cause immediate false mismatch failures.
🔧 Proposed fix
parser.add_argument('--match-window', type=int, default=DEFAULT_MATCH_WINDOW,
help=f'Seconds to wait for matching event (default: {DEFAULT_MATCH_WINDOW})')
parser.add_argument('--no-stop-on-fail', dest='stop_on_fail', action='store_false', default=True,
help='Do not stop swingbench on mismatch')
args = parser.parse_args()
+ if args.match_window <= 0:
+ parser.error('--match-window must be > 0')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/debezium/perf/validator.py` around lines 104 -
108, The --match-window CLI option (added via parser.add_argument with dest
'match_window' and default DEFAULT_MATCH_WINDOW) must be validated to ensure
it's strictly positive; update the argument parsing flow after args =
parser.parse_args() to check args.match_window > 0 and raise a parser.error or
exit with a clear message if not. Specifically, reference the parser instance
and the argument name 'match-window' (or args.match_window) and enforce the
constraint so zero or negative values are rejected before the validator logic
runs.
| help='Do not stop swingbench on mismatch') | ||
| args = parser.parse_args() | ||
|
|
||
| print(f'Validator starting', flush=True) |
There was a problem hiding this comment.
Remove the extraneous f-string prefix.
This string has no placeholders and will trigger Ruff F541.
🔧 Proposed fix
- print(f'Validator starting', flush=True)
+ print('Validator starting', flush=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f'Validator starting', flush=True) | |
| print('Validator starting', flush=True) |
🧰 Tools
🪛 Ruff (0.15.6)
[error] 110-110: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/debezium/perf/validator.py` at line 110, The print
statement in validator.py uses an unnecessary f-string: change the call to
remove the f-prefix so it becomes a normal string literal in the print(...) call
(locate the print(f'Validator starting', flush=True) invocation and replace it
with print('Validator starting', flush=True)).
Summary
Performance benchmarking and continuous data validation framework for comparing
Debezium with OLR adapter vs LogMiner adapter on Oracle RAC.
Performance benchmark
debezium-receiver.pywith/metricsendpoint (throughput + latency percentiles)Continuous data validation
validator.py: tails LogMiner + OLR JSONL files, matches events by content in real-timeInfrastructure
racnodepc1-scan:1521) instead of direct node IPTest plan
Summary by CodeRabbit
New Features
Documentation
Chores