test: add checkpoint/restart fault tolerance tests#7
Conversation
RAC version: kill/restart OLR with DML on both nodes, compare LogMiner vs OLR via Debezium twin-test. Verifies checkpoint resume, data accuracy, no duplicates, no ASAN errors. Single-instance version: same logic against local Docker Oracle. Also adds restart: unless-stopped for Debezium OLR adapter (crashes when OLR is killed, needs auto-restart) and olr-checkpoint volume for single-instance test.
- Update VM IP from 192.168.122.248 to 192.168.122.130 - Add checkpoint interval (10s, 1MB) for fault tolerance testing - Remove flags: 4 (ADAPTIVE_SCHEMA) — causes upstream bug ERROR 50022 duplicate SYS.CCOL$ on createSchema()
VM was rebuilt at 192.168.122.130 (was 192.168.122.248). Update up.sh and .env to match.
📝 WalkthroughWalkthroughAdds new single-instance and RAC end-to-end checkpoint/restart integration tests, VM environment detection helper, and Docker/OLR configuration changes to exercise OLR kill/restart cycles, checkpoint SCN verification, sentinel synchronization, and output comparison. Changes
Sequence Diagram(s)sequenceDiagram
participant Tester
participant VM as RAC VM / SSH
participant Oracle as Oracle DB (RAC)
participant Receiver as Debezium Receiver
participant LogMiner as Debezium LogMiner
participant OLR as OpenLogReplicator
participant Docker as Container Engine
Tester->>VM: source vm-env.sh (detect VM_HOST, _SSH_OPTS)
Tester->>Receiver: GET /health (verify reachable)
Tester->>Oracle: sqlplus (prepare schema, supplemental logging)
Tester->>Docker: reset/create `olr-checkpoint` volume
Tester->>Docker: start LogMiner & OLR services
Docker->>LogMiner: start streaming
LogMiner->>Receiver: produce events
Tester->>OLR: wait for readiness
loop KILL_COUNT cycles
Tester->>Oracle: perform DML while OLR running
Tester->>Oracle: force log switch
OLR->>Tester: write checkpoint (capture pre-kill SCN)
Tester->>Docker: kill/remove OLR container
Tester->>Oracle: DML/DDL while OLR down
Tester->>Docker: start OLR (restart unless-stopped)
OLR->>Tester: become ready, restore checkpoint (capture post-restart SCN)
Tester->>Tester: verify post-SCN >= pre-SCN
end
Tester->>Oracle: insert DEBEZIUM_SENTINEL
loop until timeout
Receiver->>Tester: /status (check logminer_sentinel, olr_sentinel)
end
Tester->>Tester: compare outputs (logminer.jsonl vs olr.jsonl)
Tester->>Docker: fetch OLR logs (scan ASAN/duplicates/errors)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/debezium/docker-compose.yaml (1)
35-47:⚠️ Potential issue | 🟠 MajorMount
olr-checkpointintoolrservice to persist checkpoint data across restarts.The
olrservice creates/olr-data/checkpointin its entrypoint, but this directory lives under an ephemeral tmpfs mount. Theolr-checkpointnamed volume is declared but never mounted into the service. A container restart will discard checkpoint files, preventing the restart/resume test from validating durability.Suggested fix
volumes: - ./config/olr-config.json:/config/olr-config.json:ro - oracle-data:/opt/oracle/oradata:ro + - olr-checkpoint:/olr-data/checkpoint🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/docker-compose.yaml` around lines 35 - 47, The olr service creates /olr-data/checkpoint in its entrypoint but that path is on a tmpfs and the declared named volume olr-checkpoint is never mounted; update the olr service spec (service name "olr", its entrypoint/command and volumes block) to mount the named volume into /olr-data/checkpoint (e.g. add a volumes entry like olr-checkpoint:/olr-data/checkpoint) and ensure the tmpfs entry does not override that path (remove or adjust the tmpfs for /olr-data or exclude /olr-data/checkpoint) so checkpoint files persist across restarts; also confirm the named volume olr-checkpoint is declared in the compose file.
🧹 Nitpick comments (6)
tests/sql/environments/rac/debezium/checkpoint-restart-test.sh (3)
70-84: Shellcheck SC2155: Declare and assign separately.Per static analysis, line 73 combines declaration and assignment.
Suggested fix
_exec_user() { local sql_file="$1" local node="${2:-$RAC_NODE1}" sid="${3:-$ORACLE_SID1}" conn="${4:-$DB_CONN1}" - local remote="/tmp/$(basename "$sql_file")" + local remote + remote="/tmp/$(basename "$sql_file")" _vm_copy_in "$sql_file" "$remote" "$node"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh` around lines 70 - 84, The ShellCheck warning SC2155 in _exec_user is caused by declaring and assigning a variable in one statement; change the pattern to declare the variable first (local output) and then assign to it on the next line (output=$(_vm_sqlplus "$node" "$sid" "$conn" "$remote")), ensuring you reference the _exec_user function and the _vm_sqlplus call so the declaration and assignment are separated.
63-68: Shellcheck SC2155: Declare and assign separately.Per static analysis, line 65 combines declaration and assignment which can mask return values.
Suggested fix
_exec_sysdba() { local sql_file="$1" - local remote="/tmp/$(basename "$sql_file")" + local remote + remote="/tmp/$(basename "$sql_file")" _vm_copy_in "$sql_file" "$remote" "$RAC_NODE1" _vm_sqlplus "$RAC_NODE1" "$ORACLE_SID1" "/ as sysdba" "$remote" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh` around lines 63 - 68, The function _exec_sysdba uses combined declaration+assignment which triggers ShellCheck SC2155; to fix, split declarations and assignments in _exec_sysdba: declare local variables first (e.g., local sql_file; local remote) then assign sql_file="$1" and remote="/tmp/$(basename "$sql_file")" on separate lines, leaving the following calls to _vm_copy_in and _vm_sqlplus unchanged.
158-183: Shellcheck SC2155 and redundant condition.Line 175 has the same SC2155 issue. Additionally, similar to the single-instance version, lines 170-173 are redundant—they're already covered by lines 166-169.
Suggested fix
_wait_for_checkpoint() { echo " Waiting for OLR checkpoint (up to 30s)..." local prev_scn prev_scn=$(_read_checkpoint_scn) for i in $(seq 1 15); do sleep 2 local cur_scn cur_scn=$(_read_checkpoint_scn) if [[ "$cur_scn" != "0" && "$cur_scn" != "$prev_scn" ]]; then echo " Checkpoint written: scn=$cur_scn" return 0 fi - if [[ "$cur_scn" != "0" && "$prev_scn" == "0" ]]; then - echo " Checkpoint written: scn=$cur_scn" - return 0 - fi done - local final=$(_read_checkpoint) + local final + final=$(_read_checkpoint)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh` around lines 158 - 183, In _wait_for_checkpoint, fix ShellCheck SC2155 by declaring locals separately (e.g., use "local prev_scn" then "prev_scn=$(_read_checkpoint_scn)" and likewise "local cur_scn" then assign, and "local final" then assign from _read_checkpoint) and remove the redundant second conditional block that repeats the same check (the if that checks "$cur_scn" != "0" && "$prev_scn" == "0"); instead keep a single conditional that detects a non-zero cur_scn different from prev_scn (if [[ "$cur_scn" != "0" && "$cur_scn" != "$prev_scn" ]] ...) to report the checkpoint.tests/debezium/checkpoint-restart-test.sh (3)
20-21: UnusedORACLE_PORTvariable.
ORACLE_PORTis defined but never used.DB_CONNhardcodes port1521regardless of theORACLE_PORTvalue.Suggested fix
Either remove the unused variable or use it in
DB_CONN:-ORACLE_PORT="${ORACLE_PORT:-1522}" -DB_CONN="olr_test/olr_test@//localhost:1521/XEPDB1" +ORACLE_PORT="${ORACLE_PORT:-1521}" +DB_CONN="olr_test/olr_test@//localhost:${ORACLE_PORT}/XEPDB1"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/checkpoint-restart-test.sh` around lines 20 - 21, The script defines ORACLE_PORT but never uses it and DB_CONN hardcodes port 1521; update DB_CONN to reference the ORACLE_PORT variable (e.g., use the ORACLE_PORT shell variable in the connection string) so the port is configurable via ORACLE_PORT, or remove the ORACLE_PORT definition if you intend to always use 1521; modify the DB_CONN assignment that currently contains "DB_CONN" to interpolate ORACLE_PORT instead of the hardcoded 1521 (or delete ORACLE_PORT if unused).
475-481: Broad grep pattern may cause false positives.The pattern
"duplicate\|ERROR 50022"could match unrelated log messages containing "duplicate". The RAC version uses a more specific pattern"ERROR 50022\|duplicate SYS\."which targets the actual bug.Suggested fix for consistency with RAC version
-if grep -q "duplicate\|ERROR 50022" "$WORK_DIR/olr.log"; then +if grep -q "ERROR 50022\|duplicate SYS\." "$WORK_DIR/olr.log"; then echo " FAIL: OLR duplicate/schema errors detected (Bug 1)" - grep "duplicate\|ERROR 50022" "$WORK_DIR/olr.log" | head -5 + grep "ERROR 50022\|duplicate SYS\." "$WORK_DIR/olr.log" | head -5 OLR_ERROR_RESULT=1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/checkpoint-restart-test.sh` around lines 475 - 481, The grep pattern in the OLR error check is too broad and may yield false positives; update the pattern used in the conditional that greps "$WORK_DIR/olr.log" (the line that currently checks "duplicate\|ERROR 50022") to the more specific pattern used in RAC (e.g., "ERROR 50022\|duplicate SYS\.") so it only matches the real bug, and ensure the subsequent grep that prints matches (the line piping to head -5) uses the same narrowed pattern; no other logic changes needed—just replace the pattern in both grep invocations that set/check OLR_ERROR_RESULT.
141-148: Redundant condition in_wait_for_checkpoint.The second condition (lines 145-148) is already covered by the first (lines 141-143). When
prev_scn == "0"andcur_scn != "0", the first check'scur_scn != prev_scnis also true.Suggested simplification
if [[ "$cur_scn" != "0" && "$cur_scn" != "$prev_scn" ]]; then echo " Checkpoint written: scn=$cur_scn" return 0 fi - if [[ "$cur_scn" != "0" && "$prev_scn" == "0" ]]; then - echo " Checkpoint written: scn=$cur_scn" - return 0 - fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/checkpoint-restart-test.sh` around lines 141 - 148, In _wait_for_checkpoint remove the redundant second if block that checks 'if [[ "$cur_scn" != "0" && "$prev_scn" == "0" ]]' because its case is already handled by the first condition 'if [[ "$cur_scn" != "0" && "$cur_scn" != "$prev_scn" ]]' — simply delete the second block (echo + return) so only the first check remains to detect a written checkpoint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/debezium/checkpoint-restart-test.sh`:
- Around line 495-503: The script currently ignores CHECKPOINT_VERIFIED when
computing the final exit code; update the logic to treat checkpoint verification
as a failure when CHECKPOINT_VERIFIED is 0 by introducing or using a
CHKPT_RESULT and including it in both the pass/fail conditional and the final
exit calculation: set CHKPT_RESULT=0/1 based on CHECKPOINT_VERIFIED (or convert
CHECKPOINT_VERIFIED to a numeric result), change the if condition to require
CHKPT_RESULT (e.g., include CHKPT_RESULT in the && list with COMPARE_RESULT,
ASAN_RESULT, OLR_ERROR_RESULT), and add CHKPT_RESULT to the exit sum so a failed
checkpoint produces a non‑zero exit code.
- Around line 114-119: The _read_checkpoint_scn function can emit multiple
concatenated outputs because the initial `docker volume inspect
"$CHECKPOINT_VOL" || echo "0"` echoes "0" but does not stop subsequent commands;
change the control flow so a failing volume inspect short-circuits the function
(for example, after testing CHECKPOINT_VOL with docker volume inspect,
immediately echo "0" and return) and only run the docker run / python pipeline
when the volume exists; reference the _read_checkpoint_scn function and the
CHECKPOINT_VOL variable to locate and update the conditional flow so the
function emits exactly one value.
In `@tests/sql/environments/rac/up.sh`:
- Around line 46-47: up.sh currently discovers the real RAC IP and sets VM_HOST
but none of the entrypoints (setup.sh, checkpoint-restart-test.sh, run.sh,
soak-test.sh) source up.sh, so they fall back to the hardcoded VM_HOST; fix by
moving the discovery logic into a shared sourced helper (e.g., a new
scripts/env.sh) or by having up.sh write the discovered value to a persistent
env file (e.g., .env) and then update all entrypoints (setup.sh,
checkpoint-restart-test.sh, run.sh, soak-test.sh) to source that helper/.env
before they evaluate their fallback VM_HOST="${VM_HOST:-192.168.122.248}" so the
discovered VM_HOST is available and exported for all scripts.
- Around line 9-15: The current ACTUAL_IP assignment from `virsh domifaddr
oracle-rac-vm` can yield multiple IPv4 lines causing a multi-line ACTUAL_IP and
breaking later SSH/mismatch checks; update the probe in up.sh to filter for
IPv4, select exactly one address (e.g., the first matching IPv4) and extract
only the IP portion so ACTUAL_IP is always a single IPv4 string; target the
`ACTUAL_IP` assignment and the `virsh domifaddr` pipeline in up.sh (referenced
symbol: ACTUAL_IP, command: virsh domifaddr oracle-rac-vm) and add a step like
"pick first match" (head -n1 or equivalent) after the awk/cut extraction to
ensure a single-line result.
---
Outside diff comments:
In `@tests/debezium/docker-compose.yaml`:
- Around line 35-47: The olr service creates /olr-data/checkpoint in its
entrypoint but that path is on a tmpfs and the declared named volume
olr-checkpoint is never mounted; update the olr service spec (service name
"olr", its entrypoint/command and volumes block) to mount the named volume into
/olr-data/checkpoint (e.g. add a volumes entry like
olr-checkpoint:/olr-data/checkpoint) and ensure the tmpfs entry does not
override that path (remove or adjust the tmpfs for /olr-data or exclude
/olr-data/checkpoint) so checkpoint files persist across restarts; also confirm
the named volume olr-checkpoint is declared in the compose file.
---
Nitpick comments:
In `@tests/debezium/checkpoint-restart-test.sh`:
- Around line 20-21: The script defines ORACLE_PORT but never uses it and
DB_CONN hardcodes port 1521; update DB_CONN to reference the ORACLE_PORT
variable (e.g., use the ORACLE_PORT shell variable in the connection string) so
the port is configurable via ORACLE_PORT, or remove the ORACLE_PORT definition
if you intend to always use 1521; modify the DB_CONN assignment that currently
contains "DB_CONN" to interpolate ORACLE_PORT instead of the hardcoded 1521 (or
delete ORACLE_PORT if unused).
- Around line 475-481: The grep pattern in the OLR error check is too broad and
may yield false positives; update the pattern used in the conditional that greps
"$WORK_DIR/olr.log" (the line that currently checks "duplicate\|ERROR 50022") to
the more specific pattern used in RAC (e.g., "ERROR 50022\|duplicate SYS\.") so
it only matches the real bug, and ensure the subsequent grep that prints matches
(the line piping to head -5) uses the same narrowed pattern; no other logic
changes needed—just replace the pattern in both grep invocations that set/check
OLR_ERROR_RESULT.
- Around line 141-148: In _wait_for_checkpoint remove the redundant second if
block that checks 'if [[ "$cur_scn" != "0" && "$prev_scn" == "0" ]]' because its
case is already handled by the first condition 'if [[ "$cur_scn" != "0" &&
"$cur_scn" != "$prev_scn" ]]' — simply delete the second block (echo + return)
so only the first check remains to detect a written checkpoint.
In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh`:
- Around line 70-84: The ShellCheck warning SC2155 in _exec_user is caused by
declaring and assigning a variable in one statement; change the pattern to
declare the variable first (local output) and then assign to it on the next line
(output=$(_vm_sqlplus "$node" "$sid" "$conn" "$remote")), ensuring you reference
the _exec_user function and the _vm_sqlplus call so the declaration and
assignment are separated.
- Around line 63-68: The function _exec_sysdba uses combined
declaration+assignment which triggers ShellCheck SC2155; to fix, split
declarations and assignments in _exec_sysdba: declare local variables first
(e.g., local sql_file; local remote) then assign sql_file="$1" and
remote="/tmp/$(basename "$sql_file")" on separate lines, leaving the following
calls to _vm_copy_in and _vm_sqlplus unchanged.
- Around line 158-183: In _wait_for_checkpoint, fix ShellCheck SC2155 by
declaring locals separately (e.g., use "local prev_scn" then
"prev_scn=$(_read_checkpoint_scn)" and likewise "local cur_scn" then assign, and
"local final" then assign from _read_checkpoint) and remove the redundant second
conditional block that repeats the same check (the if that checks "$cur_scn" !=
"0" && "$prev_scn" == "0"); instead keep a single conditional that detects a
non-zero cur_scn different from prev_scn (if [[ "$cur_scn" != "0" && "$cur_scn"
!= "$prev_scn" ]] ...) to report the checkpoint.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b386ce39-065d-4eea-b506-174082f77624
📒 Files selected for processing (10)
tests/debezium/checkpoint-restart-test.shtests/debezium/docker-compose.yamltests/sql/environments/rac/.envtests/sql/environments/rac/debezium/checkpoint-restart-test.shtests/sql/environments/rac/debezium/config/application-logminer.propertiestests/sql/environments/rac/debezium/config/application-olr.propertiestests/sql/environments/rac/debezium/config/olr-config.jsontests/sql/environments/rac/debezium/docker-compose.yamltests/sql/environments/rac/up.shtests/sql/scripts/drivers/rac.sh
tests/sql/environments/rac/up.sh
Outdated
| # Auto-detect VM IP from virsh | ||
| ACTUAL_IP=$(virsh domifaddr oracle-rac-vm 2>/dev/null | awk '/ipv4/{print $4}' | cut -d/ -f1) | ||
| if [[ -z "$ACTUAL_IP" ]]; then | ||
| echo "ERROR: Cannot detect RAC VM IP. Is oracle-rac-vm running?" >&2 | ||
| echo " Start with: virsh start oracle-rac-vm" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual file mentioned
if [[ -f "tests/sql/environments/rac/up.sh" ]]; then
echo "=== File exists, checking lines around 9-15 ==="
sed -n '1,20p' tests/sql/environments/rac/up.sh
else
echo "=== File not found, listing directory structure ==="
find . -name "up.sh" -type f | head -20
fiRepository: rophy/OpenLogReplicator
Length of output: 757
🏁 Script executed:
# Search for related files that use virsh commands
rg "virsh" --type sh -A 2 -B 2 | head -50Repository: rophy/OpenLogReplicator
Length of output: 711
🌐 Web query:
virsh domifaddr output format multiple IPv4 addresses documentation
💡 Result:
The virsh domifaddr command displays network interface addresses for a libvirt domain in a tabular format with columns: Name, MAC address, Protocol, and Address. Each row represents one IP address (IPv4 or IPv6) associated with an interface. Example output without --full: Name MAC address Protocol Address ------------------------------------------------------------------------------- vnet1 52:54:00:b9:58:64 ipv4 192.168.130.156/24 For interfaces with multiple IPv4/IPv6 addresses, without --full, subsequent addresses for the same interface show "-" for Name and MAC address: Name MAC address Protocol Address ------------------------------------------------------------------------------- lo 00:00:00:00:00:00 ipv4 127.0.0.1/8 - - ipv6 ::1/128 enp1s0 52:54:00:6b:3b:e4 ipv4 192.168.122.248/24 With --full, the interface Name and MAC address are repeated for every address on that interface. The Address field shows IP/prefix like 192.168.122.248/24 for IPv4. Protocol is "ipv4" or "ipv6". Data source (--source lease|agent|arp) affects what addresses are shown; agent shows guest-view (multiple IPs common), lease shows libvirt DHCP leases (usually one IPv4 per interface). Official documentation: libvirt.org/manpages/virsh.html#domifaddr confirms tabular format and --full behavior for multiple addresses.
Citations:
- 1: https://www.libvirt.org/manpages/virsh.html
- 2: https://manpages.org/virsh
- 3: https://manpages.ubuntu.com/manpages/xenial/man1/virsh.1.html
- 4: https://www.adresse.data.gouv.fr
- 5: https://man.archlinux.org/man/extra/libvirt/virsh.1.en
- 6: https://stackoverflow.com/questions/63639017/parse-virsh-output
- 7: https://serverfault.com/questions/1065646/can-ssh-into-ubuntu-20-04-virtual-machine-but-domifaddr-shows-no-ip
- 8: https://serverfault.com/questions/101982/get-list-of-dhcp-clients-with-kvmlibvirt
Make the virsh probe select exactly one IPv4.
virsh domifaddr can output multiple IPv4 rows for an interface. In that case, ACTUAL_IP becomes a multi-line string and later SSH or mismatch checks will fail or target the wrong host.
Suggested fix
-# Auto-detect VM IP from virsh
-ACTUAL_IP=$(virsh domifaddr oracle-rac-vm 2>/dev/null | awk '/ipv4/{print $4}' | cut -d/ -f1)
+# Auto-detect VM IP from virsh
+mapfile -t ACTUAL_IPS < <(virsh domifaddr oracle-rac-vm 2>/dev/null | awk '/ipv4/{print $4}' | cut -d/ -f1)
+ACTUAL_IP="${ACTUAL_IPS[0]:-}"
+if [[ ${`#ACTUAL_IPS`[@]} -gt 1 ]]; then
+ echo "ERROR: Multiple IPv4 addresses detected for oracle-rac-vm: ${ACTUAL_IPS[*]}" >&2
+ exit 1
+fi
if [[ -z "$ACTUAL_IP" ]]; then
echo "ERROR: Cannot detect RAC VM IP. Is oracle-rac-vm running?" >&2
echo " Start with: virsh start oracle-rac-vm" >&2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Auto-detect VM IP from virsh | |
| ACTUAL_IP=$(virsh domifaddr oracle-rac-vm 2>/dev/null | awk '/ipv4/{print $4}' | cut -d/ -f1) | |
| if [[ -z "$ACTUAL_IP" ]]; then | |
| echo "ERROR: Cannot detect RAC VM IP. Is oracle-rac-vm running?" >&2 | |
| echo " Start with: virsh start oracle-rac-vm" >&2 | |
| exit 1 | |
| fi | |
| # Auto-detect VM IP from virsh | |
| mapfile -t ACTUAL_IPS < <(virsh domifaddr oracle-rac-vm 2>/dev/null | awk '/ipv4/{print $4}' | cut -d/ -f1) | |
| ACTUAL_IP="${ACTUAL_IPS[0]:-}" | |
| if [[ ${`#ACTUAL_IPS`[@]} -gt 1 ]]; then | |
| echo "ERROR: Multiple IPv4 addresses detected for oracle-rac-vm: ${ACTUAL_IPS[*]}" >&2 | |
| exit 1 | |
| fi | |
| if [[ -z "$ACTUAL_IP" ]]; then | |
| echo "ERROR: Cannot detect RAC VM IP. Is oracle-rac-vm running?" >&2 | |
| echo " Start with: virsh start oracle-rac-vm" >&2 | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/up.sh` around lines 9 - 15, The current ACTUAL_IP
assignment from `virsh domifaddr oracle-rac-vm` can yield multiple IPv4 lines
causing a multi-line ACTUAL_IP and breaking later SSH/mismatch checks; update
the probe in up.sh to filter for IPv4, select exactly one address (e.g., the
first matching IPv4) and extract only the IP portion so ACTUAL_IP is always a
single IPv4 string; target the `ACTUAL_IP` assignment and the `virsh domifaddr`
pipeline in up.sh (referenced symbol: ACTUAL_IP, command: virsh domifaddr
oracle-rac-vm) and add a step like "pick first match" (head -n1 or equivalent)
after the awk/cut extraction to ensure a single-line result.
tests/sql/environments/rac/up.sh
Outdated
| export VM_HOST="$ACTUAL_IP" | ||
| echo "RAC VM IP: $ACTUAL_IP" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --type sh 'VM_HOST="\$\{VM_HOST:-192\.168\.122\.[0-9]+\}"' tests/sql/environments/rac tests/sql/scripts/driversRepository: rophy/OpenLogReplicator
Length of output: 504
🏁 Script executed:
cat -n tests/sql/environments/rac/up.sh | head -60Repository: rophy/OpenLogReplicator
Length of output: 2286
🏁 Script executed:
# Find how up.sh is invoked in the codebase
rg -n "up\.sh" tests/sql/environments/rac/ tests/sql/scripts/drivers/ -A 2 -B 2 | head -100Repository: rophy/OpenLogReplicator
Length of output: 2243
🏁 Script executed:
# Check if up.sh is sourced or executed as subprocess
rg -n "source.*up\.sh|\..*up\.sh|\./up\.sh|bash.*up\.sh" tests/sql/environments/rac/ tests/sql/scripts/drivers/Repository: rophy/OpenLogReplicator
Length of output: 407
🏁 Script executed:
# Check if any entrypoint sources or calls up.sh
rg -n "source.*up\.sh|\..*up\.sh|\./up\.sh" tests/sql/environments/rac/debezium/Repository: rophy/OpenLogReplicator
Length of output: 407
🏁 Script executed:
# Examine checkpoint-restart-test.sh to understand VM_HOST usage
sed -n '1,50p' tests/sql/environments/rac/debezium/checkpoint-restart-test.shRepository: rophy/OpenLogReplicator
Length of output: 1907
🏁 Script executed:
# Check setup.sh
sed -n '1,50p' tests/sql/environments/rac/debezium/setup.shRepository: rophy/OpenLogReplicator
Length of output: 1668
🏁 Script executed:
# Check run.sh
sed -n '1,50p' tests/sql/environments/rac/debezium/run.shRepository: rophy/OpenLogReplicator
Length of output: 1903
Move VM_HOST discovery from up.sh into shared sourced helper or generated env file.
up.sh correctly discovers the actual RAC VM IP via virsh (line 46), but none of the test entrypoints (setup.sh, checkpoint-restart-test.sh, run.sh, soak-test.sh) source or invoke up.sh. Each entrypoint independently has its own hardcoded fallback: VM_HOST="${VM_HOST:-192.168.122.248}". Running any entrypoint directly will therefore ignore the discovered IP and use the stale fallback, defeating the discovery logic entirely.
Either:
- Create a shared helper that all entrypoints source to discover and share the actual IP, or
- Have
up.shwrite the discovered IP to a persistent env file (e.g.,.env) that all entrypoints source before setting their fallback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/up.sh` around lines 46 - 47, up.sh currently
discovers the real RAC IP and sets VM_HOST but none of the entrypoints
(setup.sh, checkpoint-restart-test.sh, run.sh, soak-test.sh) source up.sh, so
they fall back to the hardcoded VM_HOST; fix by moving the discovery logic into
a shared sourced helper (e.g., a new scripts/env.sh) or by having up.sh write
the discovered value to a persistent env file (e.g., .env) and then update all
entrypoints (setup.sh, checkpoint-restart-test.sh, run.sh, soak-test.sh) to
source that helper/.env before they evaluate their fallback
VM_HOST="${VM_HOST:-192.168.122.248}" so the discovered VM_HOST is available and
exported for all scripts.
- Extract VM IP discovery into shared vm-env.sh sourced by all RAC scripts (fixes major: standalone scripts bypassed up.sh discovery) - Fix _read_checkpoint_scn double output when volume doesn't exist - Add CHKPT_RESULT to single-instance exit code (consistency with RAC) - Use head -1 for virsh IP detection (handles multiple IPs)
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/debezium/checkpoint-restart-test.sh`:
- Around line 200-205: The _kill_olr() function currently removes the OLR
container (OLR_CONTAINER) without saving its logs, losing logs from earlier
restart cycles; modify _kill_olr to capture docker logs before removal and write
them to a uniquely named file (include the current cycle index or timestamp and
the output of _read_checkpoint in the filename) under a persistent test
artifacts dir (e.g., TEST_ARTIFACTS or /tmp/olr-logs) so stage 6's checks
against dbz-olr logs can examine logs from all KILL_COUNT runs; ensure the code
uses docker logs "$OLR_CONTAINER" > <file> prior to docker rm and retains
filenames that make it easy to correlate with cycle number/checkpoint.
- Around line 160-166: The docker inspect format currently concatenates network
names (in the command that sets net from ORACLE_CONTAINER), producing invalid
values like "net1net2"; update the format string used in the docker inspect call
inside the tests/debezium/checkpoint-restart-test.sh script so each network name
is printed on its own line (for example by using a template that emits a newline
for each $k or by using println $k), then keep the existing head -1 to select
the first network; refer to the variables/commands net, ORACLE_CONTAINER, and
the docker inspect invocation and ensure the docker run that uses
OLR_CONTAINER/--network reads a single valid network name.
In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh`:
- Around line 573-575: The code currently prints a warning when len(unique_ids)
< expected_total allowing the test to pass with missing CHKPT_TEST inserts;
change this to fail the test by replacing the print warning with a non-zero exit
or assertion (e.g., raise SystemExit/exit(1) or assert len(unique_ids) >=
expected_total) that includes a clear message with len(unique_ids) and
expected_total so missing IDs cause test failure.
- Around line 181-186: Define a persistent OLR_LOG path
(OLR_LOG="$WORK_DIR/olr.log") right after WORK_DIR is created, then modify the
_kill_olr() function to capture and append the container logs to $OLR_LOG before
removing the container (i.e., run podman logs $OLR_CONTAINER >> "$OLR_LOG" or
equivalent prior to podman rm), and update the Stage 7 log collection step to
append (>>) instead of overwrite (>) so earlier restart-cycle errors (ASAN/ERROR
50022) are preserved; references: WORK_DIR, OLR_LOG, _kill_olr(), and the Stage
7 log collection redirection.
In `@tests/sql/environments/rac/vm-env.sh`:
- Around line 15-18: The VM auto-detection picks the first IPv4 from `virsh
domifaddr` which can be a non-libvirt NIC; update the `VM_HOST` assignment to
filter returned addresses to the libvirt subnet `192.168.122.*` before taking
the first match so it matches the validation later. Locate the `VM_HOST=$(virsh
domifaddr oracle-rac-vm ... | awk ... | cut ... | head -1)` line and change the
pipeline to only select addresses in the `192.168.122.` range (e.g., via
awk/grep filtering) so `VM_HOST` is set to the libvirt IPv4 if present; keep the
existing fallback behavior if no libvirt address is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e89a23b7-14f0-46ce-9354-b82339d39ddd
📒 Files selected for processing (4)
tests/debezium/checkpoint-restart-test.shtests/sql/environments/rac/debezium/checkpoint-restart-test.shtests/sql/environments/rac/up.shtests/sql/environments/rac/vm-env.sh
✅ Files skipped from review due to trivial changes (1)
- tests/sql/environments/rac/up.sh
| # Get Oracle container's network | ||
| local net | ||
| net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -1) | ||
|
|
||
| docker run -d --name "$OLR_CONTAINER" \ | ||
| --network "$net" \ | ||
| --network-alias olr \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and examine the context around lines 160-166
find . -type f -name "checkpoint-restart-test.sh" | head -5Repository: rophy/OpenLogReplicator
Length of output: 175
🏁 Script executed:
# Let's also look for docker-related documentation or examples in the repo
find . -type f \( -name "*.sh" -o -name "*.md" \) -exec grep -l "docker inspect" {} \; | head -10Repository: rophy/OpenLogReplicator
Length of output: 110
🏁 Script executed:
# Read the specific file around the mentioned lines
if [ -f tests/debezium/checkpoint-restart-test.sh ]; then
sed -n '155,170p' tests/debezium/checkpoint-restart-test.sh
fiRepository: rophy/OpenLogReplicator
Length of output: 630
🌐 Web query:
docker inspect format go template multiple networks concatenation
💡 Result:
To use docker inspect with Go templates to handle multiple networks (from .NetworkSettings.Networks, a map) and concatenate their names or details, use the range action to iterate over the map keys and the join function on a collected slice if needed. ## List network names (one per line): docker inspect container_name --format '{{range $k, $v := .NetworkSettings.Networks}}{{println $k}}{{end}}' Output example: bridge br01 Sources: StackOverflow answer confirms this works for listing networks [2], Docker docs show similar range and println usage [1]. ## Concatenate network names with custom separator (e.g., comma): First collect keys into a slice, then join: docker inspect container_name --format '{{$nets := slice}}{{range $k, $v := .NetworkSettings.Networks}}{{$nets = append $nets $k}}{{end}}{{join $nets ","}}' Docker's Go template implementation (sprig library) supports keys, slice, append, and join functions [1]. This extracts map keys ["bridge", "br01"], appends to slice, joins as "bridge,br01". Alternative without variables (simpler, space-separated): docker inspect container_name --format '{{range $k, $v := .NetworkSettings.Networks}}{{$k}},{{end}}' | sed 's/,$//' But pure template preferred. ## Concatenate IP addresses (one per network): docker inspect container_name --format='{{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}' Docker docs example [7]. ## Full JSON of networks: docker inspect container_name --format '{{json .NetworkSettings.Networks}}' Official Docker formatting docs cover join, range, println [1,7]. StackOverflow provides exact container networks iteration [2]. These work as of Docker 27.x (2026).
Citations:
- 1: https://docs.docker.com/engine/cli/formatting/
- 2: https://stackoverflow.com/questions/63921780/how-to-list-networks-a-docker-container-is-attached-to-using-format-template
- 3: https://docs.docker.com/reference/cli/docker/inspect/
🌐 Web query:
go template printf %s\n docker inspect format
💡 Result:
In Docker's docker inspect command, the --format flag uses Go text/templates. The printf "%s\n" is a Go template function call to printf, which is an alias for fmt.Sprintf. It formats its arguments according to the format string "%s\n", printing the string value followed by a newline. This is commonly used to output a field value cleanly with a trailing newline, avoiding extra spacing from other functions like println. Example from community usage (e.g., reconstructing docker run commands): docker inspect --format '{{printf "%q" .Name}}' container This quotes the container name safely. Official Docker docs confirm printf availability via Go templates (https://docs.docker.com/engine/cli/formatting/), and Go docs specify printf behavior (https://pkg.go.dev/text/template#hdr-Functions). Shell escaping is needed: use single quotes around the template and escape inner double quotes, e.g.: docker inspect --format '{{printf "%s\n" .Name}}' container_id This prints the container name followed by a newline.
Citations:
- 1: https://docs.docker.com/engine/cli/formatting/
- 2: https://docs.docker.com/reference/cli/docker/inspect/
- 3: https://pkg.go.dev/text/template@go1.19
- 4: https://docs.docker.com/engine/cli/formatting
- 5: https://stackoverflow.com/questions/70894814/how-do-you-left-align-a-printf-on-docker-format
🏁 Script executed:
# Search for similar docker inspect patterns in the codebase
rg 'docker inspect.*format.*NetworkSettings' -A 2 -B 2Repository: rophy/OpenLogReplicator
Length of output: 497
🏁 Script executed:
# Also check if there are examples of properly handling multiple networks
rg 'NetworkSettings|--network' tests/ -A 2 -B 2 | head -50Repository: rophy/OpenLogReplicator
Length of output: 778
🌐 Web query:
docker inspect go template println vs printf newline function
💡 Result:
In Docker inspect Go templates (used with --format), both println and printf are built-in functions from Go's text/template package. println is an alias for fmt.Sprintln: it formats its arguments with default formatting (spaces between non-string args), appends a newline, and outputs the result as a string. printf is an alias for fmt.Sprintf: it formats its arguments according to a format string (e.g., %s, %d), does NOT append a newline automatically, and outputs the result as a string. Key difference for newlines: Use println to automatically add a newline after output. Use printf '%s\n' or similar to add a newline manually. Docker adds extra functions like json, join, lower, upper, split, title, pad, truncate, table (println listed explicitly in docs). No evidence of Sprig library integration. Examples from Docker docs: - println for multi-line: docker inspect --format='{{range.NetworkSettings.Networks}}{{println .IPAddress}}{{end}}' container (each IP on new line) - printf for custom: docker inspect --format='{{printf "%s\n" .Name}}' container (name followed by newline)
Citations:
- 1: https://docs.docker.com/engine/cli/formatting
- 2: https://docs.docker.com/reference/cli/docker/inspect
- 3: https://pkg.go.dev/text/template@go1.24.2
- 4: https://go.dev/src/text/template/funcs.go
Output one network name per line before head -1.
The current template {{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}} concatenates all network names without separators. If the Oracle container is on multiple networks, $net becomes an invalid concatenated string like "net1net2", causing docker run --network "$net" to fail.
Suggested fix
- net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -1)
+ net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{printf "%s\n" $k}}{{end}}' | head -1)Alternatively, {{println $k}} works as a simpler approach that automatically appends a newline.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get Oracle container's network | |
| local net | |
| net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -1) | |
| docker run -d --name "$OLR_CONTAINER" \ | |
| --network "$net" \ | |
| --network-alias olr \ | |
| # Get Oracle container's network | |
| local net | |
| net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{printf "%s\n" $k}}{{end}}' | head -1) | |
| docker run -d --name "$OLR_CONTAINER" \ | |
| --network "$net" \ | |
| --network-alias olr \ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/debezium/checkpoint-restart-test.sh` around lines 160 - 166, The docker
inspect format currently concatenates network names (in the command that sets
net from ORACLE_CONTAINER), producing invalid values like "net1net2"; update the
format string used in the docker inspect call inside the
tests/debezium/checkpoint-restart-test.sh script so each network name is printed
on its own line (for example by using a template that emits a newline for each
$k or by using println $k), then keep the existing head -1 to select the first
network; refer to the variables/commands net, ORACLE_CONTAINER, and the docker
inspect invocation and ensure the docker run that uses OLR_CONTAINER/--network
reads a single valid network name.
- Preserve OLR logs before container removal on each kill cycle, scan all cycle logs in final error check (Critical + Major) - Turn missing IDs into test failure, not warning (Major) - Filter virsh output for 192.168.122.* subnet (Minor)
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tests/debezium/checkpoint-restart-test.sh (1)
160-166:⚠️ Potential issue | 🟠 MajorPrint one network name per line before
head -1.
docker inspect ... '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}'still concatenates every attached network into one token. Ifdbz-oracleis on more than one network,netbecomes an invalid value likenet1net2, and thedocker run --network "$net"here fails before the test starts.♻️ Proposed fix
- net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -1) + net=$(docker inspect "$ORACLE_CONTAINER" --format '{{range $k,$v := .NetworkSettings.Networks}}{{printf "%s\n" $k}}{{end}}' | head -1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/debezium/checkpoint-restart-test.sh` around lines 160 - 166, The code builds the variable net by concatenating all network names into one token, causing docker run --network "$net" to fail; update the docker inspect invocation that sets net (used with ORACLE_CONTAINER and later in the docker run that creates OLR_CONTAINER) to produce one network name per line (e.g., emit newline-separated network names) and then keep the head -1 usage to select the first network, ensuring net contains a single valid network name rather than a concatenation of multiple names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/debezium/checkpoint-restart-test.sh`:
- Around line 369-376: The current check compares PRE_KILL_SCN to
POST_RESTART_SCN read from the checkpoint file immediately after _launch_olr,
which can be unchanged and give a false positive; instead, after calling
_launch_olr wait for OLR to report its actual startup SCN (e.g., poll OLR’s
status/metrics endpoint or read the startup SCN from OLR logs) and compare that
reported STARTUP_SCN to PRE_KILL_SCN. Replace the immediate _read_checkpoint_scn
usage for POST_RESTART_SCN with a function that queries the running OLR instance
for the SCN it actually opened (or waits until the checkpoint file is updated by
OLR), and use that reported value in the comparison to ensure a real resume
occurred.
In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh`:
- Around line 376-382: The current check uses POST_RESTART_SCN from
_read_checkpoint_scn which reads the persisted file (unchanged until OLR writes
a new checkpoint) so it can falsely pass; instead capture the SCN OLR actually
reported/used at startup (e.g., the value _start_olr prints or a helper that
queries the running OLR process) and use that startup SCN for the comparison
against PRE_KILL_SCN; replace the POST_RESTART_SCN = $(_read_checkpoint_scn)
usage with retrieving the OLR-reported startup SCN after calling _start_olr (or
add a _get_olr_startup_scn helper that parses _start_olr output/logs) and then
compare that startup SCN to PRE_KILL_SCN before incrementing
CHECKPOINT_VERIFIED.
- Around line 55-56: The SSH command in _vm_copy_in() currently chains "podman
cp ...; rm -f ..." which masks podman cp failures because rm returns 0; change
the SSH command so the podman cp exit code is preserved (either by using a
logical AND so rm runs only on success, or by capturing podman cp's exit status
into a variable, running rm, then exiting with that saved code). Apply this same
fix to the identical patterns in tests/sql/scripts/drivers/rac.sh,
tests/sql/environments/rac/debezium/run.sh, and
tests/sql/environments/rac/debezium/soak-test.sh so podman cp failures are
propagated instead of being masked.
---
Duplicate comments:
In `@tests/debezium/checkpoint-restart-test.sh`:
- Around line 160-166: The code builds the variable net by concatenating all
network names into one token, causing docker run --network "$net" to fail;
update the docker inspect invocation that sets net (used with ORACLE_CONTAINER
and later in the docker run that creates OLR_CONTAINER) to produce one network
name per line (e.g., emit newline-separated network names) and then keep the
head -1 usage to select the first network, ensuring net contains a single valid
network name rather than a concatenation of multiple names.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9a3ad45f-9c35-4ca9-890c-368be54dddfe
📒 Files selected for processing (3)
tests/debezium/checkpoint-restart-test.shtests/sql/environments/rac/debezium/checkpoint-restart-test.shtests/sql/environments/rac/vm-env.sh
✅ Files skipped from review due to trivial changes (1)
- tests/sql/environments/rac/vm-env.sh
| # Verify checkpoint resume | ||
| POST_RESTART_SCN=$(_read_checkpoint_scn) | ||
| if [[ "$PRE_KILL_SCN" != "0" && "$POST_RESTART_SCN" != "0" ]]; then | ||
| echo " Checkpoint resume: pre-kill=$PRE_KILL_SCN, post-restart=$POST_RESTART_SCN" | ||
| if [[ "$POST_RESTART_SCN" -ge "$PRE_KILL_SCN" ]]; then | ||
| echo " PASS: resumed from checkpoint" | ||
| CHECKPOINT_VERIFIED=$(( CHECKPOINT_VERIFIED + 1 )) | ||
| fi |
There was a problem hiding this comment.
This checkpoint check can pass without proving a real resume.
POST_RESTART_SCN is read immediately after _launch_olr, before OLR has written any new state. In the normal case that just rereads the same checkpoint JSON that produced PRE_KILL_SCN, so POST_RESTART_SCN >= PRE_KILL_SCN can succeed even if the restarted process actually reopened from an earlier SCN. Please validate against the startup SCN OLR really uses/reports, not the unchanged checkpoint file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/debezium/checkpoint-restart-test.sh` around lines 369 - 376, The
current check compares PRE_KILL_SCN to POST_RESTART_SCN read from the checkpoint
file immediately after _launch_olr, which can be unchanged and give a false
positive; instead, after calling _launch_olr wait for OLR to report its actual
startup SCN (e.g., poll OLR’s status/metrics endpoint or read the startup SCN
from OLR logs) and compare that reported STARTUP_SCN to PRE_KILL_SCN. Replace
the immediate _read_checkpoint_scn usage for POST_RESTART_SCN with a function
that queries the running OLR instance for the SCN it actually opened (or waits
until the checkpoint file is updated by OLR), and use that reported value in the
comparison to ensure a real resume occurred.
| scp $_SSH_OPTS "$local_path" "${VM_USER}@${VM_HOST}:${staging}" | ||
| ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" "podman cp ${staging} ${node}:${container_path}; rm -f ${staging}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Read the file to see the actual code and context
cat -n tests/sql/environments/rac/debezium/checkpoint-restart-test.sh | sed -n '50,65p'Repository: rophy/OpenLogReplicator
Length of output: 688
🏁 Script executed:
# Also check if there are other similar patterns in the codebase
rg 'ssh.*\$_SSH_OPTS.*".*&&|;.*rm -f' --multiline -A 2 -B 2Repository: rophy/OpenLogReplicator
Length of output: 3617
🏁 Script executed:
# Verify the exact SSH exit-status behavior documentation/semantics
cat > /tmp/test_ssh_semantics.sh << 'EOF'
# This tests the claim that SSH returns only the last command's exit status
# when multiple commands are joined by semicolon
# Test 1: What does a local shell do with cmd1; cmd2?
echo "=== Local shell test: false; true ==="
bash -c "false; true"
echo "Exit status: $?"
echo ""
echo "=== Local shell test: true; false ==="
bash -c "true; false"
echo "Exit status: $?"
echo ""
echo "=== Demonstrating the problem ==="
echo "Local behavior with semicolon:"
bash -c "echo 'cmd1 fails' >&2; exit 1; echo 'cleanup'; exit 0"
echo "Exit status after cmd1 fails, cmd2 succeeds: $?"
bash -c "exit 1; exit 0"
echo "Exit status of (exit 1; exit 0): $?"
EOF
chmod +x /tmp/test_ssh_semantics.sh
/tmp/test_ssh_semantics.shRepository: rophy/OpenLogReplicator
Length of output: 349
Fix the exit-status masking in _vm_copy_in() that hides failed podman cp operations.
With podman cp ${staging} ${node}:${container_path}; rm -f ${staging}, a failed copy returns a non-zero exit code, but since rm -f unconditionally runs next and succeeds (returning 0), the overall SSH exit status is 0. This masks the copy failure, and the script proceeds with a missing or stale file, causing failures downstream that are harder to debug.
The same pattern exists in tests/sql/scripts/drivers/rac.sh, tests/sql/environments/rac/debezium/run.sh, and tests/sql/environments/rac/debezium/soak-test.sh and should be fixed consistently.
Proposed fix
- ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" "podman cp ${staging} ${node}:${container_path}; rm -f ${staging}"
+ ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" "status=0; podman cp ${staging} ${node}:${container_path} || status=\$?; rm -f ${staging}; exit \$status"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scp $_SSH_OPTS "$local_path" "${VM_USER}@${VM_HOST}:${staging}" | |
| ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" "podman cp ${staging} ${node}:${container_path}; rm -f ${staging}" | |
| scp $_SSH_OPTS "$local_path" "${VM_USER}@${VM_HOST}:${staging}" | |
| ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" "status=0; podman cp ${staging} ${node}:${container_path} || status=\$?; rm -f ${staging}; exit \$status" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh` around lines
55 - 56, The SSH command in _vm_copy_in() currently chains "podman cp ...; rm -f
..." which masks podman cp failures because rm returns 0; change the SSH command
so the podman cp exit code is preserved (either by using a logical AND so rm
runs only on success, or by capturing podman cp's exit status into a variable,
running rm, then exiting with that saved code). Apply this same fix to the
identical patterns in tests/sql/scripts/drivers/rac.sh,
tests/sql/environments/rac/debezium/run.sh, and
tests/sql/environments/rac/debezium/soak-test.sh so podman cp failures are
propagated instead of being masked.
| # Verify it resumed from checkpoint, not from start SCN | ||
| POST_RESTART_SCN=$(_read_checkpoint_scn) | ||
| if [[ "$PRE_KILL_SCN" != "0" && "$POST_RESTART_SCN" != "0" ]]; then | ||
| echo " Checkpoint resume: pre-kill scn=$PRE_KILL_SCN, post-restart scn=$POST_RESTART_SCN" | ||
| if [[ "$POST_RESTART_SCN" -ge "$PRE_KILL_SCN" ]]; then | ||
| echo " PASS: OLR resumed from checkpoint (not start SCN)" | ||
| CHECKPOINT_VERIFIED=$(( CHECKPOINT_VERIFIED + 1 )) |
There was a problem hiding this comment.
This resume check is reading persisted state, not the SCN OLR reopened from.
POST_RESTART_SCN comes straight from the checkpoint file immediately after _start_olr. Until OLR writes a new checkpoint, that file still contains the pre-kill SCN, so this branch can increment CHECKPOINT_VERIFIED even when the restarted process actually resumed from some earlier position. Please key this assertion off the startup SCN OLR reports/uses, not the unchanged checkpoint file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sql/environments/rac/debezium/checkpoint-restart-test.sh` around lines
376 - 382, The current check uses POST_RESTART_SCN from _read_checkpoint_scn
which reads the persisted file (unchanged until OLR writes a new checkpoint) so
it can falsely pass; instead capture the SCN OLR actually reported/used at
startup (e.g., the value _start_olr prints or a helper that queries the running
OLR process) and use that startup SCN for the comparison against PRE_KILL_SCN;
replace the POST_RESTART_SCN = $(_read_checkpoint_scn) usage with retrieving the
OLR-reported startup SCN after calling _start_olr (or add a _get_olr_startup_scn
helper that parses _start_olr output/logs) and then compare that startup SCN to
PRE_KILL_SCN before incrementing CHECKPOINT_VERIFIED.
- Extract VM IP discovery into shared vm-env.sh sourced by all RAC scripts (fixes major: standalone scripts bypassed up.sh discovery) - Fix _read_checkpoint_scn double output when volume doesn't exist - Add CHKPT_RESULT to single-instance exit code (consistency with RAC) - Use head -1 for virsh IP detection (handles multiple IPs)
Summary
flags: 4(upstream ADAPTIVE_SCHEMA bug), added checkpoint intervalsup.shauto-detects VM IP viavirsh domifaddrand fails loudly if config files have stale IPsTest plan
Summary by CodeRabbit
Tests
Chores