Skip to content

refactor: migrate chain-index from QMDB to SQLite#10

Merged
tac0turtle merged 4 commits intomainfrom
swarm/worker-1
Feb 21, 2026
Merged

refactor: migrate chain-index from QMDB to SQLite#10
tac0turtle merged 4 commits intomainfrom
swarm/worker-1

Conversation

@tac0turtle
Copy link
Copy Markdown
Contributor

@tac0turtle tac0turtle commented Feb 21, 2026

Summary

  • Replace PersistentChainIndex's QMDB KV backend with SQLite (WAL mode, proper relational schema)
  • PersistentChainIndex is no longer generic over S: Storage — concrete struct with Mutex<Connection>
  • Proper relational tables (blocks, transactions, receipts, logs, metadata) with indexes replace JSON-serialized KV pairs
  • Chunking workaround (TX_HASH_CHUNK_SIZE, LOG_CHUNK_SIZE) eliminated — SQLite handles large values natively
  • All 3 construction sites updated (node/lib.rs x2, evd/main.rs x1)
  • ChainIndex trait interface unchanged — all consumers compile without modification

Files changed

  • Cargo.toml — added rusqlite to workspace deps
  • crates/rpc/chain-index/Cargo.toml — added rusqlite (bundled), removed evolve_storage
  • crates/rpc/chain-index/src/index.rs — full rewrite of PersistentChainIndex
  • crates/rpc/chain-index/src/error.rs — added Sqlite error variant
  • crates/app/node/src/lib.rs — updated construction + type annotations
  • bin/evd/src/main.rs — updated construction

Test plan

  • All 13 chain-index unit + proptest tests pass
  • just quality passes (fmt + clippy zero warnings)
  • just check passes workspace-wide
  • Verify block storage + retrieval round-trip through SQLite

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Chain index is now persisted to a SQLite database file, making indexed chain data durable across restarts and available when RPC or block indexing is enabled.
  • Chores

    • Added SQLite-related dependencies and updated startup to initialize optional file-backed chain indexing when configured.
  • Tests

    • Added recorded test seeds to reproduce and re-run previous proptest failures.

Replace PersistentChainIndex's QMDB KV storage with a SQLite database.
Key changes:

- Remove evolve_storage dependency from chain-index crate; add rusqlite
  with bundled feature to workspace and chain-index Cargo.toml
- Rewrite PersistentChainIndex in src/index.rs:
  - No longer generic over S: Storage; owns a Mutex<Connection>
  - Constructor: PersistentChainIndex::new(db_path) for on-disk
  - Constructor: PersistentChainIndex::in_memory() for tests
  - Schema: proper relational tables (blocks, transactions, receipts,
    logs, metadata) with indexes; WAL mode for concurrency
  - store_block uses a single SQLite transaction for atomicity
  - Read methods use SQL queries instead of KV lookups
  - Re-indexing same block number correctly deletes old rows first
- Eliminate TX_HASH_CHUNK_SIZE / LOG_CHUNK_SIZE chunking workaround
- Add Sqlite(String) variant + From<rusqlite::Error> to error.rs
- Update tests to use PersistentChainIndex::in_memory() instead of
  MockStorage; add tempfile dev-dependency for persistence test
- Update node/src/lib.rs (2 sites): derive chain-index.sqlite path from
  data_dir; change type annotations from PersistentChainIndex<S> to
  PersistentChainIndex
- Update bin/evd/src/main.rs (1 site): same pattern
- All 13 chain-index tests pass; just quality passes (fmt + clippy)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fundamentally re-architects the PersistentChainIndex by replacing its underlying data storage mechanism. Previously relying on a generic Key-Value store, it now leverages SQLite to manage blockchain data. This change introduces a structured relational database schema, eliminating the need for data chunking and simplifying data access patterns, while maintaining the existing public API for seamless integration with other components.

Highlights

  • Storage Backend Migration: The PersistentChainIndex has been migrated from using a QMDB Key-Value backend to a SQLite database, providing a proper relational schema for chain data.
  • Generic Type Removal: The PersistentChainIndex is no longer generic over a storage type (S: Storage), now directly using a concrete Mutex<Connection> for SQLite access.
  • Relational Schema Implementation: Chain data (blocks, transactions, receipts, logs, metadata) is now stored in dedicated relational tables with appropriate indexes, replacing the previous JSON-serialized KV pairs.
  • Chunking Logic Elimination: Previous workarounds for large values, such as TX_HASH_CHUNK_SIZE and LOG_CHUNK_SIZE for chunking data, have been removed as SQLite natively handles larger data sizes.
  • Construction Site Updates: All three instances where PersistentChainIndex is constructed (two in node/lib.rs and one in evd/main.rs) have been updated to reflect the new SQLite-based constructor.
  • API Compatibility: The public ChainIndex trait interface remains unchanged, ensuring that all existing consumers of the chain index compile without modifications.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Cargo.toml
    • Added rusqlite as a workspace dependency with the bundled feature.
  • bin/evd/src/main.rs
    • Updated the instantiation of PersistentChainIndex to use a SQLite database path instead of the generic storage interface.
  • crates/app/node/src/lib.rs
    • Modified run_dev_node_with_rpc and run_dev_node_with_rpc_and_mempool_eth functions to define a SQLite database path for the chain index.
    • Updated PersistentChainIndex instantiation to use the new SQLite constructor and removed the generic type parameter S from DevConsensus.
  • crates/rpc/chain-index/Cargo.toml
    • Removed the evolve_storage dependency.
    • Added rusqlite as a dependency, inheriting from the workspace definition.
    • Added tempfile as a dev-dependency for testing purposes.
  • crates/rpc/chain-index/proptest-regressions/index.txt
    • Added a new file to store proptest regression seeds, ensuring past failure cases are re-tested.
  • crates/rpc/chain-index/src/error.rs
    • Added a new Sqlite error variant to ChainIndexError.
    • Implemented From<rusqlite::Error> for ChainIndexError to simplify error handling.
  • crates/rpc/chain-index/src/index.rs
    • Completely refactored PersistentChainIndex to use rusqlite for database interactions.
    • Removed the evolve_storage dependency and related key generation logic.
    • Changed PersistentChainIndex to be a concrete struct with a Mutex<Connection> instead of being generic over a storage trait.
    • Introduced new and in_memory constructors for PersistentChainIndex that initialize a SQLite connection.
    • Added an init_schema method to create and configure SQLite tables (blocks, transactions, receipts, logs, metadata) with appropriate indexes and foreign key constraints.
    • Rewrote all data access methods (get_block, get_transaction, store_block, etc.) to perform SQL queries and operations.
    • Removed TX_HASH_CHUNK_SIZE and LOG_CHUNK_SIZE constants, as data chunking is no longer necessary with SQLite.
    • Updated test utilities and proptest integration to use in-memory SQLite instances and tempfile for persistence tests, removing the MockStorage.
Activity
  • The pull request description was generated with Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Walkthrough

Replaces the generic Storage-backed chain index with a SQLite-backed PersistentChainIndex, adds rusqlite (v0.38) and related dependencies, updates initialization paths to use a file-based DB (data_dir/chain-index.sqlite) and makes the chain index optional where not required. New SQLite error mapping added.

Changes

Cohort / File(s) Summary
Workspace Dependency
Cargo.toml
Bump/add workspace dependency: rusqlite = { version = "0.38", features = ["bundled"] }.
Chain-index crate manifest
crates/rpc/chain-index/Cargo.toml
Removed evolve_storage.workspace = true; added rusqlite.workspace = true, r2d2 = "0.8", r2d2_sqlite = "0.32", and tempfile = "3" to dev-deps.
Core index implementation
crates/rpc/chain-index/src/index.rs
Major refactor: replace generic Storage-backed PersistentChainIndex<S> with SQLite-backed PersistentChainIndex using a writer Mutex<Connection> and a read Pool; new constructors new(db_path) and in_memory(); added schema init, SQL-backed get/store methods, cache, connection helpers, and removed legacy key-space.
Error mapping
crates/rpc/chain-index/src/error.rs
Added Sqlite(String) variant to ChainIndexError and From<rusqlite::Error> conversion.
Node / binary integration
bin/evd/src/main.rs, crates/app/node/src/lib.rs
Change initialization to build data_dir/chain-index.sqlite path and construct PersistentChainIndex::new(path); chain index becomes Option<Arc<PersistentChainIndex>> and is only created when RPC or indexing is enabled; updated callbacks to guard on optional index.
Tests / regressions
crates/rpc/chain-index/proptest-regressions/index.txt
Add proptest regression seeds file to capture reproducible failure cases.
Other
...
Various API surface updates: added initialize() and cache() accessors; removed storage constants and keys module; tests adapted to new in_memory and DB-backed behavior.

Sequence Diagram

sequenceDiagram
    participant App as Application / Node
    participant RPC as RPC Server
    participant Chain as PersistentChainIndex
    participant DB as SQLite DB

    App->>Chain: new(db_path) / in_memory()
    activate Chain
    Chain->>DB: open writer conn & setup read pool
    Chain->>DB: init_schema() (if empty)
    Chain->>Chain: build cache from metadata
    deactivate Chain

    App->>RPC: start (requires chain index)
    RPC->>Chain: request state / get_block(...)
    activate Chain
    Chain->>Chain: check cache
    alt cache miss
        Chain->>DB: SELECT blocks/txs/receipts/logs
        DB-->>Chain: rows
        Chain->>Chain: map rows -> Stored structs
    end
    Chain-->>RPC: response
    deactivate Chain

    App->>Chain: store_block(block) [on execution]
    activate Chain
    Chain->>DB: BEGIN
    Chain->>DB: INSERT/UPDATE blocks, transactions, receipts, logs
    Chain->>DB: COMMIT
    Chain->>Chain: update cache
    deactivate Chain
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped from RAM to disk so neat,
A sqlite burrow finds blocks a seat,
With WAL and PRAGMA snug and tight,
The index naps through day and night,
I nibble seeds and guard each byte. 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: migrating the chain-index backend from QMDB to SQLite, which is the primary focus of all modified files and the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch swarm/worker-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The migration of chain-index from QMDB to SQLite is a significant improvement, providing a proper relational schema and eliminating the need for manual value chunking. The new implementation is cleaner and leverages SQLite's native capabilities for indexing and WAL mode. I have identified a few missing indexes that could lead to performance issues as the chain grows, and I recommend using standard transactions instead of unchecked_transaction for better safety and error reporting.

FOREIGN KEY (block_number) REFERENCES blocks(number)
);
CREATE INDEX IF NOT EXISTS idx_logs_block ON logs(block_number);
CREATE INDEX IF NOT EXISTS idx_logs_address ON logs(address);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logs table is missing an index on transaction_hash. This will cause load_logs_for_tx (used by get_receipt) to perform a full table scan, which will become increasingly slow as the database grows.

Suggested change
CREATE INDEX IF NOT EXISTS idx_logs_address ON logs(address);
CREATE INDEX IF NOT EXISTS idx_logs_address ON logs(address);
CREATE INDEX IF NOT EXISTS idx_logs_tx ON logs(transaction_hash);

status INTEGER NOT NULL,
tx_type INTEGER NOT NULL,
FOREIGN KEY (block_number) REFERENCES blocks(number)
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The receipts table is missing an index on block_number. This will cause the DELETE FROM receipts WHERE block_number = ? operation in store_block to perform a full table scan.

Suggested change
);
FOREIGN KEY (block_number) REFERENCES blocks(number)
);
CREATE INDEX IF NOT EXISTS idx_receipts_block ON receipts(block_number);

})?,
});
let conn = self.connection.lock().unwrap();
let tx = conn.unchecked_transaction()?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using unchecked_transaction() is generally discouraged unless you have a specific performance reason to skip the autocommit check. Since the connection is already protected by a Mutex, using the standard transaction() method is safer and provides better error reporting if a transaction is unexpectedly already active.

Suggested change
let tx = conn.unchecked_transaction()?;
let mut conn = self.connection.lock().unwrap();
let tx = conn.transaction()?;

Comment on lines +526 to +527
for receipt in &receipts {
ops.push(evolve_storage::Operation::Set {
key: keys::tx_receipt_key(&receipt.transaction_hash),
value: serde_json::to_vec(receipt)?,
});
}

// Store logs by block in chunked format to avoid oversize values.
if !all_logs.is_empty() {
let log_chunks: Vec<Vec<StoredLog>> = all_logs
.chunks(LOG_CHUNK_SIZE)
.map(|c| c.to_vec())
.collect();
ops.push(evolve_storage::Operation::Set {
key: keys::logs_by_block_chunk_count_key(block_number),
value: (log_chunks.len() as u32).to_be_bytes().to_vec(),
});
for (idx, chunk) in log_chunks.iter().enumerate() {
ops.push(evolve_storage::Operation::Set {
key: keys::logs_by_block_chunk_key(block_number, idx as u32),
value: serde_json::to_vec(chunk)?,
});
insert_receipt(&tx, receipt)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with how transactions are stored, receipts should also use the loop index as the authoritative transaction_index. Currently, insert_transaction uses the loop index while insert_receipt uses the field from the struct, which could lead to inconsistencies if the input vectors are not perfectly aligned or if the struct field is incorrect.

        // Insert receipts and logs
        for (idx, receipt) in receipts.iter().enumerate() {
            insert_receipt(&tx, receipt, idx as i64)?;

data: RwLock<HashMap<Vec<u8>, Vec<u8>>>,
commit_count: RwLock<u64>,
}
fn insert_receipt(tx: &rusqlite::Transaction<'_>, receipt: &StoredReceipt) -> ChainIndexResult<()> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update insert_receipt to accept an explicit transaction_index to ensure consistency with the transactions table.

fn insert_receipt(
    tx: &rusqlite::Transaction<'_>,
    receipt: &StoredReceipt,
    array_index: i64,
) -> ChainIndexResult<()> {

VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
params![
receipt.transaction_hash.as_slice(),
receipt.transaction_index as i64,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the provided array_index instead of the struct field.

            receipt.transaction_hash.as_slice(),
            array_index,

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/evd/src/main.rs (1)

17-18: ⚠️ Potential issue | 🟡 Minor

Stale architecture diagram — still references "QMDB Storage" for the chain index.

The ASCII diagram shows the chain index backed by "QMDB Storage", but it now uses a SQLite file (chain-index.sqlite). QMDB is only the STF storage.

📝 Suggested update
-//!  │  │  QMDB  │          │
-//!  │  │Storage │          │
+//!  │  │ SQLite │          │
+//!  │  │ Index  │          │
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/evd/src/main.rs` around lines 17 - 18, The ASCII architecture diagram
comment in main.rs still labels the chain index as "QMDB Storage"; update that
diagram to reflect the current design by replacing the "QMDB Storage" label with
"SQLite" or "chain-index.sqlite" and ensure QMDB is only referenced as the STF
storage; locate the ASCII block containing the "Client", "RPC Server", and "QMDB
Storage" labels in main.rs and change the label text to accurately state "SQLite
(chain-index.sqlite)" or similar.
🧹 Nitpick comments (5)
crates/rpc/chain-index/src/index.rs (4)

481-483: unchecked_transaction — add a safety comment explaining the choice.

Without context, this looks like a shortcut. The reason is that MutexGuard<Connection> only gives &Connection, making the normal transaction() (which requires &mut Connection) inaccessible; unchecked_transaction is the correct workaround and is safe here because conn and tx share the same stack frame.

📝 Suggested comment
-        let tx = conn.unchecked_transaction()?;
+        // `unchecked_transaction` is required here because `conn` is a `MutexGuard<Connection>`
+        // which only provides `&Connection`. The safety invariant (transaction does not outlive
+        // the connection) is upheld because both `conn` and `tx` live in this stack frame.
+        let tx = conn.unchecked_transaction()?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 481 - 483, Add a brief
safety comment above the unchecked_transaction call explaining why
unchecked_transaction is used instead of transaction(): note that conn is a
MutexGuard<Connection> (from self.connection.lock().unwrap()) which provides
only &Connection so transaction() (which requires &mut Connection) is
inaccessible, and that unchecked_transaction is safe here because conn and tx
are tied to the same stack frame and the MutexGuard prevents concurrent access;
reference the symbols conn, tx, self.connection.lock(), MutexGuard<Connection>,
transaction(), and unchecked_transaction in the comment for clarity.

104-181: Missing index on receipts.block_number — full-table scan during re-indexing.

DELETE FROM receipts WHERE block_number = ? (called in store_block on every re-indexed block) has no index to cover the block_number predicate. For large receipt tables this degrades to a full scan. All other child tables (transactions, logs) have idx_tx_block / idx_logs_block covering this exact pattern.

⚡ Add the missing index
             CREATE TABLE IF NOT EXISTS receipts (
                 ...
+                FOREIGN KEY (block_number) REFERENCES blocks(number)
             );
+            CREATE INDEX IF NOT EXISTS idx_receipts_block ON receipts(block_number);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 104 - 181, The receipts
table lacks an index on block_number which causes DELETE FROM receipts WHERE
block_number = ? (called in store_block) to do full-table scans; update
init_schema to add a CREATE INDEX IF NOT EXISTS idx_receipts_block ON
receipts(block_number) inside the SQL batch that creates the receipts table so
the delete in store_block is covered by an index.

104-105: lock().unwrap() propagates mutex-poison panics everywhere.

std::sync::Mutex poisons on thread panic. Any panic while the lock is held (e.g., from a B256::from_slice panic identified above) renders every subsequent lock().unwrap() call a panic too, making the chain index permanently unusable without a node restart.

parking_lot::Mutex is already a direct dependency and never poisons — replacing the one import is the simplest fix.

♻️ Swap to parking_lot::Mutex
-use std::sync::Mutex;
+use parking_lot::Mutex;

parking_lot::Mutex::lock() returns the guard directly (no Result), so all call sites simply become:

-let conn = self.connection.lock().unwrap();
+let conn = self.connection.lock();

Also applies to: 184-184, 314-314, 351-351, 366-366, 386-386, 406-406, 428-428, 452-452, 481-481

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 104 - 105, The code
currently uses std::sync::Mutex and calls self.connection.lock().unwrap() (e.g.,
in init_schema and other methods), which will poison on thread panic; replace
the std::sync::Mutex type with parking_lot::Mutex and update the import so the
field (connection) uses parking_lot::Mutex, then remove the .unwrap() calls and
call .lock() directly (parking_lot::Mutex::lock returns the guard), and apply
the same swap for all other occurrences listed (lines referencing
lock().unwrap() such as in init_schema and the occurrences around the other
reported locations). Ensure signatures/types for the connection field and any
MutexGuard usages are adjusted to parking_lot equivalents.

365-379: get_block_transactions is the only read method that never consults the cache.

All other hot-path readers (get_block, get_transaction, get_receipt) return cached results; get_block_transactions always acquires the mutex and hits SQLite, even for blocks that were just stored and are fully in-memory. Consider checking the block cache (or a dedicated tx-hashes cache entry) before falling through to the DB.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 365 - 379,
get_block_transactions always queries SQLite instead of checking the in-memory
cache; update the get_block_transactions function to first consult the existing
block cache (or a dedicated transaction-hashes cache) for the given block number
and return the cached Vec<B256> if present, only acquiring the DB mutex and
running the SQL when the cache miss occurs. Locate get_block_transactions and
reuse the same cache lookup logic used by get_block/get_transaction/get_receipt
(e.g., check the cache map or call the internal cache getter), clone or
otherwise return an owned Vec<B256> from the cached entry, and only fall back to
the current conn.prepare/query_map path on cache miss, ensuring thread-safety
consistent with the other reader methods.
Cargo.toml (1)

95-95: Operational note: bundled feature increases compile time and binary size.

bundled causes an embedded, up-to-date SQLite to be compiled from source, which avoids system-library drift; the rusqlite docs call it "the right choice for most programs that control their own SQLite databases", while also noting it's not ideal for generic libraries. For this application binary that's the correct call, but CI/CD cold-build times will increase noticeably since it compiles the full SQLite C library.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` at line 95, The dependency line enables rusqlite with the
"bundled" feature which compiles SQLite from source and increases CI cold-build
time and binary size; either remove the "bundled" feature from the rusqlite
dependency in Cargo.toml to use the system SQLite (rusqlite = "0.32") or make
"bundled" an optional feature (add a [features] section, e.g. feature name like
"bundled-sqlite" that enables rusqlite = { version = "0.32", features =
["bundled"] }) and update CI/workflows to enable that feature only where you
need the bundled build; refer to the rusqlite dependency entry and the
Cargo.toml [features] section to implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/evd/src/main.rs`:
- Around line 219-228: The PersistentChainIndex::new(...) call and chain_index
initialization should be performed only when the chain index is needed (i.e.,
when config.rpc.enabled or enable_block_indexing is true) to avoid creating the
SQLite file unnecessarily; move the construction and the
chain_index.initialize() call inside the guard that checks config.rpc.enabled ||
enable_block_indexing, make chain_index an Option<Arc<PersistentChainIndex>> (or
otherwise conditionally create it) so downstream code accesses it via if let
Some(chain_index) = chain_index { ... }, and remove the unconditional creation
at the top so the DB file is only created when RPC or block indexing is enabled.

In `@Cargo.toml`:
- Line 95: The Cargo.toml currently pins rusqlite = "0.32" which jumps several
breaking releases; either keep the existing 0.32.x pin or upgrade to 0.38.0
intentionally and then update code to accommodate breaking changes: audit all
uses of rusqlite::Connection::{execute, prepare} to split trailing/multiple
statements, adjust any code relying on u64/usize ToSql/FromSql (enable the
fallible_uint feature if you want automatic impls), remove or replace any use of
the removed release_memory feature, and update hook registration sites to pass
owned Connection where required; also add or adjust Cargo feature flags (e.g.,
fallible_uint) in the rusqlite dependency specification as needed.

In `@crates/rpc/chain-index/src/index.rs`:
- Around line 197-298: The row mappers (row_to_stored_block,
row_to_stored_transaction, row_to_stored_receipt) call B256::from_slice and
Address::from_slice unconditionally and will panic on wrong-length blobs; fix by
validating byte vector lengths before calling those constructors (e.g., ensure
32 bytes for B256 fields and 20 bytes for Address fields), and if a length is
invalid return a rusqlite::Result::Err with a descriptive rusqlite::Error
(propagating a recoverable error instead of panicking) so invalid DB rows are
handled safely.

---

Outside diff comments:
In `@bin/evd/src/main.rs`:
- Around line 17-18: The ASCII architecture diagram comment in main.rs still
labels the chain index as "QMDB Storage"; update that diagram to reflect the
current design by replacing the "QMDB Storage" label with "SQLite" or
"chain-index.sqlite" and ensure QMDB is only referenced as the STF storage;
locate the ASCII block containing the "Client", "RPC Server", and "QMDB Storage"
labels in main.rs and change the label text to accurately state "SQLite
(chain-index.sqlite)" or similar.

---

Nitpick comments:
In `@Cargo.toml`:
- Line 95: The dependency line enables rusqlite with the "bundled" feature which
compiles SQLite from source and increases CI cold-build time and binary size;
either remove the "bundled" feature from the rusqlite dependency in Cargo.toml
to use the system SQLite (rusqlite = "0.32") or make "bundled" an optional
feature (add a [features] section, e.g. feature name like "bundled-sqlite" that
enables rusqlite = { version = "0.32", features = ["bundled"] }) and update
CI/workflows to enable that feature only where you need the bundled build; refer
to the rusqlite dependency entry and the Cargo.toml [features] section to
implement the change.

In `@crates/rpc/chain-index/src/index.rs`:
- Around line 481-483: Add a brief safety comment above the
unchecked_transaction call explaining why unchecked_transaction is used instead
of transaction(): note that conn is a MutexGuard<Connection> (from
self.connection.lock().unwrap()) which provides only &Connection so
transaction() (which requires &mut Connection) is inaccessible, and that
unchecked_transaction is safe here because conn and tx are tied to the same
stack frame and the MutexGuard prevents concurrent access; reference the symbols
conn, tx, self.connection.lock(), MutexGuard<Connection>, transaction(), and
unchecked_transaction in the comment for clarity.
- Around line 104-181: The receipts table lacks an index on block_number which
causes DELETE FROM receipts WHERE block_number = ? (called in store_block) to do
full-table scans; update init_schema to add a CREATE INDEX IF NOT EXISTS
idx_receipts_block ON receipts(block_number) inside the SQL batch that creates
the receipts table so the delete in store_block is covered by an index.
- Around line 104-105: The code currently uses std::sync::Mutex and calls
self.connection.lock().unwrap() (e.g., in init_schema and other methods), which
will poison on thread panic; replace the std::sync::Mutex type with
parking_lot::Mutex and update the import so the field (connection) uses
parking_lot::Mutex, then remove the .unwrap() calls and call .lock() directly
(parking_lot::Mutex::lock returns the guard), and apply the same swap for all
other occurrences listed (lines referencing lock().unwrap() such as in
init_schema and the occurrences around the other reported locations). Ensure
signatures/types for the connection field and any MutexGuard usages are adjusted
to parking_lot equivalents.
- Around line 365-379: get_block_transactions always queries SQLite instead of
checking the in-memory cache; update the get_block_transactions function to
first consult the existing block cache (or a dedicated transaction-hashes cache)
for the given block number and return the cached Vec<B256> if present, only
acquiring the DB mutex and running the SQL when the cache miss occurs. Locate
get_block_transactions and reuse the same cache lookup logic used by
get_block/get_transaction/get_receipt (e.g., check the cache map or call the
internal cache getter), clone or otherwise return an owned Vec<B256> from the
cached entry, and only fall back to the current conn.prepare/query_map path on
cache miss, ensuring thread-safety consistent with the other reader methods.

tac0turtle and others added 3 commits February 21, 2026 12:20
- Add missing indexes: idx_logs_tx on logs(transaction_hash),
  idx_receipts_block on receipts(block_number)
- Use transaction() instead of unchecked_transaction() for safer
  transaction handling
- Pass array_index to insert_receipt for consistent transaction_index
  with insert_transaction
- Add length-validated b256_from_row/address_from_row helpers to
  prevent panics on corrupt BLOB data
- Conditionally create chain index only when RPC or block indexing
  is enabled
- Upgrade rusqlite from 0.32 to 0.38

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The single Mutex serialized all reads behind the write lock, defeating
SQLite WAL mode's concurrent reader capability. Split into:
- r2d2 read pool (4 connections): concurrent RPC queries
- Dedicated writer Mutex<Connection>: serialized store_block

In-memory tests use SQLite shared-cache URI so reader pool and writer
see the same data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tac0turtle tac0turtle merged commit f5d6c5d into main Feb 21, 2026
4 of 6 checks passed
@tac0turtle tac0turtle deleted the swarm/worker-1 branch February 21, 2026 11:40
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/rpc/chain-index/src/index.rs (1)

584-611: ⚠️ Potential issue | 🟡 Minor

Cache and DB can disagree on transaction_index when the input field differs from array position.

insert_transaction and insert_receipt correctly store array_index as transaction_index in the DB (the intended authoritative value), but self.cache.insert_block_with_data(block, transactions, receipts) stores the original structs, which carry whatever transaction_index was set on them before the call.

A caller that hits the cache (warm) observes transaction_index = original field value; after a cache eviction (cold) the DB-loaded struct contains transaction_index = array position. If build_index_data ever produces transactions/receipts where those two values differ, get_transaction(..).transaction_index becomes time-dependent.

get_transaction_location always queries the DB and is immune, but get_transaction / get_receipt are not.

Consider normalizing the transaction_index field in the structs before inserting them into the cache:

♻️ Proposed fix
+        // Normalise transaction_index to array position before caching, matching what
+        // the DB persists via insert_transaction / insert_receipt.
+        let transactions: Vec<StoredTransaction> = transactions
+            .into_iter()
+            .enumerate()
+            .map(|(i, mut tx)| { tx.transaction_index = i as u32; tx })
+            .collect();
+        let receipts: Vec<StoredReceipt> = receipts
+            .into_iter()
+            .enumerate()
+            .map(|(i, mut r)| { r.transaction_index = i as u32; r })
+            .collect();
+
         tx.commit()?;
-        self.cache.insert_block_with_data(block, transactions, receipts);
+        self.cache.insert_block_with_data(block, transactions, receipts);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 584 - 611, The cache is
being populated with the original transaction/receipt structs (via
self.cache.insert_block_with_data(block, transactions, receipts)) which can
contain a different transaction_index than the authoritative array index stored
in the DB by insert_transaction/insert_receipt, causing cache/DB discrepancies
for get_transaction/get_receipt; fix by normalizing the transaction_index on the
structs before caching: when you enumerate transactions and receipts (the same
idx used for insert_transaction/insert_receipt), produce new/updated transaction
and receipt objects with transaction_index set to idx as i64 and pass those
normalized vectors to self.cache.insert_block_with_data instead of the original
transactions/receipts so cached data matches DB.
🧹 Nitpick comments (3)
bin/evd/src/main.rs (2)

319-343: build_index_data is always called even when indexing is disabled.

BlockBuilder::new()...build() and build_index_data(...) run unconditionally on every committed block, even when chain_index_for_callback is None (neither RPC nor indexing enabled) or callback_indexing_enabled is false. The computed data is then discarded by the if let Some / if callback_indexing_enabled guards. This wastes CPU on every block execution.

♻️ Proposed fix — guard block data construction
-            let block = BlockBuilder::<TxContext>::new()
-                .number(info.height)
-                .timestamp(info.timestamp)
-                .transactions(info.transactions)
-                .build();
-
-            let (stored_block, stored_txs, stored_receipts) =
-                build_index_data(&block, &info.block_result, &metadata);
-
-            if let Some(ref chain_index) = chain_index_for_callback {
-                if callback_indexing_enabled {
-                    if let Err(e) =
-                        chain_index.store_block(stored_block, stored_txs, stored_receipts)
-                    {
-                        tracing::warn!("Failed to index block {}: {:?}", info.height, e);
-                    } else {
-                        tracing::debug!(
-                            "Indexed block {} (hash={}, state_root={})",
-                            info.height,
-                            block_hash,
-                            state_root
-                        );
-                    }
-                }
-            }
+            if let Some(ref chain_index) = chain_index_for_callback {
+                if callback_indexing_enabled {
+                    let block = BlockBuilder::<TxContext>::new()
+                        .number(info.height)
+                        .timestamp(info.timestamp)
+                        .transactions(info.transactions)
+                        .build();
+                    let (stored_block, stored_txs, stored_receipts) =
+                        build_index_data(&block, &info.block_result, &metadata);
+                    if let Err(e) =
+                        chain_index.store_block(stored_block, stored_txs, stored_receipts)
+                    {
+                        tracing::warn!("Failed to index block {}: {:?}", info.height, e);
+                    } else {
+                        tracing::debug!(
+                            "Indexed block {} (hash={}, state_root={})",
+                            info.height,
+                            block_hash,
+                            state_root
+                        );
+                    }
+                }
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/evd/src/main.rs` around lines 319 - 343, The block construction and
indexing data creation are done unconditionally; to avoid wasted CPU, only build
the Block (via BlockBuilder::<TxContext>::new()...build()) and call
build_index_data(...) when indexing is actually enabled: i.e., when
chain_index_for_callback.is_some() and callback_indexing_enabled is true. Move
the BlockBuilder/Block creation and the subsequent let (stored_block,
stored_txs, stored_receipts) = build_index_data(...) inside that same guard,
then call chain_index.store_block(...) and keep the existing tracing::warn /
tracing::debug paths inside that branch so you don’t reference variables (block,
stored_*) outside their scope. Ensure block_hash and state_root used in the
debug log are available inside the guarded block or recomputed there.

10-21: Module diagram omits the new SQLite chain-index component.

The ASCII-art architecture diagram still only shows the QMDB storage component. Now that a SQLite-backed chain index is a separate subsystem, the diagram is incomplete for new contributors trying to understand the node layout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/evd/src/main.rs` around lines 10 - 21, The ASCII module diagram in the
top-level comment (the large doc-comment block containing the
evd/External/Client/QMDB diagram) omits the new SQLite chain-index subsystem;
update that comment to add a distinct "SQLite chain-index" box (label it
clearly, e.g., "SQLite Chain-Index") alongside or near "QMDB Storage" and show
its JSON-RPC/gRPC connections to evd (and/or RPC Server) so the diagram reflects
the new subsystem; ensure the new label uses the same ASCII-art style and
positioning as existing components so it reads consistently for new
contributors.
crates/rpc/chain-index/src/index.rs (1)

1306-1322: prop_duplicate_block_storage doesn't actually test hash replacement.

Both block1 and block2 are produced by make_test_stored_block(block_number), which sets hash: B256::repeat_byte(number as u8) — identical for the same block_number. So block2_hash == block1.hash, and prop_assert_eq!(retrieved.hash, block2_hash) is trivially satisfied regardless of whether the second write replaced the first. The useful part of the test is the transaction count check; the hash check is a no-op.

Consider mutating block2.hash to something distinct (e.g. B256::repeat_byte(0xFFu8 ^ number as u8)) to actually exercise the INSERT OR REPLACE INTO blocks overwrite path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/rpc/chain-index/src/index.rs` around lines 1306 - 1322, The test
prop_duplicate_block_storage never verifies replacement because
generate_block_data / make_test_stored_block produce identical hashes for the
same block_number; mutate block2.hash to a distinct value before calling
index.store_block to exercise the "replace" path (e.g. set block2.hash to a
different B256 like B256::repeat_byte(0xFFu8 ^ (block_number as u8)) or any
other deterministic alternative), then call index.store_block(block2,
txs2.clone(), receipts2) and assert
index.get_block(block_number).unwrap().unwrap().hash equals that new value and
that index.get_block_transactions still reflects txs2.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/rpc/chain-index/src/index.rs`:
- Around line 584-611: The cache is being populated with the original
transaction/receipt structs (via self.cache.insert_block_with_data(block,
transactions, receipts)) which can contain a different transaction_index than
the authoritative array index stored in the DB by
insert_transaction/insert_receipt, causing cache/DB discrepancies for
get_transaction/get_receipt; fix by normalizing the transaction_index on the
structs before caching: when you enumerate transactions and receipts (the same
idx used for insert_transaction/insert_receipt), produce new/updated transaction
and receipt objects with transaction_index set to idx as i64 and pass those
normalized vectors to self.cache.insert_block_with_data instead of the original
transactions/receipts so cached data matches DB.

---

Duplicate comments:
In `@bin/evd/src/main.rs`:
- Around line 219-233: This change correctly makes chain index creation
conditional (see the chain_index block, PersistentChainIndex::new and
index.initialize usage) so no code changes are required; approve and merge, and
remove the duplicate review metadata/comment tag ([duplicate_comment]) from the
PR comment to avoid clutter.

---

Nitpick comments:
In `@bin/evd/src/main.rs`:
- Around line 319-343: The block construction and indexing data creation are
done unconditionally; to avoid wasted CPU, only build the Block (via
BlockBuilder::<TxContext>::new()...build()) and call build_index_data(...) when
indexing is actually enabled: i.e., when chain_index_for_callback.is_some() and
callback_indexing_enabled is true. Move the BlockBuilder/Block creation and the
subsequent let (stored_block, stored_txs, stored_receipts) =
build_index_data(...) inside that same guard, then call
chain_index.store_block(...) and keep the existing tracing::warn /
tracing::debug paths inside that branch so you don’t reference variables (block,
stored_*) outside their scope. Ensure block_hash and state_root used in the
debug log are available inside the guarded block or recomputed there.
- Around line 10-21: The ASCII module diagram in the top-level comment (the
large doc-comment block containing the evd/External/Client/QMDB diagram) omits
the new SQLite chain-index subsystem; update that comment to add a distinct
"SQLite chain-index" box (label it clearly, e.g., "SQLite Chain-Index")
alongside or near "QMDB Storage" and show its JSON-RPC/gRPC connections to evd
(and/or RPC Server) so the diagram reflects the new subsystem; ensure the new
label uses the same ASCII-art style and positioning as existing components so it
reads consistently for new contributors.

In `@crates/rpc/chain-index/src/index.rs`:
- Around line 1306-1322: The test prop_duplicate_block_storage never verifies
replacement because generate_block_data / make_test_stored_block produce
identical hashes for the same block_number; mutate block2.hash to a distinct
value before calling index.store_block to exercise the "replace" path (e.g. set
block2.hash to a different B256 like B256::repeat_byte(0xFFu8 ^ (block_number as
u8)) or any other deterministic alternative), then call
index.store_block(block2, txs2.clone(), receipts2) and assert
index.get_block(block_number).unwrap().unwrap().hash equals that new value and
that index.get_block_transactions still reflects txs2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant