Skip to content

fix: correct storage.update() call signature in _update_active_counts()#280

Merged
MaojiaSheng merged 3 commits intovolcengine:mainfrom
ponsde:fix/active-count-update-signature
Feb 25, 2026
Merged

fix: correct storage.update() call signature in _update_active_counts()#280
MaojiaSheng merged 3 commits intovolcengine:mainfrom
ponsde:fix/active-count-update-signature

Conversation

@ponsde
Copy link
Contributor

@ponsde ponsde commented Feb 25, 2026

Problem

Session._update_active_counts() has three bugs that leave active_count permanently stuck at 0.

Bug 1: wrong keyword argument names

# Broken call in session.py:
storage.update(
    collection="context",
    filter={"uri": usage.uri},            # ← not a valid kwarg
    update={"$inc": {"active_count": 1}}, # ← not a valid kwarg
)

# Actual signature of VikingVectorIndexBackend.update:
async def update(self, collection: str, id: str, data: Dict[str, Any]) -> bool:

Python raises TypeError: update() got an unexpected keyword argument 'filter' on every call. The exception is silently swallowed by except Exception, so active_count is never written.

Bug 2: unsupported $inc operator

storage.update() is merge-and-upsert ({**existing, **data}). It does not support MongoDB-style increment operators. The value must be a plain integer.

Bug 3: fetch_by_uri returns subtree matches on path fields

Even after fixing bugs 1 and 2, using fetch_by_uri() to locate the record is unreliable. The local VikingDB engine applies hierarchical (subtree) matching to path-type fields, so filtering uri = 'viking://resources/doc' also matches child records like 'viking://resources/doc/chunk1'. When more than one record matches, fetch_by_uri raises ValueError("Duplicate records found") and returns Noneactive_count still goes unupdated.

Verified empirically:

filter(must, uri="viking://resources/doc")
→ returns ["viking://resources/doc", "viking://resources/doc/chunk1"]   # subtree match

Fix

Compute record_id = md5(uri) directly — the same formula used in collection_schemas.py for every context record — then use storage.get(ids=[record_id]) for a precise single-record lookup unaffected by path-field matching semantics:

import hashlib

record_id = hashlib.md5(usage.uri.encode("utf-8")).hexdigest()
records = run_async(storage.get(collection="context", ids=[record_id]))
if not records:
    continue
current_count = records[0].get("active_count") or 0
run_async(storage.update(
    collection="context",
    id=record_id,
    data={"active_count": current_count + 1},  # plain int, not $inc
))

Verification

All three bugs independently verified (3/3 runs each):

  • Bug 1 (mock test): TypeError: update() got an unexpected keyword argument 'filter'
  • Bug 2 (source read): update() implementation confirmed as merge, no $inc support
  • Bug 3 (empirical test): path filter on must op returns subtrees, not exact matches

End-to-end test result before fix (3/3 runs):

commit.active_count_updated = 0   # never incremented

Changes

  • openviking/session/session.py: add import hashlib; replace broken kwargs + $inc + fetch_by_uri with storage.get(ids=[md5(uri)]) lookup and correct update(id, data) call
  • tests/session/test_session_commit.py: add regression test using storage.get() for exact id lookup; docstring covers all three bugs

_update_active_counts() was calling storage.update() with MongoDB-style
keyword arguments (filter= and update=) that do not match the actual method
signature:

  async def update(self, collection: str, id: str, data: Dict[str, Any])

This caused a TypeError on every session commit, silently swallowed by the
except block, leaving active_count permanently at 0.

Fix: use fetch_by_uri() to retrieve the record and its id, then call
update(collection, id, data) with the correct positional semantics.

Also adds a regression test that verifies active_count actually increments
in storage after a commit with usage records.
@CLAassistant
Copy link

CLAassistant commented Feb 25, 2026

CLA assistant check
All committers have signed the CLA.

@MaojiaSheng MaojiaSheng requested a review from zhoujh01 February 25, 2026 09:44
- client._service doesn't exist on AsyncOpenViking; correct path is
  client._client.service.vikingdb_manager (using public .service property)
- Expand regression test docstring to note both bugs fixed:
  1. wrong kwargs (filter=/update=) → TypeError
  2. $inc operator not supported by storage.update()
…active_counts

fetch_by_uri() filters on the 'uri' path-type field. The local VikingDB
engine applies hierarchical (subtree) matching to path fields, so querying
uri='viking://resources/doc' also returns child records such as
'viking://resources/doc/chunk1'. When more than one record matches,
fetch_by_uri raises ValueError('Duplicate records found') and returns None,
leaving active_count un-updated even after the TypeError fix.

Fix: compute record_id = md5(uri) directly (the same formula used in
collection_schemas.py) and call storage.get(ids=[record_id]), which is an
exact ID lookup and is unaffected by path-field matching semantics.

Also updates the regression test to use storage.get() for the same reason.
@MaojiaSheng MaojiaSheng merged commit 653a07e into volcengine:main Feb 25, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants