Skip to content

fix(storage): idempotent rm/mv operations with vector index sync#284

Merged
qin-ctx merged 3 commits intovolcengine:mainfrom
SeanZ:fix/idempotent-rm-vector-index-sync
Feb 25, 2026
Merged

fix(storage): idempotent rm/mv operations with vector index sync#284
qin-ctx merged 3 commits intovolcengine:mainfrom
SeanZ:fix/idempotent-rm-vector-index-sync

Conversation

@SeanZ
Copy link
Contributor

@SeanZ SeanZ commented Feb 25, 2026

Description

Make file deletion (rm) and move (mv) operations idempotent, ensuring vector index consistency. When deleting/moving a non-existent file, the operation succeeds instead of failing, and any orphan vector index records are cleaned up.

This follows industry standards:

  • Linux rm -f - ignores nonexistent files
  • AWS S3 DeleteObject - returns success for non-existent keys
  • Kubernetes DELETE API - idempotent by design

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • AGFS Server (Go): Fixed HTTP status code for non-existent files

    • Changed fmt.Errorf("no such file...") to filesystem.NewNotFoundError() so mapErrorToStatus() returns 404 instead of 500
    • Affected plugins: localfs, memfs, queuefs, serverinfofs
  • AGFS SDK (Python): Added force parameter to rm() method

    • Defaults to force=True (like rm -f)
    • 404 responses return success {"message": "deleted"} instead of raising exception
    • Introduced AGFSHTTPError with status_code attribute for better error handling
  • VikingFS: Idempotent rm() and mv() with vector index cleanup

    • rm(): Deleting non-existent file succeeds and cleans any orphan index records
    • mv(): When source not found (404), cleans all related URIs from vector index

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

Before (500 error for non-existent file):

Server returned 500 Internal Server Error with message "no such file or directory"

VikingFS.rm(uri)
    │
    ▼
AGFS SDK: rm(path)
    │
    ▼
HTTP DELETE /files?path=xxx
    │
    ▼
localfs.Remove(path)
    │
    ▼
return fmt.Errorf("no such file or directory: %s", path)
    │
    ▼
Handler: status := mapErrorToStatus(err)
    │
    ▼
errors.Is(err, filesystem.ErrNotFound)  ──► false (普通 error 没有 Is() 方法)
    │
    ▼
return http.StatusInternalServerError (500)
    │
    ▼
AGFS SDK: raise AGFSClientError("no such file...")
    │
    ▼
VikingFS: ❌ Exception 抛出,向量索引未清理

After (graceful success):

Deleting non-existent file succeeds silently and returns {"message": "deleted"}

VikingFS.rm(uri)
    │
    ▼
AGFS SDK: rm(path, force=True)
    │
    ▼
HTTP DELETE /files?path=xxx
    │
    ▼
localfs.Remove(path)
    │
    ▼
return filesystem.NewNotFoundError("remove", path)
    │
    ▼
Handler: status := mapErrorToStatus(err)
    │
    ▼
errors.Is(err, filesystem.ErrNotFound)  ──► true (NotFoundError 实现了 Is() 方法)
    │
    ▼
return http.StatusNotFound (404)
    │
    ▼
AGFS SDK: force=True && status_code=404  ──► return {"message": "deleted"}
    │
    ▼
VikingFS: _delete_from_vector_store()  ──► 清理孤儿索引
    │
    ▼
✅ Success

Why Idempotent Delete Matters

In distributed systems and concurrent environments, idempotent operations are crucial for:

  1. Retry Safety - Retrying failed requests won't cause errors
  2. Race Condition Handling - Multiple delete requests for same file all succeed
  3. Index Consistency - Orphan vector index records get cleaned even if file already deleted

@SeanZ SeanZ force-pushed the fix/idempotent-rm-vector-index-sync branch 2 times, most recently from 1810419 to ab9a903 Compare February 25, 2026 10:40
- AGFS Server (Go): Use filesystem.NewNotFoundError() instead of
  fmt.Errorf() so mapErrorToStatus() returns 404 instead of 500
  - Affected plugins: localfs, memfs, queuefs, serverinfofs

- AGFS SDK (Python): rm() defaults to force=True (like rm -f)
  - 404 returns success instead of raising exception
  - Use AGFSHTTPError with status_code for HTTP errors

- VikingFS: rm() and mv() are now idempotent
  - Deleting non-existent file succeeds and cleans orphan index
  - mv() cleans all orphan indices when source not found (404)

This follows industry standards (Linux rm -f, AWS S3 DeleteObject)
where delete operations are idempotent by default.
@SeanZ SeanZ force-pushed the fix/idempotent-rm-vector-index-sync branch from ab9a903 to d4577b9 Compare February 25, 2026 11:07
Extend NewNotFoundError usage to Read, ReadDir, Open, and Grep
operations for consistent HTTP 404 responses across all plugins.
@qin-ctx qin-ctx self-assigned this Feb 25, 2026
When server returns JSON body with error message, the old code would
raise AGFSClientError (without status_code) instead of AGFSHTTPError.
This broke the force=True logic in rm() which checks status_code == 404.

Now all HTTP errors use AGFSHTTPError consistently, preserving the
status_code for callers that need to handle specific HTTP status codes.

Root cause: Original upstream code had this design flaw - JSON parsing
success path would lose status_code. This was not exposed before because
no code depended on status_code until we added force=True for rm().
if err != nil {
if os.IsNotExist(err) {
return nil, fmt.Errorf("no such file: %s", path)
return nil, filesystem.NewNotFoundError("grep", path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该用 openstream ?

@qin-ctx qin-ctx merged commit 5e44a0e into volcengine:main Feb 25, 2026
21 of 22 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants