LCORE-1617 Centralize Llama Stack Vector Store by JslYoon · Pull Request #1473 · lightspeed-core/lightspeed-stack

JslYoon · 2026-04-07T23:09:45Z

Description

The proposed architectural update aims to decouple the Red Hat Developer Hub (RHDH) plugin from its current tight dependency on the Llama Stack by centralizing all vector store operations within lightspeed-core. Currently, the RHDH plugin directly manages sessions and documents through multiple Llama Stack API calls, making it difficult to switch to alternative vector stores like ChromaDB or Pinecone without significant refactoring. By implementing a proxy/wrapper pattern, lightspeed-core will host domain-focused REST endpoints and handle all business logic, while the RHDH plugin is reduced to a thin HTTP client. This transition ensures a single source of truth for notebook logic, improves maintainability, and allows for backend infrastructure changes without impacting the frontend plugin.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude code
Generated by: Claude code

Related Tickets & Documents

Related Issue #1617
Closes #1617

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added vector store management and file management REST APIs (create, list, retrieve, update, delete; file upload and attach to vector stores).
- New authorization actions for vector store and file operations.
API / Schemas
- New request and response schemas for vector stores, files, and list endpoints; integrated into the v1 API surface.
Tests
- Added comprehensive unit tests covering success and error cases, retries, and routing registration.

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai · 2026-04-08T00:27:54Z

Walkthrough

Adds a new FastAPI vector store API (CRUD for vector stores, file upload/attachment, file management), new request/response models, RBAC actions, router registration, and comprehensive unit tests; handlers validate configuration, authorize requests, call an async Llama Stack client, and map errors to HTTP responses.

Changes

Cohort / File(s)	Summary
Vector Store REST API `src/app/endpoints/vector_stores.py`	New FastAPI APIRouter providing endpoints to create/list/get/update/delete vector stores, upload files, attach files to vector stores, list/get/delete vector store files. Handlers enforce auth, validate configuration, call `AsyncLlamaStackClientHolder().get_client()`, apply retry logic for lock errors, and map client exceptions to HTTP responses.
API Models & Enums `src/models/config.py`, `src/models/requests.py`, `src/models/responses.py`	Added three RBAC `Action` values (`MANAGE_VECTOR_STORES`, `READ_VECTOR_STORES`, `MANAGE_FILES`); three new request models (`VectorStoreCreateRequest`, `VectorStoreUpdateRequest`, `VectorStoreFileCreateRequest`); five new response models (`VectorStoreResponse`, `VectorStoresListResponse`, `FileResponse`, `VectorStoreFileResponse`, `VectorStoreFilesListResponse`) with `openapi_response` classmethods and strict schema configs.
Router Integration `src/app/routers.py`	Imported and mounted `vector_stores.router` under the `"/v1"` prefix in router composition.
Unit Tests — endpoints `tests/unit/app/endpoints/test_vector_stores.py`	New comprehensive test module covering vector store and file handlers: many success and error-path tests (configuration missing, connection errors, bad request/not found, retry behavior for lock errors). Mocks `AsyncLlamaStackClientHolder.get_client` and module-level configuration.
Unit Tests — routers `tests/unit/app/test_routers.py`	Updated tests to assert inclusion of `vector_stores.router` and adjusted expected router counts/prefix checks.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (HTTP)
    participant Auth as Auth Dependency
    participant Config as Configuration Check
    participant API as Vector Stores API (router)
    participant Llama as AsyncLlamaStackClient
    participant Store as Vector Store Service

    Client->>API: POST /v1/vector-stores (body, file)
    API->>Auth: authorize(Action.*)
    Auth-->>API: auth tuple
    API->>Config: check_configuration_loaded(configuration)
    Config-->>API: ok
    API->>Llama: get_client()
    Llama-->>API: client
    API->>Llama: client.vector_stores.create / client.files.create / client.vector_stores.add_file(...)
    alt add_file lock error
        Llama-->>API: LockError
        API->>Llama: retry (exponential backoff)
    end
    Llama-->>API: success / error
    API-->>Client: HTTP response (mapped model or error)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main change: centralizing Llama Stack vector store operations within the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 129-144: Remove the ad-hoc debugging call to await
client.models.list() and any plain print() statements in vector store request
handling (including the print at the start of the create flow and the other
print occurrences noted) and either delete them or convert them to structured
logger.debug() calls; additionally, change the logger.info that currently logs
the full request payload (the logger.info referencing "Creating vector store -
body_dict: %s, extra_body: %s") to either logger.debug or redact
sensitive/session/document fields before logging so no full request body is
emitted at info level; locate the code by the body.model_dump(exclude_none=True)
usage and the extra_body construction to apply these changes.
- Around line 366-369: Update both delete route decorators (e.g.,
`@router.delete`("/vector-stores/{vector_store_id}") and the other delete route
for entries) to include status_code=status.HTTP_204_NO_CONTENT so the actual
response matches the OpenAPI responses={"204": ...}; also fix the imports by
adding status and importing Depends directly from fastapi (e.g., from fastapi
import Depends, status) instead of importing Depends from fastapi.params and
ensure any existing references to status.HTTP_204_NO_CONTENT resolve.
- Around line 65-73: file_responses currently advertises a 503 via
ServiceUnavailableResponse for client-side upload validation failures; replace
that with a 400 Bad Request mapping and ensure BadRequestError is translated to
an HTTP 400 response. Update the file_responses dict to include a
BadRequestResponse/OpenAPI 400 entry instead of ServiceUnavailableResponse, and
in the handler where BadRequestError is caught (reference BadRequestError and
any exception handling code around the upload endpoint) raise or return FastAPI
HTTPException(status_code=400) so the OpenAPI and runtime behavior consistently
reflect a 400 Bad Request for malformed uploads.

In `@src/models/requests.py`:
- Around line 1009-1050: The VectorStoreUpdateRequest model currently allows an
empty payload because all fields are optional; add a Pydantic `@model_validator`
(e.g., on VectorStoreUpdateRequest) that runs after parsing and checks that at
least one of name, expires_at, or metadata is not None (and treat an empty
metadata dict as “empty”); if all are absent/empty, raise a ValueError so
requests return 422. Use `@field_validator` only if you need per-field
normalization (e.g., converting empty metadata to None) and reference the class
name VectorStoreUpdateRequest and the field names name, expires_at, metadata
when implementing the validators.
- Around line 1053-1097: The VectorStoreFileCreateRequest model does not enforce
the documented limits on the attributes field (max 16 pairs, keys max 64 chars,
string values max 512 chars, values may also be bool/number), so add validation:
implement a `@field_validator`("attributes", mode="before" or "after") to ensure
the field is a dict when present, has at most 16 entries, that each key is a
string <=64 chars, and each value is either a string (<=512 chars), bool, or
number; then implement a `@model_validator` to enforce any cross-field invariants
if needed and raise clear ValueError messages on violations. Reference the
VectorStoreFileCreateRequest class and the attributes field when adding these
Pydantic validators; keep model_config as-is (extra: forbid).

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 33-52: The test fakes lack the optional fields the real handler
expects; update the VectorStore __init__ to accept metadata: Optional[dict]=None
and attributes: Optional[dict]=None (defaulting to None) and assign them to
self.metadata and self.attributes, and likewise add those optional
parameters/attributes to the other mock class in the same file (the file handler
fake around lines 78-90) so create_vector_store() and file handlers can read
metadata/attributes without error.
- Around line 169-172: The assertions are inside the pytest.raises context and
never run after the awaited create_vector_store raises; move the assertions so
they execute after the with block. Specifically, in
tests/unit/app/endpoints/test_vector_stores.py surrounding the call to
create_vector_store(request=request, auth=auth, body=body), keep the "with
pytest.raises(HTTPException) as e:" only wrapping the await, then after exiting
that block assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
and assert e.value.detail["response"] == "Configuration is not loaded" (and make
the same change for the other occurrence around lines 224-227).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b8a0a1c8-d00b-4162-a0c2-b9dac912b21d

📥 Commits

Reviewing files that changed from the base of the PR and between 76fd4de and e28fc79.

📒 Files selected for processing (6)

src/app/endpoints/vector_stores.py
src/app/routers.py
src/models/config.py
src/models/requests.py
src/models/responses.py
tests/unit/app/endpoints/test_vector_stores.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-pr

🧰 Additional context used

📓 Path-based instructions (6)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/app/routers.py
src/models/config.py
src/models/requests.py
src/models/responses.py
src/app/endpoints/vector_stores.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/routers.py
src/app/endpoints/vector_stores.py

src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

src/models/config.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

src/models/config.py
src/models/requests.py
src/models/responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/unit/app/endpoints/test_vector_stores.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_vector_stores.py

🧠 Learnings (3)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py
src/models/requests.py
src/models/responses.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py
src/models/requests.py
src/models/responses.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/vector_stores.py

🪛 GitHub Actions: Unit tests

src/app/endpoints/vector_stores.py

[error] 159-159: Test_create_vector_store_success failed: AttributeError in create_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'

[error] 214-214: Test_list_vector_stores_success failed: AttributeError in list_vector_stores when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'

[error] 275-275: Test_get_vector_store_success failed: AttributeError in get_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'

[error] 345-345: Test_update_vector_store_success failed: AttributeError in update_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'

[error] 590-590: Test_add_file_to_vector_store_success failed: AttributeError in add_file_to_vector_store when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'

[error] 661-661: Test_list_vector_store_files_success failed: AttributeError in list_vector_store_files when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'

[error] 736-736: Test_get_vector_store_file_success failed: AttributeError in get_vector_store_file when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'

coderabbitai · 2026-04-08T00:39:38Z

src/app/endpoints/vector_stores.py

+file_responses: dict[int | str, dict[str, Any]] = {
+    200: FileResponse.openapi_response(),
+    401: UnauthorizedResponse.openapi_response(
+        examples=["missing header", "missing token"]
+    ),
+    403: ForbiddenResponse.openapi_response(examples=["endpoint"]),
+    500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),
+    503: ServiceUnavailableResponse.openapi_response(),
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For llama-stack-clientversion 0.5.2, what HTTP status or error semantics doesBadRequestError represent?

💡 Result:

In llama-stack-client v0.5.2, BadRequestError represents an HTTP 400 (Bad Request) response—i.e., it’s the APIStatusError subclass raised when the server returns status code 400, typically indicating the request was invalid (bad parameters/body). [1]

Source: [1] (pypi.org)

Citations:

1: https://pypi.org/project/llama-stack-client/

🏁 Script executed:

wc -l src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 114

🏁 Script executed:

sed -n '50,85p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1639

🏁 Script executed:

sed -n '480,500p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1134

🏁 Script executed:

sed -n '1,100p' src/app/endpoints/vector_stores.py | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 3780

🏁 Script executed:

sed -n '480,510p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1409

🏁 Script executed:

rg -n "BadRequestError" src/app/endpoints/vector_stores.py -A 3

Repository: lightspeed-core/lightspeed-stack

Length of output: 1870

🏁 Script executed:

rg -n "ServiceUnavailableResponse" src/app/endpoints/vector_stores.py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 1417

Use HTTP 400 (Bad Request) for BadRequestError, not 503.

At lines 489-492, BadRequestError (which represents HTTP 400 per llama-stack-client v0.5.2) is incorrectly mapped to ServiceUnavailableResponse (503). The file_responses dict advertises status codes 200, 401, 403, 500, and 503—no 4xx response for malformed uploads. Clients will interpret 503 as a transient backend outage instead of a client error requiring request correction.

Per the coding guidelines, use FastAPI HTTPException with appropriate status codes. Return 400 for validation failures and update file_responses to advertise this status.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 65 - 73, file_responses currently advertises a 503 via ServiceUnavailableResponse for client-side upload validation failures; replace that with a 400 Bad Request mapping and ensure BadRequestError is translated to an HTTP 400 response. Update the file_responses dict to include a BadRequestResponse/OpenAPI 400 entry instead of ServiceUnavailableResponse, and in the handler where BadRequestError is caught (reference BadRequestError and any exception handling code around the upload endpoint) raise or return FastAPI HTTPException(status_code=400) so the OpenAPI and runtime behavior consistently reflect a 400 Bad Request for malformed uploads.

coderabbitai · 2026-04-08T00:39:38Z

src/app/endpoints/vector_stores.py

+        # Extract provider_id for extra_body (not a direct client parameter)
+        body_dict = body.model_dump(exclude_none=True)
+        print("client.models.list() reaches here", await client.models.list())
+        extra_body = {}
+        if "provider_id" in body_dict:
+            extra_body["provider_id"] = body_dict.pop("provider_id")
+        if "embedding_model" in body_dict:
+            extra_body["embedding_model"] = body_dict.pop("embedding_model")
+        if "embedding_dimension" in body_dict:
+            extra_body["embedding_dimension"] = body_dict.pop("embedding_dimension")
+
+        logger.info(
+            "Creating vector store - body_dict: %s, extra_body: %s",
+            body_dict,
+            extra_body,
+        )


⚠️ Potential issue | 🟠 Major

Remove the debugging probe from the request path.

Line 131 adds an unrelated backend call before every vector-store create, and the print() calls on Lines 465 and 725-730 bypass structured logging. On top of that, Lines 140-144 log the full request payload at info, which can include session/document metadata.

🛠️ Proposed fix

- print("client.models.list() reaches here", await client.models.list()) extra_body = {} @@ - logger.info( - "Creating vector store - body_dict: %s, extra_body: %s", - body_dict, - extra_body, - ) + logger.debug( + "Creating vector store '%s' with provider '%s'", + body.name, + extra_body.get("provider_id"), + ) @@ - print("filename reaches here", filename) @@ - print("file_id", file_id) vs_file = await client.vector_stores.files.retrieve( vector_store_id=vector_store_id, file_id=file_id, ) - print("vs_file reaches here", vs_file)

As per coding guidelines, use logger.debug() for detailed diagnostic information and logger.info() for general information about program execution.

Also applies to: 465-465, 725-730

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 129 - 144, Remove the ad-hoc debugging call to await client.models.list() and any plain print() statements in vector store request handling (including the print at the start of the create flow and the other print occurrences noted) and either delete them or convert them to structured logger.debug() calls; additionally, change the logger.info that currently logs the full request payload (the logger.info referencing "Creating vector store - body_dict: %s, extra_body: %s") to either logger.debug or redact sensitive/session/document fields before logging so no full request body is emitted at info level; locate the code by the body.model_dump(exclude_none=True) usage and the extra_body construction to apply these changes.

coderabbitai · 2026-04-08T00:39:38Z

src/app/endpoints/vector_stores.py

+@router.delete(
+    "/vector-stores/{vector_store_id}",
+    responses={"204": {"description": "Vector store deleted"}},
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

FastAPI official docs: if a @router.deletehandler omitsstatus_code, what default success status code is returned, and does responses={204: ...} affect the actual runtime status code?

💡 Result:

Default success status code: If you omit status_code on a @router.delete(...) (or any path operation), FastAPI returns HTTP 200 OK by default. [1]

Does responses={204: ...} change runtime status code? No. The responses parameter is for documenting additional responses in the OpenAPI schema; it does not make FastAPI automatically return 204 at runtime. To actually return 204, you must set status_code=204 (or return a Response/JSONResponse with status_code=204). FastAPI explicitly notes that for “additional responses” you must return a Response directly with your status code. [2]

Sources
[1] FastAPI docs — Response Status Code https://fastapi.tiangolo.com/tutorial/response-status-code/
[2] FastAPI docs — Additional Responses in OpenAPI https://fastapi.tiangolo.com/advanced/additional-responses/

🏁 Script executed:

wc -l src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 114

🏁 Script executed:

head -30 src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 992

🏁 Script executed:

sed -n '360,375p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 617

🏁 Script executed:

sed -n '755,775p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 791

Declare the 204 status code on the delete routes and fix imports.

Both delete routes at lines 366–369 and 764–767 specify responses={"204": ...} for OpenAPI documentation, but omit status_code=status.HTTP_204_NO_CONTENT. Without the explicit status code, FastAPI returns HTTP 200 by default—creating a mismatch between documented and actual behavior.

Additionally, the import statement violates coding guidelines: status is missing, and Depends should be imported directly from fastapi rather than fastapi.params.

Proposed fix

-from fastapi import APIRouter, File, HTTPException, Request, UploadFile -from fastapi.params import Depends +from fastapi import APIRouter, File, HTTPException, Request, UploadFile, status, Depends @@ `@router.delete`( "/vector-stores/{vector_store_id}", + status_code=status.HTTP_204_NO_CONTENT, responses={"204": {"description": "Vector store deleted"}}, ) @@ `@router.delete`( "/vector-stores/{vector_store_id}/files/{file_id}", + status_code=status.HTTP_204_NO_CONTENT, responses={"204": {"description": "File deleted from vector store"}}, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 366 - 369, Update both delete route decorators (e.g., `@router.delete`("/vector-stores/{vector_store_id}") and the other delete route for entries) to include status_code=status.HTTP_204_NO_CONTENT so the actual response matches the OpenAPI responses={"204": ...}; also fix the imports by adding status and importing Depends directly from fastapi (e.g., from fastapi import Depends, status) instead of importing Depends from fastapi.params and ensure any existing references to status.HTTP_204_NO_CONTENT resolve.

coderabbitai · 2026-04-08T00:39:38Z

src/models/requests.py

+class VectorStoreUpdateRequest(BaseModel):
+    """Model representing a request to update a vector store.
+
+    Attributes:
+        name: New name for the vector store.
+        expires_at: Optional expiration timestamp.
+        metadata: Optional metadata dictionary for storing session information.
+    """
+
+    name: Optional[str] = Field(
+        None,
+        description="New name for the vector store",
+        examples=["updated_vector_store"],
+        min_length=1,
+        max_length=256,
+    )
+
+    expires_at: Optional[int] = Field(
+        None,
+        description="Unix timestamp when the vector store should expire",
+        examples=[1735689600],
+        gt=0,
+    )
+
+    metadata: Optional[dict[str, Any]] = Field(
+        None,
+        description="Metadata dictionary for storing session information",
+        examples=[{"user_id": "user123", "session_id": "sess456"}],
+    )
+
+    model_config = {
+        "extra": "forbid",
+        "json_schema_extra": {
+            "examples": [
+                {
+                    "name": "updated_vector_store",
+                    "expires_at": 1735689600,
+                    "metadata": {"user_id": "user123"},
+                },
+            ]
+        },
+    }


⚠️ Potential issue | 🟠 Major

Reject empty vector-store update payloads.

VectorStoreUpdateRequest currently accepts {} because every field is optional. That means update_vector_store() can forward a no-op update upstream instead of failing fast with a 422.

🛠️ Proposed fix

class VectorStoreUpdateRequest(BaseModel): @@ model_config = { "extra": "forbid", "json_schema_extra": { "examples": [ { "name": "updated_vector_store", "expires_at": 1735689600, "metadata": {"user_id": "user123"}, }, ] }, } + + `@model_validator`(mode="after") + def validate_non_empty_update(self) -> Self: + """Ensure at least one updatable field is provided.""" + if ( + self.name is None + and self.expires_at is None + and self.metadata is None + ): + raise ValueError( + "At least one of 'name', 'expires_at', or 'metadata' must be provided" + ) + return self

As per coding guidelines, src/models/**/*.py should use @field_validator and @model_validator for custom validation in Pydantic models.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 1009 - 1050, The VectorStoreUpdateRequest model currently allows an empty payload because all fields are optional; add a Pydantic `@model_validator` (e.g., on VectorStoreUpdateRequest) that runs after parsing and checks that at least one of name, expires_at, or metadata is not None (and treat an empty metadata dict as “empty”); if all are absent/empty, raise a ValueError so requests return 422. Use `@field_validator` only if you need per-field normalization (e.g., converting empty metadata to None) and reference the class name VectorStoreUpdateRequest and the field names name, expires_at, metadata when implementing the validators.

coderabbitai · 2026-04-08T00:39:38Z

src/models/requests.py

+class VectorStoreFileCreateRequest(BaseModel):
+    """Model representing a request to add a file to a vector store.
+
+    Attributes:
+        file_id: ID of the file to add to the vector store.
+        attributes: Optional metadata key-value pairs (max 16 pairs).
+        chunking_strategy: Optional chunking strategy configuration.
+    """
+
+    file_id: str = Field(
+        ...,
+        description="ID of the file to add to the vector store",
+        examples=["file-abc123"],
+        min_length=1,
+    )
+
+    attributes: Optional[dict[str, str | float | bool]] = Field(
+        None,
+        description=(
+            "Set of up to 16 key-value pairs for storing additional information. "
+            "Keys: strings (max 64 chars). Values: strings (max 512 chars), booleans, or numbers."
+        ),
+        examples=[
+            {"created_at": "2026-04-04T15:20:00Z", "updated_at": "2026-04-04T15:20:00Z"}
+        ],
+    )
+
+    chunking_strategy: Optional[dict[str, Any]] = Field(
+        None,
+        description="Chunking strategy configuration for this file",
+        examples=[{"type": "fixed", "chunk_size": 512, "chunk_overlap": 50}],
+    )
+
+    model_config = {
+        "extra": "forbid",
+        "json_schema_extra": {
+            "examples": [
+                {
+                    "file_id": "file-abc123",
+                    "attributes": {"created_at": "2026-04-04T15:20:00Z"},
+                    "chunking_strategy": {"type": "fixed", "chunk_size": 512},
+                },
+            ]
+        },
+    }


⚠️ Potential issue | 🟠 Major

Enforce the documented attributes limits in the request model.

The field description says this payload is limited to 16 pairs with max key/value lengths, but none of that is validated here. Invalid requests will make it to the backend layer and fail later than they should.

🛠️ Proposed fix

class VectorStoreFileCreateRequest(BaseModel): @@ model_config = { "extra": "forbid", "json_schema_extra": { "examples": [ { "file_id": "file-abc123", "attributes": {"created_at": "2026-04-04T15:20:00Z"}, "chunking_strategy": {"type": "fixed", "chunk_size": 512}, }, ] }, } + + `@field_validator`("attributes") + `@classmethod` + def validate_attributes( + cls, value: Optional[dict[str, str | float | bool]] + ) -> Optional[dict[str, str | float | bool]]: + """Validate attribute count and key/value sizes.""" + if value is None: + return value + if len(value) > 16: + raise ValueError("A maximum of 16 attributes is allowed") + for key, attr_value in value.items(): + if len(key) > 64: + raise ValueError(f"Attribute key '{key}' exceeds 64 characters") + if isinstance(attr_value, str) and len(attr_value) > 512: + raise ValueError( + f"Attribute value for '{key}' exceeds 512 characters" + ) + return value

As per coding guidelines, src/models/**/*.py should use @field_validator and @model_validator for custom validation in Pydantic models.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 1053 - 1097, The VectorStoreFileCreateRequest model does not enforce the documented limits on the attributes field (max 16 pairs, keys max 64 chars, string values max 512 chars, values may also be bool/number), so add validation: implement a `@field_validator`("attributes", mode="before" or "after") to ensure the field is a dict when present, has at most 16 entries, that each key is a string <=64 chars, and each value is either a string (<=512 chars), bool, or number; then implement a `@model_validator` to enforce any cross-field invariants if needed and raise clear ValueError messages on violations. Reference the VectorStoreFileCreateRequest class and the attributes field when adding these Pydantic validators; keep model_config as-is (extra: forbid).

tests/unit/app/endpoints/test_vector_stores.py

coderabbitai · 2026-04-08T00:39:38Z

tests/unit/app/endpoints/test_vector_stores.py

+    with pytest.raises(HTTPException) as e:
+        await create_vector_store(request=request, auth=auth, body=body)
+        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore


⚠️ Potential issue | 🟠 Major

Move the assertions out of the pytest.raises body.

Once the awaited call raises on Line 170 or Line 225, the remaining assertions inside the with block are skipped. These tests currently verify only that some HTTPException was raised.

🛠️ Proposed fix

with pytest.raises(HTTPException) as e: await create_vector_store(request=request, auth=auth, body=body) - assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR - assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore + assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR + assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore @@ with pytest.raises(HTTPException) as e: await create_vector_store(request=request, auth=auth, body=body) - assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE - assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignore + assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE + assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignore

Also applies to: 224-227

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 169 - 172, The assertions are inside the pytest.raises context and never run after the awaited create_vector_store raises; move the assertions so they execute after the with block. Specifically, in tests/unit/app/endpoints/test_vector_stores.py surrounding the call to create_vector_store(request=request, auth=auth, body=body), keep the "with pytest.raises(HTTPException) as e:" only wrapping the await, then after exiting that block assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR and assert e.value.detail["response"] == "Configuration is not loaded" (and make the same change for the other occurrence around lines 224-227).

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (4)

tests/unit/app/endpoints/test_vector_stores.py (1)

173-176: ⚠️ Potential issue | 🟠 Major

Move the assertions outside pytest.raises.

Once the awaited call raises, the remaining lines inside the context manager never run. These tests currently verify only that some HTTPException was raised.

Suggested fix

     with pytest.raises(HTTPException) as e:
         await create_vector_store(request=request, auth=auth, body=body)
-        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
-        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
+    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
@@
     with pytest.raises(HTTPException) as e:
         await create_vector_store(request=request, auth=auth, body=body)
-        assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
-        assert e.value.detail["response"] == "Unable to connect to Llama Stack"  # type: ignore
+    assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
+    assert e.value.detail["response"] == "Unable to connect to Llama Stack"  # type: ignore

Also applies to: 228-231

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 173 - 176, The
assertions are inside the pytest.raises context so they never run after await
create_vector_store(...) raises; move the two assert lines out of the with
pytest.raises(HTTPException) as e: block so you capture the exception into e and
then assert e.value.status_code and e.value.detail["response"] after the
context. Apply the same change for the second occurrence around lines
referencing the same pattern (the other test block at 228-231) and keep
references to create_vector_store, pytest.raises, e.value.status_code and
e.value.detail["response"] when updating the tests.

src/app/endpoints/vector_stores.py (3)

139-143: ⚠️ Potential issue | 🟠 Major

Stop logging the full create payload at info.

body_dict and extra_body can carry notebook/session metadata. Emitting the whole request at info level is noisy and leaks more context than this endpoint needs. Log stable identifiers only, and keep payload-level diagnostics at debug.
Suggested fix
-        logger.info(
-            "Creating vector store - body_dict: %s, extra_body: %s",
-            body_dict,
-            extra_body,
-        )
+        logger.debug(
+            "Creating vector store '%s' with provider '%s'",
+            body.name,
+            extra_body.get("provider_id"),
+        )
As per coding guidelines, use logger.debug() for detailed diagnostic information.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 139 - 143, Change the noisy
info-level logging that prints the full payload to log only stable identifiers
at info and move payload diagnostics to debug: replace the current logger.info
call that references body_dict and extra_body with an info log that emits only
stable identifiers (e.g., request id, user_id, notebook_id or session_id if
available) and use logger.debug to record the full body_dict/extra_body for
detailed diagnostics; update references in the surrounding function (the logger
usage in this module, e.g., the create vector store endpoint in
vector_stores.py) so sensitive/session metadata is not logged at info level.
65-73: ⚠️ Potential issue | 🟠 Major

Return 400 for upload bad requests, not 503.

BadRequestError here is a client-side upload failure, but the route currently advertises and returns ServiceUnavailableResponse. That makes malformed uploads look like transient backend outages and encourages retries instead of request fixes. The matching bad-request unit test will need to flip with this change.

Based on learnings, use FastAPI HTTPException with appropriate status codes for API endpoint error handling.

Also applies to: 487-490
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 65 - 73, The route currently
advertises and returns ServiceUnavailableResponse (503) for client-side
BadRequestError; change the response mapping to return a 400 BadRequestResponse
and raise FastAPI HTTPException(status_code=400) when catching BadRequestError
so uploads are treated as client errors; update file_responses to include
BadRequestResponse.openapi_response() instead of
ServiceUnavailableResponse.openapi_response() and adjust the error handling in
the same handler (and the similar handler referenced around the code marked
487-490) so unit tests expect 400 instead of 503.
9-10: ⚠️ Potential issue | 🟠 Major

Declare 204 on both delete routes and fix the FastAPI imports.

responses={"204": ...} only changes the OpenAPI schema. Without status_code=status.HTTP_204_NO_CONTENT, both handlers still return 200 at runtime. The import also bypasses the repo’s required fastapi import form for Depends/status.
Suggested fix
-from fastapi import APIRouter, File, HTTPException, Request, UploadFile
-from fastapi.params import Depends
+from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile, status
@@
 `@router.delete`(
     "/vector-stores/{vector_store_id}",
+    status_code=status.HTTP_204_NO_CONTENT,
     responses={"204": {"description": "Vector store deleted"}},
 )
@@
 `@router.delete`(
     "/vector-stores/{vector_store_id}/files/{file_id}",
+    status_code=status.HTTP_204_NO_CONTENT,
     responses={"204": {"description": "File deleted from vector store"}},
 )
As per coding guidelines, use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies.

Also applies to: 365-368, 760-763
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 9 - 10, The FastAPI import
and delete handlers are incorrect: change the import to use the repo style "from
fastapi import APIRouter, HTTPException, Request, status, Depends" (remove the
separate fastapi.params import) and update both delete route handlers that
currently only set responses={"204": ...} to also set
status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime; locate the
routes in src/app/endpoints/vector_stores.py that declare a 204 response (the
two DELETE handlers) and add status_code=status.HTTP_204_NO_CONTENT to their
decorator arguments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 86-94: The list_vector_store_files() handler currently only maps
APIConnectionError and generic Exception to responses; update
vector_store_files_list_responses to include a 404 entry (use the appropriate
NotFound/BadRequest response helper used elsewhere) and add a specific except
BadRequestError branch in list_vector_store_files() that returns the 404
response, mirroring how get_vector_store() handles BadRequestError; also update
the function docstring to mention the 404/BadRequest response.
- Around line 450-468: The code double-buffers uploaded files by calling await
file.read() then wrapping into BytesIO(content); change to pass the payload
directly to the Llama client as a tuple (filename, content) or (filename,
file.file) to avoid copying (use client.files.create(file=(filename, content),
purpose="assistants") or client.files.create(file=(filename, file.file),
purpose="assistants")), remove the BytesIO usage (file_bytes and its .name
assignment), and update logging to use the filename and size without forcing a
second copy; also add a pre-read size check using the request's Content-Length
header (and raise HTTPException when over the allowed limit) so you validate
large uploads before materializing them into memory.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 184-205: Add a test case that verifies the adapter moves
provider_id, embedding_model, and embedding_dimension from the main request into
extra_body when calling the client: use the existing test setup
(get_test_request, get_test_auth, VectorStoreCreateRequest) but include
provider_id, embedding_model, and embedding_dimension in the body; patch
AsyncLlamaStackClientHolder.get_client to return mock_client and assert
mock_client.vector_stores.create was called with extra_body containing those
three keys (and that the top-level body forwarded to create does not contain
them) while still returning a VectorStore("vs_123","test_store") and asserting
the response fields as in the current test.

---

Duplicate comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 139-143: Change the noisy info-level logging that prints the full
payload to log only stable identifiers at info and move payload diagnostics to
debug: replace the current logger.info call that references body_dict and
extra_body with an info log that emits only stable identifiers (e.g., request
id, user_id, notebook_id or session_id if available) and use logger.debug to
record the full body_dict/extra_body for detailed diagnostics; update references
in the surrounding function (the logger usage in this module, e.g., the create
vector store endpoint in vector_stores.py) so sensitive/session metadata is not
logged at info level.
- Around line 65-73: The route currently advertises and returns
ServiceUnavailableResponse (503) for client-side BadRequestError; change the
response mapping to return a 400 BadRequestResponse and raise FastAPI
HTTPException(status_code=400) when catching BadRequestError so uploads are
treated as client errors; update file_responses to include
BadRequestResponse.openapi_response() instead of
ServiceUnavailableResponse.openapi_response() and adjust the error handling in
the same handler (and the similar handler referenced around the code marked
487-490) so unit tests expect 400 instead of 503.
- Around line 9-10: The FastAPI import and delete handlers are incorrect: change
the import to use the repo style "from fastapi import APIRouter, HTTPException,
Request, status, Depends" (remove the separate fastapi.params import) and update
both delete route handlers that currently only set responses={"204": ...} to
also set status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime;
locate the routes in src/app/endpoints/vector_stores.py that declare a 204
response (the two DELETE handlers) and add
status_code=status.HTTP_204_NO_CONTENT to their decorator arguments.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 173-176: The assertions are inside the pytest.raises context so
they never run after await create_vector_store(...) raises; move the two assert
lines out of the with pytest.raises(HTTPException) as e: block so you capture
the exception into e and then assert e.value.status_code and
e.value.detail["response"] after the context. Apply the same change for the
second occurrence around lines referencing the same pattern (the other test
block at 228-231) and keep references to create_vector_store, pytest.raises,
e.value.status_code and e.value.detail["response"] when updating the tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eea96eaa-fe50-41d6-b442-dd1b7c8bd69a

📥 Commits

Reviewing files that changed from the base of the PR and between e28fc79 and 3fe63aa.

📒 Files selected for processing (3)

src/app/endpoints/vector_stores.py
tests/unit/app/endpoints/test_vector_stores.py
tests/unit/app/test_routers.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build-pr
GitHub Check: E2E: library mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/unit/app/test_routers.py
tests/unit/app/endpoints/test_vector_stores.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/test_routers.py
tests/unit/app/endpoints/test_vector_stores.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/app/endpoints/vector_stores.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/vector_stores.py

🧠 Learnings (3)

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

src/app/endpoints/vector_stores.py

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies

Applied to files:

src/app/endpoints/vector_stores.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/vector_stores.py

coderabbitai · 2026-04-08T01:21:10Z

src/app/endpoints/vector_stores.py

+vector_store_files_list_responses: dict[int | str, dict[str, Any]] = {
+    200: VectorStoreFilesListResponse.openapi_response(),
+    401: UnauthorizedResponse.openapi_response(
+        examples=["missing header", "missing token"]
+    ),
+    403: ForbiddenResponse.openapi_response(examples=["endpoint"]),
+    500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),
+    503: ServiceUnavailableResponse.openapi_response(),
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if file exists and get an overview ls -la src/app/endpoints/vector_stores.py # Get file size to determine reading strategy wc -l src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 199

🏁 Script executed:

# Read the specific section around lines 86-94 sed -n '80,100p' src/app/endpoints/vector_stores.py # Read around lines 638-643 and 650-683 sed -n '630,690p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3121

🏁 Script executed:

# Search for list_vector_store_files function definition rg -n "def list_vector_store_files" src/app/endpoints/vector_stores.py # Search for BadRequestError handling in this file rg -n "BadRequestError" src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 443

🏁 Script executed:

# Check what exceptions are imported head -50 src/app/endpoints/vector_stores.py | grep -E "^from|^import"

Repository: lightspeed-core/lightspeed-stack

Length of output: 752

🏁 Script executed:

# Search for sibling handlers to understand the pattern rg -n "NotFoundResponse" src/app/endpoints/vector_stores.py # Check other endpoint files for similar patterns rg -l "vector_stores\|BadRequestError" src/app/endpoints/ | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 599

🏁 Script executed:

# Read the full list_vector_store_files function (lines 623+) sed -n '623,680p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2210

🏁 Script executed:

# Check one of the other handlers that correctly handles BadRequestError to see the pattern sed -n '270,290p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1057

🏁 Script executed:

# Check if there are other functions in the file that also list/get vector stores rg -n "def (get_|list_|retrieve_).*vector_store" src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 217

🏁 Script executed:

# Check lines 638-643 and what those are about sed -n '638,650p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 398

🏁 Script executed:

# Check lines 650-683 sed -n '650,690p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1657

🏁 Script executed:

# Look at one more sibling handler to confirm the pattern is consistent sed -n '234,260p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 886

🏁 Script executed:

# Check what the openapi_response declarations look like to verify they exist rg -A2 "class NotFoundResponse" src/app/endpoints/ || rg -A2 "NotFoundResponse" src/models/responses.py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 287

🏁 Script executed:

# Get the full context of list_vector_store_files including decorators and responses sed -n '610,625p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 553

🏁 Script executed:

# Verify the vector_store_files_list_responses dict definition location sed -n '86,94p' src/app/endpoints/vector_stores.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 521

🏁 Script executed:

# Check if BadRequestError is actually raised by the Llama Stack client for missing vector stores rg -n "vector_stores.files.list" src/app/endpoints/vector_stores.py -A2 -B2

Repository: lightspeed-core/lightspeed-stack

Length of output: 269

Add BadRequestError handling to list_vector_store_files() for 404 responses.

When an invalid vector-store ID is passed, the backend raises BadRequestError, but this function only catches APIConnectionError and generic Exception. This causes the error to be returned as 500 instead of 404, inconsistent with sibling handlers like get_vector_store() which properly map BadRequestError to 404.

Update the vector_store_files_list_responses dict to include the 404 code, add a BadRequestError handler in the exception chain, and document the 404 response in the docstring.

Suggested fix

vector_store_files_list_responses: dict[int | str, dict[str, Any]] = { 200: VectorStoreFilesListResponse.openapi_response(), 401: UnauthorizedResponse.openapi_response( examples=["missing header", "missing token"] ), 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), + 404: NotFoundResponse.openapi_response(examples=["vector_store"]), 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), 503: ServiceUnavailableResponse.openapi_response(), } @@ Raises: HTTPException: - 401: Authentication failed - 403: Authorization failed + - 404: Vector store not found - 500: Lightspeed Stack configuration not loaded - 503: Unable to connect to Llama Stack @@ except APIConnectionError as e: logger.error("Unable to connect to Llama Stack: %s", e) response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e)) raise HTTPException(**response.model_dump()) from e + except BadRequestError as e: + logger.error("Vector store not found: %s", e) + response = NotFoundResponse( + resource="vector_store", resource_id=vector_store_id + ) + raise HTTPException(**response.model_dump()) from e except Exception as e: logger.error("Unable to list vector store files: %s", e)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

vector_store_files_list_responses: dict[int | str, dict[str, Any]] = {

200: VectorStoreFilesListResponse.openapi_response(),

401: UnauthorizedResponse.openapi_response(

examples=["missing header", "missing token"]

),

403: ForbiddenResponse.openapi_response(examples=["endpoint"]),

500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),

503: ServiceUnavailableResponse.openapi_response(),

}

vector_store_files_list_responses: dict[int | str, dict[str, Any]] = {

200: VectorStoreFilesListResponse.openapi_response(),

401: UnauthorizedResponse.openapi_response(

examples=["missing header", "missing token"]

),

403: ForbiddenResponse.openapi_response(examples=["endpoint"]),

404: NotFoundResponse.openapi_response(examples=["vector_store"]),

500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),

503: ServiceUnavailableResponse.openapi_response(),

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 86 - 94, The list_vector_store_files() handler currently only maps APIConnectionError and generic Exception to responses; update vector_store_files_list_responses to include a 404 entry (use the appropriate NotFound/BadRequest response helper used elsewhere) and add a specific except BadRequestError branch in list_vector_store_files() that returns the 404 response, mirroring how get_vector_store() handles BadRequestError; also update the function docstring to mention the 404/BadRequest response.

coderabbitai · 2026-04-08T01:21:11Z

src/app/endpoints/vector_stores.py

+        # Read file content
+        content = await file.read()
+        filename = file.filename or "uploaded_file"
+
+        # Add .txt extension if no extension present
+        # (since parsed PDFs/URLs are sent as plain text)
+        if not os.path.splitext(filename)[1]:
+            filename = f"{filename}.txt"
+
+        logger.info(
+            "Uploading file - filename: %s, size: %d bytes",
+            filename,
+            len(content),
+        )
+
+        # Convert to BytesIO for Llama Stack client
+        # The client expects bytes, io.IOBase, PathLike, or a tuple
+        file_bytes = BytesIO(content)
+        file_bytes.name = filename  # Set the filename attribute


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full function context around lines 450-468 cat -n src/app/endpoints/vector_stores.py | sed -n '420,480p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2378

🏁 Script executed:

# Check if there's a size limit check or any validation before reading cat -n src/app/endpoints/vector_stores.py | sed -n '410,475p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2563

🏁 Script executed:

# Look for any size limit constants or config rg -n "size|SIZE|limit|LIMIT" src/app/endpoints/vector_stores.py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 140

🏁 Script executed:

# Check if UploadFile.file is used anywhere else in the codebase rg -n "UploadFile" src/ -A 5 -B 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 1217

🏁 Script executed:

# Check if there are other file upload endpoints in the codebase to see patterns find src -name "*.py" -type f | xargs grep -l "UploadFile" | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 110

🏁 Script executed:

# Look for how file.file is used or if it's mentioned anywhere rg -n "file\.file" src/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 278

🏁 Script executed:

# Check the imports and FastAPI version to understand UploadFile better cat src/app/endpoints/vector_stores.py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 992

🏁 Script executed:

# Check the Llama Stack client usage - what does files.create() expect? rg -n "client.files.create\|files\.create" src/ -B 2 -A 3 --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Check if there's any documentation or type hints about what file parameter accepts rg -n "files\.create" src/ -B 5 -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 1763

🏁 Script executed:

# Search for any Python documentation or comments about UploadFile usage rg -n "SpooledTemporaryFile\|\.file\|file object" src/ --type py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Check if there's a requirements.txt to see llama-stack-client version cat requirements.txt | grep -i llama

Repository: lightspeed-core/lightspeed-stack

Length of output: 124

🏁 Script executed:

# Look for requirements files with different names find . -maxdepth 2 -name "*.txt" -o -name "pyproject.toml" -o -name "setup.py" | grep -E "req|setup|pyproject"

Repository: lightspeed-core/lightspeed-stack

Length of output: 239

🏁 Script executed:

# Check for pyproject.toml or setup.py with dependencies ls -la | grep -E "requirements|setup|pyproject"

Repository: lightspeed-core/lightspeed-stack

Length of output: 539

🏁 Script executed:

# Look for any configuration about file size limits rg -n "max.*size\|MAX.*SIZE\|file.*limit\|upload.*limit" src/ --type py -i | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Check pyproject.toml for llama-stack-client version cat pyproject.toml | grep -A 5 -B 5 "llama"

Repository: lightspeed-core/lightspeed-stack

Length of output: 813

🏁 Script executed:

# Check if the UploadFile's underlying file object is accessible # FastAPI's UploadFile has a .file property that is SpooledTemporaryFile # Let's see if there are any patterns in the code rg -n "SpooledTemporaryFile\|BinaryIO\|AsyncBinaryIO" src/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Check the full function signature and see what happens with file after the read cat -n src/app/endpoints/vector_stores.py | sed -n '417,490p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 3117

🏁 Script executed:

# Look for the error handling to see what status code is returned for large files cat -n src/app/endpoints/vector_stores.py | sed -n '490,510p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 982

🏁 Script executed:

# Search for any MAX_FILE_SIZE or similar constants rg -n "MAX\|LIMIT\|SIZE" src/ --type py | grep -i "file\|upload\|size" | head -15

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🌐 Web query:

llama-stack-client files.create() file parameter accept bytes BytesIO SpooledTemporaryFile

💡 Result:

client.files.create(file=...) accepts FileTypes, where FileContent = IO[bytes] | bytes | PathLike[...]. So yes: bytes, io.BytesIO, and tempfile.SpooledTemporaryFile (opened in binary mode) are valid inputs. [1] [2]

What you typically want in practice is to also provide a filename (and optionally a MIME type), because that’s how the client/server can infer file type (especially for BytesIO, which has no .name). The supported “rich” forms are tuples like: (filename, contents) or (filename, contents, content_type) (and optionally headers). [1]

Examples:

from io import BytesIO from llama_stack_client import LlamaStackClient client = LlamaStackClient() bio = BytesIO(b"%PDF-...") # or any bytes client.files.create( file=("doc.pdf", bio, "application/pdf"), purpose="assistants", )

import tempfile from llama_stack_client import LlamaStackClient client = LlamaStackClient() with tempfile.SpooledTemporaryFile(mode="w+b") as f: f.write(b"hello") f.seek(0) client.files.create( file=("hello.txt", f, "text/plain"), purpose="assistants", )

If you already have raw bytes and don’t care about name/type, you can pass them directly:

client.files.create(file=b"hello", purpose="assistants")

But supplying a filename is safer for type detection. [1]

Sources:
[1] FileTypes / FileContent definitions in _types.py (union includes IO[bytes], bytes, tuple variants) (github.com)
[2] FileCreateParams.file: Required[FileTypes] (github.com)
[3] README “File uploads” (bytes, PathLike, or (filename, contents, media type) tuple) (github.com)

Citations:

1: https://github.com/llamastack/llama-stack-client-python/blob/main/src/llama_stack_client/_types.py

2: https://github.com/llamastack/llama-stack-client-python/blob/main/src/llama_stack_client/types/file_create_params.py

3: https://github.com/meta-llama/llama-stack-client-python

Avoid double-buffering the file into memory.

await file.read() loads the full payload into bytes, and BytesIO(content) copies it again. For large files this doubles memory consumption. Instead, pass the bytes directly using a tuple with the filename: await client.files.create(file=(filename, content), purpose="assistants"). Alternatively, pass the underlying spooled file object directly: await client.files.create(file=(filename, file.file), purpose="assistants").

Additionally, the docstring claims to raise HTTPException for large files, but there is no size validation before reading. Add a check using the Content-Length header before materializing the payload, or configure a file size limit upstream.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 450 - 468, The code double-buffers uploaded files by calling await file.read() then wrapping into BytesIO(content); change to pass the payload directly to the Llama client as a tuple (filename, content) or (filename, file.file) to avoid copying (use client.files.create(file=(filename, content), purpose="assistants") or client.files.create(file=(filename, file.file), purpose="assistants")), remove the BytesIO usage (file_bytes and its .name assignment), and update logging to use the filename and size without forcing a second copy; also add a pre-read size check using the request's Content-Length header (and raise HTTPException when over the allowed limit) so you validate large uploads before materializing them into memory.

coderabbitai · 2026-04-08T01:21:11Z

src/app/endpoints/vector_stores.py

+    Raises:
+        HTTPException:
+            - 401: Authentication failed
+            - 403: Authorization failed
+            - 404: Vector store or file not found
+            - 500: Lightspeed Stack configuration not loaded
+            - 503: Unable to connect to Llama Stack


⚠️ Potential issue | 🟡 Minor

Don’t hard-code the 404 to body.file_id.

This path is documented as “vector store or file not found”, but the BadRequestError branch always returns a not-found payload for body.file_id. If vector_store_id is the missing resource, callers get a misleading error body. Inspect the backend error before choosing the resource, or return a neutral 404 payload.

Also applies to: 600-605

coderabbitai · 2026-04-08T01:21:11Z

tests/unit/app/endpoints/test_vector_stores.py

+    config_dict = get_test_config()
+    cfg = AppConfig()
+    cfg.init_from_dict(config_dict)
+
+    mock_client = mocker.AsyncMock()
+    mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store")
+    mock_lsc = mocker.patch(
+        "app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client"
+    )
+    mock_lsc.return_value = mock_client
+    mocker.patch("app.endpoints.vector_stores.configuration", cfg)
+
+    request = get_test_request()
+    auth = get_test_auth()
+    body = VectorStoreCreateRequest(name="test_store")
+
+    response = await create_vector_store(request=request, auth=auth, body=body)
+    assert response is not None
+    assert response.id == "vs_123"
+    assert response.name == "test_store"
+    assert response.status == "active"
+


🧹 Nitpick | 🔵 Trivial

Add a test for the extra_body translation.

This success case only sends name, so it never verifies the new adapter logic that strips provider_id, embedding_model, and embedding_dimension out of the request body and forwards them via extra_body. That is the core wrapper behavior this PR is adding.

Suggested test shape

- body = VectorStoreCreateRequest(name="test_store") + body = VectorStoreCreateRequest( + name="test_store", + provider_id="rhdh-docs", + embedding_model="text-embedding-ada-002", + embedding_dimension=1536, + ) response = await create_vector_store(request=request, auth=auth, body=body) + mock_client.vector_stores.create.assert_awaited_once_with( + name="test_store", + extra_body={ + "provider_id": "rhdh-docs", + "embedding_model": "text-embedding-ada-002", + "embedding_dimension": 1536, + }, + ) assert response is not None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

config_dict = get_test_config()

cfg = AppConfig()

cfg.init_from_dict(config_dict)

mock_client = mocker.AsyncMock()

mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store")

mock_lsc = mocker.patch(

"app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client"

)

mock_lsc.return_value = mock_client

mocker.patch("app.endpoints.vector_stores.configuration", cfg)

request = get_test_request()

auth = get_test_auth()

body = VectorStoreCreateRequest(name="test_store")

response = await create_vector_store(request=request, auth=auth, body=body)

assert response is not None

assert response.id == "vs_123"

assert response.name == "test_store"

assert response.status == "active"

config_dict = get_test_config()

cfg = AppConfig()

cfg.init_from_dict(config_dict)

mock_client = mocker.AsyncMock()

mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store")

mock_lsc = mocker.patch(

"app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client"

)

mock_lsc.return_value = mock_client

mocker.patch("app.endpoints.vector_stores.configuration", cfg)

request = get_test_request()

auth = get_test_auth()

body = VectorStoreCreateRequest(

name="test_store",

provider_id="rhdh-docs",

embedding_model="text-embedding-ada-002",

embedding_dimension=1536,

)

response = await create_vector_store(request=request, auth=auth, body=body)

mock_client.vector_stores.create.assert_awaited_once_with(

name="test_store",

extra_body={

"provider_id": "rhdh-docs",

"embedding_model": "text-embedding-ada-002",

"embedding_dimension": 1536,

},

)

assert response is not None

assert response.id == "vs_123"

assert response.name == "test_store"

assert response.status == "active"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 184 - 205, Add a test case that verifies the adapter moves provider_id, embedding_model, and embedding_dimension from the main request into extra_body when calling the client: use the existing test setup (get_test_request, get_test_auth, VectorStoreCreateRequest) but include provider_id, embedding_model, and embedding_dimension in the body; patch AsyncLlamaStackClientHolder.get_client to return mock_client and assert mock_client.vector_stores.create was called with extra_body containing those three keys (and that the top-level body forwarded to create does not contain them) while still returning a VectorStore("vs_123","test_store") and asserting the response fields as in the current test.

Centralize Llama Stack Vector Store/File Operations in Lightspeed-Core

e28fc79

Signed-off-by: Lucas <lyoon@redhat.com>

JslYoon changed the title ~~[LCORE-1617] Centralize Llama Stack Vector Store/File Operations in Lightspeed-Core~~ [LCORE-1617] Centralize Llama Stack Vector Store Apr 7, 2026

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

unit tests

3fe63aa

Signed-off-by: Lucas <lyoon@redhat.com>

JslYoon changed the title ~~[LCORE-1617] Centralize Llama Stack Vector Store~~ LCORE-1617 Centralize Llama Stack Vector Store Apr 8, 2026

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Conversation

JslYoon commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JslYoon commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading