LCORE-1617 Centralize Llama Stack Vector Store#1473
LCORE-1617 Centralize Llama Stack Vector Store#1473JslYoon wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
Signed-off-by: Lucas <lyoon@redhat.com>
WalkthroughAdds a new FastAPI vector store API (CRUD for vector stores, file upload/attachment, file management), new request/response models, RBAC actions, router registration, and comprehensive unit tests; handlers validate configuration, authorize requests, call an async Llama Stack client, and map errors to HTTP responses. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client (HTTP)
participant Auth as Auth Dependency
participant Config as Configuration Check
participant API as Vector Stores API (router)
participant Llama as AsyncLlamaStackClient
participant Store as Vector Store Service
Client->>API: POST /v1/vector-stores (body, file)
API->>Auth: authorize(Action.*)
Auth-->>API: auth tuple
API->>Config: check_configuration_loaded(configuration)
Config-->>API: ok
API->>Llama: get_client()
Llama-->>API: client
API->>Llama: client.vector_stores.create / client.files.create / client.vector_stores.add_file(...)
alt add_file lock error
Llama-->>API: LockError
API->>Llama: retry (exponential backoff)
end
Llama-->>API: success / error
API-->>Client: HTTP response (mapped model or error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 129-144: Remove the ad-hoc debugging call to await
client.models.list() and any plain print() statements in vector store request
handling (including the print at the start of the create flow and the other
print occurrences noted) and either delete them or convert them to structured
logger.debug() calls; additionally, change the logger.info that currently logs
the full request payload (the logger.info referencing "Creating vector store -
body_dict: %s, extra_body: %s") to either logger.debug or redact
sensitive/session/document fields before logging so no full request body is
emitted at info level; locate the code by the body.model_dump(exclude_none=True)
usage and the extra_body construction to apply these changes.
- Around line 366-369: Update both delete route decorators (e.g.,
`@router.delete`("/vector-stores/{vector_store_id}") and the other delete route
for entries) to include status_code=status.HTTP_204_NO_CONTENT so the actual
response matches the OpenAPI responses={"204": ...}; also fix the imports by
adding status and importing Depends directly from fastapi (e.g., from fastapi
import Depends, status) instead of importing Depends from fastapi.params and
ensure any existing references to status.HTTP_204_NO_CONTENT resolve.
- Around line 65-73: file_responses currently advertises a 503 via
ServiceUnavailableResponse for client-side upload validation failures; replace
that with a 400 Bad Request mapping and ensure BadRequestError is translated to
an HTTP 400 response. Update the file_responses dict to include a
BadRequestResponse/OpenAPI 400 entry instead of ServiceUnavailableResponse, and
in the handler where BadRequestError is caught (reference BadRequestError and
any exception handling code around the upload endpoint) raise or return FastAPI
HTTPException(status_code=400) so the OpenAPI and runtime behavior consistently
reflect a 400 Bad Request for malformed uploads.
In `@src/models/requests.py`:
- Around line 1009-1050: The VectorStoreUpdateRequest model currently allows an
empty payload because all fields are optional; add a Pydantic `@model_validator`
(e.g., on VectorStoreUpdateRequest) that runs after parsing and checks that at
least one of name, expires_at, or metadata is not None (and treat an empty
metadata dict as “empty”); if all are absent/empty, raise a ValueError so
requests return 422. Use `@field_validator` only if you need per-field
normalization (e.g., converting empty metadata to None) and reference the class
name VectorStoreUpdateRequest and the field names name, expires_at, metadata
when implementing the validators.
- Around line 1053-1097: The VectorStoreFileCreateRequest model does not enforce
the documented limits on the attributes field (max 16 pairs, keys max 64 chars,
string values max 512 chars, values may also be bool/number), so add validation:
implement a `@field_validator`("attributes", mode="before" or "after") to ensure
the field is a dict when present, has at most 16 entries, that each key is a
string <=64 chars, and each value is either a string (<=512 chars), bool, or
number; then implement a `@model_validator` to enforce any cross-field invariants
if needed and raise clear ValueError messages on violations. Reference the
VectorStoreFileCreateRequest class and the attributes field when adding these
Pydantic validators; keep model_config as-is (extra: forbid).
In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 33-52: The test fakes lack the optional fields the real handler
expects; update the VectorStore __init__ to accept metadata: Optional[dict]=None
and attributes: Optional[dict]=None (defaulting to None) and assign them to
self.metadata and self.attributes, and likewise add those optional
parameters/attributes to the other mock class in the same file (the file handler
fake around lines 78-90) so create_vector_store() and file handlers can read
metadata/attributes without error.
- Around line 169-172: The assertions are inside the pytest.raises context and
never run after the awaited create_vector_store raises; move the assertions so
they execute after the with block. Specifically, in
tests/unit/app/endpoints/test_vector_stores.py surrounding the call to
create_vector_store(request=request, auth=auth, body=body), keep the "with
pytest.raises(HTTPException) as e:" only wrapping the await, then after exiting
that block assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
and assert e.value.detail["response"] == "Configuration is not loaded" (and make
the same change for the other occurrence around lines 224-227).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b8a0a1c8-d00b-4162-a0c2-b9dac912b21d
📒 Files selected for processing (6)
src/app/endpoints/vector_stores.pysrc/app/routers.pysrc/models/config.pysrc/models/requests.pysrc/models/responses.pytests/unit/app/endpoints/test_vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/app/routers.pysrc/models/config.pysrc/models/requests.pysrc/models/responses.pysrc/app/endpoints/vector_stores.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/routers.pysrc/app/endpoints/vector_stores.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic configuration models
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
Pydantic configuration models should extendConfigurationBase
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/config.pysrc/models/requests.pysrc/models/responses.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_vector_stores.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.pysrc/models/requests.pysrc/models/responses.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.pysrc/models/requests.pysrc/models/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/vector_stores.py
🪛 GitHub Actions: Unit tests
src/app/endpoints/vector_stores.py
[error] 159-159: Test_create_vector_store_success failed: AttributeError in create_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'
[error] 214-214: Test_list_vector_stores_success failed: AttributeError in list_vector_stores when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'
[error] 275-275: Test_get_vector_store_success failed: AttributeError in get_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'
[error] 345-345: Test_update_vector_store_success failed: AttributeError in update_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'
[error] 590-590: Test_add_file_to_vector_store_success failed: AttributeError in add_file_to_vector_store when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'
[error] 661-661: Test_list_vector_store_files_success failed: AttributeError in list_vector_store_files when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'
[error] 736-736: Test_get_vector_store_file_success failed: AttributeError in get_vector_store_file when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'
| file_responses: dict[int | str, dict[str, Any]] = { | ||
| 200: FileResponse.openapi_response(), | ||
| 401: UnauthorizedResponse.openapi_response( | ||
| examples=["missing header", "missing token"] | ||
| ), | ||
| 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), | ||
| 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), | ||
| 503: ServiceUnavailableResponse.openapi_response(), | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For llama-stack-clientversion 0.5.2, what HTTP status or error semantics doesBadRequestError represent?
💡 Result:
In llama-stack-client v0.5.2, BadRequestError represents an HTTP 400 (Bad Request) response—i.e., it’s the APIStatusError subclass raised when the server returns status code 400, typically indicating the request was invalid (bad parameters/body). [1]
Source: [1] (pypi.org)
Citations:
🏁 Script executed:
wc -l src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 114
🏁 Script executed:
sed -n '50,85p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1639
🏁 Script executed:
sed -n '480,500p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1134
🏁 Script executed:
sed -n '1,100p' src/app/endpoints/vector_stores.py | head -100Repository: lightspeed-core/lightspeed-stack
Length of output: 3780
🏁 Script executed:
sed -n '480,510p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1409
🏁 Script executed:
rg -n "BadRequestError" src/app/endpoints/vector_stores.py -A 3Repository: lightspeed-core/lightspeed-stack
Length of output: 1870
🏁 Script executed:
rg -n "ServiceUnavailableResponse" src/app/endpoints/vector_stores.py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 1417
Use HTTP 400 (Bad Request) for BadRequestError, not 503.
At lines 489-492, BadRequestError (which represents HTTP 400 per llama-stack-client v0.5.2) is incorrectly mapped to ServiceUnavailableResponse (503). The file_responses dict advertises status codes 200, 401, 403, 500, and 503—no 4xx response for malformed uploads. Clients will interpret 503 as a transient backend outage instead of a client error requiring request correction.
Per the coding guidelines, use FastAPI HTTPException with appropriate status codes. Return 400 for validation failures and update file_responses to advertise this status.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/vector_stores.py` around lines 65 - 73, file_responses
currently advertises a 503 via ServiceUnavailableResponse for client-side upload
validation failures; replace that with a 400 Bad Request mapping and ensure
BadRequestError is translated to an HTTP 400 response. Update the file_responses
dict to include a BadRequestResponse/OpenAPI 400 entry instead of
ServiceUnavailableResponse, and in the handler where BadRequestError is caught
(reference BadRequestError and any exception handling code around the upload
endpoint) raise or return FastAPI HTTPException(status_code=400) so the OpenAPI
and runtime behavior consistently reflect a 400 Bad Request for malformed
uploads.
| # Extract provider_id for extra_body (not a direct client parameter) | ||
| body_dict = body.model_dump(exclude_none=True) | ||
| print("client.models.list() reaches here", await client.models.list()) | ||
| extra_body = {} | ||
| if "provider_id" in body_dict: | ||
| extra_body["provider_id"] = body_dict.pop("provider_id") | ||
| if "embedding_model" in body_dict: | ||
| extra_body["embedding_model"] = body_dict.pop("embedding_model") | ||
| if "embedding_dimension" in body_dict: | ||
| extra_body["embedding_dimension"] = body_dict.pop("embedding_dimension") | ||
|
|
||
| logger.info( | ||
| "Creating vector store - body_dict: %s, extra_body: %s", | ||
| body_dict, | ||
| extra_body, | ||
| ) |
There was a problem hiding this comment.
Remove the debugging probe from the request path.
Line 131 adds an unrelated backend call before every vector-store create, and the print() calls on Lines 465 and 725-730 bypass structured logging. On top of that, Lines 140-144 log the full request payload at info, which can include session/document metadata.
🛠️ Proposed fix
- print("client.models.list() reaches here", await client.models.list())
extra_body = {}
@@
- logger.info(
- "Creating vector store - body_dict: %s, extra_body: %s",
- body_dict,
- extra_body,
- )
+ logger.debug(
+ "Creating vector store '%s' with provider '%s'",
+ body.name,
+ extra_body.get("provider_id"),
+ )
@@
- print("filename reaches here", filename)
@@
- print("file_id", file_id)
vs_file = await client.vector_stores.files.retrieve(
vector_store_id=vector_store_id,
file_id=file_id,
)
- print("vs_file reaches here", vs_file)As per coding guidelines, use logger.debug() for detailed diagnostic information and logger.info() for general information about program execution.
Also applies to: 465-465, 725-730
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/vector_stores.py` around lines 129 - 144, Remove the ad-hoc
debugging call to await client.models.list() and any plain print() statements in
vector store request handling (including the print at the start of the create
flow and the other print occurrences noted) and either delete them or convert
them to structured logger.debug() calls; additionally, change the logger.info
that currently logs the full request payload (the logger.info referencing
"Creating vector store - body_dict: %s, extra_body: %s") to either logger.debug
or redact sensitive/session/document fields before logging so no full request
body is emitted at info level; locate the code by the
body.model_dump(exclude_none=True) usage and the extra_body construction to
apply these changes.
| @router.delete( | ||
| "/vector-stores/{vector_store_id}", | ||
| responses={"204": {"description": "Vector store deleted"}}, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
FastAPI official docs: if a @router.deletehandler omitsstatus_code, what default success status code is returned, and does responses={204: ...} affect the actual runtime status code?
💡 Result:
- Default success status code: If you omit
status_codeon a@router.delete(...)(or any path operation), FastAPI returns HTTP 200 OK by default. [1] - Does
responses={204: ...}change runtime status code? No. Theresponsesparameter is for documenting additional responses in the OpenAPI schema; it does not make FastAPI automatically return 204 at runtime. To actually return 204, you must setstatus_code=204(or return aResponse/JSONResponsewithstatus_code=204). FastAPI explicitly notes that for “additional responses” you must return aResponsedirectly with your status code. [2]
Sources
[1] FastAPI docs — Response Status Code https://fastapi.tiangolo.com/tutorial/response-status-code/
[2] FastAPI docs — Additional Responses in OpenAPI https://fastapi.tiangolo.com/advanced/additional-responses/
🏁 Script executed:
wc -l src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 114
🏁 Script executed:
head -30 src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 992
🏁 Script executed:
sed -n '360,375p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 617
🏁 Script executed:
sed -n '755,775p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 791
Declare the 204 status code on the delete routes and fix imports.
Both delete routes at lines 366–369 and 764–767 specify responses={"204": ...} for OpenAPI documentation, but omit status_code=status.HTTP_204_NO_CONTENT. Without the explicit status code, FastAPI returns HTTP 200 by default—creating a mismatch between documented and actual behavior.
Additionally, the import statement violates coding guidelines: status is missing, and Depends should be imported directly from fastapi rather than fastapi.params.
Proposed fix
-from fastapi import APIRouter, File, HTTPException, Request, UploadFile
-from fastapi.params import Depends
+from fastapi import APIRouter, File, HTTPException, Request, UploadFile, status, Depends
@@
`@router.delete`(
"/vector-stores/{vector_store_id}",
+ status_code=status.HTTP_204_NO_CONTENT,
responses={"204": {"description": "Vector store deleted"}},
)
@@
`@router.delete`(
"/vector-stores/{vector_store_id}/files/{file_id}",
+ status_code=status.HTTP_204_NO_CONTENT,
responses={"204": {"description": "File deleted from vector store"}},
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/vector_stores.py` around lines 366 - 369, Update both
delete route decorators (e.g.,
`@router.delete`("/vector-stores/{vector_store_id}") and the other delete route
for entries) to include status_code=status.HTTP_204_NO_CONTENT so the actual
response matches the OpenAPI responses={"204": ...}; also fix the imports by
adding status and importing Depends directly from fastapi (e.g., from fastapi
import Depends, status) instead of importing Depends from fastapi.params and
ensure any existing references to status.HTTP_204_NO_CONTENT resolve.
| class VectorStoreUpdateRequest(BaseModel): | ||
| """Model representing a request to update a vector store. | ||
|
|
||
| Attributes: | ||
| name: New name for the vector store. | ||
| expires_at: Optional expiration timestamp. | ||
| metadata: Optional metadata dictionary for storing session information. | ||
| """ | ||
|
|
||
| name: Optional[str] = Field( | ||
| None, | ||
| description="New name for the vector store", | ||
| examples=["updated_vector_store"], | ||
| min_length=1, | ||
| max_length=256, | ||
| ) | ||
|
|
||
| expires_at: Optional[int] = Field( | ||
| None, | ||
| description="Unix timestamp when the vector store should expire", | ||
| examples=[1735689600], | ||
| gt=0, | ||
| ) | ||
|
|
||
| metadata: Optional[dict[str, Any]] = Field( | ||
| None, | ||
| description="Metadata dictionary for storing session information", | ||
| examples=[{"user_id": "user123", "session_id": "sess456"}], | ||
| ) | ||
|
|
||
| model_config = { | ||
| "extra": "forbid", | ||
| "json_schema_extra": { | ||
| "examples": [ | ||
| { | ||
| "name": "updated_vector_store", | ||
| "expires_at": 1735689600, | ||
| "metadata": {"user_id": "user123"}, | ||
| }, | ||
| ] | ||
| }, | ||
| } |
There was a problem hiding this comment.
Reject empty vector-store update payloads.
VectorStoreUpdateRequest currently accepts {} because every field is optional. That means update_vector_store() can forward a no-op update upstream instead of failing fast with a 422.
🛠️ Proposed fix
class VectorStoreUpdateRequest(BaseModel):
@@
model_config = {
"extra": "forbid",
"json_schema_extra": {
"examples": [
{
"name": "updated_vector_store",
"expires_at": 1735689600,
"metadata": {"user_id": "user123"},
},
]
},
}
+
+ `@model_validator`(mode="after")
+ def validate_non_empty_update(self) -> Self:
+ """Ensure at least one updatable field is provided."""
+ if (
+ self.name is None
+ and self.expires_at is None
+ and self.metadata is None
+ ):
+ raise ValueError(
+ "At least one of 'name', 'expires_at', or 'metadata' must be provided"
+ )
+ return selfAs per coding guidelines, src/models/**/*.py should use @field_validator and @model_validator for custom validation in Pydantic models.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/models/requests.py` around lines 1009 - 1050, The
VectorStoreUpdateRequest model currently allows an empty payload because all
fields are optional; add a Pydantic `@model_validator` (e.g., on
VectorStoreUpdateRequest) that runs after parsing and checks that at least one
of name, expires_at, or metadata is not None (and treat an empty metadata dict
as “empty”); if all are absent/empty, raise a ValueError so requests return 422.
Use `@field_validator` only if you need per-field normalization (e.g., converting
empty metadata to None) and reference the class name VectorStoreUpdateRequest
and the field names name, expires_at, metadata when implementing the validators.
| class VectorStoreFileCreateRequest(BaseModel): | ||
| """Model representing a request to add a file to a vector store. | ||
|
|
||
| Attributes: | ||
| file_id: ID of the file to add to the vector store. | ||
| attributes: Optional metadata key-value pairs (max 16 pairs). | ||
| chunking_strategy: Optional chunking strategy configuration. | ||
| """ | ||
|
|
||
| file_id: str = Field( | ||
| ..., | ||
| description="ID of the file to add to the vector store", | ||
| examples=["file-abc123"], | ||
| min_length=1, | ||
| ) | ||
|
|
||
| attributes: Optional[dict[str, str | float | bool]] = Field( | ||
| None, | ||
| description=( | ||
| "Set of up to 16 key-value pairs for storing additional information. " | ||
| "Keys: strings (max 64 chars). Values: strings (max 512 chars), booleans, or numbers." | ||
| ), | ||
| examples=[ | ||
| {"created_at": "2026-04-04T15:20:00Z", "updated_at": "2026-04-04T15:20:00Z"} | ||
| ], | ||
| ) | ||
|
|
||
| chunking_strategy: Optional[dict[str, Any]] = Field( | ||
| None, | ||
| description="Chunking strategy configuration for this file", | ||
| examples=[{"type": "fixed", "chunk_size": 512, "chunk_overlap": 50}], | ||
| ) | ||
|
|
||
| model_config = { | ||
| "extra": "forbid", | ||
| "json_schema_extra": { | ||
| "examples": [ | ||
| { | ||
| "file_id": "file-abc123", | ||
| "attributes": {"created_at": "2026-04-04T15:20:00Z"}, | ||
| "chunking_strategy": {"type": "fixed", "chunk_size": 512}, | ||
| }, | ||
| ] | ||
| }, | ||
| } |
There was a problem hiding this comment.
Enforce the documented attributes limits in the request model.
The field description says this payload is limited to 16 pairs with max key/value lengths, but none of that is validated here. Invalid requests will make it to the backend layer and fail later than they should.
🛠️ Proposed fix
class VectorStoreFileCreateRequest(BaseModel):
@@
model_config = {
"extra": "forbid",
"json_schema_extra": {
"examples": [
{
"file_id": "file-abc123",
"attributes": {"created_at": "2026-04-04T15:20:00Z"},
"chunking_strategy": {"type": "fixed", "chunk_size": 512},
},
]
},
}
+
+ `@field_validator`("attributes")
+ `@classmethod`
+ def validate_attributes(
+ cls, value: Optional[dict[str, str | float | bool]]
+ ) -> Optional[dict[str, str | float | bool]]:
+ """Validate attribute count and key/value sizes."""
+ if value is None:
+ return value
+ if len(value) > 16:
+ raise ValueError("A maximum of 16 attributes is allowed")
+ for key, attr_value in value.items():
+ if len(key) > 64:
+ raise ValueError(f"Attribute key '{key}' exceeds 64 characters")
+ if isinstance(attr_value, str) and len(attr_value) > 512:
+ raise ValueError(
+ f"Attribute value for '{key}' exceeds 512 characters"
+ )
+ return valueAs per coding guidelines, src/models/**/*.py should use @field_validator and @model_validator for custom validation in Pydantic models.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/models/requests.py` around lines 1053 - 1097, The
VectorStoreFileCreateRequest model does not enforce the documented limits on the
attributes field (max 16 pairs, keys max 64 chars, string values max 512 chars,
values may also be bool/number), so add validation: implement a
`@field_validator`("attributes", mode="before" or "after") to ensure the field is
a dict when present, has at most 16 entries, that each key is a string <=64
chars, and each value is either a string (<=512 chars), bool, or number; then
implement a `@model_validator` to enforce any cross-field invariants if needed and
raise clear ValueError messages on violations. Reference the
VectorStoreFileCreateRequest class and the attributes field when adding these
Pydantic validators; keep model_config as-is (extra: forbid).
| with pytest.raises(HTTPException) as e: | ||
| await create_vector_store(request=request, auth=auth, body=body) | ||
| assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR | ||
| assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore |
There was a problem hiding this comment.
Move the assertions out of the pytest.raises body.
Once the awaited call raises on Line 170 or Line 225, the remaining assertions inside the with block are skipped. These tests currently verify only that some HTTPException was raised.
🛠️ Proposed fix
with pytest.raises(HTTPException) as e:
await create_vector_store(request=request, auth=auth, body=body)
- assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
- assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore
+ assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+ assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore
@@
with pytest.raises(HTTPException) as e:
await create_vector_store(request=request, auth=auth, body=body)
- assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
- assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignore
+ assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
+ assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignoreAlso applies to: 224-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 169 - 172, The
assertions are inside the pytest.raises context and never run after the awaited
create_vector_store raises; move the assertions so they execute after the with
block. Specifically, in tests/unit/app/endpoints/test_vector_stores.py
surrounding the call to create_vector_store(request=request, auth=auth,
body=body), keep the "with pytest.raises(HTTPException) as e:" only wrapping the
await, then after exiting that block assert e.value.status_code ==
status.HTTP_500_INTERNAL_SERVER_ERROR and assert e.value.detail["response"] ==
"Configuration is not loaded" (and make the same change for the other occurrence
around lines 224-227).
Signed-off-by: Lucas <lyoon@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (4)
tests/unit/app/endpoints/test_vector_stores.py (1)
173-176:⚠️ Potential issue | 🟠 MajorMove the assertions outside
pytest.raises.Once the awaited call raises, the remaining lines inside the context manager never run. These tests currently verify only that some
HTTPExceptionwas raised.Suggested fix
with pytest.raises(HTTPException) as e: await create_vector_store(request=request, auth=auth, body=body) - assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR - assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore + assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR + assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore @@ with pytest.raises(HTTPException) as e: await create_vector_store(request=request, auth=auth, body=body) - assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE - assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignore + assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE + assert e.value.detail["response"] == "Unable to connect to Llama Stack" # type: ignoreAlso applies to: 228-231
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 173 - 176, The assertions are inside the pytest.raises context so they never run after await create_vector_store(...) raises; move the two assert lines out of the with pytest.raises(HTTPException) as e: block so you capture the exception into e and then assert e.value.status_code and e.value.detail["response"] after the context. Apply the same change for the second occurrence around lines referencing the same pattern (the other test block at 228-231) and keep references to create_vector_store, pytest.raises, e.value.status_code and e.value.detail["response"] when updating the tests.src/app/endpoints/vector_stores.py (3)
139-143:⚠️ Potential issue | 🟠 MajorStop logging the full create payload at
info.
body_dictandextra_bodycan carry notebook/session metadata. Emitting the whole request at info level is noisy and leaks more context than this endpoint needs. Log stable identifiers only, and keep payload-level diagnostics at debug.Suggested fix
- logger.info( - "Creating vector store - body_dict: %s, extra_body: %s", - body_dict, - extra_body, - ) + logger.debug( + "Creating vector store '%s' with provider '%s'", + body.name, + extra_body.get("provider_id"), + )As per coding guidelines, use
logger.debug()for detailed diagnostic information.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 139 - 143, Change the noisy info-level logging that prints the full payload to log only stable identifiers at info and move payload diagnostics to debug: replace the current logger.info call that references body_dict and extra_body with an info log that emits only stable identifiers (e.g., request id, user_id, notebook_id or session_id if available) and use logger.debug to record the full body_dict/extra_body for detailed diagnostics; update references in the surrounding function (the logger usage in this module, e.g., the create vector store endpoint in vector_stores.py) so sensitive/session metadata is not logged at info level.
65-73:⚠️ Potential issue | 🟠 MajorReturn 400 for upload bad requests, not 503.
BadRequestErrorhere is a client-side upload failure, but the route currently advertises and returnsServiceUnavailableResponse. That makes malformed uploads look like transient backend outages and encourages retries instead of request fixes. The matching bad-request unit test will need to flip with this change.Based on learnings, use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling.Also applies to: 487-490
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 65 - 73, The route currently advertises and returns ServiceUnavailableResponse (503) for client-side BadRequestError; change the response mapping to return a 400 BadRequestResponse and raise FastAPI HTTPException(status_code=400) when catching BadRequestError so uploads are treated as client errors; update file_responses to include BadRequestResponse.openapi_response() instead of ServiceUnavailableResponse.openapi_response() and adjust the error handling in the same handler (and the similar handler referenced around the code marked 487-490) so unit tests expect 400 instead of 503.
9-10:⚠️ Potential issue | 🟠 MajorDeclare 204 on both delete routes and fix the FastAPI imports.
responses={"204": ...}only changes the OpenAPI schema. Withoutstatus_code=status.HTTP_204_NO_CONTENT, both handlers still return 200 at runtime. The import also bypasses the repo’s requiredfastapiimport form forDepends/status.Suggested fix
-from fastapi import APIRouter, File, HTTPException, Request, UploadFile -from fastapi.params import Depends +from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile, status @@ `@router.delete`( "/vector-stores/{vector_store_id}", + status_code=status.HTTP_204_NO_CONTENT, responses={"204": {"description": "Vector store deleted"}}, ) @@ `@router.delete`( "/vector-stores/{vector_store_id}/files/{file_id}", + status_code=status.HTTP_204_NO_CONTENT, responses={"204": {"description": "File deleted from vector store"}}, )As per coding guidelines, use
from fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies.Also applies to: 365-368, 760-763
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 9 - 10, The FastAPI import and delete handlers are incorrect: change the import to use the repo style "from fastapi import APIRouter, HTTPException, Request, status, Depends" (remove the separate fastapi.params import) and update both delete route handlers that currently only set responses={"204": ...} to also set status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime; locate the routes in src/app/endpoints/vector_stores.py that declare a 204 response (the two DELETE handlers) and add status_code=status.HTTP_204_NO_CONTENT to their decorator arguments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 86-94: The list_vector_store_files() handler currently only maps
APIConnectionError and generic Exception to responses; update
vector_store_files_list_responses to include a 404 entry (use the appropriate
NotFound/BadRequest response helper used elsewhere) and add a specific except
BadRequestError branch in list_vector_store_files() that returns the 404
response, mirroring how get_vector_store() handles BadRequestError; also update
the function docstring to mention the 404/BadRequest response.
- Around line 450-468: The code double-buffers uploaded files by calling await
file.read() then wrapping into BytesIO(content); change to pass the payload
directly to the Llama client as a tuple (filename, content) or (filename,
file.file) to avoid copying (use client.files.create(file=(filename, content),
purpose="assistants") or client.files.create(file=(filename, file.file),
purpose="assistants")), remove the BytesIO usage (file_bytes and its .name
assignment), and update logging to use the filename and size without forcing a
second copy; also add a pre-read size check using the request's Content-Length
header (and raise HTTPException when over the allowed limit) so you validate
large uploads before materializing them into memory.
In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 184-205: Add a test case that verifies the adapter moves
provider_id, embedding_model, and embedding_dimension from the main request into
extra_body when calling the client: use the existing test setup
(get_test_request, get_test_auth, VectorStoreCreateRequest) but include
provider_id, embedding_model, and embedding_dimension in the body; patch
AsyncLlamaStackClientHolder.get_client to return mock_client and assert
mock_client.vector_stores.create was called with extra_body containing those
three keys (and that the top-level body forwarded to create does not contain
them) while still returning a VectorStore("vs_123","test_store") and asserting
the response fields as in the current test.
---
Duplicate comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 139-143: Change the noisy info-level logging that prints the full
payload to log only stable identifiers at info and move payload diagnostics to
debug: replace the current logger.info call that references body_dict and
extra_body with an info log that emits only stable identifiers (e.g., request
id, user_id, notebook_id or session_id if available) and use logger.debug to
record the full body_dict/extra_body for detailed diagnostics; update references
in the surrounding function (the logger usage in this module, e.g., the create
vector store endpoint in vector_stores.py) so sensitive/session metadata is not
logged at info level.
- Around line 65-73: The route currently advertises and returns
ServiceUnavailableResponse (503) for client-side BadRequestError; change the
response mapping to return a 400 BadRequestResponse and raise FastAPI
HTTPException(status_code=400) when catching BadRequestError so uploads are
treated as client errors; update file_responses to include
BadRequestResponse.openapi_response() instead of
ServiceUnavailableResponse.openapi_response() and adjust the error handling in
the same handler (and the similar handler referenced around the code marked
487-490) so unit tests expect 400 instead of 503.
- Around line 9-10: The FastAPI import and delete handlers are incorrect: change
the import to use the repo style "from fastapi import APIRouter, HTTPException,
Request, status, Depends" (remove the separate fastapi.params import) and update
both delete route handlers that currently only set responses={"204": ...} to
also set status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime;
locate the routes in src/app/endpoints/vector_stores.py that declare a 204
response (the two DELETE handlers) and add
status_code=status.HTTP_204_NO_CONTENT to their decorator arguments.
In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 173-176: The assertions are inside the pytest.raises context so
they never run after await create_vector_store(...) raises; move the two assert
lines out of the with pytest.raises(HTTPException) as e: block so you capture
the exception into e and then assert e.value.status_code and
e.value.detail["response"] after the context. Apply the same change for the
second occurrence around lines referencing the same pattern (the other test
block at 228-231) and keep references to create_vector_store, pytest.raises,
e.value.status_code and e.value.detail["response"] when updating the tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eea96eaa-fe50-41d6-b442-dd1b7c8bd69a
📒 Files selected for processing (3)
src/app/endpoints/vector_stores.pytests/unit/app/endpoints/test_vector_stores.pytests/unit/app/test_routers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_vector_stores.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/app/endpoints/vector_stores.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/vector_stores.py
🧠 Learnings (3)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies
Applied to files:
src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/vector_stores.py
| vector_store_files_list_responses: dict[int | str, dict[str, Any]] = { | ||
| 200: VectorStoreFilesListResponse.openapi_response(), | ||
| 401: UnauthorizedResponse.openapi_response( | ||
| examples=["missing header", "missing token"] | ||
| ), | ||
| 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), | ||
| 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), | ||
| 503: ServiceUnavailableResponse.openapi_response(), | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if file exists and get an overview
ls -la src/app/endpoints/vector_stores.py
# Get file size to determine reading strategy
wc -l src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 199
🏁 Script executed:
# Read the specific section around lines 86-94
sed -n '80,100p' src/app/endpoints/vector_stores.py
# Read around lines 638-643 and 650-683
sed -n '630,690p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 3121
🏁 Script executed:
# Search for list_vector_store_files function definition
rg -n "def list_vector_store_files" src/app/endpoints/vector_stores.py
# Search for BadRequestError handling in this file
rg -n "BadRequestError" src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 443
🏁 Script executed:
# Check what exceptions are imported
head -50 src/app/endpoints/vector_stores.py | grep -E "^from|^import"Repository: lightspeed-core/lightspeed-stack
Length of output: 752
🏁 Script executed:
# Search for sibling handlers to understand the pattern
rg -n "NotFoundResponse" src/app/endpoints/vector_stores.py
# Check other endpoint files for similar patterns
rg -l "vector_stores\|BadRequestError" src/app/endpoints/ | head -5Repository: lightspeed-core/lightspeed-stack
Length of output: 599
🏁 Script executed:
# Read the full list_vector_store_files function (lines 623+)
sed -n '623,680p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2210
🏁 Script executed:
# Check one of the other handlers that correctly handles BadRequestError to see the pattern
sed -n '270,290p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1057
🏁 Script executed:
# Check if there are other functions in the file that also list/get vector stores
rg -n "def (get_|list_|retrieve_).*vector_store" src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 217
🏁 Script executed:
# Check lines 638-643 and what those are about
sed -n '638,650p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 398
🏁 Script executed:
# Check lines 650-683
sed -n '650,690p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1657
🏁 Script executed:
# Look at one more sibling handler to confirm the pattern is consistent
sed -n '234,260p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 886
🏁 Script executed:
# Check what the openapi_response declarations look like to verify they exist
rg -A2 "class NotFoundResponse" src/app/endpoints/ || rg -A2 "NotFoundResponse" src/models/responses.py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 287
🏁 Script executed:
# Get the full context of list_vector_store_files including decorators and responses
sed -n '610,625p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 553
🏁 Script executed:
# Verify the vector_store_files_list_responses dict definition location
sed -n '86,94p' src/app/endpoints/vector_stores.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 521
🏁 Script executed:
# Check if BadRequestError is actually raised by the Llama Stack client for missing vector stores
rg -n "vector_stores.files.list" src/app/endpoints/vector_stores.py -A2 -B2Repository: lightspeed-core/lightspeed-stack
Length of output: 269
Add BadRequestError handling to list_vector_store_files() for 404 responses.
When an invalid vector-store ID is passed, the backend raises BadRequestError, but this function only catches APIConnectionError and generic Exception. This causes the error to be returned as 500 instead of 404, inconsistent with sibling handlers like get_vector_store() which properly map BadRequestError to 404.
Update the vector_store_files_list_responses dict to include the 404 code, add a BadRequestError handler in the exception chain, and document the 404 response in the docstring.
Suggested fix
vector_store_files_list_responses: dict[int | str, dict[str, Any]] = {
200: VectorStoreFilesListResponse.openapi_response(),
401: UnauthorizedResponse.openapi_response(
examples=["missing header", "missing token"]
),
403: ForbiddenResponse.openapi_response(examples=["endpoint"]),
+ 404: NotFoundResponse.openapi_response(examples=["vector_store"]),
500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),
503: ServiceUnavailableResponse.openapi_response(),
}
@@
Raises:
HTTPException:
- 401: Authentication failed
- 403: Authorization failed
+ - 404: Vector store not found
- 500: Lightspeed Stack configuration not loaded
- 503: Unable to connect to Llama Stack
@@
except APIConnectionError as e:
logger.error("Unable to connect to Llama Stack: %s", e)
response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e))
raise HTTPException(**response.model_dump()) from e
+ except BadRequestError as e:
+ logger.error("Vector store not found: %s", e)
+ response = NotFoundResponse(
+ resource="vector_store", resource_id=vector_store_id
+ )
+ raise HTTPException(**response.model_dump()) from e
except Exception as e:
logger.error("Unable to list vector store files: %s", e)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| vector_store_files_list_responses: dict[int | str, dict[str, Any]] = { | |
| 200: VectorStoreFilesListResponse.openapi_response(), | |
| 401: UnauthorizedResponse.openapi_response( | |
| examples=["missing header", "missing token"] | |
| ), | |
| 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), | |
| 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), | |
| 503: ServiceUnavailableResponse.openapi_response(), | |
| } | |
| vector_store_files_list_responses: dict[int | str, dict[str, Any]] = { | |
| 200: VectorStoreFilesListResponse.openapi_response(), | |
| 401: UnauthorizedResponse.openapi_response( | |
| examples=["missing header", "missing token"] | |
| ), | |
| 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), | |
| 404: NotFoundResponse.openapi_response(examples=["vector_store"]), | |
| 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), | |
| 503: ServiceUnavailableResponse.openapi_response(), | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/vector_stores.py` around lines 86 - 94, The
list_vector_store_files() handler currently only maps APIConnectionError and
generic Exception to responses; update vector_store_files_list_responses to
include a 404 entry (use the appropriate NotFound/BadRequest response helper
used elsewhere) and add a specific except BadRequestError branch in
list_vector_store_files() that returns the 404 response, mirroring how
get_vector_store() handles BadRequestError; also update the function docstring
to mention the 404/BadRequest response.
| # Read file content | ||
| content = await file.read() | ||
| filename = file.filename or "uploaded_file" | ||
|
|
||
| # Add .txt extension if no extension present | ||
| # (since parsed PDFs/URLs are sent as plain text) | ||
| if not os.path.splitext(filename)[1]: | ||
| filename = f"{filename}.txt" | ||
|
|
||
| logger.info( | ||
| "Uploading file - filename: %s, size: %d bytes", | ||
| filename, | ||
| len(content), | ||
| ) | ||
|
|
||
| # Convert to BytesIO for Llama Stack client | ||
| # The client expects bytes, io.IOBase, PathLike, or a tuple | ||
| file_bytes = BytesIO(content) | ||
| file_bytes.name = filename # Set the filename attribute |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full function context around lines 450-468
cat -n src/app/endpoints/vector_stores.py | sed -n '420,480p'Repository: lightspeed-core/lightspeed-stack
Length of output: 2378
🏁 Script executed:
# Check if there's a size limit check or any validation before reading
cat -n src/app/endpoints/vector_stores.py | sed -n '410,475p'Repository: lightspeed-core/lightspeed-stack
Length of output: 2563
🏁 Script executed:
# Look for any size limit constants or config
rg -n "size|SIZE|limit|LIMIT" src/app/endpoints/vector_stores.py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 140
🏁 Script executed:
# Check if UploadFile.file is used anywhere else in the codebase
rg -n "UploadFile" src/ -A 5 -B 2Repository: lightspeed-core/lightspeed-stack
Length of output: 1217
🏁 Script executed:
# Check if there are other file upload endpoints in the codebase to see patterns
find src -name "*.py" -type f | xargs grep -l "UploadFile" | head -5Repository: lightspeed-core/lightspeed-stack
Length of output: 110
🏁 Script executed:
# Look for how file.file is used or if it's mentioned anywhere
rg -n "file\.file" src/ --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 278
🏁 Script executed:
# Check the imports and FastAPI version to understand UploadFile better
cat src/app/endpoints/vector_stores.py | head -30Repository: lightspeed-core/lightspeed-stack
Length of output: 992
🏁 Script executed:
# Check the Llama Stack client usage - what does files.create() expect?
rg -n "client.files.create\|files\.create" src/ -B 2 -A 3 --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check if there's any documentation or type hints about what file parameter accepts
rg -n "files\.create" src/ -B 5 -A 5Repository: lightspeed-core/lightspeed-stack
Length of output: 1763
🏁 Script executed:
# Search for any Python documentation or comments about UploadFile usage
rg -n "SpooledTemporaryFile\|\.file\|file object" src/ --type py | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check if there's a requirements.txt to see llama-stack-client version
cat requirements.txt | grep -i llamaRepository: lightspeed-core/lightspeed-stack
Length of output: 124
🏁 Script executed:
# Look for requirements files with different names
find . -maxdepth 2 -name "*.txt" -o -name "pyproject.toml" -o -name "setup.py" | grep -E "req|setup|pyproject"Repository: lightspeed-core/lightspeed-stack
Length of output: 239
🏁 Script executed:
# Check for pyproject.toml or setup.py with dependencies
ls -la | grep -E "requirements|setup|pyproject"Repository: lightspeed-core/lightspeed-stack
Length of output: 539
🏁 Script executed:
# Look for any configuration about file size limits
rg -n "max.*size\|MAX.*SIZE\|file.*limit\|upload.*limit" src/ --type py -i | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check pyproject.toml for llama-stack-client version
cat pyproject.toml | grep -A 5 -B 5 "llama"Repository: lightspeed-core/lightspeed-stack
Length of output: 813
🏁 Script executed:
# Check if the UploadFile's underlying file object is accessible
# FastAPI's UploadFile has a .file property that is SpooledTemporaryFile
# Let's see if there are any patterns in the code
rg -n "SpooledTemporaryFile\|BinaryIO\|AsyncBinaryIO" src/ --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check the full function signature and see what happens with file after the read
cat -n src/app/endpoints/vector_stores.py | sed -n '417,490p'Repository: lightspeed-core/lightspeed-stack
Length of output: 3117
🏁 Script executed:
# Look for the error handling to see what status code is returned for large files
cat -n src/app/endpoints/vector_stores.py | sed -n '490,510p'Repository: lightspeed-core/lightspeed-stack
Length of output: 982
🏁 Script executed:
# Search for any MAX_FILE_SIZE or similar constants
rg -n "MAX\|LIMIT\|SIZE" src/ --type py | grep -i "file\|upload\|size" | head -15Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🌐 Web query:
llama-stack-client files.create() file parameter accept bytes BytesIO SpooledTemporaryFile
💡 Result:
client.files.create(file=...) accepts FileTypes, where FileContent = IO[bytes] | bytes | PathLike[...]. So yes: bytes, io.BytesIO, and tempfile.SpooledTemporaryFile (opened in binary mode) are valid inputs. [1] [2]
What you typically want in practice is to also provide a filename (and optionally a MIME type), because that’s how the client/server can infer file type (especially for BytesIO, which has no .name). The supported “rich” forms are tuples like: (filename, contents) or (filename, contents, content_type) (and optionally headers). [1]
Examples:
from io import BytesIO
from llama_stack_client import LlamaStackClient
client = LlamaStackClient()
bio = BytesIO(b"%PDF-...") # or any bytes
client.files.create(
file=("doc.pdf", bio, "application/pdf"),
purpose="assistants",
)import tempfile
from llama_stack_client import LlamaStackClient
client = LlamaStackClient()
with tempfile.SpooledTemporaryFile(mode="w+b") as f:
f.write(b"hello")
f.seek(0)
client.files.create(
file=("hello.txt", f, "text/plain"),
purpose="assistants",
)If you already have raw bytes and don’t care about name/type, you can pass them directly:
client.files.create(file=b"hello", purpose="assistants")But supplying a filename is safer for type detection. [1]
Sources:
[1] FileTypes / FileContent definitions in _types.py (union includes IO[bytes], bytes, tuple variants) (github.com)
[2] FileCreateParams.file: Required[FileTypes] (github.com)
[3] README “File uploads” (bytes, PathLike, or (filename, contents, media type) tuple) (github.com)
Citations:
- 1: https://github.com/llamastack/llama-stack-client-python/blob/main/src/llama_stack_client/_types.py
- 2: https://github.com/llamastack/llama-stack-client-python/blob/main/src/llama_stack_client/types/file_create_params.py
- 3: https://github.com/meta-llama/llama-stack-client-python
Avoid double-buffering the file into memory.
await file.read() loads the full payload into bytes, and BytesIO(content) copies it again. For large files this doubles memory consumption. Instead, pass the bytes directly using a tuple with the filename: await client.files.create(file=(filename, content), purpose="assistants"). Alternatively, pass the underlying spooled file object directly: await client.files.create(file=(filename, file.file), purpose="assistants").
Additionally, the docstring claims to raise HTTPException for large files, but there is no size validation before reading. Add a check using the Content-Length header before materializing the payload, or configure a file size limit upstream.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/vector_stores.py` around lines 450 - 468, The code
double-buffers uploaded files by calling await file.read() then wrapping into
BytesIO(content); change to pass the payload directly to the Llama client as a
tuple (filename, content) or (filename, file.file) to avoid copying (use
client.files.create(file=(filename, content), purpose="assistants") or
client.files.create(file=(filename, file.file), purpose="assistants")), remove
the BytesIO usage (file_bytes and its .name assignment), and update logging to
use the filename and size without forcing a second copy; also add a pre-read
size check using the request's Content-Length header (and raise HTTPException
when over the allowed limit) so you validate large uploads before materializing
them into memory.
| Raises: | ||
| HTTPException: | ||
| - 401: Authentication failed | ||
| - 403: Authorization failed | ||
| - 404: Vector store or file not found | ||
| - 500: Lightspeed Stack configuration not loaded | ||
| - 503: Unable to connect to Llama Stack |
There was a problem hiding this comment.
Don’t hard-code the 404 to body.file_id.
This path is documented as “vector store or file not found”, but the BadRequestError branch always returns a not-found payload for body.file_id. If vector_store_id is the missing resource, callers get a misleading error body. Inspect the backend error before choosing the resource, or return a neutral 404 payload.
Also applies to: 600-605
| config_dict = get_test_config() | ||
| cfg = AppConfig() | ||
| cfg.init_from_dict(config_dict) | ||
|
|
||
| mock_client = mocker.AsyncMock() | ||
| mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store") | ||
| mock_lsc = mocker.patch( | ||
| "app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client" | ||
| ) | ||
| mock_lsc.return_value = mock_client | ||
| mocker.patch("app.endpoints.vector_stores.configuration", cfg) | ||
|
|
||
| request = get_test_request() | ||
| auth = get_test_auth() | ||
| body = VectorStoreCreateRequest(name="test_store") | ||
|
|
||
| response = await create_vector_store(request=request, auth=auth, body=body) | ||
| assert response is not None | ||
| assert response.id == "vs_123" | ||
| assert response.name == "test_store" | ||
| assert response.status == "active" | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a test for the extra_body translation.
This success case only sends name, so it never verifies the new adapter logic that strips provider_id, embedding_model, and embedding_dimension out of the request body and forwards them via extra_body. That is the core wrapper behavior this PR is adding.
Suggested test shape
- body = VectorStoreCreateRequest(name="test_store")
+ body = VectorStoreCreateRequest(
+ name="test_store",
+ provider_id="rhdh-docs",
+ embedding_model="text-embedding-ada-002",
+ embedding_dimension=1536,
+ )
response = await create_vector_store(request=request, auth=auth, body=body)
+ mock_client.vector_stores.create.assert_awaited_once_with(
+ name="test_store",
+ extra_body={
+ "provider_id": "rhdh-docs",
+ "embedding_model": "text-embedding-ada-002",
+ "embedding_dimension": 1536,
+ },
+ )
assert response is not None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| config_dict = get_test_config() | |
| cfg = AppConfig() | |
| cfg.init_from_dict(config_dict) | |
| mock_client = mocker.AsyncMock() | |
| mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store") | |
| mock_lsc = mocker.patch( | |
| "app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client" | |
| ) | |
| mock_lsc.return_value = mock_client | |
| mocker.patch("app.endpoints.vector_stores.configuration", cfg) | |
| request = get_test_request() | |
| auth = get_test_auth() | |
| body = VectorStoreCreateRequest(name="test_store") | |
| response = await create_vector_store(request=request, auth=auth, body=body) | |
| assert response is not None | |
| assert response.id == "vs_123" | |
| assert response.name == "test_store" | |
| assert response.status == "active" | |
| config_dict = get_test_config() | |
| cfg = AppConfig() | |
| cfg.init_from_dict(config_dict) | |
| mock_client = mocker.AsyncMock() | |
| mock_client.vector_stores.create.return_value = VectorStore("vs_123", "test_store") | |
| mock_lsc = mocker.patch( | |
| "app.endpoints.vector_stores.AsyncLlamaStackClientHolder.get_client" | |
| ) | |
| mock_lsc.return_value = mock_client | |
| mocker.patch("app.endpoints.vector_stores.configuration", cfg) | |
| request = get_test_request() | |
| auth = get_test_auth() | |
| body = VectorStoreCreateRequest( | |
| name="test_store", | |
| provider_id="rhdh-docs", | |
| embedding_model="text-embedding-ada-002", | |
| embedding_dimension=1536, | |
| ) | |
| response = await create_vector_store(request=request, auth=auth, body=body) | |
| mock_client.vector_stores.create.assert_awaited_once_with( | |
| name="test_store", | |
| extra_body={ | |
| "provider_id": "rhdh-docs", | |
| "embedding_model": "text-embedding-ada-002", | |
| "embedding_dimension": 1536, | |
| }, | |
| ) | |
| assert response is not None | |
| assert response.id == "vs_123" | |
| assert response.name == "test_store" | |
| assert response.status == "active" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 184 - 205, Add a
test case that verifies the adapter moves provider_id, embedding_model, and
embedding_dimension from the main request into extra_body when calling the
client: use the existing test setup (get_test_request, get_test_auth,
VectorStoreCreateRequest) but include provider_id, embedding_model, and
embedding_dimension in the body; patch AsyncLlamaStackClientHolder.get_client to
return mock_client and assert mock_client.vector_stores.create was called with
extra_body containing those three keys (and that the top-level body forwarded to
create does not contain them) while still returning a
VectorStore("vs_123","test_store") and asserting the response fields as in the
current test.
Description
The proposed architectural update aims to decouple the Red Hat Developer Hub (RHDH) plugin from its current tight dependency on the Llama Stack by centralizing all vector store operations within lightspeed-core. Currently, the RHDH plugin directly manages sessions and documents through multiple Llama Stack API calls, making it difficult to switch to alternative vector stores like ChromaDB or Pinecone without significant refactoring. By implementing a proxy/wrapper pattern, lightspeed-core will host domain-focused REST endpoints and handle all business logic, while the RHDH plugin is reduced to a thin HTTP client. This transition ensures a single source of truth for notebook logic, improves maintainability, and allows for backend infrastructure changes without impacting the frontend plugin.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
API / Schemas
Tests