feat(inference): multi-route proxy with alias-based model routing#618
feat(inference): multi-route proxy with alias-based model routing#618cosmicnet wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
Add pattern detection, provider profile, and validation probe for Ollama's native /api/chat, /api/tags, and /api/show endpoints. Proxy changes (l7/inference.rs): - POST /api/chat -> ollama_chat protocol - GET /api/tags -> ollama_model_discovery protocol - POST /api/show -> ollama_model_discovery protocol Provider profile (openshell-core/inference.rs): - New 'ollama' provider type with default endpoint http://host.openshell.internal:11434 - Supports ollama_chat, ollama_model_discovery, and OpenAI-compatible protocols (openai_chat_completions, openai_completions, model_discovery) - Credential lookup via OLLAMA_API_KEY, base URL via OLLAMA_BASE_URL Validation (backend.rs): - Ollama validation probe sends minimal /api/chat request with stream:false Tests: 4 new tests for pattern detection (ollama chat, tags, show, and GET /api/chat rejection). Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Proto: add InferenceModelEntry message with alias/provider/model fields; add repeated models field to ClusterInferenceConfig, Set/Get request/response - Server: add upsert_multi_model_route() for storing multiple model entries under a single route slot; update resolve_route_by_name() to expand multi-model configs into per-alias ResolvedRoute entries - Router: add select_route() with alias-first, protocol-fallback strategy; add model_hint parameter to proxy_with_candidates() variants - Sandbox proxy: extract model field from JSON body as routing hint - Tests: 7 new tests covering select_route, multi-model resolution, and bundle expansion; all 291 existing tests continue to pass Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
There was a problem hiding this comment.
Pull request overview
Adds multi-route inference proxying so sandboxes can route inference.local requests to multiple LLM backends by using a model alias in the request body.
Changes:
- Extends the inference proto + gateway storage to support multiple
(alias, provider_name, model_id)entries per route. - Adds alias-first route selection in the router and passes a
model_hintextracted from sandbox request bodies. - Expands sandbox L7 inference patterns and adds an Ollama provider profile + endpoint validation probe.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/inference.proto | Adds InferenceModelEntry and models fields for multi-model inference config. |
| crates/openshell-server/src/inference.rs | Implements multi-model upsert + resolves each alias into separate ResolvedRoute entries. |
| crates/openshell-sandbox/src/proxy.rs | Extracts model from JSON body and forwards it as model_hint to the router. |
| crates/openshell-sandbox/src/l7/inference.rs | Adds Codex + Ollama native API patterns and tests. |
| crates/openshell-router/src/lib.rs | Adds select_route() and extends proxy APIs to accept model_hint. |
| crates/openshell-router/src/backend.rs | Adds Ollama validation probe and changes backend URL construction behavior. |
| crates/openshell-router/tests/backend_integration.rs | Updates tests for new proxy function signatures and /v1 endpoint expectations. |
| crates/openshell-core/src/inference.rs | Adds OLLAMA_PROFILE (protocols/base URL/config keys). |
| crates/openshell-cli/src/run.rs | Adds gateway_inference_set_multi() to send multi-model configs. |
| crates/openshell-cli/src/main.rs | Adds --model-alias ALIAS=PROVIDER/MODEL CLI flag and dispatch. |
| architecture/inference-routing.md | Documents alias-based route selection, new patterns, and multi-model route behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Validate aliases are unique. | ||
| let mut seen_aliases = std::collections::HashSet::new(); | ||
| for entry in models { | ||
| if entry.alias.trim().is_empty() { | ||
| return Err(Status::invalid_argument( | ||
| "each model entry must have a non-empty alias", | ||
| )); |
There was a problem hiding this comment.
upsert_multi_model_route allows aliases that can collide with reserved route identifiers (e.g. "sandbox-system"), but downstream the sandbox partitions routes into user vs system caches by comparing ResolvedRoute.name == "sandbox-system". A user-configured alias collision would misclassify routes (and potentially let user config influence system inference). Reject reserved names (case-insensitive, trimmed) such as "sandbox-system" (and likely "inference.local"/empty) during validation, or add a separate alias field instead of reusing name.
There was a problem hiding this comment.
Fixed: Added RESERVED_ALIASES validation rejecting "sandbox-system" and "inference.local" (case-insensitive, trimmed) in upsert_multi_model_route(). Also blocked --model-alias + --system at the CLI level. Test added.
| // Multi-model path: each model entry becomes a separate ResolvedRoute with name = alias. | ||
| if !config.models.is_empty() { | ||
| let mut results = Vec::with_capacity(config.models.len()); | ||
| for entry in &config.models { | ||
| let provider = store | ||
| .get_message_by_name::<Provider>(&entry.provider_name) | ||
| .await | ||
| .map_err(|e| Status::internal(format!("fetch provider failed: {e}")))? | ||
| .ok_or_else(|| { | ||
| Status::failed_precondition(format!( | ||
| "configured provider '{}' was not found", | ||
| entry.provider_name | ||
| )) | ||
| })?; | ||
| let resolved = resolve_provider_route(&provider)?; | ||
| results.push(ResolvedRoute { | ||
| name: entry.alias.clone(), | ||
| base_url: resolved.route.endpoint, | ||
| model_id: entry.model_id.clone(), |
There was a problem hiding this comment.
In the multi-model bundle resolution, ResolvedRoute.name is set to the alias (entry.alias). The sandbox currently uses ResolvedRoute.name to partition routes into user vs system caches (routes named exactly "sandbox-system" go to the system cache). With this change, a multi-model system route will be mis-partitioned into the user cache (and system cache will be empty). Consider adding an explicit route_group/route_name field to ResolvedRoute, or encoding both route name and alias in a stable way and updating the sandbox partition logic accordingly.
There was a problem hiding this comment.
Addressed by blocking --model-alias combined with --system at CLI level (conflicts_with_all), so multi-model bundles can never target the system route. Combined with the reserved alias rejection above, this prevents both paths to the issue.
crates/openshell-router/src/lib.rs
Outdated
| if let Some(r) = candidates | ||
| .iter() | ||
| .find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol)) |
There was a problem hiding this comment.
Alias routing currently does an exact, case-sensitive match (r.name == hint), but the gateway validates alias uniqueness case-insensitively (to_ascii_lowercase()) and the CLI does not normalize. This inconsistency makes aliases like "My-GPT" behave unpredictably for agents. Normalize both the stored route name and the model_hint (trim + ASCII lowercase) before comparison, or document/enforce a single canonical form at config time.
| if let Some(r) = candidates | |
| .iter() | |
| .find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol)) | |
| let normalized_hint = hint.trim().to_ascii_lowercase(); | |
| if let Some(r) = candidates | |
| .iter() | |
| .find(|r| { | |
| r.name.trim().to_ascii_lowercase() == normalized_hint | |
| && r.protocols.iter().any(|p| p == protocol) | |
| }) |
There was a problem hiding this comment.
Fixed. select_route() now normalises both sides with trim().to_ascii_lowercase(). Test added.
| // The inference proxy always intercepts paths starting with /v1/ | ||
| // (the inference.local convention). Strip this prefix so the | ||
| // upstream URL uses only the remainder appended to the endpoint. | ||
| // Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep | ||
| // it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get | ||
| // just the path suffix (e.g. /codex/responses). | ||
| if path == "/v1" || path.starts_with("/v1/") { |
There was a problem hiding this comment.
build_backend_url() now strips the /v1 prefix from all /v1/... request paths regardless of whether the configured endpoint already includes /v1. This is a behavior change that will break providers/endpoints configured without a /v1 suffix but that expect versioned paths (e.g. https://api.anthropic.com + /v1/messages would become /messages). Consider restoring the prior “dedupe only when endpoint already ends with /v1” logic, or add protocol-aware/path-aware handling (e.g. strip only for Codex/non-versioned backends) and/or validate/normalize configured base URLs up front.
| // The inference proxy always intercepts paths starting with /v1/ | |
| // (the inference.local convention). Strip this prefix so the | |
| // upstream URL uses only the remainder appended to the endpoint. | |
| // Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep | |
| // it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get | |
| // just the path suffix (e.g. /codex/responses). | |
| if path == "/v1" || path.starts_with("/v1/") { | |
| // If the upstream endpoint already ends with /v1 (e.g. api.openai.com/v1), | |
| // and the incoming path is /v1 or /v1/..., strip the duplicate /v1 | |
| // from the path. Otherwise, preserve the path as-is so versioned | |
| // paths (e.g. /v1/messages) still reach unversioned endpoints | |
| // (e.g. https://api.anthropic.com) correctly. | |
| if base.ends_with("/v1") && (path == "/v1" || path.starts_with("/v1/")) { |
There was a problem hiding this comment.
Restored dedup-only logic as suggested. /v1 is stripped from the path only when the endpoint already ends with /v1. All built-in provider profiles include /v1 in default_base_url, so this handles every real case correctly. Tests updated.
- Add --model-alias flag to 'inference set' for multi-model config (e.g. --model-alias gpt=openai/gpt-4 --model-alias claude=anthropic/claude-sonnet-4-20250514) - Add gateway_inference_set_multi() handler in run.rs - Update inference get/print to display multi-model entries - Import InferenceModelEntry proto type in CLI - Fix build_backend_url to always strip /v1 prefix for codex paths - Add /v1/codex/* inference pattern for openai_responses protocol - Fix backend tests to use /v1 endpoint suffix Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
af1748b to
ab71175
Compare
Summary
Adds multi-route inference proxy support, allowing sandboxed agents to reach multiple LLM providers (OpenAI, Anthropic, NVIDIA, Ollama) through a single
inference.localendpoint. Agents select a backend by setting themodelfield to an alias name. Also adds Ollama native API support and Codex URL pattern matching.Related Issue
Closes #203
Changes
InferenceModelEntrymessage (alias,provider_name,model_id); addmodelsrepeated field to set/get request/response messagesupsert_multi_model_route()validates and stores multiple alias→provider mappings; resolves each entry into a separateResolvedRouteat bundle timeselect_route()implements alias-first, protocol-fallback selection;proxy_with_candidates/proxy_with_candidates_streamingaccept optionalmodel_hintmodelfield from request body asmodel_hintfor route selection/v1/codex/*,/api/chat,/api/tags,/api/showinference patternsbuild_backend_url()always strips/v1prefix to support both versioned and non-versioned endpoints (e.g. Codex)OLLAMA_PROFILEprovider profile with native + OpenAI-compat protocols--model-alias ALIAS=PROVIDER/MODELflag (repeatable, conflicts with--provider/--model)inference-routing.mdwith all new sectionsTesting
mise run pre-commitpassesChecklist