Skip to content

feat(inference): multi-route proxy with alias-based model routing#618

Open
cosmicnet wants to merge 3 commits intoNVIDIA:mainfrom
cosmicnet:203-multi-route-inference/lh
Open

feat(inference): multi-route proxy with alias-based model routing#618
cosmicnet wants to merge 3 commits intoNVIDIA:mainfrom
cosmicnet:203-multi-route-inference/lh

Conversation

@cosmicnet
Copy link

@cosmicnet cosmicnet commented Mar 25, 2026

Summary

Adds multi-route inference proxy support, allowing sandboxed agents to reach multiple LLM providers (OpenAI, Anthropic, NVIDIA, Ollama) through a single inference.local endpoint. Agents select a backend by setting the model field to an alias name. Also adds Ollama native API support and Codex URL pattern matching.

Related Issue

Closes #203

Changes

  • Proto: Add InferenceModelEntry message (alias, provider_name, model_id); add models repeated field to set/get request/response messages
  • Server: upsert_multi_model_route() validates and stores multiple alias→provider mappings; resolves each entry into a separate ResolvedRoute at bundle time
  • Router: select_route() implements alias-first, protocol-fallback selection; proxy_with_candidates/proxy_with_candidates_streaming accept optional model_hint
  • Sandbox proxy: Extracts model field from request body as model_hint for route selection
  • Sandbox L7: Add /v1/codex/*, /api/chat, /api/tags, /api/show inference patterns
  • Backend: build_backend_url() always strips /v1 prefix to support both versioned and non-versioned endpoints (e.g. Codex)
  • Core: Add OLLAMA_PROFILE provider profile with native + OpenAI-compat protocols
  • CLI: --model-alias ALIAS=PROVIDER/MODEL flag (repeatable, conflicts with --provider/--model)
  • Architecture docs: Updated inference-routing.md with all new sections

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Add pattern detection, provider profile, and validation probe for
Ollama's native /api/chat, /api/tags, and /api/show endpoints.

Proxy changes (l7/inference.rs):
- POST /api/chat -> ollama_chat protocol
- GET /api/tags -> ollama_model_discovery protocol
- POST /api/show -> ollama_model_discovery protocol

Provider profile (openshell-core/inference.rs):
- New 'ollama' provider type with default endpoint
  http://host.openshell.internal:11434
- Supports ollama_chat, ollama_model_discovery, and OpenAI-compatible
  protocols (openai_chat_completions, openai_completions, model_discovery)
- Credential lookup via OLLAMA_API_KEY, base URL via OLLAMA_BASE_URL

Validation (backend.rs):
- Ollama validation probe sends minimal /api/chat request with stream:false

Tests: 4 new tests for pattern detection (ollama chat, tags, show,
and GET /api/chat rejection).

Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Proto: add InferenceModelEntry message with alias/provider/model fields;
  add repeated models field to ClusterInferenceConfig, Set/Get request/response
- Server: add upsert_multi_model_route() for storing multiple model entries
  under a single route slot; update resolve_route_by_name() to expand
  multi-model configs into per-alias ResolvedRoute entries
- Router: add select_route() with alias-first, protocol-fallback strategy;
  add model_hint parameter to proxy_with_candidates() variants
- Sandbox proxy: extract model field from JSON body as routing hint
- Tests: 7 new tests covering select_route, multi-model resolution, and
  bundle expansion; all 291 existing tests continue to pass

Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
@cosmicnet cosmicnet requested a review from a team as a code owner March 25, 2026 23:52
Copilot AI review requested due to automatic review settings March 25, 2026 23:52
@github-actions
Copy link

github-actions bot commented Mar 25, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@cosmicnet
Copy link
Author

I have read the DCO document and I hereby sign the DCO.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds multi-route inference proxying so sandboxes can route inference.local requests to multiple LLM backends by using a model alias in the request body.

Changes:

  • Extends the inference proto + gateway storage to support multiple (alias, provider_name, model_id) entries per route.
  • Adds alias-first route selection in the router and passes a model_hint extracted from sandbox request bodies.
  • Expands sandbox L7 inference patterns and adds an Ollama provider profile + endpoint validation probe.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
proto/inference.proto Adds InferenceModelEntry and models fields for multi-model inference config.
crates/openshell-server/src/inference.rs Implements multi-model upsert + resolves each alias into separate ResolvedRoute entries.
crates/openshell-sandbox/src/proxy.rs Extracts model from JSON body and forwards it as model_hint to the router.
crates/openshell-sandbox/src/l7/inference.rs Adds Codex + Ollama native API patterns and tests.
crates/openshell-router/src/lib.rs Adds select_route() and extends proxy APIs to accept model_hint.
crates/openshell-router/src/backend.rs Adds Ollama validation probe and changes backend URL construction behavior.
crates/openshell-router/tests/backend_integration.rs Updates tests for new proxy function signatures and /v1 endpoint expectations.
crates/openshell-core/src/inference.rs Adds OLLAMA_PROFILE (protocols/base URL/config keys).
crates/openshell-cli/src/run.rs Adds gateway_inference_set_multi() to send multi-model configs.
crates/openshell-cli/src/main.rs Adds --model-alias ALIAS=PROVIDER/MODEL CLI flag and dispatch.
architecture/inference-routing.md Documents alias-based route selection, new patterns, and multi-model route behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +252 to +258
// Validate aliases are unique.
let mut seen_aliases = std::collections::HashSet::new();
for entry in models {
if entry.alias.trim().is_empty() {
return Err(Status::invalid_argument(
"each model entry must have a non-empty alias",
));
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsert_multi_model_route allows aliases that can collide with reserved route identifiers (e.g. "sandbox-system"), but downstream the sandbox partitions routes into user vs system caches by comparing ResolvedRoute.name == "sandbox-system". A user-configured alias collision would misclassify routes (and potentially let user config influence system inference). Reject reserved names (case-insensitive, trimmed) such as "sandbox-system" (and likely "inference.local"/empty) during validation, or add a separate alias field instead of reusing name.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Added RESERVED_ALIASES validation rejecting "sandbox-system" and "inference.local" (case-insensitive, trimmed) in upsert_multi_model_route(). Also blocked --model-alias + --system at the CLI level. Test added.

Comment on lines +555 to +573
// Multi-model path: each model entry becomes a separate ResolvedRoute with name = alias.
if !config.models.is_empty() {
let mut results = Vec::with_capacity(config.models.len());
for entry in &config.models {
let provider = store
.get_message_by_name::<Provider>(&entry.provider_name)
.await
.map_err(|e| Status::internal(format!("fetch provider failed: {e}")))?
.ok_or_else(|| {
Status::failed_precondition(format!(
"configured provider '{}' was not found",
entry.provider_name
))
})?;
let resolved = resolve_provider_route(&provider)?;
results.push(ResolvedRoute {
name: entry.alias.clone(),
base_url: resolved.route.endpoint,
model_id: entry.model_id.clone(),
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multi-model bundle resolution, ResolvedRoute.name is set to the alias (entry.alias). The sandbox currently uses ResolvedRoute.name to partition routes into user vs system caches (routes named exactly "sandbox-system" go to the system cache). With this change, a multi-model system route will be mis-partitioned into the user cache (and system cache will be empty). Consider adding an explicit route_group/route_name field to ResolvedRoute, or encoding both route name and alias in a stable way and updating the sandbox partition logic accordingly.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by blocking --model-alias combined with --system at CLI level (conflicts_with_all), so multi-model bundles can never target the system route. Combined with the reserved alias rejection above, this prevents both paths to the issue.

Comment on lines +50 to +52
if let Some(r) = candidates
.iter()
.find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol))
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias routing currently does an exact, case-sensitive match (r.name == hint), but the gateway validates alias uniqueness case-insensitively (to_ascii_lowercase()) and the CLI does not normalize. This inconsistency makes aliases like "My-GPT" behave unpredictably for agents. Normalize both the stored route name and the model_hint (trim + ASCII lowercase) before comparison, or document/enforce a single canonical form at config time.

Suggested change
if let Some(r) = candidates
.iter()
.find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol))
let normalized_hint = hint.trim().to_ascii_lowercase();
if let Some(r) = candidates
.iter()
.find(|r| {
r.name.trim().to_ascii_lowercase() == normalized_hint
&& r.protocols.iter().any(|p| p == protocol)
})

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. select_route() now normalises both sides with trim().to_ascii_lowercase(). Test added.

Comment on lines +438 to +444
// The inference proxy always intercepts paths starting with /v1/
// (the inference.local convention). Strip this prefix so the
// upstream URL uses only the remainder appended to the endpoint.
// Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep
// it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get
// just the path suffix (e.g. /codex/responses).
if path == "/v1" || path.starts_with("/v1/") {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_backend_url() now strips the /v1 prefix from all /v1/... request paths regardless of whether the configured endpoint already includes /v1. This is a behavior change that will break providers/endpoints configured without a /v1 suffix but that expect versioned paths (e.g. https://api.anthropic.com + /v1/messages would become /messages). Consider restoring the prior “dedupe only when endpoint already ends with /v1” logic, or add protocol-aware/path-aware handling (e.g. strip only for Codex/non-versioned backends) and/or validate/normalize configured base URLs up front.

Suggested change
// The inference proxy always intercepts paths starting with /v1/
// (the inference.local convention). Strip this prefix so the
// upstream URL uses only the remainder appended to the endpoint.
// Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep
// it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get
// just the path suffix (e.g. /codex/responses).
if path == "/v1" || path.starts_with("/v1/") {
// If the upstream endpoint already ends with /v1 (e.g. api.openai.com/v1),
// and the incoming path is /v1 or /v1/..., strip the duplicate /v1
// from the path. Otherwise, preserve the path as-is so versioned
// paths (e.g. /v1/messages) still reach unversioned endpoints
// (e.g. https://api.anthropic.com) correctly.
if base.ends_with("/v1") && (path == "/v1" || path.starts_with("/v1/")) {

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored dedup-only logic as suggested. /v1 is stripped from the path only when the endpoint already ends with /v1. All built-in provider profiles include /v1 in default_base_url, so this handles every real case correctly. Tests updated.

- Add --model-alias flag to 'inference set' for multi-model config
  (e.g. --model-alias gpt=openai/gpt-4 --model-alias claude=anthropic/claude-sonnet-4-20250514)
- Add gateway_inference_set_multi() handler in run.rs
- Update inference get/print to display multi-model entries
- Import InferenceModelEntry proto type in CLI
- Fix build_backend_url to always strip /v1 prefix for codex paths
- Add /v1/codex/* inference pattern for openai_responses protocol
- Fix backend tests to use /v1 endpoint suffix

Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
@cosmicnet cosmicnet force-pushed the 203-multi-route-inference/lh branch from af1748b to ab71175 Compare March 26, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: support multiple models with alias-based routing in inference

2 participants