feat(inference): multi-route proxy with alias-based model routing by cosmicnet · Pull Request #618 · NVIDIA/OpenShell

cosmicnet · 2026-03-25T23:52:08Z

Summary

Adds multi-route inference proxy support, allowing sandboxed agents to reach multiple LLM providers (OpenAI, Anthropic, NVIDIA, Ollama) through a single inference.local endpoint. Agents select a backend by setting the model field to an alias name. Also adds Ollama native API support and Codex URL pattern matching.

Related Issue

Closes #203

Changes

Proto: Add InferenceModelEntry message (alias, provider_name, model_id); add models repeated field to set/get request/response messages
Server: upsert_multi_model_route() validates and stores multiple alias→provider mappings; resolves each entry into a separate ResolvedRoute at bundle time
Router: select_route() implements alias-first, protocol-fallback selection; proxy_with_candidates/proxy_with_candidates_streaming accept optional model_hint
Sandbox proxy: Extracts model field from request body as model_hint for route selection
Sandbox L7: Add /v1/codex/*, /api/chat, /api/tags, /api/show inference patterns
Backend: build_backend_url() always strips /v1 prefix to support both versioned and non-versioned endpoints (e.g. Codex)
Core: Add OLLAMA_PROFILE provider profile with native + OpenAI-compat protocols
CLI: --model-alias ALIAS=PROVIDER/MODEL flag (repeatable, conflicts with --provider/--model)
Architecture docs: Updated inference-routing.md with all new sections

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

Add pattern detection, provider profile, and validation probe for Ollama's native /api/chat, /api/tags, and /api/show endpoints. Proxy changes (l7/inference.rs): - POST /api/chat -> ollama_chat protocol - GET /api/tags -> ollama_model_discovery protocol - POST /api/show -> ollama_model_discovery protocol Provider profile (openshell-core/inference.rs): - New 'ollama' provider type with default endpoint http://host.openshell.internal:11434 - Supports ollama_chat, ollama_model_discovery, and OpenAI-compatible protocols (openai_chat_completions, openai_completions, model_discovery) - Credential lookup via OLLAMA_API_KEY, base URL via OLLAMA_BASE_URL Validation (backend.rs): - Ollama validation probe sends minimal /api/chat request with stream:false Tests: 4 new tests for pattern detection (ollama chat, tags, show, and GET /api/chat rejection). Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>

- Proto: add InferenceModelEntry message with alias/provider/model fields; add repeated models field to ClusterInferenceConfig, Set/Get request/response - Server: add upsert_multi_model_route() for storing multiple model entries under a single route slot; update resolve_route_by_name() to expand multi-model configs into per-alias ResolvedRoute entries - Router: add select_route() with alias-first, protocol-fallback strategy; add model_hint parameter to proxy_with_candidates() variants - Sandbox proxy: extract model field from JSON body as routing hint - Tests: 7 new tests covering select_route, multi-model resolution, and bundle expansion; all 291 existing tests continue to pass Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>

github-actions · 2026-03-25T23:52:18Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

cosmicnet · 2026-03-25T23:53:15Z

I have read the DCO document and I hereby sign the DCO.

Copilot

Pull request overview

Adds multi-route inference proxying so sandboxes can route inference.local requests to multiple LLM backends by using a model alias in the request body.

Changes:

Extends the inference proto + gateway storage to support multiple (alias, provider_name, model_id) entries per route.
Adds alias-first route selection in the router and passes a model_hint extracted from sandbox request bodies.
Expands sandbox L7 inference patterns and adds an Ollama provider profile + endpoint validation probe.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
proto/inference.proto	Adds `InferenceModelEntry` and `models` fields for multi-model inference config.
crates/openshell-server/src/inference.rs	Implements multi-model upsert + resolves each alias into separate `ResolvedRoute` entries.
crates/openshell-sandbox/src/proxy.rs	Extracts `model` from JSON body and forwards it as `model_hint` to the router.
crates/openshell-sandbox/src/l7/inference.rs	Adds Codex + Ollama native API patterns and tests.
crates/openshell-router/src/lib.rs	Adds `select_route()` and extends proxy APIs to accept `model_hint`.
crates/openshell-router/src/backend.rs	Adds Ollama validation probe and changes backend URL construction behavior.
crates/openshell-router/tests/backend_integration.rs	Updates tests for new proxy function signatures and `/v1` endpoint expectations.
crates/openshell-core/src/inference.rs	Adds `OLLAMA_PROFILE` (protocols/base URL/config keys).
crates/openshell-cli/src/run.rs	Adds `gateway_inference_set_multi()` to send multi-model configs.
crates/openshell-cli/src/main.rs	Adds `--model-alias ALIAS=PROVIDER/MODEL` CLI flag and dispatch.
architecture/inference-routing.md	Documents alias-based route selection, new patterns, and multi-model route behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T23:57:14Z

crates/openshell-server/src/inference.rs

+    // Validate aliases are unique.
+    let mut seen_aliases = std::collections::HashSet::new();
+    for entry in models {
+        if entry.alias.trim().is_empty() {
+            return Err(Status::invalid_argument(
+                "each model entry must have a non-empty alias",
+            ));


upsert_multi_model_route allows aliases that can collide with reserved route identifiers (e.g. "sandbox-system"), but downstream the sandbox partitions routes into user vs system caches by comparing ResolvedRoute.name == "sandbox-system". A user-configured alias collision would misclassify routes (and potentially let user config influence system inference). Reject reserved names (case-insensitive, trimmed) such as "sandbox-system" (and likely "inference.local"/empty) during validation, or add a separate alias field instead of reusing name.

Fixed: Added RESERVED_ALIASES validation rejecting "sandbox-system" and "inference.local" (case-insensitive, trimmed) in upsert_multi_model_route(). Also blocked --model-alias + --system at the CLI level. Test added.

Copilot · 2026-03-25T23:57:14Z

crates/openshell-server/src/inference.rs

+    // Multi-model path: each model entry becomes a separate ResolvedRoute with name = alias.
+    if !config.models.is_empty() {
+        let mut results = Vec::with_capacity(config.models.len());
+        for entry in &config.models {
+            let provider = store
+                .get_message_by_name::<Provider>(&entry.provider_name)
+                .await
+                .map_err(|e| Status::internal(format!("fetch provider failed: {e}")))?
+                .ok_or_else(|| {
+                    Status::failed_precondition(format!(
+                        "configured provider '{}' was not found",
+                        entry.provider_name
+                    ))
+                })?;
+            let resolved = resolve_provider_route(&provider)?;
+            results.push(ResolvedRoute {
+                name: entry.alias.clone(),
+                base_url: resolved.route.endpoint,
+                model_id: entry.model_id.clone(),


In the multi-model bundle resolution, ResolvedRoute.name is set to the alias (entry.alias). The sandbox currently uses ResolvedRoute.name to partition routes into user vs system caches (routes named exactly "sandbox-system" go to the system cache). With this change, a multi-model system route will be mis-partitioned into the user cache (and system cache will be empty). Consider adding an explicit route_group/route_name field to ResolvedRoute, or encoding both route name and alias in a stable way and updating the sandbox partition logic accordingly.

Addressed by blocking --model-alias combined with --system at CLI level (conflicts_with_all), so multi-model bundles can never target the system route. Combined with the reserved alias rejection above, this prevents both paths to the issue.

Copilot · 2026-03-25T23:57:14Z

crates/openshell-router/src/lib.rs

+        if let Some(r) = candidates
+            .iter()
+            .find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol))


Alias routing currently does an exact, case-sensitive match (r.name == hint), but the gateway validates alias uniqueness case-insensitively (to_ascii_lowercase()) and the CLI does not normalize. This inconsistency makes aliases like "My-GPT" behave unpredictably for agents. Normalize both the stored route name and the model_hint (trim + ASCII lowercase) before comparison, or document/enforce a single canonical form at config time.

Suggested change

if let Some(r) = candidates

.iter()

.find(|r| r.name == hint && r.protocols.iter().any(|p| p == protocol))

let normalized_hint = hint.trim().to_ascii_lowercase();

if let Some(r) = candidates

.iter()

.find(|r| {

r.name.trim().to_ascii_lowercase() == normalized_hint

&& r.protocols.iter().any(|p| p == protocol)

})

Fixed. select_route() now normalises both sides with trim().to_ascii_lowercase(). Test added.

Copilot · 2026-03-25T23:57:15Z

crates/openshell-router/src/backend.rs

+    // The inference proxy always intercepts paths starting with /v1/
+    // (the inference.local convention).  Strip this prefix so the
+    // upstream URL uses only the remainder appended to the endpoint.
+    // Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep
+    // it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get
+    // just the path suffix (e.g. /codex/responses).
+    if path == "/v1" || path.starts_with("/v1/") {


build_backend_url() now strips the /v1 prefix from all /v1/... request paths regardless of whether the configured endpoint already includes /v1. This is a behavior change that will break providers/endpoints configured without a /v1 suffix but that expect versioned paths (e.g. https://api.anthropic.com + /v1/messages would become /messages). Consider restoring the prior “dedupe only when endpoint already ends with /v1” logic, or add protocol-aware/path-aware handling (e.g. strip only for Codex/non-versioned backends) and/or validate/normalize configured base URLs up front.

Suggested change

// The inference proxy always intercepts paths starting with /v1/

// (the inference.local convention). Strip this prefix so the

// upstream URL uses only the remainder appended to the endpoint.

// Endpoints that already contain /v1 (e.g. api.openai.com/v1) keep

// it; endpoints without /v1 (e.g. chatgpt.com/backend-api) get

// just the path suffix (e.g. /codex/responses).

if path == "/v1" || path.starts_with("/v1/") {

// If the upstream endpoint already ends with /v1 (e.g. api.openai.com/v1),

// and the incoming path is /v1 or /v1/..., strip the duplicate /v1

// from the path. Otherwise, preserve the path as-is so versioned

// paths (e.g. /v1/messages) still reach unversioned endpoints

// (e.g. https://api.anthropic.com) correctly.

if base.ends_with("/v1") && (path == "/v1" || path.starts_with("/v1/")) {

Restored dedup-only logic as suggested. /v1 is stripped from the path only when the endpoint already ends with /v1. All built-in provider profiles include /v1 in default_base_url, so this handles every real case correctly. Tests updated.

- Add --model-alias flag to 'inference set' for multi-model config (e.g. --model-alias gpt=openai/gpt-4 --model-alias claude=anthropic/claude-sonnet-4-20250514) - Add gateway_inference_set_multi() handler in run.rs - Update inference get/print to display multi-model entries - Import InferenceModelEntry proto type in CLI - Fix build_backend_url to always strip /v1 prefix for codex paths - Add /v1/codex/* inference pattern for openai_responses protocol - Fix backend tests to use /v1 endpoint suffix Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>

cosmicnet added 2 commits March 25, 2026 23:39

cosmicnet requested a review from a team as a code owner March 25, 2026 23:52

Copilot AI review requested due to automatic review settings March 25, 2026 23:52

Copilot started reviewing on behalf of cosmicnet March 25, 2026 23:52 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

cosmicnet force-pushed the 203-multi-route-inference/lh branch from af1748b to ab71175 Compare March 26, 2026 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): multi-route proxy with alias-based model routing#618

feat(inference): multi-route proxy with alias-based model routing#618
cosmicnet wants to merge 3 commits intoNVIDIA:mainfrom
cosmicnet:203-multi-route-inference/lh

cosmicnet commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

cosmicnet commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

cosmicnet Mar 26, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

cosmicnet Mar 26, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

cosmicnet Mar 26, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

cosmicnet Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cosmicnet commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cosmicnet commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cosmicnet Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cosmicnet Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cosmicnet Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cosmicnet Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cosmicnet commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading