follow-up(inference): cluster-only guardrails, anthropic auth, v1/models, and streaming

# Inference Interception Follow-ups

> **Status**: Draft
> **Date**: 2026-02-23
> **Context**: Follow-ups from MR `!38` (`feat(inference): inference interception and routing`).

## Confirmed Constraints

1. Inference routing is currently supported only in **cluster mode**.
2. Local sandbox mode (`--policy-rules/--policy-data`) is not expected to support inference routing at this time.

## Gaps Observed

### 1) Missing cluster-only guardrail in local mode

Current behavior can accept `CONNECT` for `inspect_for_inference` and then fail later if inference runtime prerequisites are missing.

Desired behavior:
- If sandbox is not running with cluster prerequisites (`sandbox_id` + gateway endpoint + TLS state), fail fast and clearly.
- Do not emit `200 Connection Established` for requests that cannot be serviced.

### 2) Incomplete Anthropic authentication handling

Current routing path is OpenAI-Bearer-centric and does not provide full Anthropic-compatible API key behavior.

Desired behavior:
- Preserve credential isolation for Anthropic flows just like OpenAI flows.
- Ensure route credentials (not sandbox/client credentials) are used for Anthropic upstream calls.
- Support Anthropic header semantics end-to-end (`x-api-key`, `anthropic-version`, and related required headers).

### 3) `api_patterns` policy field is not wired

`InferencePolicy.api_patterns` exists in proto but interception currently uses built-in defaults only.

Desired behavior:
- If policy defines `api_patterns`, use them.
- If absent/empty, fall back to built-in defaults.

### 4) Missing `GET /v1/models` support

Current interception focuses on completion/messages-style write endpoints and does not provide a route-aware models listing behavior.

Desired behavior:
- Support `GET /v1/models` for intercepted OpenAI-compatible flows.
- Return a deterministic response strategy:
  - proxy to compatible backend route, or
  - synthesize a route-aware response when backend behavior is unsuitable.
- Ensure policy and route compatibility filtering applies to model listing just like inference generation endpoints.

### 5) Missing streaming support

Current proxying model is request/response body buffering, which is insufficient for streaming APIs.

Desired behavior:
- Support streaming for OpenAI and Anthropic-compatible APIs.
- Preserve chunk boundaries and event framing (`text/event-stream` / SSE semantics where applicable).
- Propagate cancellation/connection-close behavior correctly across sandbox proxy, gRPC transport, and backend.

## Next Steps

## P0: Enforce cluster-only inference explicitly

1. Add startup validation in sandbox runtime:
- If inference routing is configured but cluster prerequisites are absent, fail sandbox startup with a clear error.

2. Add proxy-side defensive handling:
- For `inspect_for_inference`, verify prerequisites before returning `200 Connection Established`.
- Return a deterministic error response when inference is unavailable.

3. Add tests:
- Unit/integration test for local mode + inference policy => explicit failure.
- Regression test ensuring no optimistic `200` is sent before prerequisite validation.

## P0: Full Anthropic API key support

1. Define protocol-aware auth rewrite behavior in router backend:
- `openai_*`: `Authorization: Bearer <route.api_key>`.
- `anthropic_messages`: set `x-api-key: <route.api_key>` and preserve/validate `anthropic-version` behavior.

2. Strip inbound client credentials for Anthropic at interception boundary:
- Remove `authorization` and `x-api-key` from forwarded headers.
- Keep non-sensitive headers that are required for request compatibility.

3. Add tests:
- Router integration test that verifies `x-api-key` rewrite for Anthropic routes.
- Regression test proving client-supplied Anthropic credentials are not forwarded upstream.

## P1: Add `GET /v1/models` support

1. Extend inference API pattern matching to classify models-list requests.
2. Implement route-aware handling in gateway/router for model-list requests.
3. Add tests:
- e2e test for intercepted `GET /v1/models`.
- compatibility test across multiple allowed routes.

## P1: Add streaming support

1. Define streaming transport contract across proxy <-> gateway (gRPC streaming or framed chunk transport).
2. Implement protocol-aware streaming passthrough in router backend.
3. Ensure cancellation propagation and timeout behavior are explicit.
4. Add tests:
- e2e streaming chat completion test.
- disconnection/cancellation regression test.

## P1: Wire policy-driven API pattern configuration

1. Map `sandbox.policy.inference.api_patterns` into sandbox inference interception context.
2. Add validation for malformed patterns.
3. Add tests for:
- custom pattern match,
- default fallback behavior,
- invalid pattern rejection.

## Definition of Done

- Local mode with inference configuration fails fast with a clear, actionable error.
- Anthropic requests route successfully using route-managed credentials only.
- `GET /v1/models` works through interception with policy/route-aware behavior.
- Streaming inference requests are supported end-to-end (including cancellation).
- `api_patterns` works when configured and defaults remain backward compatible.
- Unit/integration coverage added for each gap above.

---

Originally by @pimlock on 2026-02-22T22:10:54.126-08:00


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

follow-up(inference): cluster-only guardrails, anthropic auth, v1/models, and streaming #37

Inference Interception Follow-ups

Confirmed Constraints

Gaps Observed

1) Missing cluster-only guardrail in local mode

2) Incomplete Anthropic authentication handling

3) `api_patterns` policy field is not wired

4) Missing `GET /v1/models` support

5) Missing streaming support

Next Steps

P0: Enforce cluster-only inference explicitly

P0: Full Anthropic API key support

P1: Add `GET /v1/models` support

P1: Add streaming support

P1: Wire policy-driven API pattern configuration

Definition of Done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

follow-up(inference): cluster-only guardrails, anthropic auth, v1/models, and streaming #37

Description

Inference Interception Follow-ups

Confirmed Constraints

Gaps Observed

1) Missing cluster-only guardrail in local mode

2) Incomplete Anthropic authentication handling

3) api_patterns policy field is not wired

4) Missing GET /v1/models support

5) Missing streaming support

Next Steps

P0: Enforce cluster-only inference explicitly

P0: Full Anthropic API key support

P1: Add GET /v1/models support

P1: Add streaming support

P1: Wire policy-driven API pattern configuration

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3) `api_patterns` policy field is not wired

4) Missing `GET /v1/models` support

P1: Add `GET /v1/models` support