Skip to content

Multi-model routing: triage with cheap models, review with frontier #26

@haasonsaas

Description

@haasonsaas

Problem

DiffScope uses a single model for the entire review. All three competitors route different tasks to different models:

Task Greptile CodeRabbit Qodo
Triage/classification Cheap model model_weak
Summarization GPT-5-nano / GPT-4o-mini Light model model_weak
Embeddings text-embedding-3-small Unknown Qodo Embed-1
Review Claude Sonnet 4 GPT-4/o1/Claude Primary model
Verification Separate agent
Self-reflection model_reasoning
Complex tasks Claude Opus 4
Lightweight Claude Haiku 4.5

Using frontier models for summarization and triage wastes tokens and money. Using cheap models for review misses bugs.

Proposed Solution

Model Hierarchy

models:
  primary: claude-sonnet-4-6        # review, verification
  weak: claude-haiku-4-5            # triage, summarization, NL translation
  reasoning: claude-opus-4-6        # complex analysis, self-reflection
  embedding: text-embedding-3-small  # RAG indexing
  fallback:
    - claude-sonnet-4-6
    - gpt-4o

Task Routing

Task Model Rationale
File triage (NEEDS_REVIEW vs cosmetic) weak Simple classification
Per-file summarization weak Cheap, high-volume
NL translation of code chunks weak Indexing task
Embedding generation embedding Specialized
Code review primary Quality matters
Verification pass primary Accuracy matters
Complex cross-file analysis reasoning Needs deep reasoning
Commit message generation weak Simple task

Implementation

enum ModelRole {
    Primary,
    Weak,
    Reasoning,
    Embedding,
}

impl Config {
    fn model_for_role(&self, role: ModelRole) -> &ModelConfig {
        match role {
            ModelRole::Primary => &self.primary_model,
            ModelRole::Weak => self.weak_model.as_ref().unwrap_or(&self.primary_model),
            ModelRole::Reasoning => self.reasoning_model.as_ref().unwrap_or(&self.primary_model),
            ModelRole::Embedding => self.embedding_model.as_ref().unwrap_or(&self.primary_model),
        }
    }
}

Fallback Chain

Like Qodo: try primary, catch error, try next in fallback list. Simple and robust.

Expected Impact

  • Cost reduction: 60-80% for summarization/triage tasks
  • Speed: Cheap models respond 3-5x faster
  • Quality: Frontier models focused where they matter (review + verification)
  • Greptile reports 75% lower inference costs despite 3x more context tokens, primarily through model routing + prompt caching

Priority

Medium — cost optimization + quality improvement. Becomes critical at scale.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions