-
Notifications
You must be signed in to change notification settings - Fork 2
Multi-model routing: triage with cheap models, review with frontier #26
Copy link
Copy link
Open
Labels
area: review-pipelineReview pipeline, context, promptsReview pipeline, context, promptsenhancementNew feature or requestNew feature or requestpriority: mediumMedium priorityMedium priority
Description
Problem
DiffScope uses a single model for the entire review. All three competitors route different tasks to different models:
| Task | Greptile | CodeRabbit | Qodo |
|---|---|---|---|
| Triage/classification | — | Cheap model | model_weak |
| Summarization | GPT-5-nano / GPT-4o-mini | Light model | model_weak |
| Embeddings | text-embedding-3-small | Unknown | Qodo Embed-1 |
| Review | Claude Sonnet 4 | GPT-4/o1/Claude | Primary model |
| Verification | — | Separate agent | — |
| Self-reflection | — | — | model_reasoning |
| Complex tasks | Claude Opus 4 | — | — |
| Lightweight | Claude Haiku 4.5 | — | — |
Using frontier models for summarization and triage wastes tokens and money. Using cheap models for review misses bugs.
Proposed Solution
Model Hierarchy
models:
primary: claude-sonnet-4-6 # review, verification
weak: claude-haiku-4-5 # triage, summarization, NL translation
reasoning: claude-opus-4-6 # complex analysis, self-reflection
embedding: text-embedding-3-small # RAG indexing
fallback:
- claude-sonnet-4-6
- gpt-4oTask Routing
| Task | Model | Rationale |
|---|---|---|
| File triage (NEEDS_REVIEW vs cosmetic) | weak | Simple classification |
| Per-file summarization | weak | Cheap, high-volume |
| NL translation of code chunks | weak | Indexing task |
| Embedding generation | embedding | Specialized |
| Code review | primary | Quality matters |
| Verification pass | primary | Accuracy matters |
| Complex cross-file analysis | reasoning | Needs deep reasoning |
| Commit message generation | weak | Simple task |
Implementation
enum ModelRole {
Primary,
Weak,
Reasoning,
Embedding,
}
impl Config {
fn model_for_role(&self, role: ModelRole) -> &ModelConfig {
match role {
ModelRole::Primary => &self.primary_model,
ModelRole::Weak => self.weak_model.as_ref().unwrap_or(&self.primary_model),
ModelRole::Reasoning => self.reasoning_model.as_ref().unwrap_or(&self.primary_model),
ModelRole::Embedding => self.embedding_model.as_ref().unwrap_or(&self.primary_model),
}
}
}Fallback Chain
Like Qodo: try primary, catch error, try next in fallback list. Simple and robust.
Expected Impact
- Cost reduction: 60-80% for summarization/triage tasks
- Speed: Cheap models respond 3-5x faster
- Quality: Frontier models focused where they matter (review + verification)
- Greptile reports 75% lower inference costs despite 3x more context tokens, primarily through model routing + prompt caching
Priority
Medium — cost optimization + quality improvement. Becomes critical at scale.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: review-pipelineReview pipeline, context, promptsReview pipeline, context, promptsenhancementNew feature or requestNew feature or requestpriority: mediumMedium priorityMedium priority