opencode-model-router 1.1.1 → 1.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +238 -231
- package/package.json +1 -1
- package/src/index.ts +34 -7
package/README.md
CHANGED
|
@@ -1,195 +1,238 @@
|
|
|
1
1
|
# opencode-model-router
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> **Use the cheapest model that can do the job. Automatically.**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
An [OpenCode](https://opencode.ai) plugin that routes every task to the right-priced AI tier. Instead of running everything on your most expensive model, the orchestrator delegates exploration to a fast/cheap model, implementation to a balanced model, and architecture only to the powerful (expensive) one — automatically, on every message.
|
|
6
6
|
|
|
7
|
-
The
|
|
7
|
+
## The problem
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|------|---------------------|------|---------|
|
|
11
|
-
| `@fast` | Claude Haiku 4.5 | 1x | Exploration, search, file reads, grep |
|
|
12
|
-
| `@medium` | Claude Sonnet 4.5 | 5x | Implementation, refactoring, tests, bug fixes |
|
|
13
|
-
| `@heavy` | Claude Opus 4.6 | 20x | Architecture, complex debugging, security review |
|
|
9
|
+
Vibe coding is expensive because most AI coding tools default to one model for everything. That model is usually the most capable available — and you pay for that capability even when the task is `grep for a function name`.
|
|
14
10
|
|
|
15
|
-
|
|
11
|
+
A typical coding session breaks down roughly like this:
|
|
16
12
|
|
|
17
|
-
|
|
13
|
+
| Task type | % of session | Example |
|
|
14
|
+
|-----------|-------------|---------|
|
|
15
|
+
| Exploration / search | ~40% | Find where X is defined, read a file, check git log |
|
|
16
|
+
| Implementation | ~45% | Write a function, fix a bug, add a test |
|
|
17
|
+
| Architecture / deep debug | ~15% | Design a new module, debug after 2+ failures |
|
|
18
18
|
|
|
19
|
-
|
|
19
|
+
If you're running Opus (20x cost) for all of it, you're overpaying by **3-10x** on most tasks.
|
|
20
20
|
|
|
21
|
-
The
|
|
21
|
+
## The solution
|
|
22
22
|
|
|
23
|
-
-
|
|
24
|
-
- Preset breakdown (default `tiers.json`): `anthropic` ~209, `openai` ~206
|
|
25
|
-
- Estimation method: `prompt_characters / 4` (rough heuristic)
|
|
23
|
+
opencode-model-router injects a **delegation protocol** into the system prompt that teaches the orchestrator to:
|
|
26
24
|
|
|
27
|
-
|
|
25
|
+
1. **Match task to tier** using a configurable task taxonomy
|
|
26
|
+
2. **Split composite tasks** — explore first with a cheap model, then implement with a mid-tier model
|
|
27
|
+
3. **Skip delegation overhead** for trivial tasks (1-2 tool calls)
|
|
28
|
+
4. **Never over-qualify** — use the cheapest tier that can reliably handle the task
|
|
29
|
+
5. **Fallback** across providers when one fails
|
|
28
30
|
|
|
29
|
-
|
|
31
|
+
All of this adds ~210 tokens of system prompt overhead per message.
|
|
30
32
|
|
|
31
|
-
|
|
33
|
+
## Cost simulation
|
|
32
34
|
|
|
33
|
-
|
|
35
|
+
**Scenario: 50-message coding session with 30 delegated tasks**
|
|
36
|
+
|
|
37
|
+
Task distribution: 18 exploration (60%), 10 implementation (33%), 2 architecture (7%)
|
|
38
|
+
|
|
39
|
+
### Without model router (all-Opus)
|
|
40
|
+
|
|
41
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
42
|
+
|------|-------|------|-----------|-------|
|
|
43
|
+
| Exploration | 18 | Opus | 20x | 360x |
|
|
44
|
+
| Implementation | 10 | Opus | 20x | 200x |
|
|
45
|
+
| Architecture | 2 | Opus | 20x | 40x |
|
|
46
|
+
| **Total** | **30** | | | **600x** |
|
|
47
|
+
|
|
48
|
+
### With model router (normal mode, Sonnet orchestrator)
|
|
49
|
+
|
|
50
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
51
|
+
|------|-------|------|-----------|-------|
|
|
52
|
+
| Exploration (delegated) | 10 | @fast | 1x | 10x |
|
|
53
|
+
| Exploration (direct, trivial) | 8 | self | 0x | 0x |
|
|
54
|
+
| Implementation | 10 | @medium | 5x | 50x |
|
|
55
|
+
| Architecture | 2 | @heavy | 20x | 40x |
|
|
56
|
+
| **Total** | **30** | | | **100x** |
|
|
57
|
+
|
|
58
|
+
### With model router (budget mode, Sonnet orchestrator)
|
|
59
|
+
|
|
60
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
61
|
+
|------|-------|------|-----------|-------|
|
|
62
|
+
| Exploration | 18 | @fast | 1x | 18x |
|
|
63
|
+
| Implementation (simple) | 7 | @fast | 1x | 7x |
|
|
64
|
+
| Implementation (complex) | 3 | @medium | 5x | 15x |
|
|
65
|
+
| Architecture | 2 | @medium | 5x | 10x |
|
|
66
|
+
| **Total** | **30** | | | **50x** |
|
|
67
|
+
|
|
68
|
+
### Summary
|
|
69
|
+
|
|
70
|
+
| Setup | Session cost | vs all-Opus |
|
|
71
|
+
|-------|-------------|-------------|
|
|
72
|
+
| All-Opus (no router) | 600x | baseline |
|
|
73
|
+
| Sonnet orchestrator + router (normal) | 100x | **−83%** |
|
|
74
|
+
| Sonnet orchestrator + router (budget) | 50x | **−92%** |
|
|
75
|
+
|
|
76
|
+
> Cost ratios are relative units. Actual savings depend on your provider pricing and model selection.
|
|
77
|
+
|
|
78
|
+
## How it works
|
|
79
|
+
|
|
80
|
+
On every message, the plugin injects ~210 tokens into the system prompt:
|
|
34
81
|
|
|
35
|
-
```json
|
|
36
|
-
{
|
|
37
|
-
"plugin": [
|
|
38
|
-
"opencode-model-router@latest"
|
|
39
|
-
]
|
|
40
|
-
}
|
|
41
82
|
```
|
|
83
|
+
## Model Delegation Protocol
|
|
84
|
+
Preset: anthropic. Tiers: @fast=claude-haiku-4-5(1x) @medium=claude-sonnet-4-5/max(5x) @heavy=claude-opus-4-6/max(20x). mode:normal
|
|
85
|
+
R: @fast→search/grep/read/git-info/ls/lookup-docs/types/count/exists-check/rename @medium→impl-feature/refactor/write-tests/bugfix(≤2)/edit-logic/code-review/build-fix/create-file/db-migrate/api-endpoint/config-update @heavy→arch-design/debug(≥3fail)/sec-audit/perf-opt/migrate-strategy/multi-system-integration/tradeoff-analysis/rca
|
|
86
|
+
Multi-phase: split explore(@fast)→execute(@medium). Cheapest-first.
|
|
87
|
+
1.[tier:X]→delegate X 2.plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy 3.default:impl→@medium | readonly→@fast 4.orchestrate=self,delegate=exec 5.trivial(≤2tools)→direct,skip-delegate 6.self∈opus→never→@heavy,do-it-yourself 7.consult route-guide↑ 8.min(cost,adequate-tier)
|
|
88
|
+
Err→retry-alt-tier→fail→direct. Chain: anthropic→openai→google→github-copilot
|
|
89
|
+
Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").
|
|
90
|
+
Keep orchestration and final synthesis in the primary agent.
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
The orchestrator reads this once per message and applies it to every decision in that turn.
|
|
94
|
+
|
|
95
|
+
### Multi-phase decomposition (key differentiator)
|
|
96
|
+
|
|
97
|
+
The most impactful optimization. A composite task like:
|
|
98
|
+
|
|
99
|
+
> "Find how the auth middleware works and refactor it to use JWT."
|
|
100
|
+
|
|
101
|
+
Without router → routed entirely to `@medium` (5x for all ~8K tokens)
|
|
42
102
|
|
|
43
|
-
|
|
103
|
+
With router → split:
|
|
104
|
+
- **@fast (1x)**: grep, read 4-5 files, trace call chain (~4K tokens)
|
|
105
|
+
- **@medium (5x)**: rewrite auth module (~4K tokens)
|
|
44
106
|
|
|
107
|
+
**Result: ~36% cost reduction on composite tasks**, which represent ~60-70% of real coding work.
|
|
108
|
+
|
|
109
|
+
## Why not just use another orchestrator?
|
|
110
|
+
|
|
111
|
+
| Feature | model-router | Claude native | oh-my-opencode | GSD | ralph-loop |
|
|
112
|
+
|---------|:---:|:---:|:---:|:---:|:---:|
|
|
113
|
+
| Multi-tier cost routing | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
114
|
+
| Configurable task taxonomy | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
115
|
+
| Budget / quality modes | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
116
|
+
| Multi-phase decomposition | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
117
|
+
| Cross-provider fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
118
|
+
| Cost ratio awareness | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
119
|
+
| Plan annotation with tiers | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
120
|
+
| ~210 token overhead | ✅ | — | ❌ | ❌ | ❌ |
|
|
121
|
+
|
|
122
|
+
**Claude native**: single model for everything, no cost routing. If you're using claude.ai or OpenCode without plugins, you're paying the same price for `grep` as for architecture design.
|
|
123
|
+
|
|
124
|
+
**oh-my-opencode**: focused on workflow personality and prompt style, not cost optimization. No tier routing, no task taxonomy.
|
|
125
|
+
|
|
126
|
+
**GSD (Get Shit Done)**: prioritizes execution speed and low deliberation overhead. Excellent at pushing through tasks fast, but uses one model — no cost differentiation between search and architecture.
|
|
127
|
+
|
|
128
|
+
**ralph-loop**: iterative feedback-loop orchestrator. Excellent at self-correction and quality verification. No tier routing — every loop iteration runs on the same model regardless of task complexity.
|
|
129
|
+
|
|
130
|
+
**The core difference**: the others optimize for *how* the agent works (style, speed, quality loops). model-router optimizes for *what it costs* — with zero compromise on quality, because you can always put Opus in the heavy tier.
|
|
131
|
+
|
|
132
|
+
## Recommended setup
|
|
133
|
+
|
|
134
|
+
**Orchestrator**: use `claude-sonnet-4-5` (or equivalent mid-tier) as your primary/default model. Not Opus.
|
|
135
|
+
|
|
136
|
+
Why: the orchestrator runs on every message, including trivial ones. Sonnet can read the delegation protocol and make routing decisions just as well as Opus. You reserve Opus for when it's genuinely needed — via `@heavy` delegation.
|
|
137
|
+
|
|
138
|
+
In your `opencode.json`:
|
|
45
139
|
```json
|
|
46
140
|
{
|
|
47
|
-
"
|
|
48
|
-
|
|
49
|
-
]
|
|
141
|
+
"model": "anthropic/claude-sonnet-4-5",
|
|
142
|
+
"autoshare": false
|
|
50
143
|
}
|
|
51
144
|
```
|
|
52
145
|
|
|
53
|
-
|
|
146
|
+
Then install and configure model-router to handle the rest.
|
|
54
147
|
|
|
55
|
-
|
|
148
|
+
## Installation
|
|
56
149
|
|
|
150
|
+
### From npm (recommended)
|
|
57
151
|
```bash
|
|
58
|
-
|
|
59
|
-
|
|
152
|
+
# In your opencode project or globally
|
|
153
|
+
npm install -g opencode-model-router
|
|
60
154
|
```
|
|
61
155
|
|
|
62
|
-
|
|
63
|
-
|
|
156
|
+
Add to `~/.config/opencode/opencode.json`:
|
|
64
157
|
```json
|
|
65
158
|
{
|
|
66
|
-
"plugin":
|
|
67
|
-
"
|
|
68
|
-
|
|
159
|
+
"plugin": {
|
|
160
|
+
"opencode-model-router": {
|
|
161
|
+
"type": "npm",
|
|
162
|
+
"package": "opencode-model-router"
|
|
163
|
+
}
|
|
164
|
+
}
|
|
69
165
|
}
|
|
70
166
|
```
|
|
71
167
|
|
|
72
|
-
###
|
|
73
|
-
|
|
74
|
-
Clone wherever you want:
|
|
75
|
-
|
|
168
|
+
### Local clone
|
|
76
169
|
```bash
|
|
77
|
-
git clone https://github.com/
|
|
170
|
+
git clone https://github.com/your-username/opencode-model-router
|
|
171
|
+
cd opencode-model-router
|
|
172
|
+
npm install
|
|
78
173
|
```
|
|
79
174
|
|
|
80
|
-
|
|
81
|
-
|
|
175
|
+
In `~/.config/opencode/opencode.json`:
|
|
82
176
|
```json
|
|
83
177
|
{
|
|
84
|
-
"plugin":
|
|
85
|
-
"
|
|
86
|
-
|
|
178
|
+
"plugin": {
|
|
179
|
+
"opencode-model-router": {
|
|
180
|
+
"type": "local",
|
|
181
|
+
"path": "/absolute/path/to/opencode-model-router"
|
|
182
|
+
}
|
|
183
|
+
}
|
|
87
184
|
}
|
|
88
185
|
```
|
|
89
186
|
|
|
90
|
-
Restart OpenCode after adding the plugin.
|
|
91
|
-
|
|
92
187
|
## Configuration
|
|
93
188
|
|
|
94
|
-
All configuration lives in `tiers.json` at the plugin root.
|
|
189
|
+
All configuration lives in `tiers.json` at the plugin root.
|
|
95
190
|
|
|
96
191
|
### Presets
|
|
97
192
|
|
|
98
|
-
The plugin ships with four presets:
|
|
193
|
+
The plugin ships with four presets (switch with `/preset <name>`):
|
|
99
194
|
|
|
100
195
|
**anthropic** (default):
|
|
101
|
-
| Tier | Model | Cost
|
|
102
|
-
|
|
103
|
-
| fast | `anthropic/claude-haiku-4-5` | 1x |
|
|
104
|
-
| medium | `anthropic/claude-sonnet-4-5` | 5x |
|
|
105
|
-
| heavy | `anthropic/claude-opus-4-6` | 20x |
|
|
196
|
+
| Tier | Model | Cost ratio |
|
|
197
|
+
|------|-------|-----------|
|
|
198
|
+
| @fast | `anthropic/claude-haiku-4-5` | 1x |
|
|
199
|
+
| @medium | `anthropic/claude-sonnet-4-5` (max) | 5x |
|
|
200
|
+
| @heavy | `anthropic/claude-opus-4-6` (max) | 20x |
|
|
106
201
|
|
|
107
202
|
**openai**:
|
|
108
|
-
| Tier | Model | Cost
|
|
109
|
-
|
|
110
|
-
| fast | `openai/gpt-5.3-codex-spark` | 1x |
|
|
111
|
-
| medium | `openai/gpt-5.3-codex` | 5x |
|
|
112
|
-
| heavy | `openai/gpt-5.3-codex` | 20x |
|
|
203
|
+
| Tier | Model | Cost ratio |
|
|
204
|
+
|------|-------|-----------|
|
|
205
|
+
| @fast | `openai/gpt-5.3-codex-spark` | 1x |
|
|
206
|
+
| @medium | `openai/gpt-5.3-codex` | 5x |
|
|
207
|
+
| @heavy | `openai/gpt-5.3-codex` (xhigh) | 20x |
|
|
113
208
|
|
|
114
209
|
**github-copilot**:
|
|
115
|
-
| Tier | Model | Cost
|
|
116
|
-
|
|
117
|
-
| fast | `github-copilot/claude-haiku-4-5` | 1x |
|
|
118
|
-
| medium | `github-copilot/claude-sonnet-4-5` | 5x |
|
|
119
|
-
| heavy | `github-copilot/claude-opus-4-6` | 20x |
|
|
210
|
+
| Tier | Model | Cost ratio |
|
|
211
|
+
|------|-------|-----------|
|
|
212
|
+
| @fast | `github-copilot/claude-haiku-4-5` | 1x |
|
|
213
|
+
| @medium | `github-copilot/claude-sonnet-4-5` | 5x |
|
|
214
|
+
| @heavy | `github-copilot/claude-opus-4-6` (thinking) | 20x |
|
|
120
215
|
|
|
121
216
|
**google**:
|
|
122
|
-
| Tier | Model | Cost
|
|
123
|
-
|
|
124
|
-
| fast | `google/gemini-2.5-flash` | 1x |
|
|
125
|
-
| medium | `google/gemini-2.5-pro` | 5x |
|
|
126
|
-
| heavy | `google/gemini-3-pro-preview` | 20x |
|
|
127
|
-
|
|
128
|
-
Switch presets with the `/preset` command:
|
|
129
|
-
|
|
130
|
-
```
|
|
131
|
-
/preset openai
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
### Creating custom presets
|
|
135
|
-
|
|
136
|
-
Add a new preset to the `presets` object in `tiers.json`:
|
|
137
|
-
|
|
138
|
-
```json
|
|
139
|
-
{
|
|
140
|
-
"presets": {
|
|
141
|
-
"my-preset": {
|
|
142
|
-
"fast": {
|
|
143
|
-
"model": "provider/model-name",
|
|
144
|
-
"costRatio": 1,
|
|
145
|
-
"description": "What this tier does",
|
|
146
|
-
"steps": 30,
|
|
147
|
-
"prompt": "System prompt for the subagent",
|
|
148
|
-
"whenToUse": ["Use case 1", "Use case 2"]
|
|
149
|
-
},
|
|
150
|
-
"medium": { "costRatio": 5, "..." : "..." },
|
|
151
|
-
"heavy": { "costRatio": 20, "..." : "..." }
|
|
152
|
-
}
|
|
153
|
-
}
|
|
154
|
-
}
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
Each tier supports these fields:
|
|
158
|
-
|
|
159
|
-
| Field | Type | Description |
|
|
160
|
-
|-------|------|-------------|
|
|
161
|
-
| `model` | string | Full model ID (`provider/model-name`) |
|
|
162
|
-
| `variant` | string | Optional variant (e.g., `"max"` for extended thinking) |
|
|
163
|
-
| `costRatio` | number | Relative cost multiplier (e.g., 1 for cheapest, 20 for most expensive). Injected into the system prompt so the agent considers cost when delegating. |
|
|
164
|
-
| `thinking` | object | Anthropic thinking config: `{ "budgetTokens": 10000 }` |
|
|
165
|
-
| `reasoning` | object | OpenAI reasoning config: `{ "effort": "high", "summary": "detailed" }` |
|
|
166
|
-
| `description` | string | Human-readable description shown in `/tiers` |
|
|
167
|
-
| `steps` | number | Max agent turns (default: varies by tier) |
|
|
168
|
-
| `prompt` | string | System prompt for the subagent |
|
|
169
|
-
| `color` | string | Optional display color |
|
|
170
|
-
| `whenToUse` | string[] | List of use cases (shown in delegation protocol) |
|
|
217
|
+
| Tier | Model | Cost ratio |
|
|
218
|
+
|------|-------|-----------|
|
|
219
|
+
| @fast | `google/gemini-2.5-flash` | 1x |
|
|
220
|
+
| @medium | `google/gemini-2.5-pro` | 5x |
|
|
221
|
+
| @heavy | `google/gemini-3-pro-preview` | 20x |
|
|
171
222
|
|
|
172
223
|
### Routing modes
|
|
173
224
|
|
|
174
|
-
|
|
225
|
+
Switch with `/budget <mode>`. Mode is persisted across restarts.
|
|
175
226
|
|
|
176
|
-
| Mode | Default
|
|
227
|
+
| Mode | Default tier | Behavior |
|
|
177
228
|
|------|-------------|----------|
|
|
178
|
-
| `normal` |
|
|
179
|
-
| `budget` |
|
|
180
|
-
| `quality` |
|
|
181
|
-
|
|
182
|
-
When a mode has `overrideRules`, those replace the global `rules` array in the system prompt. This lets each mode have fundamentally different delegation behavior.
|
|
183
|
-
|
|
184
|
-
Configure modes in `tiers.json`:
|
|
229
|
+
| `normal` | @medium | Balanced — routes by task complexity |
|
|
230
|
+
| `budget` | @fast | Aggressive savings — defaults cheap, escalates only when necessary |
|
|
231
|
+
| `quality` | @medium | Quality-first — liberal use of @medium/@heavy |
|
|
185
232
|
|
|
186
233
|
```json
|
|
187
234
|
{
|
|
188
235
|
"modes": {
|
|
189
|
-
"normal": {
|
|
190
|
-
"defaultTier": "medium",
|
|
191
|
-
"description": "Balanced quality and cost"
|
|
192
|
-
},
|
|
193
236
|
"budget": {
|
|
194
237
|
"defaultTier": "fast",
|
|
195
238
|
"description": "Aggressive cost savings",
|
|
@@ -198,87 +241,77 @@ Configure modes in `tiers.json`:
|
|
|
198
241
|
"Use @medium ONLY for: multi-file edits, complex refactors, test suites",
|
|
199
242
|
"Use @heavy ONLY when explicitly requested or after 2+ failed @medium attempts"
|
|
200
243
|
]
|
|
201
|
-
},
|
|
202
|
-
"quality": {
|
|
203
|
-
"defaultTier": "medium",
|
|
204
|
-
"description": "Quality-first",
|
|
205
|
-
"overrideRules": [
|
|
206
|
-
"Default to @medium for all tasks including exploration",
|
|
207
|
-
"Use @heavy for architecture, debugging, security, or multi-file coordination",
|
|
208
|
-
"Use @fast only for trivial single-tool operations"
|
|
209
|
-
]
|
|
210
244
|
}
|
|
211
245
|
}
|
|
212
246
|
}
|
|
213
247
|
```
|
|
214
248
|
|
|
215
|
-
|
|
249
|
+
### Task taxonomy (`taskPatterns`)
|
|
216
250
|
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
The `taskPatterns` object maps common coding task descriptions to tiers. This is injected into the system prompt as a routing guide so the agent can quickly look up which tier to use:
|
|
251
|
+
Keyword routing guide injected into the system prompt. Customize to match your workflow:
|
|
220
252
|
|
|
221
253
|
```json
|
|
222
254
|
{
|
|
223
255
|
"taskPatterns": {
|
|
224
|
-
"fast": [
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
"Check git status, log, diff, or blame"
|
|
228
|
-
],
|
|
229
|
-
"medium": [
|
|
230
|
-
"Implement a new feature, function, or component",
|
|
231
|
-
"Refactor or restructure existing code",
|
|
232
|
-
"Write or update tests",
|
|
233
|
-
"Fix a bug (first or second attempt)"
|
|
234
|
-
],
|
|
235
|
-
"heavy": [
|
|
236
|
-
"Design system or module architecture from scratch",
|
|
237
|
-
"Debug a problem after 2+ failed attempts",
|
|
238
|
-
"Security audit or vulnerability review"
|
|
239
|
-
]
|
|
256
|
+
"fast": ["search/grep/read", "git-info/ls", "lookup-docs/types", "count/exists-check/rename"],
|
|
257
|
+
"medium": ["impl-feature/refactor", "write-tests/bugfix(≤2)", "build-fix/create-file"],
|
|
258
|
+
"heavy": ["arch-design/debug(≥3fail)", "sec-audit/perf-opt", "migrate-strategy/rca"]
|
|
240
259
|
}
|
|
241
260
|
}
|
|
242
261
|
```
|
|
243
262
|
|
|
244
|
-
Customize these patterns to match your workflow. The agent uses them as heuristics, not hard rules.
|
|
245
|
-
|
|
246
263
|
### Cost ratios
|
|
247
264
|
|
|
248
|
-
|
|
265
|
+
Set `costRatio` on each tier to reflect your real provider pricing. These are injected into the system prompt so the orchestrator makes cost-aware decisions:
|
|
249
266
|
|
|
250
|
-
```
|
|
251
|
-
|
|
252
|
-
|
|
267
|
+
```json
|
|
268
|
+
{
|
|
269
|
+
"fast": { "costRatio": 1 },
|
|
270
|
+
"medium": { "costRatio": 5 },
|
|
271
|
+
"heavy": { "costRatio": 20 }
|
|
272
|
+
}
|
|
253
273
|
```
|
|
254
274
|
|
|
255
|
-
Adjust
|
|
275
|
+
Adjust to actual prices. Exact values don't matter — directional signals are enough.
|
|
256
276
|
|
|
257
277
|
### Rules
|
|
258
278
|
|
|
259
|
-
The `rules` array
|
|
279
|
+
The `rules` array is injected verbatim (in compact form) into the system prompt. Default ruleset:
|
|
260
280
|
|
|
261
281
|
```json
|
|
262
282
|
{
|
|
263
283
|
"rules": [
|
|
264
|
-
"
|
|
265
|
-
"
|
|
266
|
-
"
|
|
267
|
-
"
|
|
268
|
-
"
|
|
269
|
-
"
|
|
270
|
-
"
|
|
271
|
-
"
|
|
272
|
-
"Consider cost ratios when choosing tiers -- always use the cheapest tier that can reliably handle the task"
|
|
284
|
+
"[tier:X]→delegate X",
|
|
285
|
+
"plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
|
|
286
|
+
"default:impl→@medium | readonly→@fast",
|
|
287
|
+
"orchestrate=self,delegate=exec",
|
|
288
|
+
"trivial(≤2tools)→direct,skip-delegate",
|
|
289
|
+
"self∈opus→never→@heavy,do-it-yourself",
|
|
290
|
+
"consult route-guide↑",
|
|
291
|
+
"min(cost,adequate-tier)"
|
|
273
292
|
]
|
|
274
293
|
}
|
|
275
294
|
```
|
|
276
295
|
|
|
277
|
-
|
|
296
|
+
Rules in `modes[x].overrideRules` replace this array entirely for that mode.
|
|
297
|
+
|
|
298
|
+
### Tier fields reference
|
|
299
|
+
|
|
300
|
+
| Field | Type | Description |
|
|
301
|
+
|-------|------|-------------|
|
|
302
|
+
| `model` | string | Full model ID (`provider/model-name`) |
|
|
303
|
+
| `variant` | string | Optional variant (`"max"`, `"xhigh"`, `"thinking"`) |
|
|
304
|
+
| `costRatio` | number | Relative cost (1 = cheapest). Shown in prompt. |
|
|
305
|
+
| `thinking` | object | Anthropic thinking: `{ "budgetTokens": 10000 }` |
|
|
306
|
+
| `reasoning` | object | OpenAI reasoning: `{ "effort": "high", "summary": "detailed" }` |
|
|
307
|
+
| `description` | string | Shown in `/tiers` output |
|
|
308
|
+
| `steps` | number | Max agent turns |
|
|
309
|
+
| `prompt` | string | Subagent system prompt |
|
|
310
|
+
| `whenToUse` | string[] | Use cases (shown in `/tiers`, not in system prompt) |
|
|
278
311
|
|
|
279
312
|
### Fallback
|
|
280
313
|
|
|
281
|
-
|
|
314
|
+
Defines provider fallback order when a delegated task fails:
|
|
282
315
|
|
|
283
316
|
```json
|
|
284
317
|
{
|
|
@@ -291,81 +324,55 @@ The `fallback` section defines which presets to try when a provider fails:
|
|
|
291
324
|
}
|
|
292
325
|
```
|
|
293
326
|
|
|
294
|
-
When a delegated task fails with a provider/model/rate-limit error, the agent is instructed to retry with the next preset in the fallback chain.
|
|
295
|
-
|
|
296
327
|
## Commands
|
|
297
328
|
|
|
298
329
|
| Command | Description |
|
|
299
330
|
|---------|-------------|
|
|
300
|
-
| `/tiers` | Show active tier configuration and
|
|
331
|
+
| `/tiers` | Show active tier configuration, models, and rules |
|
|
301
332
|
| `/preset` | List available presets |
|
|
302
|
-
| `/preset <name>` | Switch
|
|
303
|
-
| `/budget` | Show available
|
|
304
|
-
| `/budget <mode>` | Switch routing mode (`normal`, `budget`,
|
|
333
|
+
| `/preset <name>` | Switch preset (e.g., `/preset openai`) |
|
|
334
|
+
| `/budget` | Show available modes and which is active |
|
|
335
|
+
| `/budget <mode>` | Switch routing mode (`normal`, `budget`, `quality`) |
|
|
305
336
|
| `/annotate-plan [path]` | Annotate a plan file with `[tier:X]` tags for each step |
|
|
306
337
|
|
|
307
338
|
## Plan annotation
|
|
308
339
|
|
|
309
|
-
The `/annotate-plan` command reads
|
|
340
|
+
For complex tasks, you can write a plan file and annotate each step with the correct tier. The `/annotate-plan` command reads the plan and adds `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags to each step based on the task taxonomy.
|
|
310
341
|
|
|
311
|
-
|
|
312
|
-
```markdown
|
|
313
|
-
## Steps
|
|
314
|
-
1. Search the codebase for all authentication handlers
|
|
315
|
-
2. Implement the new OAuth2 flow
|
|
316
|
-
3. Review the auth architecture for security vulnerabilities
|
|
317
|
-
```
|
|
342
|
+
The orchestrator then reads these tags and delegates accordingly — removing ambiguity from routing decisions on long, multi-step tasks.
|
|
318
343
|
|
|
319
|
-
|
|
344
|
+
Example plan (before annotation):
|
|
320
345
|
```markdown
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
```
|
|
326
|
-
|
|
327
|
-
When the agent executes an annotated plan, it delegates each step to the appropriate subagent automatically.
|
|
328
|
-
|
|
329
|
-
## Provider setup
|
|
330
|
-
|
|
331
|
-
The models referenced in your preset must be configured in your `opencode.json` providers. For example, to use the default `anthropic` preset:
|
|
332
|
-
|
|
333
|
-
```json
|
|
334
|
-
{
|
|
335
|
-
"provider": {
|
|
336
|
-
"anthropic": {
|
|
337
|
-
"models": {
|
|
338
|
-
"claude-haiku-4-5": {},
|
|
339
|
-
"claude-sonnet-4-5": {},
|
|
340
|
-
"claude-opus-4-6": { "thinking": true }
|
|
341
|
-
}
|
|
342
|
-
}
|
|
343
|
-
}
|
|
344
|
-
}
|
|
346
|
+
1. Find all API endpoints in the codebase
|
|
347
|
+
2. Add rate limiting middleware to each endpoint
|
|
348
|
+
3. Write integration tests for rate limiting
|
|
349
|
+
4. Design a token bucket algorithm for advanced rate limiting
|
|
345
350
|
```
|
|
346
351
|
|
|
347
|
-
|
|
348
|
-
|
|
352
|
+
After `/annotate-plan`:
|
|
353
|
+
```markdown
|
|
354
|
+
1. [tier:fast] Find all API endpoints in the codebase
|
|
355
|
+
2. [tier:medium] Add rate limiting middleware to each endpoint
|
|
356
|
+
3. [tier:medium] Write integration tests for rate limiting
|
|
357
|
+
4. [tier:heavy] Design a token bucket algorithm for advanced rate limiting
|
|
349
358
|
```
|
|
350
|
-
You: Implement the user settings page based on PLAN.md
|
|
351
359
|
|
|
352
|
-
|
|
353
|
-
-> Task(@fast): "Find all existing settings-related components and their patterns"
|
|
354
|
-
<- @fast returns: Found SettingsLayout at src/components/..., uses React Hook Form...
|
|
360
|
+
## Token overhead
|
|
355
361
|
|
|
356
|
-
|
|
357
|
-
-> Task(@medium): "Implement UserSettingsForm component following the patterns from..."
|
|
358
|
-
<- @medium returns: Created src/components/UserSettingsForm.tsx, added tests...
|
|
362
|
+
The system prompt injection is ~210 tokens per message — roughly the same as v1.0 (before cost-aware features were added). Dense notation keeps overhead flat while adding full routing intelligence.
|
|
359
363
|
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
364
|
+
| Version | Tokens | Features |
|
|
365
|
+
|---------|--------|----------|
|
|
366
|
+
| v1.0.7 | ~208 | Basic tier routing |
|
|
367
|
+
| v1.1.0 | ~870 | All features, verbose format |
|
|
368
|
+
| v1.1.1+ | ~210 | All features, compressed format |
|
|
363
369
|
|
|
364
370
|
## Requirements
|
|
365
371
|
|
|
366
|
-
- OpenCode v1.0
|
|
367
|
-
-
|
|
372
|
+
- [OpenCode](https://opencode.ai) v1.0 or later
|
|
373
|
+
- Node.js 18+
|
|
374
|
+
- Provider API keys configured in OpenCode
|
|
368
375
|
|
|
369
376
|
## License
|
|
370
377
|
|
|
371
|
-
|
|
378
|
+
GPL-3.0
|
package/package.json
CHANGED
package/src/index.ts
CHANGED
|
@@ -328,11 +328,39 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
|
|
|
328
328
|
|
|
329
329
|
function buildTaskTaxonomy(cfg: RouterConfig): string {
|
|
330
330
|
if (!cfg.taskPatterns || Object.keys(cfg.taskPatterns).length === 0) return "";
|
|
331
|
+
const lines = ["R:"];
|
|
332
|
+
for (const [tier, patterns] of Object.entries(cfg.taskPatterns)) {
|
|
333
|
+
if (Array.isArray(patterns) && patterns.length > 0) {
|
|
334
|
+
lines.push(`@${tier}→${patterns.join("/")}`);
|
|
335
|
+
}
|
|
336
|
+
}
|
|
337
|
+
return lines.join(" ");
|
|
338
|
+
}
|
|
331
339
|
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
340
|
+
/**
|
|
341
|
+
* Injects a multi-phase decomposition hint into the delegation protocol.
|
|
342
|
+
* Teaches the orchestrator to split composite tasks (explore + implement)
|
|
343
|
+
* so the cheap @fast tier handles exploration and @medium handles execution.
|
|
344
|
+
* Only active in normal mode — budget/quality modes have their own override rules.
|
|
345
|
+
*/
|
|
346
|
+
function buildDecomposeHint(cfg: RouterConfig): string {
|
|
347
|
+
const mode = getActiveMode(cfg);
|
|
348
|
+
// Budget and quality modes handle this via overrideRules — skip to avoid conflicts
|
|
349
|
+
if (mode?.overrideRules?.length) return "";
|
|
350
|
+
|
|
351
|
+
const tiers = getActiveTiers(cfg);
|
|
352
|
+
const entries = Object.entries(tiers);
|
|
353
|
+
if (entries.length < 2) return "";
|
|
354
|
+
|
|
355
|
+
// Sort by costRatio ascending to find cheapest (explore) and next (execute) tiers
|
|
356
|
+
const sorted = [...entries].sort(
|
|
357
|
+
([, a], [, b]) => (a.costRatio ?? 1) - (b.costRatio ?? 1)
|
|
358
|
+
);
|
|
359
|
+
const cheapest = sorted[0]?.[0];
|
|
360
|
+
const mid = sorted[1]?.[0];
|
|
361
|
+
if (!cheapest || !mid) return "";
|
|
362
|
+
|
|
363
|
+
return `Multi-phase: split explore(@${cheapest})→execute(@${mid}). Cheapest-first.`;
|
|
336
364
|
}
|
|
337
365
|
|
|
338
366
|
// ---------------------------------------------------------------------------
|
|
@@ -352,14 +380,12 @@ function buildDelegationProtocol(cfg: RouterConfig): string {
|
|
|
352
380
|
})
|
|
353
381
|
.join(" ");
|
|
354
382
|
|
|
355
|
-
// Compact mode
|
|
356
383
|
const mode = getActiveMode(cfg);
|
|
357
384
|
const modeSuffix = cfg.activeMode ? ` mode:${cfg.activeMode}` : "";
|
|
358
385
|
|
|
359
|
-
// Compact task routing guide
|
|
360
386
|
const taxonomy = buildTaskTaxonomy(cfg);
|
|
387
|
+
const decompose = buildDecomposeHint(cfg);
|
|
361
388
|
|
|
362
|
-
// Compact rules
|
|
363
389
|
const effectiveRules = mode?.overrideRules?.length ? mode.overrideRules : cfg.rules;
|
|
364
390
|
const rulesLine = effectiveRules.map((r, i) => `${i + 1}.${r}`).join(" ");
|
|
365
391
|
|
|
@@ -369,6 +395,7 @@ function buildDelegationProtocol(cfg: RouterConfig): string {
|
|
|
369
395
|
`## Model Delegation Protocol`,
|
|
370
396
|
`Preset: ${cfg.activePreset}. Tiers: ${tierLine}.${modeSuffix}`,
|
|
371
397
|
...(taxonomy ? [taxonomy] : []),
|
|
398
|
+
...(decompose ? [decompose] : []),
|
|
372
399
|
rulesLine,
|
|
373
400
|
...(fallback ? [fallback] : []),
|
|
374
401
|
`Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").`,
|