opencode-model-router 1.1.0 → 1.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +276 -158
- package/package.json +1 -1
- package/src/index.ts +50 -60
- package/tiers.json +21 -55
package/README.md
CHANGED
|
@@ -1,260 +1,378 @@
|
|
|
1
1
|
# opencode-model-router
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> **Use the cheapest model that can do the job. Automatically.**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
An [OpenCode](https://opencode.ai) plugin that routes every task to the right-priced AI tier. Instead of running everything on your most expensive model, the orchestrator delegates exploration to a fast/cheap model, implementation to a balanced model, and architecture only to the powerful (expensive) one — automatically, on every message.
|
|
6
6
|
|
|
7
|
-
The
|
|
7
|
+
## The problem
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|------|---------------------|---------|
|
|
11
|
-
| `@fast` | Claude Haiku 4.5 | Exploration, search, file reads, grep |
|
|
12
|
-
| `@medium` | Claude Sonnet 4.5 | Implementation, refactoring, tests, bug fixes |
|
|
13
|
-
| `@heavy` | Claude Opus 4.6 | Architecture, complex debugging, security review |
|
|
9
|
+
Vibe coding is expensive because most AI coding tools default to one model for everything. That model is usually the most capable available — and you pay for that capability even when the task is `grep for a function name`.
|
|
14
10
|
|
|
15
|
-
|
|
11
|
+
A typical coding session breaks down roughly like this:
|
|
16
12
|
|
|
17
|
-
|
|
13
|
+
| Task type | % of session | Example |
|
|
14
|
+
|-----------|-------------|---------|
|
|
15
|
+
| Exploration / search | ~40% | Find where X is defined, read a file, check git log |
|
|
16
|
+
| Implementation | ~45% | Write a function, fix a bug, add a test |
|
|
17
|
+
| Architecture / deep debug | ~15% | Design a new module, debug after 2+ failures |
|
|
18
18
|
|
|
19
|
-
|
|
19
|
+
If you're running Opus (20x cost) for all of it, you're overpaying by **3-10x** on most tasks.
|
|
20
20
|
|
|
21
|
-
The
|
|
21
|
+
## The solution
|
|
22
22
|
|
|
23
|
-
-
|
|
24
|
-
- Preset breakdown (default `tiers.json`): `anthropic` ~209, `openai` ~206
|
|
25
|
-
- Estimation method: `prompt_characters / 4` (rough heuristic)
|
|
23
|
+
opencode-model-router injects a **delegation protocol** into the system prompt that teaches the orchestrator to:
|
|
26
24
|
|
|
27
|
-
|
|
25
|
+
1. **Match task to tier** using a configurable task taxonomy
|
|
26
|
+
2. **Split composite tasks** — explore first with a cheap model, then implement with a mid-tier model
|
|
27
|
+
3. **Skip delegation overhead** for trivial tasks (1-2 tool calls)
|
|
28
|
+
4. **Never over-qualify** — use the cheapest tier that can reliably handle the task
|
|
29
|
+
5. **Fallback** across providers when one fails
|
|
28
30
|
|
|
29
|
-
|
|
31
|
+
All of this adds ~210 tokens of system prompt overhead per message.
|
|
30
32
|
|
|
31
|
-
|
|
33
|
+
## Cost simulation
|
|
32
34
|
|
|
33
|
-
|
|
35
|
+
**Scenario: 50-message coding session with 30 delegated tasks**
|
|
36
|
+
|
|
37
|
+
Task distribution: 18 exploration (60%), 10 implementation (33%), 2 architecture (7%)
|
|
38
|
+
|
|
39
|
+
### Without model router (all-Opus)
|
|
40
|
+
|
|
41
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
42
|
+
|------|-------|------|-----------|-------|
|
|
43
|
+
| Exploration | 18 | Opus | 20x | 360x |
|
|
44
|
+
| Implementation | 10 | Opus | 20x | 200x |
|
|
45
|
+
| Architecture | 2 | Opus | 20x | 40x |
|
|
46
|
+
| **Total** | **30** | | | **600x** |
|
|
47
|
+
|
|
48
|
+
### With model router (normal mode, Sonnet orchestrator)
|
|
49
|
+
|
|
50
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
51
|
+
|------|-------|------|-----------|-------|
|
|
52
|
+
| Exploration (delegated) | 10 | @fast | 1x | 10x |
|
|
53
|
+
| Exploration (direct, trivial) | 8 | self | 0x | 0x |
|
|
54
|
+
| Implementation | 10 | @medium | 5x | 50x |
|
|
55
|
+
| Architecture | 2 | @heavy | 20x | 40x |
|
|
56
|
+
| **Total** | **30** | | | **100x** |
|
|
57
|
+
|
|
58
|
+
### With model router (budget mode, Sonnet orchestrator)
|
|
59
|
+
|
|
60
|
+
| Task | Count | Tier | Cost ratio | Total |
|
|
61
|
+
|------|-------|------|-----------|-------|
|
|
62
|
+
| Exploration | 18 | @fast | 1x | 18x |
|
|
63
|
+
| Implementation (simple) | 7 | @fast | 1x | 7x |
|
|
64
|
+
| Implementation (complex) | 3 | @medium | 5x | 15x |
|
|
65
|
+
| Architecture | 2 | @medium | 5x | 10x |
|
|
66
|
+
| **Total** | **30** | | | **50x** |
|
|
67
|
+
|
|
68
|
+
### Summary
|
|
69
|
+
|
|
70
|
+
| Setup | Session cost | vs all-Opus |
|
|
71
|
+
|-------|-------------|-------------|
|
|
72
|
+
| All-Opus (no router) | 600x | baseline |
|
|
73
|
+
| Sonnet orchestrator + router (normal) | 100x | **−83%** |
|
|
74
|
+
| Sonnet orchestrator + router (budget) | 50x | **−92%** |
|
|
75
|
+
|
|
76
|
+
> Cost ratios are relative units. Actual savings depend on your provider pricing and model selection.
|
|
77
|
+
|
|
78
|
+
## How it works
|
|
79
|
+
|
|
80
|
+
On every message, the plugin injects ~210 tokens into the system prompt:
|
|
34
81
|
|
|
35
|
-
```json
|
|
36
|
-
{
|
|
37
|
-
"plugin": [
|
|
38
|
-
"opencode-model-router@latest"
|
|
39
|
-
]
|
|
40
|
-
}
|
|
41
82
|
```
|
|
83
|
+
## Model Delegation Protocol
|
|
84
|
+
Preset: anthropic. Tiers: @fast=claude-haiku-4-5(1x) @medium=claude-sonnet-4-5/max(5x) @heavy=claude-opus-4-6/max(20x). mode:normal
|
|
85
|
+
R: @fast→search/grep/read/git-info/ls/lookup-docs/types/count/exists-check/rename @medium→impl-feature/refactor/write-tests/bugfix(≤2)/edit-logic/code-review/build-fix/create-file/db-migrate/api-endpoint/config-update @heavy→arch-design/debug(≥3fail)/sec-audit/perf-opt/migrate-strategy/multi-system-integration/tradeoff-analysis/rca
|
|
86
|
+
Multi-phase: split explore(@fast)→execute(@medium). Cheapest-first.
|
|
87
|
+
1.[tier:X]→delegate X 2.plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy 3.default:impl→@medium | readonly→@fast 4.orchestrate=self,delegate=exec 5.trivial(≤2tools)→direct,skip-delegate 6.self∈opus→never→@heavy,do-it-yourself 7.consult route-guide↑ 8.min(cost,adequate-tier)
|
|
88
|
+
Err→retry-alt-tier→fail→direct. Chain: anthropic→openai→google→github-copilot
|
|
89
|
+
Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").
|
|
90
|
+
Keep orchestration and final synthesis in the primary agent.
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
The orchestrator reads this once per message and applies it to every decision in that turn.
|
|
94
|
+
|
|
95
|
+
### Multi-phase decomposition (key differentiator)
|
|
96
|
+
|
|
97
|
+
The most impactful optimization. A composite task like:
|
|
98
|
+
|
|
99
|
+
> "Find how the auth middleware works and refactor it to use JWT."
|
|
100
|
+
|
|
101
|
+
Without router → routed entirely to `@medium` (5x for all ~8K tokens)
|
|
42
102
|
|
|
43
|
-
|
|
103
|
+
With router → split:
|
|
104
|
+
- **@fast (1x)**: grep, read 4-5 files, trace call chain (~4K tokens)
|
|
105
|
+
- **@medium (5x)**: rewrite auth module (~4K tokens)
|
|
44
106
|
|
|
107
|
+
**Result: ~36% cost reduction on composite tasks**, which represent ~60-70% of real coding work.
|
|
108
|
+
|
|
109
|
+
## Why not just use another orchestrator?
|
|
110
|
+
|
|
111
|
+
| Feature | model-router | Claude native | oh-my-opencode | GSD | ralph-loop |
|
|
112
|
+
|---------|:---:|:---:|:---:|:---:|:---:|
|
|
113
|
+
| Multi-tier cost routing | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
114
|
+
| Configurable task taxonomy | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
115
|
+
| Budget / quality modes | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
116
|
+
| Multi-phase decomposition | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
117
|
+
| Cross-provider fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
118
|
+
| Cost ratio awareness | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
119
|
+
| Plan annotation with tiers | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
120
|
+
| ~210 token overhead | ✅ | — | ❌ | ❌ | ❌ |
|
|
121
|
+
|
|
122
|
+
**Claude native**: single model for everything, no cost routing. If you're using claude.ai or OpenCode without plugins, you're paying the same price for `grep` as for architecture design.
|
|
123
|
+
|
|
124
|
+
**oh-my-opencode**: focused on workflow personality and prompt style, not cost optimization. No tier routing, no task taxonomy.
|
|
125
|
+
|
|
126
|
+
**GSD (Get Shit Done)**: prioritizes execution speed and low deliberation overhead. Excellent at pushing through tasks fast, but uses one model — no cost differentiation between search and architecture.
|
|
127
|
+
|
|
128
|
+
**ralph-loop**: iterative feedback-loop orchestrator. Excellent at self-correction and quality verification. No tier routing — every loop iteration runs on the same model regardless of task complexity.
|
|
129
|
+
|
|
130
|
+
**The core difference**: the others optimize for *how* the agent works (style, speed, quality loops). model-router optimizes for *what it costs* — with zero compromise on quality, because you can always put Opus in the heavy tier.
|
|
131
|
+
|
|
132
|
+
## Recommended setup
|
|
133
|
+
|
|
134
|
+
**Orchestrator**: use `claude-sonnet-4-5` (or equivalent mid-tier) as your primary/default model. Not Opus.
|
|
135
|
+
|
|
136
|
+
Why: the orchestrator runs on every message, including trivial ones. Sonnet can read the delegation protocol and make routing decisions just as well as Opus. You reserve Opus for when it's genuinely needed — via `@heavy` delegation.
|
|
137
|
+
|
|
138
|
+
In your `opencode.json`:
|
|
45
139
|
```json
|
|
46
140
|
{
|
|
47
|
-
"
|
|
48
|
-
|
|
49
|
-
]
|
|
141
|
+
"model": "anthropic/claude-sonnet-4-5",
|
|
142
|
+
"autoshare": false
|
|
50
143
|
}
|
|
51
144
|
```
|
|
52
145
|
|
|
53
|
-
|
|
146
|
+
Then install and configure model-router to handle the rest.
|
|
54
147
|
|
|
55
|
-
|
|
148
|
+
## Installation
|
|
56
149
|
|
|
150
|
+
### From npm (recommended)
|
|
57
151
|
```bash
|
|
58
|
-
|
|
59
|
-
|
|
152
|
+
# In your opencode project or globally
|
|
153
|
+
npm install -g opencode-model-router
|
|
60
154
|
```
|
|
61
155
|
|
|
62
|
-
|
|
63
|
-
|
|
156
|
+
Add to `~/.config/opencode/opencode.json`:
|
|
64
157
|
```json
|
|
65
158
|
{
|
|
66
|
-
"plugin":
|
|
67
|
-
"
|
|
68
|
-
|
|
159
|
+
"plugin": {
|
|
160
|
+
"opencode-model-router": {
|
|
161
|
+
"type": "npm",
|
|
162
|
+
"package": "opencode-model-router"
|
|
163
|
+
}
|
|
164
|
+
}
|
|
69
165
|
}
|
|
70
166
|
```
|
|
71
167
|
|
|
72
|
-
###
|
|
73
|
-
|
|
74
|
-
Clone wherever you want:
|
|
75
|
-
|
|
168
|
+
### Local clone
|
|
76
169
|
```bash
|
|
77
|
-
git clone https://github.com/
|
|
170
|
+
git clone https://github.com/your-username/opencode-model-router
|
|
171
|
+
cd opencode-model-router
|
|
172
|
+
npm install
|
|
78
173
|
```
|
|
79
174
|
|
|
80
|
-
|
|
81
|
-
|
|
175
|
+
In `~/.config/opencode/opencode.json`:
|
|
82
176
|
```json
|
|
83
177
|
{
|
|
84
|
-
"plugin":
|
|
85
|
-
"
|
|
86
|
-
|
|
178
|
+
"plugin": {
|
|
179
|
+
"opencode-model-router": {
|
|
180
|
+
"type": "local",
|
|
181
|
+
"path": "/absolute/path/to/opencode-model-router"
|
|
182
|
+
}
|
|
183
|
+
}
|
|
87
184
|
}
|
|
88
185
|
```
|
|
89
186
|
|
|
90
|
-
Restart OpenCode after adding the plugin.
|
|
91
|
-
|
|
92
187
|
## Configuration
|
|
93
188
|
|
|
94
|
-
All configuration lives in `tiers.json` at the plugin root.
|
|
189
|
+
All configuration lives in `tiers.json` at the plugin root.
|
|
95
190
|
|
|
96
191
|
### Presets
|
|
97
192
|
|
|
98
|
-
The plugin ships with four presets:
|
|
193
|
+
The plugin ships with four presets (switch with `/preset <name>`):
|
|
99
194
|
|
|
100
195
|
**anthropic** (default):
|
|
101
|
-
| Tier | Model |
|
|
102
|
-
|
|
103
|
-
| fast | `anthropic/claude-haiku-4-5` |
|
|
104
|
-
| medium | `anthropic/claude-sonnet-4-5`
|
|
105
|
-
| heavy | `anthropic/claude-opus-4-6`
|
|
196
|
+
| Tier | Model | Cost ratio |
|
|
197
|
+
|------|-------|-----------|
|
|
198
|
+
| @fast | `anthropic/claude-haiku-4-5` | 1x |
|
|
199
|
+
| @medium | `anthropic/claude-sonnet-4-5` (max) | 5x |
|
|
200
|
+
| @heavy | `anthropic/claude-opus-4-6` (max) | 20x |
|
|
106
201
|
|
|
107
202
|
**openai**:
|
|
108
|
-
| Tier | Model |
|
|
109
|
-
|
|
110
|
-
| fast | `openai/gpt-5.3-codex-spark` |
|
|
111
|
-
| medium | `openai/gpt-5.3-codex` |
|
|
112
|
-
| heavy | `openai/gpt-5.3-codex` |
|
|
203
|
+
| Tier | Model | Cost ratio |
|
|
204
|
+
|------|-------|-----------|
|
|
205
|
+
| @fast | `openai/gpt-5.3-codex-spark` | 1x |
|
|
206
|
+
| @medium | `openai/gpt-5.3-codex` | 5x |
|
|
207
|
+
| @heavy | `openai/gpt-5.3-codex` (xhigh) | 20x |
|
|
113
208
|
|
|
114
209
|
**github-copilot**:
|
|
115
|
-
| Tier | Model |
|
|
116
|
-
|
|
117
|
-
| fast | `github-copilot/claude-haiku-4-5` |
|
|
118
|
-
| medium | `github-copilot/claude-sonnet-4-5` |
|
|
119
|
-
| heavy | `github-copilot/claude-opus-4-6` |
|
|
210
|
+
| Tier | Model | Cost ratio |
|
|
211
|
+
|------|-------|-----------|
|
|
212
|
+
| @fast | `github-copilot/claude-haiku-4-5` | 1x |
|
|
213
|
+
| @medium | `github-copilot/claude-sonnet-4-5` | 5x |
|
|
214
|
+
| @heavy | `github-copilot/claude-opus-4-6` (thinking) | 20x |
|
|
120
215
|
|
|
121
216
|
**google**:
|
|
122
|
-
| Tier | Model |
|
|
123
|
-
|
|
124
|
-
| fast | `google/gemini-2.5-flash` |
|
|
125
|
-
| medium | `google/gemini-2.5-pro` |
|
|
126
|
-
| heavy | `google/gemini-3-pro-preview` |
|
|
217
|
+
| Tier | Model | Cost ratio |
|
|
218
|
+
|------|-------|-----------|
|
|
219
|
+
| @fast | `google/gemini-2.5-flash` | 1x |
|
|
220
|
+
| @medium | `google/gemini-2.5-pro` | 5x |
|
|
221
|
+
| @heavy | `google/gemini-3-pro-preview` | 20x |
|
|
127
222
|
|
|
128
|
-
|
|
223
|
+
### Routing modes
|
|
129
224
|
|
|
130
|
-
|
|
131
|
-
|
|
225
|
+
Switch with `/budget <mode>`. Mode is persisted across restarts.
|
|
226
|
+
|
|
227
|
+
| Mode | Default tier | Behavior |
|
|
228
|
+
|------|-------------|----------|
|
|
229
|
+
| `normal` | @medium | Balanced — routes by task complexity |
|
|
230
|
+
| `budget` | @fast | Aggressive savings — defaults cheap, escalates only when necessary |
|
|
231
|
+
| `quality` | @medium | Quality-first — liberal use of @medium/@heavy |
|
|
232
|
+
|
|
233
|
+
```json
|
|
234
|
+
{
|
|
235
|
+
"modes": {
|
|
236
|
+
"budget": {
|
|
237
|
+
"defaultTier": "fast",
|
|
238
|
+
"description": "Aggressive cost savings",
|
|
239
|
+
"overrideRules": [
|
|
240
|
+
"Default ALL tasks to @fast unless they clearly require code edits",
|
|
241
|
+
"Use @medium ONLY for: multi-file edits, complex refactors, test suites",
|
|
242
|
+
"Use @heavy ONLY when explicitly requested or after 2+ failed @medium attempts"
|
|
243
|
+
]
|
|
244
|
+
}
|
|
245
|
+
}
|
|
246
|
+
}
|
|
132
247
|
```
|
|
133
248
|
|
|
134
|
-
###
|
|
249
|
+
### Task taxonomy (`taskPatterns`)
|
|
135
250
|
|
|
136
|
-
|
|
251
|
+
Keyword routing guide injected into the system prompt. Customize to match your workflow:
|
|
137
252
|
|
|
138
253
|
```json
|
|
139
254
|
{
|
|
140
|
-
"
|
|
141
|
-
"
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
"description": "What this tier does",
|
|
145
|
-
"steps": 30,
|
|
146
|
-
"prompt": "System prompt for the subagent",
|
|
147
|
-
"whenToUse": ["Use case 1", "Use case 2"]
|
|
148
|
-
},
|
|
149
|
-
"medium": { ... },
|
|
150
|
-
"heavy": { ... }
|
|
151
|
-
}
|
|
255
|
+
"taskPatterns": {
|
|
256
|
+
"fast": ["search/grep/read", "git-info/ls", "lookup-docs/types", "count/exists-check/rename"],
|
|
257
|
+
"medium": ["impl-feature/refactor", "write-tests/bugfix(≤2)", "build-fix/create-file"],
|
|
258
|
+
"heavy": ["arch-design/debug(≥3fail)", "sec-audit/perf-opt", "migrate-strategy/rca"]
|
|
152
259
|
}
|
|
153
260
|
}
|
|
154
261
|
```
|
|
155
262
|
|
|
156
|
-
|
|
263
|
+
### Cost ratios
|
|
157
264
|
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
265
|
+
Set `costRatio` on each tier to reflect your real provider pricing. These are injected into the system prompt so the orchestrator makes cost-aware decisions:
|
|
266
|
+
|
|
267
|
+
```json
|
|
268
|
+
{
|
|
269
|
+
"fast": { "costRatio": 1 },
|
|
270
|
+
"medium": { "costRatio": 5 },
|
|
271
|
+
"heavy": { "costRatio": 20 }
|
|
272
|
+
}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
Adjust to actual prices. Exact values don't matter — directional signals are enough.
|
|
169
276
|
|
|
170
277
|
### Rules
|
|
171
278
|
|
|
172
|
-
The `rules` array
|
|
279
|
+
The `rules` array is injected verbatim (in compact form) into the system prompt. Default ruleset:
|
|
173
280
|
|
|
174
281
|
```json
|
|
175
282
|
{
|
|
176
283
|
"rules": [
|
|
177
|
-
"
|
|
178
|
-
"
|
|
179
|
-
"
|
|
180
|
-
"
|
|
181
|
-
"
|
|
182
|
-
"
|
|
284
|
+
"[tier:X]→delegate X",
|
|
285
|
+
"plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
|
|
286
|
+
"default:impl→@medium | readonly→@fast",
|
|
287
|
+
"orchestrate=self,delegate=exec",
|
|
288
|
+
"trivial(≤2tools)→direct,skip-delegate",
|
|
289
|
+
"self∈opus→never→@heavy,do-it-yourself",
|
|
290
|
+
"consult route-guide↑",
|
|
291
|
+
"min(cost,adequate-tier)"
|
|
183
292
|
]
|
|
184
293
|
}
|
|
185
294
|
```
|
|
186
295
|
|
|
296
|
+
Rules in `modes[x].overrideRules` replace this array entirely for that mode.
|
|
297
|
+
|
|
298
|
+
### Tier fields reference
|
|
299
|
+
|
|
300
|
+
| Field | Type | Description |
|
|
301
|
+
|-------|------|-------------|
|
|
302
|
+
| `model` | string | Full model ID (`provider/model-name`) |
|
|
303
|
+
| `variant` | string | Optional variant (`"max"`, `"xhigh"`, `"thinking"`) |
|
|
304
|
+
| `costRatio` | number | Relative cost (1 = cheapest). Shown in prompt. |
|
|
305
|
+
| `thinking` | object | Anthropic thinking: `{ "budgetTokens": 10000 }` |
|
|
306
|
+
| `reasoning` | object | OpenAI reasoning: `{ "effort": "high", "summary": "detailed" }` |
|
|
307
|
+
| `description` | string | Shown in `/tiers` output |
|
|
308
|
+
| `steps` | number | Max agent turns |
|
|
309
|
+
| `prompt` | string | Subagent system prompt |
|
|
310
|
+
| `whenToUse` | string[] | Use cases (shown in `/tiers`, not in system prompt) |
|
|
311
|
+
|
|
312
|
+
### Fallback
|
|
313
|
+
|
|
314
|
+
Defines provider fallback order when a delegated task fails:
|
|
315
|
+
|
|
316
|
+
```json
|
|
317
|
+
{
|
|
318
|
+
"fallback": {
|
|
319
|
+
"global": {
|
|
320
|
+
"anthropic": ["openai", "google", "github-copilot"],
|
|
321
|
+
"openai": ["anthropic", "google", "github-copilot"]
|
|
322
|
+
}
|
|
323
|
+
}
|
|
324
|
+
}
|
|
325
|
+
```
|
|
326
|
+
|
|
187
327
|
## Commands
|
|
188
328
|
|
|
189
329
|
| Command | Description |
|
|
190
330
|
|---------|-------------|
|
|
191
|
-
| `/tiers` | Show active tier configuration and
|
|
331
|
+
| `/tiers` | Show active tier configuration, models, and rules |
|
|
192
332
|
| `/preset` | List available presets |
|
|
193
|
-
| `/preset <name>` | Switch
|
|
333
|
+
| `/preset <name>` | Switch preset (e.g., `/preset openai`) |
|
|
334
|
+
| `/budget` | Show available modes and which is active |
|
|
335
|
+
| `/budget <mode>` | Switch routing mode (`normal`, `budget`, `quality`) |
|
|
194
336
|
| `/annotate-plan [path]` | Annotate a plan file with `[tier:X]` tags for each step |
|
|
195
337
|
|
|
196
338
|
## Plan annotation
|
|
197
339
|
|
|
198
|
-
The `/annotate-plan` command reads
|
|
340
|
+
For complex tasks, you can write a plan file and annotate each step with the correct tier. The `/annotate-plan` command reads the plan and adds `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags to each step based on the task taxonomy.
|
|
199
341
|
|
|
200
|
-
|
|
201
|
-
```markdown
|
|
202
|
-
## Steps
|
|
203
|
-
1. Search the codebase for all authentication handlers
|
|
204
|
-
2. Implement the new OAuth2 flow
|
|
205
|
-
3. Review the auth architecture for security vulnerabilities
|
|
206
|
-
```
|
|
342
|
+
The orchestrator then reads these tags and delegates accordingly — removing ambiguity from routing decisions on long, multi-step tasks.
|
|
207
343
|
|
|
208
|
-
|
|
344
|
+
Example plan (before annotation):
|
|
209
345
|
```markdown
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
When the agent executes an annotated plan, it delegates each step to the appropriate subagent automatically.
|
|
217
|
-
|
|
218
|
-
## Provider setup
|
|
219
|
-
|
|
220
|
-
The models referenced in your preset must be configured in your `opencode.json` providers. For example, to use the default `anthropic` preset:
|
|
221
|
-
|
|
222
|
-
```json
|
|
223
|
-
{
|
|
224
|
-
"provider": {
|
|
225
|
-
"anthropic": {
|
|
226
|
-
"models": {
|
|
227
|
-
"claude-haiku-4-5": {},
|
|
228
|
-
"claude-sonnet-4-5": {},
|
|
229
|
-
"claude-opus-4-6": { "thinking": true }
|
|
230
|
-
}
|
|
231
|
-
}
|
|
232
|
-
}
|
|
233
|
-
}
|
|
346
|
+
1. Find all API endpoints in the codebase
|
|
347
|
+
2. Add rate limiting middleware to each endpoint
|
|
348
|
+
3. Write integration tests for rate limiting
|
|
349
|
+
4. Design a token bucket algorithm for advanced rate limiting
|
|
234
350
|
```
|
|
235
351
|
|
|
236
|
-
|
|
237
|
-
|
|
352
|
+
After `/annotate-plan`:
|
|
353
|
+
```markdown
|
|
354
|
+
1. [tier:fast] Find all API endpoints in the codebase
|
|
355
|
+
2. [tier:medium] Add rate limiting middleware to each endpoint
|
|
356
|
+
3. [tier:medium] Write integration tests for rate limiting
|
|
357
|
+
4. [tier:heavy] Design a token bucket algorithm for advanced rate limiting
|
|
238
358
|
```
|
|
239
|
-
You: Implement the user settings page based on PLAN.md
|
|
240
359
|
|
|
241
|
-
|
|
242
|
-
-> Task(@fast): "Find all existing settings-related components and their patterns"
|
|
243
|
-
<- @fast returns: Found SettingsLayout at src/components/..., uses React Hook Form...
|
|
360
|
+
## Token overhead
|
|
244
361
|
|
|
245
|
-
|
|
246
|
-
-> Task(@medium): "Implement UserSettingsForm component following the patterns from..."
|
|
247
|
-
<- @medium returns: Created src/components/UserSettingsForm.tsx, added tests...
|
|
362
|
+
The system prompt injection is ~210 tokens per message — roughly the same as v1.0 (before cost-aware features were added). Dense notation keeps overhead flat while adding full routing intelligence.
|
|
248
363
|
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
364
|
+
| Version | Tokens | Features |
|
|
365
|
+
|---------|--------|----------|
|
|
366
|
+
| v1.0.7 | ~208 | Basic tier routing |
|
|
367
|
+
| v1.1.0 | ~870 | All features, verbose format |
|
|
368
|
+
| v1.1.1+ | ~210 | All features, compressed format |
|
|
252
369
|
|
|
253
370
|
## Requirements
|
|
254
371
|
|
|
255
|
-
- OpenCode v1.0
|
|
256
|
-
-
|
|
372
|
+
- [OpenCode](https://opencode.ai) v1.0 or later
|
|
373
|
+
- Node.js 18+
|
|
374
|
+
- Provider API keys configured in OpenCode
|
|
257
375
|
|
|
258
376
|
## License
|
|
259
377
|
|
|
260
|
-
|
|
378
|
+
GPL-3.0
|
package/package.json
CHANGED
package/src/index.ts
CHANGED
|
@@ -310,23 +310,16 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
|
|
|
310
310
|
const map = presetMap && Object.keys(presetMap).length > 0 ? presetMap : fb.global;
|
|
311
311
|
if (!map) return "";
|
|
312
312
|
|
|
313
|
-
const
|
|
313
|
+
const chains = Object.entries(map).flatMap(([provider, presetOrder]) => {
|
|
314
314
|
if (!Array.isArray(presetOrder)) return [];
|
|
315
|
-
const
|
|
316
|
-
(
|
|
315
|
+
const valid = presetOrder.filter(
|
|
316
|
+
(p) => p !== cfg.activePreset && Boolean(cfg.presets[p]),
|
|
317
317
|
);
|
|
318
|
-
return
|
|
318
|
+
return valid.length > 0 ? [`${provider}→${valid.join("→")}`] : [];
|
|
319
319
|
});
|
|
320
320
|
|
|
321
|
-
if (
|
|
322
|
-
|
|
323
|
-
return [
|
|
324
|
-
"Fallback on delegated task errors:",
|
|
325
|
-
"1. If Task(...) returns provider/model/rate-limit/timeout/auth errors, retry once with a different tier suited to the same task.",
|
|
326
|
-
"2. If retry also fails, stop delegating that task and complete it directly in the primary agent.",
|
|
327
|
-
"3. Use the failing model prefix and this preset fallback order for next-run recovery (`/preset <name>` + restart):",
|
|
328
|
-
...providerLines,
|
|
329
|
-
].join("\n");
|
|
321
|
+
if (chains.length === 0) return "";
|
|
322
|
+
return `Err→retry-alt-tier→fail→direct. Chain: ${chains.join(" | ")}`;
|
|
330
323
|
}
|
|
331
324
|
|
|
332
325
|
// ---------------------------------------------------------------------------
|
|
@@ -335,25 +328,39 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
|
|
|
335
328
|
|
|
336
329
|
function buildTaskTaxonomy(cfg: RouterConfig): string {
|
|
337
330
|
if (!cfg.taskPatterns || Object.keys(cfg.taskPatterns).length === 0) return "";
|
|
338
|
-
|
|
339
|
-
const lines = ["Coding task routing guide:"];
|
|
331
|
+
const lines = ["R:"];
|
|
340
332
|
for (const [tier, patterns] of Object.entries(cfg.taskPatterns)) {
|
|
341
333
|
if (Array.isArray(patterns) && patterns.length > 0) {
|
|
342
|
-
lines.push(
|
|
334
|
+
lines.push(`@${tier}→${patterns.join("/")}`);
|
|
343
335
|
}
|
|
344
336
|
}
|
|
345
|
-
return lines.join("
|
|
337
|
+
return lines.join(" ");
|
|
346
338
|
}
|
|
347
339
|
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
340
|
+
/**
|
|
341
|
+
* Injects a multi-phase decomposition hint into the delegation protocol.
|
|
342
|
+
* Teaches the orchestrator to split composite tasks (explore + implement)
|
|
343
|
+
* so the cheap @fast tier handles exploration and @medium handles execution.
|
|
344
|
+
* Only active in normal mode — budget/quality modes have their own override rules.
|
|
345
|
+
*/
|
|
346
|
+
function buildDecomposeHint(cfg: RouterConfig): string {
|
|
347
|
+
const mode = getActiveMode(cfg);
|
|
348
|
+
// Budget and quality modes handle this via overrideRules — skip to avoid conflicts
|
|
349
|
+
if (mode?.overrideRules?.length) return "";
|
|
354
350
|
|
|
355
|
-
|
|
356
|
-
|
|
351
|
+
const tiers = getActiveTiers(cfg);
|
|
352
|
+
const entries = Object.entries(tiers);
|
|
353
|
+
if (entries.length < 2) return "";
|
|
354
|
+
|
|
355
|
+
// Sort by costRatio ascending to find cheapest (explore) and next (execute) tiers
|
|
356
|
+
const sorted = [...entries].sort(
|
|
357
|
+
([, a], [, b]) => (a.costRatio ?? 1) - (b.costRatio ?? 1)
|
|
358
|
+
);
|
|
359
|
+
const cheapest = sorted[0]?.[0];
|
|
360
|
+
const mid = sorted[1]?.[0];
|
|
361
|
+
if (!cheapest || !mid) return "";
|
|
362
|
+
|
|
363
|
+
return `Multi-phase: split explore(@${cheapest})→execute(@${mid}). Cheapest-first.`;
|
|
357
364
|
}
|
|
358
365
|
|
|
359
366
|
// ---------------------------------------------------------------------------
|
|
@@ -363,51 +370,34 @@ function buildCostAwareness(cfg: RouterConfig): string {
|
|
|
363
370
|
function buildDelegationProtocol(cfg: RouterConfig): string {
|
|
364
371
|
const tiers = getActiveTiers(cfg);
|
|
365
372
|
|
|
366
|
-
|
|
373
|
+
// Compact tier summary: @name=model/variant(costRatio)
|
|
374
|
+
const tierLine = Object.entries(tiers)
|
|
367
375
|
.map(([name, t]) => {
|
|
368
|
-
const
|
|
369
|
-
const
|
|
370
|
-
|
|
376
|
+
const short = t.model.split("/").pop() ?? t.model;
|
|
377
|
+
const v = t.variant ? `/${t.variant}` : "";
|
|
378
|
+
const c = t.costRatio != null ? `(${t.costRatio}x)` : "";
|
|
379
|
+
return `@${name}=${short}${v}${c}`;
|
|
371
380
|
})
|
|
372
|
-
.join("
|
|
381
|
+
.join(" ");
|
|
373
382
|
|
|
374
|
-
|
|
375
|
-
const
|
|
376
|
-
.map(([name, t]) => {
|
|
377
|
-
const uses = t.whenToUse.length > 0 ? t.whenToUse.join(", ") : t.description;
|
|
378
|
-
return `- @${name}: ${uses}`;
|
|
379
|
-
})
|
|
380
|
-
.join("\n");
|
|
383
|
+
const mode = getActiveMode(cfg);
|
|
384
|
+
const modeSuffix = cfg.activeMode ? ` mode:${cfg.activeMode}` : "";
|
|
381
385
|
|
|
382
|
-
// Task taxonomy from config
|
|
383
386
|
const taxonomy = buildTaskTaxonomy(cfg);
|
|
387
|
+
const decompose = buildDecomposeHint(cfg);
|
|
384
388
|
|
|
385
|
-
// Cost awareness
|
|
386
|
-
const costLine = buildCostAwareness(cfg);
|
|
387
|
-
|
|
388
|
-
// Mode-aware rules: if active mode has overrideRules, use those; otherwise use global rules
|
|
389
|
-
const mode = getActiveMode(cfg);
|
|
390
389
|
const effectiveRules = mode?.overrideRules?.length ? mode.overrideRules : cfg.rules;
|
|
391
|
-
const
|
|
392
|
-
.map((rule, i) => `${i + 1}. ${rule}`)
|
|
393
|
-
.join("\n");
|
|
390
|
+
const rulesLine = effectiveRules.map((r, i) => `${i + 1}.${r}`).join(" ");
|
|
394
391
|
|
|
395
|
-
const
|
|
392
|
+
const fallback = buildFallbackInstructions(cfg);
|
|
396
393
|
|
|
397
394
|
return [
|
|
398
|
-
|
|
399
|
-
`Preset: ${cfg.activePreset}. Tiers: ${
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
...(
|
|
404
|
-
...(costLine ? ["", costLine] : []),
|
|
405
|
-
...(mode ? [`\nActive mode: ${cfg.activeMode} (${mode.description})`] : []),
|
|
406
|
-
"",
|
|
407
|
-
"Apply to every user message (plan and ad-hoc):",
|
|
408
|
-
numberedRules,
|
|
409
|
-
...(fallbackInstructions ? ["", fallbackInstructions] : []),
|
|
410
|
-
"",
|
|
395
|
+
`## Model Delegation Protocol`,
|
|
396
|
+
`Preset: ${cfg.activePreset}. Tiers: ${tierLine}.${modeSuffix}`,
|
|
397
|
+
...(taxonomy ? [taxonomy] : []),
|
|
398
|
+
...(decompose ? [decompose] : []),
|
|
399
|
+
rulesLine,
|
|
400
|
+
...(fallback ? [fallback] : []),
|
|
411
401
|
`Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").`,
|
|
412
402
|
"Keep orchestration and final synthesis in the primary agent.",
|
|
413
403
|
].join("\n");
|
package/tiers.json
CHANGED
|
@@ -174,39 +174,9 @@
|
|
|
174
174
|
}
|
|
175
175
|
},
|
|
176
176
|
"taskPatterns": {
|
|
177
|
-
"fast": [
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
"Read or display specific files or sections",
|
|
181
|
-
"Check git status, log, diff, or blame",
|
|
182
|
-
"Lookup documentation, API signatures, or type definitions",
|
|
183
|
-
"Count occurrences, lines, or matches",
|
|
184
|
-
"Check if a file, function, or class exists",
|
|
185
|
-
"Simple rename or string replacement across files"
|
|
186
|
-
],
|
|
187
|
-
"medium": [
|
|
188
|
-
"Implement a new feature, function, or component",
|
|
189
|
-
"Refactor or restructure existing code",
|
|
190
|
-
"Write or update tests",
|
|
191
|
-
"Fix a bug (first or second attempt)",
|
|
192
|
-
"Modify or update existing code logic",
|
|
193
|
-
"Code review with suggested changes",
|
|
194
|
-
"Run build/lint/test and fix resulting errors",
|
|
195
|
-
"Create a new file from a template or pattern",
|
|
196
|
-
"Database migration or schema changes",
|
|
197
|
-
"API endpoint implementation",
|
|
198
|
-
"Configuration or dependency updates"
|
|
199
|
-
],
|
|
200
|
-
"heavy": [
|
|
201
|
-
"Design system or module architecture from scratch",
|
|
202
|
-
"Debug a problem after 2+ failed attempts",
|
|
203
|
-
"Security audit or vulnerability review",
|
|
204
|
-
"Performance profiling and optimization",
|
|
205
|
-
"Migration strategy (framework, language, infrastructure)",
|
|
206
|
-
"Complex multi-system integration design",
|
|
207
|
-
"Evaluate tradeoffs between competing approaches",
|
|
208
|
-
"Root cause analysis of complex or elusive failures"
|
|
209
|
-
]
|
|
177
|
+
"fast": ["search", "grep", "read", "git-info", "ls", "lookup-docs/types", "count", "exists-check", "rename"],
|
|
178
|
+
"medium": ["impl-feature", "refactor", "write-tests", "bugfix(≤2)", "edit-logic", "code-review", "build-fix", "create-file", "db-migrate", "api-endpoint", "config-update"],
|
|
179
|
+
"heavy": ["arch-design", "debug(≥3fail)", "sec-audit", "perf-opt", "migrate-strategy", "multi-system-integration", "tradeoff-analysis", "rca"]
|
|
210
180
|
},
|
|
211
181
|
"modes": {
|
|
212
182
|
"normal": {
|
|
@@ -217,22 +187,22 @@
|
|
|
217
187
|
"defaultTier": "fast",
|
|
218
188
|
"description": "Aggressive cost savings — defaults to cheapest tier, escalates only when needed",
|
|
219
189
|
"overrideRules": [
|
|
220
|
-
"
|
|
221
|
-
"
|
|
222
|
-
"
|
|
223
|
-
"
|
|
224
|
-
"
|
|
225
|
-
"
|
|
190
|
+
"default→@fast unless edits/complex-reasoning needed",
|
|
191
|
+
"@medium ONLY: multi-file-edit/refactor/test-suite/build-fix",
|
|
192
|
+
"@heavy ONLY: user-requested OR ≥2 @medium failures",
|
|
193
|
+
"trivial(grep/read/glob)→direct,no-delegate",
|
|
194
|
+
"batch related searches→single @fast",
|
|
195
|
+
"uncertain @fast vs @medium→@fast,escalate on fail"
|
|
226
196
|
]
|
|
227
197
|
},
|
|
228
198
|
"quality": {
|
|
229
199
|
"defaultTier": "medium",
|
|
230
200
|
"description": "Quality-first — uses stronger models more liberally for better results",
|
|
231
201
|
"overrideRules": [
|
|
232
|
-
"
|
|
233
|
-
"
|
|
234
|
-
"
|
|
235
|
-
"
|
|
202
|
+
"default→@medium incl exploration when deep-context matters",
|
|
203
|
+
"@heavy: arch/debug/security/multi-file-coord",
|
|
204
|
+
"@fast ONLY: trivial single-tool ops (1 grep/1 read)",
|
|
205
|
+
"prefer thoroughness over speed"
|
|
236
206
|
]
|
|
237
207
|
}
|
|
238
208
|
},
|
|
@@ -245,18 +215,14 @@
|
|
|
245
215
|
}
|
|
246
216
|
},
|
|
247
217
|
"rules": [
|
|
248
|
-
"
|
|
249
|
-
"
|
|
250
|
-
"
|
|
251
|
-
"
|
|
252
|
-
"
|
|
253
|
-
"
|
|
254
|
-
"
|
|
255
|
-
"
|
|
256
|
-
"Never delegate to @heavy if you are already running on an opus-class model - do it yourself",
|
|
257
|
-
"If a task takes 1-2 tool calls, execute directly — delegation overhead is not worth the cost",
|
|
258
|
-
"Consult the task routing guide below to match task type to the correct tier",
|
|
259
|
-
"Consider cost ratios when choosing tiers — always use the cheapest tier that can reliably handle the task"
|
|
218
|
+
"[tier:X]→delegate X",
|
|
219
|
+
"plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
|
|
220
|
+
"default:impl→@medium | readonly→@fast",
|
|
221
|
+
"orchestrate=self,delegate=exec",
|
|
222
|
+
"trivial(≤2tools)→direct,skip-delegate",
|
|
223
|
+
"self∈opus→never→@heavy,do-it-yourself",
|
|
224
|
+
"consult route-guide↑",
|
|
225
|
+
"min(cost,adequate-tier)"
|
|
260
226
|
],
|
|
261
227
|
"defaultTier": "medium"
|
|
262
228
|
}
|