opencode-model-router 1.1.1 → 1.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +287 -231
  2. package/package.json +1 -1
  3. package/src/index.ts +34 -7
package/README.md CHANGED
@@ -1,195 +1,287 @@
1
1
  # opencode-model-router
2
2
 
3
- An [OpenCode](https://opencode.ai) plugin that automatically routes tasks to tiered subagents based on complexity. Instead of running everything on your most expensive model, the orchestrator delegates exploration to a fast model, implementation to a balanced model, and architecture/security to the most capable model.
3
+ > **Use the cheapest model that can do the job. Automatically.**
4
4
 
5
- ## How it works
5
+ An [OpenCode](https://opencode.ai) plugin that routes every coding task to the right-priced AI tier — automatically, on every message, with ~210 tokens of overhead.
6
6
 
7
- The plugin injects a **delegation protocol** into the system prompt that teaches the primary agent to route work:
7
+ ## Why it's different
8
8
 
9
- | Tier | Default (Anthropic) | Cost | Purpose |
10
- |------|---------------------|------|---------|
11
- | `@fast` | Claude Haiku 4.5 | 1x | Exploration, search, file reads, grep |
12
- | `@medium` | Claude Sonnet 4.5 | 5x | Implementation, refactoring, tests, bug fixes |
13
- | `@heavy` | Claude Opus 4.6 | 20x | Architecture, complex debugging, security review |
9
+ Most AI coding tools give you one model for everything. You pay Opus prices to run `grep`. opencode-model-router changes that with a stack of interlocking ideas:
14
10
 
15
- The agent automatically delegates via the Task tool when it recognizes the task complexity, or when plan steps are annotated with `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags.
11
+ **Use a mid-tier model as orchestrator.**
12
+ The orchestrator runs on *every* message. Put Sonnet there, not Opus. Sonnet reads a routing protocol and delegates just as well as Opus — at 4x lower cost. Reserve Opus for when it genuinely matters.
16
13
 
17
- This applies both to plan-driven execution and direct ad-hoc requests. For every new user message, the orchestrator performs an intent gate, splits multi-task requests into atomic units, and routes each unit to `@fast`, `@medium`, or `@heavy`.
14
+ **Inject a compressed, LLM-optimized routing protocol.**
15
+ Instead of verbose instructions, the plugin injects ~210 tokens of dense, machine-readable notation the orchestrator understands perfectly. Same routing intelligence as 870 tokens of prose — 75% smaller. Every message, every session.
18
16
 
19
- ### Token overhead disclaimer
17
+ **Match task to tier using a configurable taxonomy.**
18
+ A keyword routing guide (`@fast→search/grep/read`, `@medium→impl/refactor/test`, `@heavy→arch/debug/security`) tells the orchestrator exactly which tier fits each task type. Fully customizable. No ambiguity.
20
19
 
21
- The injected protocol is compact, but it still adds tokens on every iteration.
20
+ **Split composite tasks: explore cheap, execute smart.**
21
+ "Find how auth works and refactor it" shouldn't cost @medium for the whole thing. The multi-phase decomposition rule splits it: @fast reads the files (1x cost), @medium does the rewrite (5x cost). ~36% savings on composite tasks, which are ~65% of real coding sessions.
22
22
 
23
- - Estimated average injection: ~208 tokens per iteration
24
- - Preset breakdown (default `tiers.json`): `anthropic` ~209, `openai` ~206
25
- - Estimation method: `prompt_characters / 4` (rough heuristic)
23
+ **Skip delegation overhead for trivial work.**
24
+ Single grep? One file read? The orchestrator executes directly — zero delegation cost, zero latency.
26
25
 
27
- Real token usage varies by tokenizer/model and any custom changes you make to `tiers.json`.
26
+ **Three routing modes for different budgets.**
27
+ `/budget normal` (balanced), `/budget budget` (aggressive savings, defaults everything to @fast), `/budget quality` (liberal use of stronger models). Mode persists across restarts.
28
28
 
29
- ## Installation
29
+ **Cost ratios in the prompt.**
30
+ Every tier carries its `costRatio` (fast=1x, medium=5x, heavy=20x) injected into the system prompt. The orchestrator sees the price before deciding. It picks the cheapest tier that can reliably handle the task.
30
31
 
31
- ### Option A: npm package (recommended)
32
+ **Orchestrator-awareness.**
33
+ If the orchestrator is already running on Opus, the rule `self∈opus→never→@heavy` fires — it does the heavy work itself rather than delegating to another Opus instance.
32
34
 
33
- Add the plugin package in your `opencode.json`:
35
+ **Multi-provider support with automatic fallback.**
36
+ Four presets out of the box: Anthropic, OpenAI, GitHub Copilot, Google. Switch with `/preset`. If a provider fails, the fallback chain tries the next one automatically.
37
+
38
+ **Plan annotation for long tasks.**
39
+ `/annotate-plan` reads a markdown plan and tags each step with `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` — removing all routing ambiguity from multi-step workflows.
40
+
41
+ **Fully configurable.**
42
+ Tiers, models, cost ratios, rules, task patterns, routing modes, fallback chains — all in `tiers.json`. No code changes needed.
43
+
44
+ ## The problem
45
+
46
+ Vibe coding is expensive because most AI coding tools default to one model for everything. That model is usually the most capable available — and you pay for that capability even when the task is `grep for a function name`.
47
+
48
+ A typical coding session breaks down roughly like this:
49
+
50
+ | Task type | % of session | Example |
51
+ |-----------|-------------|---------|
52
+ | Exploration / search | ~40% | Find where X is defined, read a file, check git log |
53
+ | Implementation | ~45% | Write a function, fix a bug, add a test |
54
+ | Architecture / deep debug | ~15% | Design a new module, debug after 2+ failures |
55
+
56
+ If you're running Opus (20x cost) for all of it, you're overpaying by **3-10x** on most tasks.
57
+
58
+ ## The solution
59
+
60
+ opencode-model-router injects a **delegation protocol** into the system prompt that teaches the orchestrator to:
61
+
62
+ 1. **Match task to tier** using a configurable task taxonomy
63
+ 2. **Split composite tasks** — explore first with a cheap model, then implement with a mid-tier model
64
+ 3. **Skip delegation overhead** for trivial tasks (1-2 tool calls)
65
+ 4. **Never over-qualify** — use the cheapest tier that can reliably handle the task
66
+ 5. **Fallback** across providers when one fails
67
+
68
+ All of this adds ~210 tokens of system prompt overhead per message.
69
+
70
+ ## Cost simulation
71
+
72
+ **Scenario: 50-message coding session with 30 delegated tasks**
73
+
74
+ Task distribution: 18 exploration (60%), 10 implementation (33%), 2 architecture (7%)
75
+
76
+ ### Without model router (all-Opus)
77
+
78
+ | Task | Count | Tier | Cost ratio | Total |
79
+ |------|-------|------|-----------|-------|
80
+ | Exploration | 18 | Opus | 20x | 360x |
81
+ | Implementation | 10 | Opus | 20x | 200x |
82
+ | Architecture | 2 | Opus | 20x | 40x |
83
+ | **Total** | **30** | | | **600x** |
84
+
85
+ ### With model router (normal mode, Sonnet orchestrator)
86
+
87
+ | Task | Count | Tier | Cost ratio | Total |
88
+ |------|-------|------|-----------|-------|
89
+ | Exploration (delegated) | 10 | @fast | 1x | 10x |
90
+ | Exploration (direct, trivial) | 8 | self | 0x | 0x |
91
+ | Implementation | 10 | @medium | 5x | 50x |
92
+ | Architecture | 2 | @heavy | 20x | 40x |
93
+ | **Total** | **30** | | | **100x** |
94
+
95
+ ### With model router (budget mode, Sonnet orchestrator)
96
+
97
+ | Task | Count | Tier | Cost ratio | Total |
98
+ |------|-------|------|-----------|-------|
99
+ | Exploration | 18 | @fast | 1x | 18x |
100
+ | Implementation (simple) | 7 | @fast | 1x | 7x |
101
+ | Implementation (complex) | 3 | @medium | 5x | 15x |
102
+ | Architecture | 2 | @medium | 5x | 10x |
103
+ | **Total** | **30** | | | **50x** |
104
+
105
+ ### Summary
106
+
107
+ | Setup | Session cost | vs all-Opus |
108
+ |-------|-------------|-------------|
109
+ | All-Opus (no router) | 600x | baseline |
110
+ | Sonnet orchestrator + router (normal) | 100x | **−83%** |
111
+ | Sonnet orchestrator + router (budget) | 50x | **−92%** |
112
+
113
+ > Cost ratios are relative units. Actual savings depend on your provider pricing and model selection.
114
+
115
+ ## How it works
116
+
117
+ On every message, the plugin injects ~210 tokens into the system prompt. The notation is intentionally dense and compressed — it's **optimized for LLM comprehension, not human readability**. An agent reads it as a precise routing grammar; a human might squint at it. That's by design: verbose prose would cost 4x more tokens per message with no routing benefit.
118
+
119
+ What the orchestrator sees (Anthropic preset, normal mode):
34
120
 
35
- ```json
36
- {
37
- "plugin": [
38
- "opencode-model-router@latest"
39
- ]
40
- }
41
121
  ```
122
+ ## Model Delegation Protocol
123
+ Preset: anthropic. Tiers: @fast=claude-haiku-4-5(1x) @medium=claude-sonnet-4-5/max(5x) @heavy=claude-opus-4-6/max(20x). mode:normal
124
+ R: @fast→search/grep/read/git-info/ls/lookup-docs/types/count/exists-check/rename @medium→impl-feature/refactor/write-tests/bugfix(≤2)/edit-logic/code-review/build-fix/create-file/db-migrate/api-endpoint/config-update @heavy→arch-design/debug(≥3fail)/sec-audit/perf-opt/migrate-strategy/multi-system-integration/tradeoff-analysis/rca
125
+ Multi-phase: split explore(@fast)→execute(@medium). Cheapest-first.
126
+ 1.[tier:X]→delegate X 2.plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy 3.default:impl→@medium | readonly→@fast 4.orchestrate=self,delegate=exec 5.trivial(≤2tools)→direct,skip-delegate 6.self∈opus→never→@heavy,do-it-yourself 7.consult route-guide↑ 8.min(cost,adequate-tier)
127
+ Err→retry-alt-tier→fail→direct. Chain: anthropic→openai→google→github-copilot
128
+ Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").
129
+ Keep orchestration and final synthesis in the primary agent.
130
+ ```
131
+
132
+ **What each line means (for humans):**
133
+
134
+ | Line | What it encodes |
135
+ |------|----------------|
136
+ | `Tiers: @fast=...(1x) @medium=...(5x) @heavy=...(20x)` | Model + cost ratio per tier, all in one compact token |
137
+ | `R: @fast→search/grep/... @medium→impl/...` | Full task taxonomy — keyword triggers for each tier |
138
+ | `Multi-phase: split explore(@fast)→execute(@medium)` | Composite task decomposition rule |
139
+ | `1.[tier:X]→... 5.trivial(≤2tools)→direct... 6.self∈opus→...` | Numbered routing rules in abbreviated form |
140
+ | `Err→retry-alt-tier→fail→direct. Chain: anthropic→...` | Fallback strategy in one line |
141
+
142
+ The orchestrator reads this once per message and applies it to every tool call and delegation decision in that turn.
143
+
144
+ ### Multi-phase decomposition (key differentiator)
145
+
146
+ The most impactful optimization. A composite task like:
147
+
148
+ > "Find how the auth middleware works and refactor it to use JWT."
149
+
150
+ Without router → routed entirely to `@medium` (5x for all ~8K tokens)
151
+
152
+ With router → split:
153
+ - **@fast (1x)**: grep, read 4-5 files, trace call chain (~4K tokens)
154
+ - **@medium (5x)**: rewrite auth module (~4K tokens)
155
+
156
+ **Result: ~36% cost reduction on composite tasks**, which represent ~60-70% of real coding work.
42
157
 
43
- If you prefer always getting the latest release, use:
158
+ ## Why not just use another orchestrator?
44
159
 
160
+ | Feature | model-router | Claude native | oh-my-opencode | GSD | ralph-loop |
161
+ |---------|:---:|:---:|:---:|:---:|:---:|
162
+ | Multi-tier cost routing | ✅ | ❌ | ❌ | ❌ | ❌ |
163
+ | Configurable task taxonomy | ✅ | ❌ | ❌ | ❌ | ❌ |
164
+ | Budget / quality modes | ✅ | ❌ | ❌ | ❌ | ❌ |
165
+ | Multi-phase decomposition | ✅ | ❌ | ❌ | ❌ | ❌ |
166
+ | Cross-provider fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
167
+ | Cost ratio awareness | ✅ | ❌ | ❌ | ❌ | ❌ |
168
+ | Plan annotation with tiers | ✅ | ❌ | ❌ | ❌ | ❌ |
169
+ | ~210 token overhead | ✅ | — | ❌ | ❌ | ❌ |
170
+
171
+ **Claude native**: single model for everything, no cost routing. If you're using claude.ai or OpenCode without plugins, you're paying the same price for `grep` as for architecture design.
172
+
173
+ **oh-my-opencode**: focused on workflow personality and prompt style, not cost optimization. No tier routing, no task taxonomy.
174
+
175
+ **GSD (Get Shit Done)**: prioritizes execution speed and low deliberation overhead. Excellent at pushing through tasks fast, but uses one model — no cost differentiation between search and architecture.
176
+
177
+ **ralph-loop**: iterative feedback-loop orchestrator. Excellent at self-correction and quality verification. No tier routing — every loop iteration runs on the same model regardless of task complexity.
178
+
179
+ **The core difference**: the others optimize for *how* the agent works (style, speed, quality loops). model-router optimizes for *what it costs* — with zero compromise on quality, because you can always put Opus in the heavy tier.
180
+
181
+ ## Recommended setup
182
+
183
+ **Orchestrator**: use `claude-sonnet-4-5` (or equivalent mid-tier) as your primary/default model. Not Opus.
184
+
185
+ Why: the orchestrator runs on every message, including trivial ones. Sonnet can read the delegation protocol and make routing decisions just as well as Opus. You reserve Opus for when it's genuinely needed — via `@heavy` delegation.
186
+
187
+ In your `opencode.json`:
45
188
  ```json
46
189
  {
47
- "plugin": [
48
- "opencode-model-router"
49
- ]
190
+ "model": "anthropic/claude-sonnet-4-5",
191
+ "autoshare": false
50
192
  }
51
193
  ```
52
194
 
53
- ### Option B: Local plugin clone
195
+ Then install and configure model-router to handle the rest.
54
196
 
55
- Clone directly into your OpenCode plugins directory:
197
+ ## Installation
56
198
 
199
+ ### From npm (recommended)
57
200
  ```bash
58
- cd ~/.config/opencode/plugin
59
- git clone https://github.com/marco-jardim/opencode-model-router.git
201
+ # In your opencode project or globally
202
+ npm install -g opencode-model-router
60
203
  ```
61
204
 
62
- Then add it to your `opencode.json`:
63
-
205
+ Add to `~/.config/opencode/opencode.json`:
64
206
  ```json
65
207
  {
66
- "plugin": [
67
- "./plugin/opencode-model-router"
68
- ]
208
+ "plugin": {
209
+ "opencode-model-router": {
210
+ "type": "npm",
211
+ "package": "opencode-model-router"
212
+ }
213
+ }
69
214
  }
70
215
  ```
71
216
 
72
- ### Option C: Reference from anywhere
73
-
74
- Clone wherever you want:
75
-
217
+ ### Local clone
76
218
  ```bash
77
- git clone https://github.com/marco-jardim/opencode-model-router.git /path/to/opencode-model-router
219
+ git clone https://github.com/your-username/opencode-model-router
220
+ cd opencode-model-router
221
+ npm install
78
222
  ```
79
223
 
80
- Then reference the absolute path in `opencode.json`:
81
-
224
+ In `~/.config/opencode/opencode.json`:
82
225
  ```json
83
226
  {
84
- "plugin": [
85
- "/path/to/opencode-model-router"
86
- ]
227
+ "plugin": {
228
+ "opencode-model-router": {
229
+ "type": "local",
230
+ "path": "/absolute/path/to/opencode-model-router"
231
+ }
232
+ }
87
233
  }
88
234
  ```
89
235
 
90
- Restart OpenCode after adding the plugin.
91
-
92
236
  ## Configuration
93
237
 
94
- All configuration lives in `tiers.json` at the plugin root. Edit it to match your available models and providers.
238
+ All configuration lives in `tiers.json` at the plugin root.
95
239
 
96
240
  ### Presets
97
241
 
98
- The plugin ships with four presets:
242
+ The plugin ships with four presets (switch with `/preset <name>`):
99
243
 
100
244
  **anthropic** (default):
101
- | Tier | Model | Cost | Notes |
102
- |------|-------|------|-------|
103
- | fast | `anthropic/claude-haiku-4-5` | 1x | Cheapest, fastest |
104
- | medium | `anthropic/claude-sonnet-4-5` | 5x | Extended thinking (variant: max) |
105
- | heavy | `anthropic/claude-opus-4-6` | 20x | Extended thinking (variant: max) |
245
+ | Tier | Model | Cost ratio |
246
+ |------|-------|-----------|
247
+ | @fast | `anthropic/claude-haiku-4-5` | 1x |
248
+ | @medium | `anthropic/claude-sonnet-4-5` (max) | 5x |
249
+ | @heavy | `anthropic/claude-opus-4-6` (max) | 20x |
106
250
 
107
251
  **openai**:
108
- | Tier | Model | Cost | Notes |
109
- |------|-------|------|-------|
110
- | fast | `openai/gpt-5.3-codex-spark` | 1x | Cheapest, fastest |
111
- | medium | `openai/gpt-5.3-codex` | 5x | Default settings (no variant/reasoning override) |
112
- | heavy | `openai/gpt-5.3-codex` | 20x | Variant: `xhigh` |
252
+ | Tier | Model | Cost ratio |
253
+ |------|-------|-----------|
254
+ | @fast | `openai/gpt-5.3-codex-spark` | 1x |
255
+ | @medium | `openai/gpt-5.3-codex` | 5x |
256
+ | @heavy | `openai/gpt-5.3-codex` (xhigh) | 20x |
113
257
 
114
258
  **github-copilot**:
115
- | Tier | Model | Cost | Notes |
116
- |------|-------|------|-------|
117
- | fast | `github-copilot/claude-haiku-4-5` | 1x | Cheapest, fastest |
118
- | medium | `github-copilot/claude-sonnet-4-5` | 5x | Balanced coding model |
119
- | heavy | `github-copilot/claude-opus-4-6` | 20x | Variant: `thinking` |
259
+ | Tier | Model | Cost ratio |
260
+ |------|-------|-----------|
261
+ | @fast | `github-copilot/claude-haiku-4-5` | 1x |
262
+ | @medium | `github-copilot/claude-sonnet-4-5` | 5x |
263
+ | @heavy | `github-copilot/claude-opus-4-6` (thinking) | 20x |
120
264
 
121
265
  **google**:
122
- | Tier | Model | Cost | Notes |
123
- |------|-------|------|-------|
124
- | fast | `google/gemini-2.5-flash` | 1x | Cheapest, fastest |
125
- | medium | `google/gemini-2.5-pro` | 5x | Balanced coding model |
126
- | heavy | `google/gemini-3-pro-preview` | 20x | Strongest reasoning in default set |
127
-
128
- Switch presets with the `/preset` command:
129
-
130
- ```
131
- /preset openai
132
- ```
133
-
134
- ### Creating custom presets
135
-
136
- Add a new preset to the `presets` object in `tiers.json`:
137
-
138
- ```json
139
- {
140
- "presets": {
141
- "my-preset": {
142
- "fast": {
143
- "model": "provider/model-name",
144
- "costRatio": 1,
145
- "description": "What this tier does",
146
- "steps": 30,
147
- "prompt": "System prompt for the subagent",
148
- "whenToUse": ["Use case 1", "Use case 2"]
149
- },
150
- "medium": { "costRatio": 5, "..." : "..." },
151
- "heavy": { "costRatio": 20, "..." : "..." }
152
- }
153
- }
154
- }
155
- ```
156
-
157
- Each tier supports these fields:
158
-
159
- | Field | Type | Description |
160
- |-------|------|-------------|
161
- | `model` | string | Full model ID (`provider/model-name`) |
162
- | `variant` | string | Optional variant (e.g., `"max"` for extended thinking) |
163
- | `costRatio` | number | Relative cost multiplier (e.g., 1 for cheapest, 20 for most expensive). Injected into the system prompt so the agent considers cost when delegating. |
164
- | `thinking` | object | Anthropic thinking config: `{ "budgetTokens": 10000 }` |
165
- | `reasoning` | object | OpenAI reasoning config: `{ "effort": "high", "summary": "detailed" }` |
166
- | `description` | string | Human-readable description shown in `/tiers` |
167
- | `steps` | number | Max agent turns (default: varies by tier) |
168
- | `prompt` | string | System prompt for the subagent |
169
- | `color` | string | Optional display color |
170
- | `whenToUse` | string[] | List of use cases (shown in delegation protocol) |
266
+ | Tier | Model | Cost ratio |
267
+ |------|-------|-----------|
268
+ | @fast | `google/gemini-2.5-flash` | 1x |
269
+ | @medium | `google/gemini-2.5-pro` | 5x |
270
+ | @heavy | `google/gemini-3-pro-preview` | 20x |
171
271
 
172
272
  ### Routing modes
173
273
 
174
- The plugin supports three routing modes that control how aggressively the agent delegates to cheaper tiers. Switch modes with the `/budget` command:
274
+ Switch with `/budget <mode>`. Mode is persisted across restarts.
175
275
 
176
- | Mode | Default Tier | Behavior |
276
+ | Mode | Default tier | Behavior |
177
277
  |------|-------------|----------|
178
- | `normal` | `@medium` | Balanced quality and cost delegates based on task complexity |
179
- | `budget` | `@fast` | Aggressive cost savings — defaults to cheapest tier, escalates only when needed |
180
- | `quality` | `@medium` | Quality-first — uses stronger models more liberally for better results |
181
-
182
- When a mode has `overrideRules`, those replace the global `rules` array in the system prompt. This lets each mode have fundamentally different delegation behavior.
183
-
184
- Configure modes in `tiers.json`:
278
+ | `normal` | @medium | Balanced — routes by task complexity |
279
+ | `budget` | @fast | Aggressive savings — defaults cheap, escalates only when necessary |
280
+ | `quality` | @medium | Quality-first — liberal use of @medium/@heavy |
185
281
 
186
282
  ```json
187
283
  {
188
284
  "modes": {
189
- "normal": {
190
- "defaultTier": "medium",
191
- "description": "Balanced quality and cost"
192
- },
193
285
  "budget": {
194
286
  "defaultTier": "fast",
195
287
  "description": "Aggressive cost savings",
@@ -198,87 +290,77 @@ Configure modes in `tiers.json`:
198
290
  "Use @medium ONLY for: multi-file edits, complex refactors, test suites",
199
291
  "Use @heavy ONLY when explicitly requested or after 2+ failed @medium attempts"
200
292
  ]
201
- },
202
- "quality": {
203
- "defaultTier": "medium",
204
- "description": "Quality-first",
205
- "overrideRules": [
206
- "Default to @medium for all tasks including exploration",
207
- "Use @heavy for architecture, debugging, security, or multi-file coordination",
208
- "Use @fast only for trivial single-tool operations"
209
- ]
210
293
  }
211
294
  }
212
295
  }
213
296
  ```
214
297
 
215
- The active mode is persisted in `~/.config/opencode/opencode-model-router.state.json` and survives restarts.
216
-
217
- ### Task taxonomy
298
+ ### Task taxonomy (`taskPatterns`)
218
299
 
219
- The `taskPatterns` object maps common coding task descriptions to tiers. This is injected into the system prompt as a routing guide so the agent can quickly look up which tier to use:
300
+ Keyword routing guide injected into the system prompt. Customize to match your workflow:
220
301
 
221
302
  ```json
222
303
  {
223
304
  "taskPatterns": {
224
- "fast": [
225
- "Find, search, locate, or grep files and code patterns",
226
- "Read or display specific files or sections",
227
- "Check git status, log, diff, or blame"
228
- ],
229
- "medium": [
230
- "Implement a new feature, function, or component",
231
- "Refactor or restructure existing code",
232
- "Write or update tests",
233
- "Fix a bug (first or second attempt)"
234
- ],
235
- "heavy": [
236
- "Design system or module architecture from scratch",
237
- "Debug a problem after 2+ failed attempts",
238
- "Security audit or vulnerability review"
239
- ]
305
+ "fast": ["search/grep/read", "git-info/ls", "lookup-docs/types", "count/exists-check/rename"],
306
+ "medium": ["impl-feature/refactor", "write-tests/bugfix(≤2)", "build-fix/create-file"],
307
+ "heavy": ["arch-design/debug(≥3fail)", "sec-audit/perf-opt", "migrate-strategy/rca"]
240
308
  }
241
309
  }
242
310
  ```
243
311
 
244
- Customize these patterns to match your workflow. The agent uses them as heuristics, not hard rules.
245
-
246
312
  ### Cost ratios
247
313
 
248
- Each tier's `costRatio` is injected into the system prompt so the agent is aware of relative costs:
314
+ Set `costRatio` on each tier to reflect your real provider pricing. These are injected into the system prompt so the orchestrator makes cost-aware decisions:
249
315
 
250
- ```
251
- Cost ratios: @fast=1x, @medium=5x, @heavy=20x.
252
- Always use the cheapest tier that can reliably handle the task.
316
+ ```json
317
+ {
318
+ "fast": { "costRatio": 1 },
319
+ "medium": { "costRatio": 5 },
320
+ "heavy": { "costRatio": 20 }
321
+ }
253
322
  ```
254
323
 
255
- Adjust `costRatio` values in each tier to reflect your actual provider pricing. The ratios don't need to be exact they're directional signals for the agent.
324
+ Adjust to actual prices. Exact values don't matter — directional signals are enough.
256
325
 
257
326
  ### Rules
258
327
 
259
- The `rules` array in `tiers.json` controls when delegation happens. These are injected into the system prompt verbatim:
328
+ The `rules` array is injected verbatim (in compact form) into the system prompt. Default ruleset:
260
329
 
261
330
  ```json
262
331
  {
263
332
  "rules": [
264
- "When a plan step contains [tier:fast], [tier:medium], or [tier:heavy], delegate to that agent",
265
- "Default to @medium for implementation tasks you could delegate",
266
- "Use @fast for any read-only exploration or research task",
267
- "Keep orchestration (planning, decisions, verification) for yourself -- delegate execution",
268
- "For trivial tasks (single grep, single file read), execute directly without delegation",
269
- "Never delegate to @heavy if you are already running on an opus-class model -- do it yourself",
270
- "If a task takes 1-2 tool calls, execute directly -- delegation overhead is not worth the cost",
271
- "Consult the task routing guide below to match task type to the correct tier",
272
- "Consider cost ratios when choosing tiers -- always use the cheapest tier that can reliably handle the task"
333
+ "[tier:X]delegate X",
334
+ "plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
335
+ "default:impl→@medium | readonly→@fast",
336
+ "orchestrate=self,delegate=exec",
337
+ "trivial(≤2tools)→direct,skip-delegate",
338
+ "self∈opus→never→@heavy,do-it-yourself",
339
+ "consult route-guide↑",
340
+ "min(cost,adequate-tier)"
273
341
  ]
274
342
  }
275
343
  ```
276
344
 
277
- When a routing mode has `overrideRules`, those replace this array entirely for that mode.
345
+ Rules in `modes[x].overrideRules` replace this array entirely for that mode.
346
+
347
+ ### Tier fields reference
348
+
349
+ | Field | Type | Description |
350
+ |-------|------|-------------|
351
+ | `model` | string | Full model ID (`provider/model-name`) |
352
+ | `variant` | string | Optional variant (`"max"`, `"xhigh"`, `"thinking"`) |
353
+ | `costRatio` | number | Relative cost (1 = cheapest). Shown in prompt. |
354
+ | `thinking` | object | Anthropic thinking: `{ "budgetTokens": 10000 }` |
355
+ | `reasoning` | object | OpenAI reasoning: `{ "effort": "high", "summary": "detailed" }` |
356
+ | `description` | string | Shown in `/tiers` output |
357
+ | `steps` | number | Max agent turns |
358
+ | `prompt` | string | Subagent system prompt |
359
+ | `whenToUse` | string[] | Use cases (shown in `/tiers`, not in system prompt) |
278
360
 
279
361
  ### Fallback
280
362
 
281
- The `fallback` section defines which presets to try when a provider fails:
363
+ Defines provider fallback order when a delegated task fails:
282
364
 
283
365
  ```json
284
366
  {
@@ -291,81 +373,55 @@ The `fallback` section defines which presets to try when a provider fails:
291
373
  }
292
374
  ```
293
375
 
294
- When a delegated task fails with a provider/model/rate-limit error, the agent is instructed to retry with the next preset in the fallback chain.
295
-
296
376
  ## Commands
297
377
 
298
378
  | Command | Description |
299
379
  |---------|-------------|
300
- | `/tiers` | Show active tier configuration and delegation rules |
380
+ | `/tiers` | Show active tier configuration, models, and rules |
301
381
  | `/preset` | List available presets |
302
- | `/preset <name>` | Switch to a different preset |
303
- | `/budget` | Show available routing modes and which is active |
304
- | `/budget <mode>` | Switch routing mode (`normal`, `budget`, or `quality`) |
382
+ | `/preset <name>` | Switch preset (e.g., `/preset openai`) |
383
+ | `/budget` | Show available modes and which is active |
384
+ | `/budget <mode>` | Switch routing mode (`normal`, `budget`, `quality`) |
305
385
  | `/annotate-plan [path]` | Annotate a plan file with `[tier:X]` tags for each step |
306
386
 
307
387
  ## Plan annotation
308
388
 
309
- The `/annotate-plan` command reads a plan file (defaults to `PLAN.md`) and adds tier tags to each step based on complexity:
389
+ For complex tasks, you can write a plan file and annotate each step with the correct tier. The `/annotate-plan` command reads the plan and adds `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags to each step based on the task taxonomy.
310
390
 
311
- **Before:**
312
- ```markdown
313
- ## Steps
314
- 1. Search the codebase for all authentication handlers
315
- 2. Implement the new OAuth2 flow
316
- 3. Review the auth architecture for security vulnerabilities
317
- ```
391
+ The orchestrator then reads these tags and delegates accordingly — removing ambiguity from routing decisions on long, multi-step tasks.
318
392
 
319
- **After:**
393
+ Example plan (before annotation):
320
394
  ```markdown
321
- ## Steps
322
- 1. [tier:fast] Search the codebase for all authentication handlers
323
- 2. [tier:medium] Implement the new OAuth2 flow
324
- 3. [tier:heavy] Review the auth architecture for security vulnerabilities
325
- ```
326
-
327
- When the agent executes an annotated plan, it delegates each step to the appropriate subagent automatically.
328
-
329
- ## Provider setup
330
-
331
- The models referenced in your preset must be configured in your `opencode.json` providers. For example, to use the default `anthropic` preset:
332
-
333
- ```json
334
- {
335
- "provider": {
336
- "anthropic": {
337
- "models": {
338
- "claude-haiku-4-5": {},
339
- "claude-sonnet-4-5": {},
340
- "claude-opus-4-6": { "thinking": true }
341
- }
342
- }
343
- }
344
- }
395
+ 1. Find all API endpoints in the codebase
396
+ 2. Add rate limiting middleware to each endpoint
397
+ 3. Write integration tests for rate limiting
398
+ 4. Design a token bucket algorithm for advanced rate limiting
345
399
  ```
346
400
 
347
- ## How delegation looks in practice
348
-
401
+ After `/annotate-plan`:
402
+ ```markdown
403
+ 1. [tier:fast] Find all API endpoints in the codebase
404
+ 2. [tier:medium] Add rate limiting middleware to each endpoint
405
+ 3. [tier:medium] Write integration tests for rate limiting
406
+ 4. [tier:heavy] Design a token bucket algorithm for advanced rate limiting
349
407
  ```
350
- You: Implement the user settings page based on PLAN.md
351
408
 
352
- Agent (Opus): I'll execute the plan. Step 1 is tagged [tier:fast] -- delegating exploration.
353
- -> Task(@fast): "Find all existing settings-related components and their patterns"
354
- <- @fast returns: Found SettingsLayout at src/components/..., uses React Hook Form...
409
+ ## Token overhead
355
410
 
356
- Agent (Opus): Step 2 is [tier:medium] -- delegating implementation.
357
- -> Task(@medium): "Implement UserSettingsForm component following the patterns from..."
358
- <- @medium returns: Created src/components/UserSettingsForm.tsx, added tests...
411
+ The system prompt injection is ~210 tokens per message — roughly the same as v1.0 (before cost-aware features were added). Dense notation keeps overhead flat while adding full routing intelligence.
359
412
 
360
- Agent (Opus): Step 3 is [tier:heavy] -- this is an architecture review and I'm already on Opus, so I'll handle it myself.
361
- [Reviews the implementation directly]
362
- ```
413
+ | Version | Tokens | Features |
414
+ |---------|--------|----------|
415
+ | v1.0.7 | ~208 | Basic tier routing |
416
+ | v1.1.0 | ~870 | All features, verbose format |
417
+ | v1.1.1+ | ~210 | All features, compressed format |
363
418
 
364
419
  ## Requirements
365
420
 
366
- - OpenCode v1.0+ with plugin support
367
- - Models configured in your `opencode.json` providers matching your preset
421
+ - [OpenCode](https://opencode.ai) v1.0 or later
422
+ - Node.js 18+
423
+ - Provider API keys configured in OpenCode
368
424
 
369
425
  ## License
370
426
 
371
- [GPL-3.0](LICENSE)
427
+ GPL-3.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencode-model-router",
3
- "version": "1.1.1",
3
+ "version": "1.1.3",
4
4
  "description": "OpenCode plugin that routes tasks to tiered subagents (fast/medium/heavy) based on complexity",
5
5
  "type": "module",
6
6
  "main": "./src/index.ts",
package/src/index.ts CHANGED
@@ -328,11 +328,39 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
328
328
 
329
329
  function buildTaskTaxonomy(cfg: RouterConfig): string {
330
330
  if (!cfg.taskPatterns || Object.keys(cfg.taskPatterns).length === 0) return "";
331
+ const lines = ["R:"];
332
+ for (const [tier, patterns] of Object.entries(cfg.taskPatterns)) {
333
+ if (Array.isArray(patterns) && patterns.length > 0) {
334
+ lines.push(`@${tier}→${patterns.join("/")}`);
335
+ }
336
+ }
337
+ return lines.join(" ");
338
+ }
331
339
 
332
- return Object.entries(cfg.taskPatterns)
333
- .filter(([_, p]) => Array.isArray(p) && p.length > 0)
334
- .map(([tier, patterns]) => `@${tier}→${(patterns as string[]).join("/")}`)
335
- .join("\n");
340
+ /**
341
+ * Injects a multi-phase decomposition hint into the delegation protocol.
342
+ * Teaches the orchestrator to split composite tasks (explore + implement)
343
+ * so the cheap @fast tier handles exploration and @medium handles execution.
344
+ * Only active in normal mode — budget/quality modes have their own override rules.
345
+ */
346
+ function buildDecomposeHint(cfg: RouterConfig): string {
347
+ const mode = getActiveMode(cfg);
348
+ // Budget and quality modes handle this via overrideRules — skip to avoid conflicts
349
+ if (mode?.overrideRules?.length) return "";
350
+
351
+ const tiers = getActiveTiers(cfg);
352
+ const entries = Object.entries(tiers);
353
+ if (entries.length < 2) return "";
354
+
355
+ // Sort by costRatio ascending to find cheapest (explore) and next (execute) tiers
356
+ const sorted = [...entries].sort(
357
+ ([, a], [, b]) => (a.costRatio ?? 1) - (b.costRatio ?? 1)
358
+ );
359
+ const cheapest = sorted[0]?.[0];
360
+ const mid = sorted[1]?.[0];
361
+ if (!cheapest || !mid) return "";
362
+
363
+ return `Multi-phase: split explore(@${cheapest})→execute(@${mid}). Cheapest-first.`;
336
364
  }
337
365
 
338
366
  // ---------------------------------------------------------------------------
@@ -352,14 +380,12 @@ function buildDelegationProtocol(cfg: RouterConfig): string {
352
380
  })
353
381
  .join(" ");
354
382
 
355
- // Compact mode
356
383
  const mode = getActiveMode(cfg);
357
384
  const modeSuffix = cfg.activeMode ? ` mode:${cfg.activeMode}` : "";
358
385
 
359
- // Compact task routing guide
360
386
  const taxonomy = buildTaskTaxonomy(cfg);
387
+ const decompose = buildDecomposeHint(cfg);
361
388
 
362
- // Compact rules
363
389
  const effectiveRules = mode?.overrideRules?.length ? mode.overrideRules : cfg.rules;
364
390
  const rulesLine = effectiveRules.map((r, i) => `${i + 1}.${r}`).join(" ");
365
391
 
@@ -369,6 +395,7 @@ function buildDelegationProtocol(cfg: RouterConfig): string {
369
395
  `## Model Delegation Protocol`,
370
396
  `Preset: ${cfg.activePreset}. Tiers: ${tierLine}.${modeSuffix}`,
371
397
  ...(taxonomy ? [taxonomy] : []),
398
+ ...(decompose ? [decompose] : []),
372
399
  rulesLine,
373
400
  ...(fallback ? [fallback] : []),
374
401
  `Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").`,