opencode-model-router 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +276 -158
  2. package/package.json +1 -1
  3. package/src/index.ts +50 -60
  4. package/tiers.json +21 -55
package/README.md CHANGED
@@ -1,260 +1,378 @@
1
1
  # opencode-model-router
2
2
 
3
- An [OpenCode](https://opencode.ai) plugin that automatically routes tasks to tiered subagents based on complexity. Instead of running everything on your most expensive model, the orchestrator delegates exploration to a fast model, implementation to a balanced model, and architecture/security to the most capable model.
3
+ > **Use the cheapest model that can do the job. Automatically.**
4
4
 
5
- ## How it works
5
+ An [OpenCode](https://opencode.ai) plugin that routes every task to the right-priced AI tier. Instead of running everything on your most expensive model, the orchestrator delegates exploration to a fast/cheap model, implementation to a balanced model, and architecture only to the powerful (expensive) one — automatically, on every message.
6
6
 
7
- The plugin injects a **delegation protocol** into the system prompt that teaches the primary agent to route work:
7
+ ## The problem
8
8
 
9
- | Tier | Default (Anthropic) | Purpose |
10
- |------|---------------------|---------|
11
- | `@fast` | Claude Haiku 4.5 | Exploration, search, file reads, grep |
12
- | `@medium` | Claude Sonnet 4.5 | Implementation, refactoring, tests, bug fixes |
13
- | `@heavy` | Claude Opus 4.6 | Architecture, complex debugging, security review |
9
+ Vibe coding is expensive because most AI coding tools default to one model for everything. That model is usually the most capable available — and you pay for that capability even when the task is `grep for a function name`.
14
10
 
15
- The agent automatically delegates via the Task tool when it recognizes the task complexity, or when plan steps are annotated with `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags.
11
+ A typical coding session breaks down roughly like this:
16
12
 
17
- This applies both to plan-driven execution and direct ad-hoc requests. For every new user message, the orchestrator performs an intent gate, splits multi-task requests into atomic units, and routes each unit to `@fast`, `@medium`, or `@heavy`.
13
+ | Task type | % of session | Example |
14
+ |-----------|-------------|---------|
15
+ | Exploration / search | ~40% | Find where X is defined, read a file, check git log |
16
+ | Implementation | ~45% | Write a function, fix a bug, add a test |
17
+ | Architecture / deep debug | ~15% | Design a new module, debug after 2+ failures |
18
18
 
19
- ### Token overhead disclaimer
19
+ If you're running Opus (20x cost) for all of it, you're overpaying by **3-10x** on most tasks.
20
20
 
21
- The injected protocol is compact, but it still adds tokens on every iteration.
21
+ ## The solution
22
22
 
23
- - Estimated average injection: ~208 tokens per iteration
24
- - Preset breakdown (default `tiers.json`): `anthropic` ~209, `openai` ~206
25
- - Estimation method: `prompt_characters / 4` (rough heuristic)
23
+ opencode-model-router injects a **delegation protocol** into the system prompt that teaches the orchestrator to:
26
24
 
27
- Real token usage varies by tokenizer/model and any custom changes you make to `tiers.json`.
25
+ 1. **Match task to tier** using a configurable task taxonomy
26
+ 2. **Split composite tasks** — explore first with a cheap model, then implement with a mid-tier model
27
+ 3. **Skip delegation overhead** for trivial tasks (1-2 tool calls)
28
+ 4. **Never over-qualify** — use the cheapest tier that can reliably handle the task
29
+ 5. **Fallback** across providers when one fails
28
30
 
29
- ## Installation
31
+ All of this adds ~210 tokens of system prompt overhead per message.
30
32
 
31
- ### Option A: npm package (recommended)
33
+ ## Cost simulation
32
34
 
33
- Add the plugin package in your `opencode.json`:
35
+ **Scenario: 50-message coding session with 30 delegated tasks**
36
+
37
+ Task distribution: 18 exploration (60%), 10 implementation (33%), 2 architecture (7%)
38
+
39
+ ### Without model router (all-Opus)
40
+
41
+ | Task | Count | Tier | Cost ratio | Total |
42
+ |------|-------|------|-----------|-------|
43
+ | Exploration | 18 | Opus | 20x | 360x |
44
+ | Implementation | 10 | Opus | 20x | 200x |
45
+ | Architecture | 2 | Opus | 20x | 40x |
46
+ | **Total** | **30** | | | **600x** |
47
+
48
+ ### With model router (normal mode, Sonnet orchestrator)
49
+
50
+ | Task | Count | Tier | Cost ratio | Total |
51
+ |------|-------|------|-----------|-------|
52
+ | Exploration (delegated) | 10 | @fast | 1x | 10x |
53
+ | Exploration (direct, trivial) | 8 | self | 0x | 0x |
54
+ | Implementation | 10 | @medium | 5x | 50x |
55
+ | Architecture | 2 | @heavy | 20x | 40x |
56
+ | **Total** | **30** | | | **100x** |
57
+
58
+ ### With model router (budget mode, Sonnet orchestrator)
59
+
60
+ | Task | Count | Tier | Cost ratio | Total |
61
+ |------|-------|------|-----------|-------|
62
+ | Exploration | 18 | @fast | 1x | 18x |
63
+ | Implementation (simple) | 7 | @fast | 1x | 7x |
64
+ | Implementation (complex) | 3 | @medium | 5x | 15x |
65
+ | Architecture | 2 | @medium | 5x | 10x |
66
+ | **Total** | **30** | | | **50x** |
67
+
68
+ ### Summary
69
+
70
+ | Setup | Session cost | vs all-Opus |
71
+ |-------|-------------|-------------|
72
+ | All-Opus (no router) | 600x | baseline |
73
+ | Sonnet orchestrator + router (normal) | 100x | **−83%** |
74
+ | Sonnet orchestrator + router (budget) | 50x | **−92%** |
75
+
76
+ > Cost ratios are relative units. Actual savings depend on your provider pricing and model selection.
77
+
78
+ ## How it works
79
+
80
+ On every message, the plugin injects ~210 tokens into the system prompt:
34
81
 
35
- ```json
36
- {
37
- "plugin": [
38
- "opencode-model-router@latest"
39
- ]
40
- }
41
82
  ```
83
+ ## Model Delegation Protocol
84
+ Preset: anthropic. Tiers: @fast=claude-haiku-4-5(1x) @medium=claude-sonnet-4-5/max(5x) @heavy=claude-opus-4-6/max(20x). mode:normal
85
+ R: @fast→search/grep/read/git-info/ls/lookup-docs/types/count/exists-check/rename @medium→impl-feature/refactor/write-tests/bugfix(≤2)/edit-logic/code-review/build-fix/create-file/db-migrate/api-endpoint/config-update @heavy→arch-design/debug(≥3fail)/sec-audit/perf-opt/migrate-strategy/multi-system-integration/tradeoff-analysis/rca
86
+ Multi-phase: split explore(@fast)→execute(@medium). Cheapest-first.
87
+ 1.[tier:X]→delegate X 2.plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy 3.default:impl→@medium | readonly→@fast 4.orchestrate=self,delegate=exec 5.trivial(≤2tools)→direct,skip-delegate 6.self∈opus→never→@heavy,do-it-yourself 7.consult route-guide↑ 8.min(cost,adequate-tier)
88
+ Err→retry-alt-tier→fail→direct. Chain: anthropic→openai→google→github-copilot
89
+ Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").
90
+ Keep orchestration and final synthesis in the primary agent.
91
+ ```
92
+
93
+ The orchestrator reads this once per message and applies it to every decision in that turn.
94
+
95
+ ### Multi-phase decomposition (key differentiator)
96
+
97
+ The most impactful optimization. A composite task like:
98
+
99
+ > "Find how the auth middleware works and refactor it to use JWT."
100
+
101
+ Without router → routed entirely to `@medium` (5x for all ~8K tokens)
42
102
 
43
- If you prefer always getting the latest release, use:
103
+ With router split:
104
+ - **@fast (1x)**: grep, read 4-5 files, trace call chain (~4K tokens)
105
+ - **@medium (5x)**: rewrite auth module (~4K tokens)
44
106
 
107
+ **Result: ~36% cost reduction on composite tasks**, which represent ~60-70% of real coding work.
108
+
109
+ ## Why not just use another orchestrator?
110
+
111
+ | Feature | model-router | Claude native | oh-my-opencode | GSD | ralph-loop |
112
+ |---------|:---:|:---:|:---:|:---:|:---:|
113
+ | Multi-tier cost routing | ✅ | ❌ | ❌ | ❌ | ❌ |
114
+ | Configurable task taxonomy | ✅ | ❌ | ❌ | ❌ | ❌ |
115
+ | Budget / quality modes | ✅ | ❌ | ❌ | ❌ | ❌ |
116
+ | Multi-phase decomposition | ✅ | ❌ | ❌ | ❌ | ❌ |
117
+ | Cross-provider fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
118
+ | Cost ratio awareness | ✅ | ❌ | ❌ | ❌ | ❌ |
119
+ | Plan annotation with tiers | ✅ | ❌ | ❌ | ❌ | ❌ |
120
+ | ~210 token overhead | ✅ | — | ❌ | ❌ | ❌ |
121
+
122
+ **Claude native**: single model for everything, no cost routing. If you're using claude.ai or OpenCode without plugins, you're paying the same price for `grep` as for architecture design.
123
+
124
+ **oh-my-opencode**: focused on workflow personality and prompt style, not cost optimization. No tier routing, no task taxonomy.
125
+
126
+ **GSD (Get Shit Done)**: prioritizes execution speed and low deliberation overhead. Excellent at pushing through tasks fast, but uses one model — no cost differentiation between search and architecture.
127
+
128
+ **ralph-loop**: iterative feedback-loop orchestrator. Excellent at self-correction and quality verification. No tier routing — every loop iteration runs on the same model regardless of task complexity.
129
+
130
+ **The core difference**: the others optimize for *how* the agent works (style, speed, quality loops). model-router optimizes for *what it costs* — with zero compromise on quality, because you can always put Opus in the heavy tier.
131
+
132
+ ## Recommended setup
133
+
134
+ **Orchestrator**: use `claude-sonnet-4-5` (or equivalent mid-tier) as your primary/default model. Not Opus.
135
+
136
+ Why: the orchestrator runs on every message, including trivial ones. Sonnet can read the delegation protocol and make routing decisions just as well as Opus. You reserve Opus for when it's genuinely needed — via `@heavy` delegation.
137
+
138
+ In your `opencode.json`:
45
139
  ```json
46
140
  {
47
- "plugin": [
48
- "opencode-model-router"
49
- ]
141
+ "model": "anthropic/claude-sonnet-4-5",
142
+ "autoshare": false
50
143
  }
51
144
  ```
52
145
 
53
- ### Option B: Local plugin clone
146
+ Then install and configure model-router to handle the rest.
54
147
 
55
- Clone directly into your OpenCode plugins directory:
148
+ ## Installation
56
149
 
150
+ ### From npm (recommended)
57
151
  ```bash
58
- cd ~/.config/opencode/plugin
59
- git clone https://github.com/marco-jardim/opencode-model-router.git
152
+ # In your opencode project or globally
153
+ npm install -g opencode-model-router
60
154
  ```
61
155
 
62
- Then add it to your `opencode.json`:
63
-
156
+ Add to `~/.config/opencode/opencode.json`:
64
157
  ```json
65
158
  {
66
- "plugin": [
67
- "./plugin/opencode-model-router"
68
- ]
159
+ "plugin": {
160
+ "opencode-model-router": {
161
+ "type": "npm",
162
+ "package": "opencode-model-router"
163
+ }
164
+ }
69
165
  }
70
166
  ```
71
167
 
72
- ### Option C: Reference from anywhere
73
-
74
- Clone wherever you want:
75
-
168
+ ### Local clone
76
169
  ```bash
77
- git clone https://github.com/marco-jardim/opencode-model-router.git /path/to/opencode-model-router
170
+ git clone https://github.com/your-username/opencode-model-router
171
+ cd opencode-model-router
172
+ npm install
78
173
  ```
79
174
 
80
- Then reference the absolute path in `opencode.json`:
81
-
175
+ In `~/.config/opencode/opencode.json`:
82
176
  ```json
83
177
  {
84
- "plugin": [
85
- "/path/to/opencode-model-router"
86
- ]
178
+ "plugin": {
179
+ "opencode-model-router": {
180
+ "type": "local",
181
+ "path": "/absolute/path/to/opencode-model-router"
182
+ }
183
+ }
87
184
  }
88
185
  ```
89
186
 
90
- Restart OpenCode after adding the plugin.
91
-
92
187
  ## Configuration
93
188
 
94
- All configuration lives in `tiers.json` at the plugin root. Edit it to match your available models and providers.
189
+ All configuration lives in `tiers.json` at the plugin root.
95
190
 
96
191
  ### Presets
97
192
 
98
- The plugin ships with four presets:
193
+ The plugin ships with four presets (switch with `/preset <name>`):
99
194
 
100
195
  **anthropic** (default):
101
- | Tier | Model | Notes |
102
- |------|-------|-------|
103
- | fast | `anthropic/claude-haiku-4-5` | Cheapest, fastest |
104
- | medium | `anthropic/claude-sonnet-4-5` | Extended thinking (variant: max) |
105
- | heavy | `anthropic/claude-opus-4-6` | Extended thinking (variant: max) |
196
+ | Tier | Model | Cost ratio |
197
+ |------|-------|-----------|
198
+ | @fast | `anthropic/claude-haiku-4-5` | 1x |
199
+ | @medium | `anthropic/claude-sonnet-4-5` (max) | 5x |
200
+ | @heavy | `anthropic/claude-opus-4-6` (max) | 20x |
106
201
 
107
202
  **openai**:
108
- | Tier | Model | Notes |
109
- |------|-------|-------|
110
- | fast | `openai/gpt-5.3-codex-spark` | Cheapest, fastest |
111
- | medium | `openai/gpt-5.3-codex` | Default settings (no variant/reasoning override) |
112
- | heavy | `openai/gpt-5.3-codex` | Variant: `xhigh` |
203
+ | Tier | Model | Cost ratio |
204
+ |------|-------|-----------|
205
+ | @fast | `openai/gpt-5.3-codex-spark` | 1x |
206
+ | @medium | `openai/gpt-5.3-codex` | 5x |
207
+ | @heavy | `openai/gpt-5.3-codex` (xhigh) | 20x |
113
208
 
114
209
  **github-copilot**:
115
- | Tier | Model | Notes |
116
- |------|-------|-------|
117
- | fast | `github-copilot/claude-haiku-4-5` | Cheapest, fastest |
118
- | medium | `github-copilot/claude-sonnet-4-5` | Balanced coding model |
119
- | heavy | `github-copilot/claude-opus-4-6` | Variant: `thinking` |
210
+ | Tier | Model | Cost ratio |
211
+ |------|-------|-----------|
212
+ | @fast | `github-copilot/claude-haiku-4-5` | 1x |
213
+ | @medium | `github-copilot/claude-sonnet-4-5` | 5x |
214
+ | @heavy | `github-copilot/claude-opus-4-6` (thinking) | 20x |
120
215
 
121
216
  **google**:
122
- | Tier | Model | Notes |
123
- |------|-------|-------|
124
- | fast | `google/gemini-2.5-flash` | Cheapest, fastest |
125
- | medium | `google/gemini-2.5-pro` | Balanced coding model |
126
- | heavy | `google/gemini-3-pro-preview` | Strongest reasoning in default set |
217
+ | Tier | Model | Cost ratio |
218
+ |------|-------|-----------|
219
+ | @fast | `google/gemini-2.5-flash` | 1x |
220
+ | @medium | `google/gemini-2.5-pro` | 5x |
221
+ | @heavy | `google/gemini-3-pro-preview` | 20x |
127
222
 
128
- Switch presets with the `/preset` command:
223
+ ### Routing modes
129
224
 
130
- ```
131
- /preset openai
225
+ Switch with `/budget <mode>`. Mode is persisted across restarts.
226
+
227
+ | Mode | Default tier | Behavior |
228
+ |------|-------------|----------|
229
+ | `normal` | @medium | Balanced — routes by task complexity |
230
+ | `budget` | @fast | Aggressive savings — defaults cheap, escalates only when necessary |
231
+ | `quality` | @medium | Quality-first — liberal use of @medium/@heavy |
232
+
233
+ ```json
234
+ {
235
+ "modes": {
236
+ "budget": {
237
+ "defaultTier": "fast",
238
+ "description": "Aggressive cost savings",
239
+ "overrideRules": [
240
+ "Default ALL tasks to @fast unless they clearly require code edits",
241
+ "Use @medium ONLY for: multi-file edits, complex refactors, test suites",
242
+ "Use @heavy ONLY when explicitly requested or after 2+ failed @medium attempts"
243
+ ]
244
+ }
245
+ }
246
+ }
132
247
  ```
133
248
 
134
- ### Creating custom presets
249
+ ### Task taxonomy (`taskPatterns`)
135
250
 
136
- Add a new preset to the `presets` object in `tiers.json`:
251
+ Keyword routing guide injected into the system prompt. Customize to match your workflow:
137
252
 
138
253
  ```json
139
254
  {
140
- "presets": {
141
- "my-preset": {
142
- "fast": {
143
- "model": "provider/model-name",
144
- "description": "What this tier does",
145
- "steps": 30,
146
- "prompt": "System prompt for the subagent",
147
- "whenToUse": ["Use case 1", "Use case 2"]
148
- },
149
- "medium": { ... },
150
- "heavy": { ... }
151
- }
255
+ "taskPatterns": {
256
+ "fast": ["search/grep/read", "git-info/ls", "lookup-docs/types", "count/exists-check/rename"],
257
+ "medium": ["impl-feature/refactor", "write-tests/bugfix(≤2)", "build-fix/create-file"],
258
+ "heavy": ["arch-design/debug(≥3fail)", "sec-audit/perf-opt", "migrate-strategy/rca"]
152
259
  }
153
260
  }
154
261
  ```
155
262
 
156
- Each tier supports these fields:
263
+ ### Cost ratios
157
264
 
158
- | Field | Type | Description |
159
- |-------|------|-------------|
160
- | `model` | string | Full model ID (`provider/model-name`) |
161
- | `variant` | string | Optional variant (e.g., `"max"` for extended thinking) |
162
- | `thinking` | object | Anthropic thinking config: `{ "budgetTokens": 10000 }` |
163
- | `reasoning` | object | OpenAI reasoning config: `{ "effort": "high", "summary": "detailed" }` |
164
- | `description` | string | Human-readable description shown in `/tiers` |
165
- | `steps` | number | Max agent turns (default: varies by tier) |
166
- | `prompt` | string | System prompt for the subagent |
167
- | `color` | string | Optional display color |
168
- | `whenToUse` | string[] | List of use cases (shown in delegation protocol) |
265
+ Set `costRatio` on each tier to reflect your real provider pricing. These are injected into the system prompt so the orchestrator makes cost-aware decisions:
266
+
267
+ ```json
268
+ {
269
+ "fast": { "costRatio": 1 },
270
+ "medium": { "costRatio": 5 },
271
+ "heavy": { "costRatio": 20 }
272
+ }
273
+ ```
274
+
275
+ Adjust to actual prices. Exact values don't matter directional signals are enough.
169
276
 
170
277
  ### Rules
171
278
 
172
- The `rules` array in `tiers.json` controls when delegation happens. These are injected into the system prompt verbatim:
279
+ The `rules` array is injected verbatim (in compact form) into the system prompt. Default ruleset:
173
280
 
174
281
  ```json
175
282
  {
176
283
  "rules": [
177
- "When a plan step contains [tier:fast], [tier:medium], or [tier:heavy], delegate to that agent",
178
- "Default to @medium for implementation tasks you could delegate",
179
- "Use @fast for any read-only exploration or research task",
180
- "Keep orchestration (planning, decisions, verification) for yourself -- delegate execution",
181
- "For trivial tasks (single grep, single file read), execute directly without delegation",
182
- "Never delegate to @heavy if you are already running on an opus-class model -- do it yourself"
284
+ "[tier:X]delegate X",
285
+ "plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
286
+ "default:impl→@medium | readonly→@fast",
287
+ "orchestrate=self,delegate=exec",
288
+ "trivial(≤2tools)→direct,skip-delegate",
289
+ "self∈opus→never→@heavy,do-it-yourself",
290
+ "consult route-guide↑",
291
+ "min(cost,adequate-tier)"
183
292
  ]
184
293
  }
185
294
  ```
186
295
 
296
+ Rules in `modes[x].overrideRules` replace this array entirely for that mode.
297
+
298
+ ### Tier fields reference
299
+
300
+ | Field | Type | Description |
301
+ |-------|------|-------------|
302
+ | `model` | string | Full model ID (`provider/model-name`) |
303
+ | `variant` | string | Optional variant (`"max"`, `"xhigh"`, `"thinking"`) |
304
+ | `costRatio` | number | Relative cost (1 = cheapest). Shown in prompt. |
305
+ | `thinking` | object | Anthropic thinking: `{ "budgetTokens": 10000 }` |
306
+ | `reasoning` | object | OpenAI reasoning: `{ "effort": "high", "summary": "detailed" }` |
307
+ | `description` | string | Shown in `/tiers` output |
308
+ | `steps` | number | Max agent turns |
309
+ | `prompt` | string | Subagent system prompt |
310
+ | `whenToUse` | string[] | Use cases (shown in `/tiers`, not in system prompt) |
311
+
312
+ ### Fallback
313
+
314
+ Defines provider fallback order when a delegated task fails:
315
+
316
+ ```json
317
+ {
318
+ "fallback": {
319
+ "global": {
320
+ "anthropic": ["openai", "google", "github-copilot"],
321
+ "openai": ["anthropic", "google", "github-copilot"]
322
+ }
323
+ }
324
+ }
325
+ ```
326
+
187
327
  ## Commands
188
328
 
189
329
  | Command | Description |
190
330
  |---------|-------------|
191
- | `/tiers` | Show active tier configuration and delegation rules |
331
+ | `/tiers` | Show active tier configuration, models, and rules |
192
332
  | `/preset` | List available presets |
193
- | `/preset <name>` | Switch to a different preset |
333
+ | `/preset <name>` | Switch preset (e.g., `/preset openai`) |
334
+ | `/budget` | Show available modes and which is active |
335
+ | `/budget <mode>` | Switch routing mode (`normal`, `budget`, `quality`) |
194
336
  | `/annotate-plan [path]` | Annotate a plan file with `[tier:X]` tags for each step |
195
337
 
196
338
  ## Plan annotation
197
339
 
198
- The `/annotate-plan` command reads a plan file (defaults to `PLAN.md`) and adds tier tags to each step based on complexity:
340
+ For complex tasks, you can write a plan file and annotate each step with the correct tier. The `/annotate-plan` command reads the plan and adds `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags to each step based on the task taxonomy.
199
341
 
200
- **Before:**
201
- ```markdown
202
- ## Steps
203
- 1. Search the codebase for all authentication handlers
204
- 2. Implement the new OAuth2 flow
205
- 3. Review the auth architecture for security vulnerabilities
206
- ```
342
+ The orchestrator then reads these tags and delegates accordingly — removing ambiguity from routing decisions on long, multi-step tasks.
207
343
 
208
- **After:**
344
+ Example plan (before annotation):
209
345
  ```markdown
210
- ## Steps
211
- 1. [tier:fast] Search the codebase for all authentication handlers
212
- 2. [tier:medium] Implement the new OAuth2 flow
213
- 3. [tier:heavy] Review the auth architecture for security vulnerabilities
214
- ```
215
-
216
- When the agent executes an annotated plan, it delegates each step to the appropriate subagent automatically.
217
-
218
- ## Provider setup
219
-
220
- The models referenced in your preset must be configured in your `opencode.json` providers. For example, to use the default `anthropic` preset:
221
-
222
- ```json
223
- {
224
- "provider": {
225
- "anthropic": {
226
- "models": {
227
- "claude-haiku-4-5": {},
228
- "claude-sonnet-4-5": {},
229
- "claude-opus-4-6": { "thinking": true }
230
- }
231
- }
232
- }
233
- }
346
+ 1. Find all API endpoints in the codebase
347
+ 2. Add rate limiting middleware to each endpoint
348
+ 3. Write integration tests for rate limiting
349
+ 4. Design a token bucket algorithm for advanced rate limiting
234
350
  ```
235
351
 
236
- ## How delegation looks in practice
237
-
352
+ After `/annotate-plan`:
353
+ ```markdown
354
+ 1. [tier:fast] Find all API endpoints in the codebase
355
+ 2. [tier:medium] Add rate limiting middleware to each endpoint
356
+ 3. [tier:medium] Write integration tests for rate limiting
357
+ 4. [tier:heavy] Design a token bucket algorithm for advanced rate limiting
238
358
  ```
239
- You: Implement the user settings page based on PLAN.md
240
359
 
241
- Agent (Opus): I'll execute the plan. Step 1 is tagged [tier:fast] -- delegating exploration.
242
- -> Task(@fast): "Find all existing settings-related components and their patterns"
243
- <- @fast returns: Found SettingsLayout at src/components/..., uses React Hook Form...
360
+ ## Token overhead
244
361
 
245
- Agent (Opus): Step 2 is [tier:medium] -- delegating implementation.
246
- -> Task(@medium): "Implement UserSettingsForm component following the patterns from..."
247
- <- @medium returns: Created src/components/UserSettingsForm.tsx, added tests...
362
+ The system prompt injection is ~210 tokens per message — roughly the same as v1.0 (before cost-aware features were added). Dense notation keeps overhead flat while adding full routing intelligence.
248
363
 
249
- Agent (Opus): Step 3 is [tier:heavy] -- this is an architecture review and I'm already on Opus, so I'll handle it myself.
250
- [Reviews the implementation directly]
251
- ```
364
+ | Version | Tokens | Features |
365
+ |---------|--------|----------|
366
+ | v1.0.7 | ~208 | Basic tier routing |
367
+ | v1.1.0 | ~870 | All features, verbose format |
368
+ | v1.1.1+ | ~210 | All features, compressed format |
252
369
 
253
370
  ## Requirements
254
371
 
255
- - OpenCode v1.0+ with plugin support
256
- - Models configured in your `opencode.json` providers matching your preset
372
+ - [OpenCode](https://opencode.ai) v1.0 or later
373
+ - Node.js 18+
374
+ - Provider API keys configured in OpenCode
257
375
 
258
376
  ## License
259
377
 
260
- [GPL-3.0](LICENSE)
378
+ GPL-3.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencode-model-router",
3
- "version": "1.1.0",
3
+ "version": "1.1.2",
4
4
  "description": "OpenCode plugin that routes tasks to tiered subagents (fast/medium/heavy) based on complexity",
5
5
  "type": "module",
6
6
  "main": "./src/index.ts",
package/src/index.ts CHANGED
@@ -310,23 +310,16 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
310
310
  const map = presetMap && Object.keys(presetMap).length > 0 ? presetMap : fb.global;
311
311
  if (!map) return "";
312
312
 
313
- const providerLines = Object.entries(map).flatMap(([provider, presetOrder]) => {
313
+ const chains = Object.entries(map).flatMap(([provider, presetOrder]) => {
314
314
  if (!Array.isArray(presetOrder)) return [];
315
- const validOrder = presetOrder.filter(
316
- (preset) => preset !== cfg.activePreset && Boolean(cfg.presets[preset]),
315
+ const valid = presetOrder.filter(
316
+ (p) => p !== cfg.activePreset && Boolean(cfg.presets[p]),
317
317
  );
318
- return validOrder.length > 0 ? [`- ${provider}: ${validOrder.join(" -> ")}`] : [];
318
+ return valid.length > 0 ? [`${provider}→${valid.join("")}`] : [];
319
319
  });
320
320
 
321
- if (providerLines.length === 0) return "";
322
-
323
- return [
324
- "Fallback on delegated task errors:",
325
- "1. If Task(...) returns provider/model/rate-limit/timeout/auth errors, retry once with a different tier suited to the same task.",
326
- "2. If retry also fails, stop delegating that task and complete it directly in the primary agent.",
327
- "3. Use the failing model prefix and this preset fallback order for next-run recovery (`/preset <name>` + restart):",
328
- ...providerLines,
329
- ].join("\n");
321
+ if (chains.length === 0) return "";
322
+ return `Err→retry-alt-tier→fail→direct. Chain: ${chains.join(" | ")}`;
330
323
  }
331
324
 
332
325
  // ---------------------------------------------------------------------------
@@ -335,25 +328,39 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
335
328
 
336
329
  function buildTaskTaxonomy(cfg: RouterConfig): string {
337
330
  if (!cfg.taskPatterns || Object.keys(cfg.taskPatterns).length === 0) return "";
338
-
339
- const lines = ["Coding task routing guide:"];
331
+ const lines = ["R:"];
340
332
  for (const [tier, patterns] of Object.entries(cfg.taskPatterns)) {
341
333
  if (Array.isArray(patterns) && patterns.length > 0) {
342
- lines.push(`- @${tier}: ${patterns.join(", ")}`);
334
+ lines.push(`@${tier}→${patterns.join("/")}`);
343
335
  }
344
336
  }
345
- return lines.join("\n");
337
+ return lines.join(" ");
346
338
  }
347
339
 
348
- function buildCostAwareness(cfg: RouterConfig): string {
349
- const tiers = getActiveTiers(cfg);
350
- const costs = Object.entries(tiers)
351
- .filter(([_, t]) => t.costRatio != null)
352
- .map(([name, t]) => `@${name}=${t.costRatio}x`)
353
- .join(", ");
340
+ /**
341
+ * Injects a multi-phase decomposition hint into the delegation protocol.
342
+ * Teaches the orchestrator to split composite tasks (explore + implement)
343
+ * so the cheap @fast tier handles exploration and @medium handles execution.
344
+ * Only active in normal mode — budget/quality modes have their own override rules.
345
+ */
346
+ function buildDecomposeHint(cfg: RouterConfig): string {
347
+ const mode = getActiveMode(cfg);
348
+ // Budget and quality modes handle this via overrideRules — skip to avoid conflicts
349
+ if (mode?.overrideRules?.length) return "";
354
350
 
355
- if (!costs) return "";
356
- return `Cost ratios: ${costs}. Always use the cheapest tier that can reliably handle the task.`;
351
+ const tiers = getActiveTiers(cfg);
352
+ const entries = Object.entries(tiers);
353
+ if (entries.length < 2) return "";
354
+
355
+ // Sort by costRatio ascending to find cheapest (explore) and next (execute) tiers
356
+ const sorted = [...entries].sort(
357
+ ([, a], [, b]) => (a.costRatio ?? 1) - (b.costRatio ?? 1)
358
+ );
359
+ const cheapest = sorted[0]?.[0];
360
+ const mid = sorted[1]?.[0];
361
+ if (!cheapest || !mid) return "";
362
+
363
+ return `Multi-phase: split explore(@${cheapest})→execute(@${mid}). Cheapest-first.`;
357
364
  }
358
365
 
359
366
  // ---------------------------------------------------------------------------
@@ -363,51 +370,34 @@ function buildCostAwareness(cfg: RouterConfig): string {
363
370
  function buildDelegationProtocol(cfg: RouterConfig): string {
364
371
  const tiers = getActiveTiers(cfg);
365
372
 
366
- const tierSummary = Object.entries(tiers)
373
+ // Compact tier summary: @name=model/variant(costRatio)
374
+ const tierLine = Object.entries(tiers)
367
375
  .map(([name, t]) => {
368
- const shortModel = t.model.split("/").pop() ?? t.model;
369
- const variant = t.variant ? ` (${t.variant})` : "";
370
- return `@${name}=${shortModel}${variant}`;
376
+ const short = t.model.split("/").pop() ?? t.model;
377
+ const v = t.variant ? `/${t.variant}` : "";
378
+ const c = t.costRatio != null ? `(${t.costRatio}x)` : "";
379
+ return `@${name}=${short}${v}${c}`;
371
380
  })
372
- .join(" | ");
381
+ .join(" ");
373
382
 
374
- // Build per-tier whenToUse descriptions so the agent knows when to pick each tier
375
- const tierDescriptions = Object.entries(tiers)
376
- .map(([name, t]) => {
377
- const uses = t.whenToUse.length > 0 ? t.whenToUse.join(", ") : t.description;
378
- return `- @${name}: ${uses}`;
379
- })
380
- .join("\n");
383
+ const mode = getActiveMode(cfg);
384
+ const modeSuffix = cfg.activeMode ? ` mode:${cfg.activeMode}` : "";
381
385
 
382
- // Task taxonomy from config
383
386
  const taxonomy = buildTaskTaxonomy(cfg);
387
+ const decompose = buildDecomposeHint(cfg);
384
388
 
385
- // Cost awareness
386
- const costLine = buildCostAwareness(cfg);
387
-
388
- // Mode-aware rules: if active mode has overrideRules, use those; otherwise use global rules
389
- const mode = getActiveMode(cfg);
390
389
  const effectiveRules = mode?.overrideRules?.length ? mode.overrideRules : cfg.rules;
391
- const numberedRules = effectiveRules
392
- .map((rule, i) => `${i + 1}. ${rule}`)
393
- .join("\n");
390
+ const rulesLine = effectiveRules.map((r, i) => `${i + 1}.${r}`).join(" ");
394
391
 
395
- const fallbackInstructions = buildFallbackInstructions(cfg);
392
+ const fallback = buildFallbackInstructions(cfg);
396
393
 
397
394
  return [
398
- "## Model Delegation Protocol",
399
- `Preset: ${cfg.activePreset}. Tiers: ${tierSummary}.`,
400
- "",
401
- "Tier capabilities:",
402
- tierDescriptions,
403
- ...(taxonomy ? ["", taxonomy] : []),
404
- ...(costLine ? ["", costLine] : []),
405
- ...(mode ? [`\nActive mode: ${cfg.activeMode} (${mode.description})`] : []),
406
- "",
407
- "Apply to every user message (plan and ad-hoc):",
408
- numberedRules,
409
- ...(fallbackInstructions ? ["", fallbackInstructions] : []),
410
- "",
395
+ `## Model Delegation Protocol`,
396
+ `Preset: ${cfg.activePreset}. Tiers: ${tierLine}.${modeSuffix}`,
397
+ ...(taxonomy ? [taxonomy] : []),
398
+ ...(decompose ? [decompose] : []),
399
+ rulesLine,
400
+ ...(fallback ? [fallback] : []),
411
401
  `Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").`,
412
402
  "Keep orchestration and final synthesis in the primary agent.",
413
403
  ].join("\n");
package/tiers.json CHANGED
@@ -174,39 +174,9 @@
174
174
  }
175
175
  },
176
176
  "taskPatterns": {
177
- "fast": [
178
- "Find, search, locate, or grep files and code patterns",
179
- "List or show directory structure and file contents",
180
- "Read or display specific files or sections",
181
- "Check git status, log, diff, or blame",
182
- "Lookup documentation, API signatures, or type definitions",
183
- "Count occurrences, lines, or matches",
184
- "Check if a file, function, or class exists",
185
- "Simple rename or string replacement across files"
186
- ],
187
- "medium": [
188
- "Implement a new feature, function, or component",
189
- "Refactor or restructure existing code",
190
- "Write or update tests",
191
- "Fix a bug (first or second attempt)",
192
- "Modify or update existing code logic",
193
- "Code review with suggested changes",
194
- "Run build/lint/test and fix resulting errors",
195
- "Create a new file from a template or pattern",
196
- "Database migration or schema changes",
197
- "API endpoint implementation",
198
- "Configuration or dependency updates"
199
- ],
200
- "heavy": [
201
- "Design system or module architecture from scratch",
202
- "Debug a problem after 2+ failed attempts",
203
- "Security audit or vulnerability review",
204
- "Performance profiling and optimization",
205
- "Migration strategy (framework, language, infrastructure)",
206
- "Complex multi-system integration design",
207
- "Evaluate tradeoffs between competing approaches",
208
- "Root cause analysis of complex or elusive failures"
209
- ]
177
+ "fast": ["search", "grep", "read", "git-info", "ls", "lookup-docs/types", "count", "exists-check", "rename"],
178
+ "medium": ["impl-feature", "refactor", "write-tests", "bugfix(≤2)", "edit-logic", "code-review", "build-fix", "create-file", "db-migrate", "api-endpoint", "config-update"],
179
+ "heavy": ["arch-design", "debug(≥3fail)", "sec-audit", "perf-opt", "migrate-strategy", "multi-system-integration", "tradeoff-analysis", "rca"]
210
180
  },
211
181
  "modes": {
212
182
  "normal": {
@@ -217,22 +187,22 @@
217
187
  "defaultTier": "fast",
218
188
  "description": "Aggressive cost savings — defaults to cheapest tier, escalates only when needed",
219
189
  "overrideRules": [
220
- "Default ALL tasks to @fast unless they clearly require code edits or complex reasoning",
221
- "Use @medium ONLY for: multi-file edits, complex refactors, test suites, or build-fix cycles",
222
- "Use @heavy ONLY when explicitly requested by user or after 2+ failed @medium attempts",
223
- "Prefer executing simple tasks directly (grep, read, glob) over delegating — zero delegation overhead",
224
- "Batch multiple related searches into a single @fast delegation instead of multiple calls",
225
- "When uncertain between @fast and @medium, choose @fast — escalate only on failure"
190
+ "default→@fast unless edits/complex-reasoning needed",
191
+ "@medium ONLY: multi-file-edit/refactor/test-suite/build-fix",
192
+ "@heavy ONLY: user-requested OR 2 @medium failures",
193
+ "trivial(grep/read/glob)→direct,no-delegate",
194
+ "batch related searchessingle @fast",
195
+ "uncertain @fast vs @medium→@fast,escalate on fail"
226
196
  ]
227
197
  },
228
198
  "quality": {
229
199
  "defaultTier": "medium",
230
200
  "description": "Quality-first — uses stronger models more liberally for better results",
231
201
  "overrideRules": [
232
- "Default to @medium for all tasks including exploration when deep context understanding matters",
233
- "Use @heavy for any task involving architecture, debugging, security, or multi-file coordination",
234
- "Use @fast only for trivial single-tool operations (one grep, one file read)",
235
- "Prefer thoroughness over speed — better to over-qualify a task than under-qualify it"
202
+ "default→@medium incl exploration when deep-context matters",
203
+ "@heavy: arch/debug/security/multi-file-coord",
204
+ "@fast ONLY: trivial single-tool ops (1 grep/1 read)",
205
+ "prefer thoroughness over speed"
236
206
  ]
237
207
  }
238
208
  },
@@ -245,18 +215,14 @@
245
215
  }
246
216
  },
247
217
  "rules": [
248
- "When a plan step contains [tier:fast], [tier:medium], or [tier:heavy], delegate to that agent",
249
- "When a plan says 'use a fast/cheap model' -> delegate to @fast",
250
- "When a plan says 'use a medium/balanced model' -> delegate to @medium",
251
- "When a plan says 'use a heavy/powerful model' -> delegate to @heavy",
252
- "Default to @medium for implementation tasks you could delegate",
253
- "Use @fast for any read-only exploration or research task",
254
- "Keep orchestration (planning, decisions, verification) for yourself - delegate execution",
255
- "For trivial tasks (single grep, single file read), execute directly without delegation",
256
- "Never delegate to @heavy if you are already running on an opus-class model - do it yourself",
257
- "If a task takes 1-2 tool calls, execute directly — delegation overhead is not worth the cost",
258
- "Consult the task routing guide below to match task type to the correct tier",
259
- "Consider cost ratios when choosing tiers — always use the cheapest tier that can reliably handle the task"
218
+ "[tier:X]delegate X",
219
+ "plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
220
+ "default:impl→@medium | readonly→@fast",
221
+ "orchestrate=self,delegate=exec",
222
+ "trivial(≤2tools)→direct,skip-delegate",
223
+ "self∈opus→never→@heavy,do-it-yourself",
224
+ "consult route-guide↑",
225
+ "min(cost,adequate-tier)"
260
226
  ],
261
227
  "defaultTier": "medium"
262
228
  }