npm - opencode-model-router - Versions diffs - 1.1.0 → 1.1.1 - Mend

opencode-model-router 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -6,11 +6,11 @@ An [OpenCode](https://opencode.ai) plugin that automatically routes tasks to tie
 The plugin injects a **delegation protocol** into the system prompt that teaches the primary agent to route work:
-| Tier | Default (Anthropic) | Purpose |
-|------|---------------------|---------|
-| `@fast` | Claude Haiku 4.5 | Exploration, search, file reads, grep |
-| `@medium` | Claude Sonnet 4.5 | Implementation, refactoring, tests, bug fixes |
-| `@heavy` | Claude Opus 4.6 | Architecture, complex debugging, security review |
+| Tier | Default (Anthropic) | Cost | Purpose |
+|------|---------------------|------|---------|
+| `@fast` | Claude Haiku 4.5 | 1x | Exploration, search, file reads, grep |
+| `@medium` | Claude Sonnet 4.5 | 5x | Implementation, refactoring, tests, bug fixes |
+| `@heavy` | Claude Opus 4.6 | 20x | Architecture, complex debugging, security review |
 The agent automatically delegates via the Task tool when it recognizes the task complexity, or when plan steps are annotated with `[tier:fast]`, `[tier:medium]`, or `[tier:heavy]` tags.
@@ -98,32 +98,32 @@ All configuration lives in `tiers.json` at the plugin root. Edit it to match you
 The plugin ships with four presets:
 **anthropic** (default):
-| Tier | Model | Notes |
-|------|-------|-------|
-| fast | `anthropic/claude-haiku-4-5` | Cheapest, fastest |
-| medium | `anthropic/claude-sonnet-4-5` | Extended thinking (variant: max) |
-| heavy | `anthropic/claude-opus-4-6` | Extended thinking (variant: max) |
+| Tier | Model | Cost | Notes |
+|------|-------|------|-------|
+| fast | `anthropic/claude-haiku-4-5` | 1x | Cheapest, fastest |
+| medium | `anthropic/claude-sonnet-4-5` | 5x | Extended thinking (variant: max) |
+| heavy | `anthropic/claude-opus-4-6` | 20x | Extended thinking (variant: max) |
 **openai**:
-| Tier | Model | Notes |
-|------|-------|-------|
-| fast | `openai/gpt-5.3-codex-spark` | Cheapest, fastest |
-| medium | `openai/gpt-5.3-codex` | Default settings (no variant/reasoning override) |
-| heavy | `openai/gpt-5.3-codex` | Variant: `xhigh` |
+| Tier | Model | Cost | Notes |
+|------|-------|------|-------|
+| fast | `openai/gpt-5.3-codex-spark` | 1x | Cheapest, fastest |
+| medium | `openai/gpt-5.3-codex` | 5x | Default settings (no variant/reasoning override) |
+| heavy | `openai/gpt-5.3-codex` | 20x | Variant: `xhigh` |
 **github-copilot**:
-| Tier | Model | Notes |
-|------|-------|-------|
-| fast | `github-copilot/claude-haiku-4-5` | Cheapest, fastest |
-| medium | `github-copilot/claude-sonnet-4-5` | Balanced coding model |
-| heavy | `github-copilot/claude-opus-4-6` | Variant: `thinking` |
+| Tier | Model | Cost | Notes |
+|------|-------|------|-------|
+| fast | `github-copilot/claude-haiku-4-5` | 1x | Cheapest, fastest |
+| medium | `github-copilot/claude-sonnet-4-5` | 5x | Balanced coding model |
+| heavy | `github-copilot/claude-opus-4-6` | 20x | Variant: `thinking` |
 **google**:
-| Tier | Model | Notes |
-|------|-------|-------|
-| fast | `google/gemini-2.5-flash` | Cheapest, fastest |
-| medium | `google/gemini-2.5-pro` | Balanced coding model |
-| heavy | `google/gemini-3-pro-preview` | Strongest reasoning in default set |
+| Tier | Model | Cost | Notes |
+|------|-------|------|-------|
+| fast | `google/gemini-2.5-flash` | 1x | Cheapest, fastest |
+| medium | `google/gemini-2.5-pro` | 5x | Balanced coding model |
+| heavy | `google/gemini-3-pro-preview` | 20x | Strongest reasoning in default set |
 Switch presets with the `/preset` command:
@@ -141,13 +141,14 @@ Add a new preset to the `presets` object in `tiers.json`:
     "my-preset": {
       "fast": {
         "model": "provider/model-name",
+        "costRatio": 1,
         "description": "What this tier does",
         "steps": 30,
         "prompt": "System prompt for the subagent",
         "whenToUse": ["Use case 1", "Use case 2"]
       },
-      "medium": { ... },
-      "heavy": { ... }
+      "medium": { "costRatio": 5, "..." : "..." },
+      "heavy": { "costRatio": 20, "..." : "..." }
     }
   }
 }
@@ -159,6 +160,7 @@ Each tier supports these fields:
 |-------|------|-------------|
 | `model` | string | Full model ID (`provider/model-name`) |
 | `variant` | string | Optional variant (e.g., `"max"` for extended thinking) |
+| `costRatio` | number | Relative cost multiplier (e.g., 1 for cheapest, 20 for most expensive). Injected into the system prompt so the agent considers cost when delegating. |
 | `thinking` | object | Anthropic thinking config: `{ "budgetTokens": 10000 }` |
 | `reasoning` | object | OpenAI reasoning config: `{ "effort": "high", "summary": "detailed" }` |
 | `description` | string | Human-readable description shown in `/tiers` |
@@ -167,6 +169,91 @@ Each tier supports these fields:
 | `color` | string | Optional display color |
 | `whenToUse` | string[] | List of use cases (shown in delegation protocol) |
+### Routing modes
+The plugin supports three routing modes that control how aggressively the agent delegates to cheaper tiers. Switch modes with the `/budget` command:
+| Mode | Default Tier | Behavior |
+|------|-------------|----------|
+| `normal` | `@medium` | Balanced quality and cost — delegates based on task complexity |
+| `budget` | `@fast` | Aggressive cost savings — defaults to cheapest tier, escalates only when needed |
+| `quality` | `@medium` | Quality-first — uses stronger models more liberally for better results |
+When a mode has `overrideRules`, those replace the global `rules` array in the system prompt. This lets each mode have fundamentally different delegation behavior.
+Configure modes in `tiers.json`:
+```json
+{
+  "modes": {
+    "normal": {
+      "defaultTier": "medium",
+      "description": "Balanced quality and cost"
+    },
+    "budget": {
+      "defaultTier": "fast",
+      "description": "Aggressive cost savings",
+      "overrideRules": [
+        "Default ALL tasks to @fast unless they clearly require code edits",
+        "Use @medium ONLY for: multi-file edits, complex refactors, test suites",
+        "Use @heavy ONLY when explicitly requested or after 2+ failed @medium attempts"
+      ]
+    },
+    "quality": {
+      "defaultTier": "medium",
+      "description": "Quality-first",
+      "overrideRules": [
+        "Default to @medium for all tasks including exploration",
+        "Use @heavy for architecture, debugging, security, or multi-file coordination",
+        "Use @fast only for trivial single-tool operations"
+      ]
+    }
+  }
+}
+```
+The active mode is persisted in `~/.config/opencode/opencode-model-router.state.json` and survives restarts.
+### Task taxonomy
+The `taskPatterns` object maps common coding task descriptions to tiers. This is injected into the system prompt as a routing guide so the agent can quickly look up which tier to use:
+```json
+{
+  "taskPatterns": {
+    "fast": [
+      "Find, search, locate, or grep files and code patterns",
+      "Read or display specific files or sections",
+      "Check git status, log, diff, or blame"
+    ],
+    "medium": [
+      "Implement a new feature, function, or component",
+      "Refactor or restructure existing code",
+      "Write or update tests",
+      "Fix a bug (first or second attempt)"
+    ],
+    "heavy": [
+      "Design system or module architecture from scratch",
+      "Debug a problem after 2+ failed attempts",
+      "Security audit or vulnerability review"
+    ]
+  }
+}
+```
+Customize these patterns to match your workflow. The agent uses them as heuristics, not hard rules.
+### Cost ratios
+Each tier's `costRatio` is injected into the system prompt so the agent is aware of relative costs:
+```
+Cost ratios: @fast=1x, @medium=5x, @heavy=20x.
+Always use the cheapest tier that can reliably handle the task.
+```
+Adjust `costRatio` values in each tier to reflect your actual provider pricing. The ratios don't need to be exact — they're directional signals for the agent.
 ### Rules
 The `rules` array in `tiers.json` controls when delegation happens. These are injected into the system prompt verbatim:
@@ -179,11 +266,33 @@ The `rules` array in `tiers.json` controls when delegation happens. These are in
     "Use @fast for any read-only exploration or research task",
     "Keep orchestration (planning, decisions, verification) for yourself -- delegate execution",
     "For trivial tasks (single grep, single file read), execute directly without delegation",
-    "Never delegate to @heavy if you are already running on an opus-class model -- do it yourself"
+    "Never delegate to @heavy if you are already running on an opus-class model -- do it yourself",
+    "If a task takes 1-2 tool calls, execute directly -- delegation overhead is not worth the cost",
+    "Consult the task routing guide below to match task type to the correct tier",
+    "Consider cost ratios when choosing tiers -- always use the cheapest tier that can reliably handle the task"
   ]
 }
 ```
+When a routing mode has `overrideRules`, those replace this array entirely for that mode.
+### Fallback
+The `fallback` section defines which presets to try when a provider fails:
+```json
+{
+  "fallback": {
+    "global": {
+      "anthropic": ["openai", "google", "github-copilot"],
+      "openai": ["anthropic", "google", "github-copilot"]
+    }
+  }
+}
+```
+When a delegated task fails with a provider/model/rate-limit error, the agent is instructed to retry with the next preset in the fallback chain.
 ## Commands
 | Command | Description |
@@ -191,6 +300,8 @@ The `rules` array in `tiers.json` controls when delegation happens. These are in
 | `/tiers` | Show active tier configuration and delegation rules |
 | `/preset` | List available presets |
 | `/preset <name>` | Switch to a different preset |
+| `/budget` | Show available routing modes and which is active |
+| `/budget <mode>` | Switch routing mode (`normal`, `budget`, or `quality`) |
 | `/annotate-plan [path]` | Annotate a plan file with `[tier:X]` tags for each step |
 ## Plan annotation

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "opencode-model-router",
-  "version": "1.1.0",
+  "version": "1.1.1",
   "description": "OpenCode plugin that routes tasks to tiered subagents (fast/medium/heavy) based on complexity",
   "type": "module",
   "main": "./src/index.ts",

package/src/index.ts CHANGED Viewed

@@ -310,23 +310,16 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
   const map = presetMap && Object.keys(presetMap).length > 0 ? presetMap : fb.global;
   if (!map) return "";
-  const providerLines = Object.entries(map).flatMap(([provider, presetOrder]) => {
+  const chains = Object.entries(map).flatMap(([provider, presetOrder]) => {
     if (!Array.isArray(presetOrder)) return [];
-    const validOrder = presetOrder.filter(
-      (preset) => preset !== cfg.activePreset && Boolean(cfg.presets[preset]),
+    const valid = presetOrder.filter(
+      (p) => p !== cfg.activePreset && Boolean(cfg.presets[p]),
     );
-    return validOrder.length > 0 ? [`- ${provider}: ${validOrder.join(" -> ")}`] : [];
+    return valid.length > 0 ? [`${provider}→${valid.join("→")}`] : [];
   });
-  if (providerLines.length === 0) return "";
-  return [
-    "Fallback on delegated task errors:",
-    "1. If Task(...) returns provider/model/rate-limit/timeout/auth errors, retry once with a different tier suited to the same task.",
-    "2. If retry also fails, stop delegating that task and complete it directly in the primary agent.",
-    "3. Use the failing model prefix and this preset fallback order for next-run recovery (`/preset <name>` + restart):",
-    ...providerLines,
-  ].join("\n");
+  if (chains.length === 0) return "";
+  return `Err→retry-alt-tier→fail→direct. Chain: ${chains.join(" | ")}`;
 }
 // ---------------------------------------------------------------------------
@@ -336,24 +329,10 @@ function buildFallbackInstructions(cfg: RouterConfig): string {
 function buildTaskTaxonomy(cfg: RouterConfig): string {
   if (!cfg.taskPatterns || Object.keys(cfg.taskPatterns).length === 0) return "";
-  const lines = ["Coding task routing guide:"];
-  for (const [tier, patterns] of Object.entries(cfg.taskPatterns)) {
-    if (Array.isArray(patterns) && patterns.length > 0) {
-      lines.push(`- @${tier}: ${patterns.join(", ")}`);
-    }
-  }
-  return lines.join("\n");
-}
-function buildCostAwareness(cfg: RouterConfig): string {
-  const tiers = getActiveTiers(cfg);
-  const costs = Object.entries(tiers)
-    .filter(([_, t]) => t.costRatio != null)
-    .map(([name, t]) => `@${name}=${t.costRatio}x`)
-    .join(", ");
-  if (!costs) return "";
-  return `Cost ratios: ${costs}. Always use the cheapest tier that can reliably handle the task.`;
+  return Object.entries(cfg.taskPatterns)
+    .filter(([_, p]) => Array.isArray(p) && p.length > 0)
+    .map(([tier, patterns]) => `@${tier}→${(patterns as string[]).join("/")}`)
+    .join("\n");
 }
 // ---------------------------------------------------------------------------
@@ -363,51 +342,35 @@ function buildCostAwareness(cfg: RouterConfig): string {
 function buildDelegationProtocol(cfg: RouterConfig): string {
   const tiers = getActiveTiers(cfg);
-  const tierSummary = Object.entries(tiers)
+  // Compact tier summary: @name=model/variant(costRatio)
+  const tierLine = Object.entries(tiers)
     .map(([name, t]) => {
-      const shortModel = t.model.split("/").pop() ?? t.model;
-      const variant = t.variant ? ` (${t.variant})` : "";
-      return `@${name}=${shortModel}${variant}`;
+      const short = t.model.split("/").pop() ?? t.model;
+      const v = t.variant ? `/${t.variant}` : "";
+      const c = t.costRatio != null ? `(${t.costRatio}x)` : "";
+      return `@${name}=${short}${v}${c}`;
     })
-    .join(" | ");
+    .join(" ");
-  // Build per-tier whenToUse descriptions so the agent knows when to pick each tier
-  const tierDescriptions = Object.entries(tiers)
-    .map(([name, t]) => {
-      const uses = t.whenToUse.length > 0 ? t.whenToUse.join(", ") : t.description;
-      return `- @${name}: ${uses}`;
-    })
-    .join("\n");
+  // Compact mode
+  const mode = getActiveMode(cfg);
+  const modeSuffix = cfg.activeMode ? ` mode:${cfg.activeMode}` : "";
-  // Task taxonomy from config
+  // Compact task routing guide
   const taxonomy = buildTaskTaxonomy(cfg);
-  // Cost awareness
-  const costLine = buildCostAwareness(cfg);
-  // Mode-aware rules: if active mode has overrideRules, use those; otherwise use global rules
-  const mode = getActiveMode(cfg);
+  // Compact rules
   const effectiveRules = mode?.overrideRules?.length ? mode.overrideRules : cfg.rules;
-  const numberedRules = effectiveRules
-    .map((rule, i) => `${i + 1}. ${rule}`)
-    .join("\n");
+  const rulesLine = effectiveRules.map((r, i) => `${i + 1}.${r}`).join(" ");
-  const fallbackInstructions = buildFallbackInstructions(cfg);
+  const fallback = buildFallbackInstructions(cfg);
   return [
-    "## Model Delegation Protocol",
-    `Preset: ${cfg.activePreset}. Tiers: ${tierSummary}.`,
-    "",
-    "Tier capabilities:",
-    tierDescriptions,
-    ...(taxonomy ? ["", taxonomy] : []),
-    ...(costLine ? ["", costLine] : []),
-    ...(mode ? [`\nActive mode: ${cfg.activeMode} (${mode.description})`] : []),
-    "",
-    "Apply to every user message (plan and ad-hoc):",
-    numberedRules,
-    ...(fallbackInstructions ? ["", fallbackInstructions] : []),
-    "",
+    `## Model Delegation Protocol`,
+    `Preset: ${cfg.activePreset}. Tiers: ${tierLine}.${modeSuffix}`,
+    ...(taxonomy ? [taxonomy] : []),
+    rulesLine,
+    ...(fallback ? [fallback] : []),
     `Delegate with Task(subagent_type="fast|medium|heavy", prompt="...").`,
     "Keep orchestration and final synthesis in the primary agent.",
   ].join("\n");

package/tiers.json CHANGED Viewed

@@ -174,39 +174,9 @@
     }
   },
   "taskPatterns": {
-    "fast": [
-      "Find, search, locate, or grep files and code patterns",
-      "List or show directory structure and file contents",
-      "Read or display specific files or sections",
-      "Check git status, log, diff, or blame",
-      "Lookup documentation, API signatures, or type definitions",
-      "Count occurrences, lines, or matches",
-      "Check if a file, function, or class exists",
-      "Simple rename or string replacement across files"
-    ],
-    "medium": [
-      "Implement a new feature, function, or component",
-      "Refactor or restructure existing code",
-      "Write or update tests",
-      "Fix a bug (first or second attempt)",
-      "Modify or update existing code logic",
-      "Code review with suggested changes",
-      "Run build/lint/test and fix resulting errors",
-      "Create a new file from a template or pattern",
-      "Database migration or schema changes",
-      "API endpoint implementation",
-      "Configuration or dependency updates"
-    ],
-    "heavy": [
-      "Design system or module architecture from scratch",
-      "Debug a problem after 2+ failed attempts",
-      "Security audit or vulnerability review",
-      "Performance profiling and optimization",
-      "Migration strategy (framework, language, infrastructure)",
-      "Complex multi-system integration design",
-      "Evaluate tradeoffs between competing approaches",
-      "Root cause analysis of complex or elusive failures"
-    ]
+    "fast": ["search", "grep", "read", "git-info", "ls", "lookup-docs/types", "count", "exists-check", "rename"],
+    "medium": ["impl-feature", "refactor", "write-tests", "bugfix(≤2)", "edit-logic", "code-review", "build-fix", "create-file", "db-migrate", "api-endpoint", "config-update"],
+    "heavy": ["arch-design", "debug(≥3fail)", "sec-audit", "perf-opt", "migrate-strategy", "multi-system-integration", "tradeoff-analysis", "rca"]
   },
   "modes": {
     "normal": {
@@ -217,22 +187,22 @@
       "defaultTier": "fast",
       "description": "Aggressive cost savings — defaults to cheapest tier, escalates only when needed",
       "overrideRules": [
-        "Default ALL tasks to @fast unless they clearly require code edits or complex reasoning",
-        "Use @medium ONLY for: multi-file edits, complex refactors, test suites, or build-fix cycles",
-        "Use @heavy ONLY when explicitly requested by user or after 2+ failed @medium attempts",
-        "Prefer executing simple tasks directly (grep, read, glob) over delegating — zero delegation overhead",
-        "Batch multiple related searches into a single @fast delegation instead of multiple calls",
-        "When uncertain between @fast and @medium, choose @fast — escalate only on failure"
+        "default→@fast unless edits/complex-reasoning needed",
+        "@medium ONLY: multi-file-edit/refactor/test-suite/build-fix",
+        "@heavy ONLY: user-requested OR ≥2 @medium failures",
+        "trivial(grep/read/glob)→direct,no-delegate",
+        "batch related searches→single @fast",
+        "uncertain @fast vs @medium→@fast,escalate on fail"
       ]
     },
     "quality": {
       "defaultTier": "medium",
       "description": "Quality-first — uses stronger models more liberally for better results",
       "overrideRules": [
-        "Default to @medium for all tasks including exploration when deep context understanding matters",
-        "Use @heavy for any task involving architecture, debugging, security, or multi-file coordination",
-        "Use @fast only for trivial single-tool operations (one grep, one file read)",
-        "Prefer thoroughness over speed — better to over-qualify a task than under-qualify it"
+        "default→@medium incl exploration when deep-context matters",
+        "@heavy: arch/debug/security/multi-file-coord",
+        "@fast ONLY: trivial single-tool ops (1 grep/1 read)",
+        "prefer thoroughness over speed"
       ]
     }
   },
@@ -245,18 +215,14 @@
     }
   },
   "rules": [
-    "When a plan step contains [tier:fast], [tier:medium], or [tier:heavy], delegate to that agent",
-    "When a plan says 'use a fast/cheap model' -> delegate to @fast",
-    "When a plan says 'use a medium/balanced model' -> delegate to @medium",
-    "When a plan says 'use a heavy/powerful model' -> delegate to @heavy",
-    "Default to @medium for implementation tasks you could delegate",
-    "Use @fast for any read-only exploration or research task",
-    "Keep orchestration (planning, decisions, verification) for yourself - delegate execution",
-    "For trivial tasks (single grep, single file read), execute directly without delegation",
-    "Never delegate to @heavy if you are already running on an opus-class model - do it yourself",
-    "If a task takes 1-2 tool calls, execute directly — delegation overhead is not worth the cost",
-    "Consult the task routing guide below to match task type to the correct tier",
-    "Consider cost ratios when choosing tiers — always use the cheapest tier that can reliably handle the task"
+    "[tier:X]→delegate X",
+    "plan:fast/cheap→@fast | plan:medium→@medium | plan:heavy→@heavy",
+    "default:impl→@medium | readonly→@fast",
+    "orchestrate=self,delegate=exec",
+    "trivial(≤2tools)→direct,skip-delegate",
+    "self∈opus→never→@heavy,do-it-yourself",
+    "consult route-guide↑",
+    "min(cost,adequate-tier)"
   ],
   "defaultTier": "medium"
 }