@brainst0rm/core 0.13.0 → 0.14.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/chunk-M7BBX56R.js +340 -0
- package/dist/chunk-M7BBX56R.js.map +1 -0
- package/dist/{chunk-SWXTFHC7.js → chunk-Z5D2QZY6.js} +3 -3
- package/dist/chunk-Z5D2QZY6.js.map +1 -0
- package/dist/chunk-Z6ZWNWWR.js +34 -0
- package/dist/index.d.ts +2717 -188
- package/dist/index.js +16178 -7949
- package/dist/index.js.map +1 -1
- package/dist/self-extend-47LWSK3E.js +52 -0
- package/dist/self-extend-47LWSK3E.js.map +1 -0
- package/dist/skills/builtin/api-and-interface-design/SKILL.md +300 -0
- package/dist/skills/builtin/browser-testing-with-devtools/SKILL.md +307 -0
- package/dist/skills/builtin/ci-cd-and-automation/SKILL.md +391 -0
- package/dist/skills/builtin/code-review-and-quality/SKILL.md +353 -0
- package/dist/skills/builtin/code-simplification/SKILL.md +340 -0
- package/dist/skills/builtin/context-engineering/SKILL.md +301 -0
- package/dist/skills/builtin/daemon-operations/SKILL.md +55 -0
- package/dist/skills/builtin/debugging-and-error-recovery/SKILL.md +306 -0
- package/dist/skills/builtin/deprecation-and-migration/SKILL.md +207 -0
- package/dist/skills/builtin/documentation-and-adrs/SKILL.md +295 -0
- package/dist/skills/builtin/frontend-ui-engineering/SKILL.md +333 -0
- package/dist/skills/builtin/git-workflow-and-versioning/SKILL.md +303 -0
- package/dist/skills/builtin/github-collaboration/SKILL.md +215 -0
- package/dist/skills/builtin/godmode-operations/SKILL.md +68 -0
- package/dist/skills/builtin/idea-refine/SKILL.md +186 -0
- package/dist/skills/builtin/idea-refine/examples.md +244 -0
- package/dist/skills/builtin/idea-refine/frameworks.md +101 -0
- package/dist/skills/builtin/idea-refine/refinement-criteria.md +126 -0
- package/dist/skills/builtin/idea-refine/scripts/idea-refine.sh +15 -0
- package/dist/skills/builtin/incremental-implementation/SKILL.md +243 -0
- package/dist/skills/builtin/memory-init/SKILL.md +54 -0
- package/dist/skills/builtin/memory-reflection/SKILL.md +59 -0
- package/dist/skills/builtin/multi-model-routing/SKILL.md +56 -0
- package/dist/skills/builtin/performance-optimization/SKILL.md +291 -0
- package/dist/skills/builtin/planning-and-task-breakdown/SKILL.md +240 -0
- package/dist/skills/builtin/security-and-hardening/SKILL.md +368 -0
- package/dist/skills/builtin/shipping-and-launch/SKILL.md +310 -0
- package/dist/skills/builtin/spec-driven-development/SKILL.md +212 -0
- package/dist/skills/builtin/test-driven-development/SKILL.md +376 -0
- package/dist/skills/builtin/using-agent-skills/SKILL.md +173 -0
- package/dist/trajectory-analyzer-ZAI2XUAI.js +14 -0
- package/dist/{trajectory-capture-RF7TUN6I.js → trajectory-capture-ERPIVYQJ.js} +3 -3
- package/package.json +14 -11
- package/dist/chunk-OU3NPQBH.js +0 -87
- package/dist/chunk-OU3NPQBH.js.map +0 -1
- package/dist/chunk-PZ5AY32C.js +0 -10
- package/dist/chunk-SWXTFHC7.js.map +0 -1
- package/dist/trajectory-MOCIJBV6.js +0 -8
- /package/dist/{chunk-PZ5AY32C.js.map → chunk-Z6ZWNWWR.js.map} +0 -0
- /package/dist/{trajectory-MOCIJBV6.js.map → trajectory-analyzer-ZAI2XUAI.js.map} +0 -0
- /package/dist/{trajectory-capture-RF7TUN6I.js.map → trajectory-capture-ERPIVYQJ.js.map} +0 -0
|
@@ -0,0 +1,243 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incremental-implementation
|
|
3
|
+
description: Delivers changes incrementally. Use when implementing any feature or change that touches more than one file. Use when you're about to write a large amount of code at once, or when a task feels too big to land in one step.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Incremental Implementation
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Build in thin vertical slices — implement one piece, test it, verify it, then expand. Avoid implementing an entire feature in one pass. Each increment should leave the system in a working, testable state. This is the execution discipline that makes large features manageable.
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
- Implementing any multi-file change
|
|
15
|
+
- Building a new feature from a task breakdown
|
|
16
|
+
- Refactoring existing code
|
|
17
|
+
- Any time you're tempted to write more than ~100 lines before testing
|
|
18
|
+
|
|
19
|
+
**When NOT to use:** Single-file, single-function changes where the scope is already minimal.
|
|
20
|
+
|
|
21
|
+
## The Increment Cycle
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
┌──────────────────────────────────────┐
|
|
25
|
+
│ │
|
|
26
|
+
│ Implement ──→ Test ──→ Verify ──┐ │
|
|
27
|
+
│ ▲ │ │
|
|
28
|
+
│ └───── Commit ◄─────────────┘ │
|
|
29
|
+
│ │ │
|
|
30
|
+
│ ▼ │
|
|
31
|
+
│ Next slice │
|
|
32
|
+
│ │
|
|
33
|
+
└──────────────────────────────────────┘
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
For each slice:
|
|
37
|
+
|
|
38
|
+
1. **Implement** the smallest complete piece of functionality
|
|
39
|
+
2. **Test** — run the test suite (or write a test if none exists)
|
|
40
|
+
3. **Verify** — confirm the slice works as expected (tests pass, build succeeds, manual check)
|
|
41
|
+
4. **Commit** -- save your progress with a descriptive message (see `git-workflow-and-versioning` for atomic commit guidance)
|
|
42
|
+
5. **Move to the next slice** — carry forward, don't restart
|
|
43
|
+
|
|
44
|
+
## Slicing Strategies
|
|
45
|
+
|
|
46
|
+
### Vertical Slices (Preferred)
|
|
47
|
+
|
|
48
|
+
Build one complete path through the stack:
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Slice 1: Create a task (DB + API + basic UI)
|
|
52
|
+
→ Tests pass, user can create a task via the UI
|
|
53
|
+
|
|
54
|
+
Slice 2: List tasks (query + API + UI)
|
|
55
|
+
→ Tests pass, user can see their tasks
|
|
56
|
+
|
|
57
|
+
Slice 3: Edit a task (update + API + UI)
|
|
58
|
+
→ Tests pass, user can modify tasks
|
|
59
|
+
|
|
60
|
+
Slice 4: Delete a task (delete + API + UI + confirmation)
|
|
61
|
+
→ Tests pass, full CRUD complete
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Each slice delivers working end-to-end functionality.
|
|
65
|
+
|
|
66
|
+
### Contract-First Slicing
|
|
67
|
+
|
|
68
|
+
When backend and frontend need to develop in parallel:
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
Slice 0: Define the API contract (types, interfaces, OpenAPI spec)
|
|
72
|
+
Slice 1a: Implement backend against the contract + API tests
|
|
73
|
+
Slice 1b: Implement frontend against mock data matching the contract
|
|
74
|
+
Slice 2: Integrate and test end-to-end
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Risk-First Slicing
|
|
78
|
+
|
|
79
|
+
Tackle the riskiest or most uncertain piece first:
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
Slice 1: Prove the WebSocket connection works (highest risk)
|
|
83
|
+
Slice 2: Build real-time task updates on the proven connection
|
|
84
|
+
Slice 3: Add offline support and reconnection
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
If Slice 1 fails, you discover it before investing in Slices 2 and 3.
|
|
88
|
+
|
|
89
|
+
## Implementation Rules
|
|
90
|
+
|
|
91
|
+
### Rule 0: Simplicity First
|
|
92
|
+
|
|
93
|
+
Before writing any code, ask: "What is the simplest thing that could work?"
|
|
94
|
+
|
|
95
|
+
After writing code, review it against these checks:
|
|
96
|
+
|
|
97
|
+
- Can this be done in fewer lines?
|
|
98
|
+
- Are these abstractions earning their complexity?
|
|
99
|
+
- Would a staff engineer look at this and say "why didn't you just..."?
|
|
100
|
+
- Am I building for hypothetical future requirements, or the current task?
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
SIMPLICITY CHECK:
|
|
104
|
+
✗ Generic EventBus with middleware pipeline for one notification
|
|
105
|
+
✓ Simple function call
|
|
106
|
+
|
|
107
|
+
✗ Abstract factory pattern for two similar components
|
|
108
|
+
✓ Two straightforward components with shared utilities
|
|
109
|
+
|
|
110
|
+
✗ Config-driven form builder for three forms
|
|
111
|
+
✓ Three form components
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Three similar lines of code is better than a premature abstraction. Implement the naive, obviously-correct version first. Optimize only after correctness is proven with tests.
|
|
115
|
+
|
|
116
|
+
### Rule 0.5: Scope Discipline
|
|
117
|
+
|
|
118
|
+
Touch only what the task requires.
|
|
119
|
+
|
|
120
|
+
Do NOT:
|
|
121
|
+
|
|
122
|
+
- "Clean up" code adjacent to your change
|
|
123
|
+
- Refactor imports in files you're not modifying
|
|
124
|
+
- Remove comments you don't fully understand
|
|
125
|
+
- Add features not in the spec because they "seem useful"
|
|
126
|
+
- Modernize syntax in files you're only reading
|
|
127
|
+
|
|
128
|
+
If you notice something worth improving outside your task scope, note it — don't fix it:
|
|
129
|
+
|
|
130
|
+
```
|
|
131
|
+
NOTICED BUT NOT TOUCHING:
|
|
132
|
+
- src/utils/format.ts has an unused import (unrelated to this task)
|
|
133
|
+
- The auth middleware could use better error messages (separate task)
|
|
134
|
+
→ Want me to create tasks for these?
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### Rule 1: One Thing at a Time
|
|
138
|
+
|
|
139
|
+
Each increment changes one logical thing. Don't mix concerns:
|
|
140
|
+
|
|
141
|
+
**Bad:** One commit that adds a new component, refactors an existing one, and updates the build config.
|
|
142
|
+
|
|
143
|
+
**Good:** Three separate commits — one for each change.
|
|
144
|
+
|
|
145
|
+
### Rule 2: Keep It Compilable
|
|
146
|
+
|
|
147
|
+
After each increment, the project must build and existing tests must pass. Don't leave the codebase in a broken state between slices.
|
|
148
|
+
|
|
149
|
+
### Rule 3: Feature Flags for Incomplete Features
|
|
150
|
+
|
|
151
|
+
If a feature isn't ready for users but you need to merge increments:
|
|
152
|
+
|
|
153
|
+
```typescript
|
|
154
|
+
// Feature flag for work-in-progress
|
|
155
|
+
const ENABLE_TASK_SHARING = process.env.FEATURE_TASK_SHARING === "true";
|
|
156
|
+
|
|
157
|
+
if (ENABLE_TASK_SHARING) {
|
|
158
|
+
// New sharing UI
|
|
159
|
+
}
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
This lets you merge small increments to the main branch without exposing incomplete work.
|
|
163
|
+
|
|
164
|
+
### Rule 4: Safe Defaults
|
|
165
|
+
|
|
166
|
+
New code should default to safe, conservative behavior:
|
|
167
|
+
|
|
168
|
+
```typescript
|
|
169
|
+
// Safe: disabled by default, opt-in
|
|
170
|
+
export function createTask(data: TaskInput, options?: { notify?: boolean }) {
|
|
171
|
+
const shouldNotify = options?.notify ?? false;
|
|
172
|
+
// ...
|
|
173
|
+
}
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Rule 5: Rollback-Friendly
|
|
177
|
+
|
|
178
|
+
Each increment should be independently revertable:
|
|
179
|
+
|
|
180
|
+
- Additive changes (new files, new functions) are easy to revert
|
|
181
|
+
- Modifications to existing code should be minimal and focused
|
|
182
|
+
- Database migrations should have corresponding rollback migrations
|
|
183
|
+
- Avoid deleting something in one commit and replacing it in the same commit — separate them
|
|
184
|
+
|
|
185
|
+
## Working with Agents
|
|
186
|
+
|
|
187
|
+
When directing an agent to implement incrementally:
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
"Let's implement Task 3 from the plan.
|
|
191
|
+
|
|
192
|
+
Start with just the database schema change and the API endpoint.
|
|
193
|
+
Don't touch the UI yet — we'll do that in the next increment.
|
|
194
|
+
|
|
195
|
+
After implementing, run `npm test` and `npm run build` to verify
|
|
196
|
+
nothing is broken."
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Be explicit about what's in scope and what's NOT in scope for each increment.
|
|
200
|
+
|
|
201
|
+
## Increment Checklist
|
|
202
|
+
|
|
203
|
+
After each increment, verify:
|
|
204
|
+
|
|
205
|
+
- [ ] The change does one thing and does it completely
|
|
206
|
+
- [ ] All existing tests still pass (`npm test`)
|
|
207
|
+
- [ ] The build succeeds (`npm run build`)
|
|
208
|
+
- [ ] Type checking passes (`npx tsc --noEmit`)
|
|
209
|
+
- [ ] Linting passes (`npm run lint`)
|
|
210
|
+
- [ ] The new functionality works as expected
|
|
211
|
+
- [ ] The change is committed with a descriptive message
|
|
212
|
+
|
|
213
|
+
## Common Rationalizations
|
|
214
|
+
|
|
215
|
+
| Rationalization | Reality |
|
|
216
|
+
| -------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
|
|
217
|
+
| "I'll test it all at the end" | Bugs compound. A bug in Slice 1 makes Slices 2-5 wrong. Test each slice. |
|
|
218
|
+
| "It's faster to do it all at once" | It _feels_ faster until something breaks and you can't find which of 500 changed lines caused it. |
|
|
219
|
+
| "These changes are too small to commit separately" | Small commits are free. Large commits hide bugs and make rollbacks painful. |
|
|
220
|
+
| "I'll add the feature flag later" | If the feature isn't complete, it shouldn't be user-visible. Add the flag now. |
|
|
221
|
+
| "This refactor is small enough to include" | Refactors mixed with features make both harder to review and debug. Separate them. |
|
|
222
|
+
|
|
223
|
+
## Red Flags
|
|
224
|
+
|
|
225
|
+
- More than 100 lines of code written without running tests
|
|
226
|
+
- Multiple unrelated changes in a single increment
|
|
227
|
+
- "Let me just quickly add this too" scope expansion
|
|
228
|
+
- Skipping the test/verify step to move faster
|
|
229
|
+
- Build or tests broken between increments
|
|
230
|
+
- Large uncommitted changes accumulating
|
|
231
|
+
- Building abstractions before the third use case demands it
|
|
232
|
+
- Touching files outside the task scope "while I'm here"
|
|
233
|
+
- Creating new utility files for one-time operations
|
|
234
|
+
|
|
235
|
+
## Verification
|
|
236
|
+
|
|
237
|
+
After completing all increments for a task:
|
|
238
|
+
|
|
239
|
+
- [ ] Each increment was individually tested and committed
|
|
240
|
+
- [ ] The full test suite passes
|
|
241
|
+
- [ ] The build is clean
|
|
242
|
+
- [ ] The feature works end-to-end as specified
|
|
243
|
+
- [ ] No uncommitted changes remain
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: memory-init
|
|
3
|
+
description: Initialize memory from project files and optionally import from Claude Code session history. Use when starting a new project or running /init.
|
|
4
|
+
max_steps: 10
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Memory Initialization
|
|
8
|
+
|
|
9
|
+
Bootstrap the memory system for this project. Read project context files, extract key information, and create structured memory entries.
|
|
10
|
+
|
|
11
|
+
## Process
|
|
12
|
+
|
|
13
|
+
### Phase 1: Read Project Context (parallel)
|
|
14
|
+
|
|
15
|
+
Read these files if they exist (use file_read, don't fail if missing):
|
|
16
|
+
|
|
17
|
+
1. `CLAUDE.md` — project instructions and conventions
|
|
18
|
+
2. `README.md` — project description, setup, architecture
|
|
19
|
+
3. `package.json` — dependencies, scripts, project name
|
|
20
|
+
4. `BRAINSTORM.md` — brainstorm-specific project context
|
|
21
|
+
5. `.github/CODEOWNERS` — team ownership structure
|
|
22
|
+
|
|
23
|
+
### Phase 2: Extract and Create Memory Entries
|
|
24
|
+
|
|
25
|
+
From the files above, create memory entries using the `memory` tool:
|
|
26
|
+
|
|
27
|
+
**System tier (always in prompt):**
|
|
28
|
+
|
|
29
|
+
- `project-overview` (type: project, tier: system) — What this project is, its stack, and primary purpose. 3-5 sentences.
|
|
30
|
+
- `conventions` (type: feedback, tier: system) — Coding conventions, style rules, and patterns found in CLAUDE.md or README.
|
|
31
|
+
- `user-identity` (type: user, tier: system) — Git user name and email from `git config user.name` and `git config user.email`.
|
|
32
|
+
|
|
33
|
+
**Archive tier (searchable on demand):**
|
|
34
|
+
|
|
35
|
+
- `dependencies` (type: reference, tier: archive) — Key dependencies and their purposes.
|
|
36
|
+
- `build-commands` (type: reference, tier: archive) — How to build, test, and run the project.
|
|
37
|
+
- `architecture` (type: project, tier: archive) — High-level architecture notes if found.
|
|
38
|
+
|
|
39
|
+
### Phase 3: Import Claude Code History (optional)
|
|
40
|
+
|
|
41
|
+
If `~/.claude/projects/` exists, look for session history matching this project path. Extract:
|
|
42
|
+
|
|
43
|
+
- User preferences expressed across sessions
|
|
44
|
+
- Recurring patterns or corrections
|
|
45
|
+
- Hard rules the user enforced
|
|
46
|
+
|
|
47
|
+
Create memory entries for each finding (type: feedback, tier: system for strong preferences, archive for incidental notes).
|
|
48
|
+
|
|
49
|
+
## Rules
|
|
50
|
+
|
|
51
|
+
- Do NOT create empty or placeholder memories. Only write entries with real content.
|
|
52
|
+
- Keep each entry concise — under 500 characters for system tier, under 1000 for archive.
|
|
53
|
+
- Use the `memory` tool's write operation for all entries.
|
|
54
|
+
- Run git commands to get user identity — this is always useful context.
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: memory-reflection
|
|
3
|
+
description: Review and consolidate memory entries. Merge duplicates, resolve contradictions, update stale entries, and rebalance system vs archive tiers. Use when running /doctor or when triggered by KAIROS auto-reflection.
|
|
4
|
+
max_steps: 15
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Memory Reflection
|
|
8
|
+
|
|
9
|
+
Review all memory entries for quality, accuracy, and organization. This is the daemon's self-maintenance cycle.
|
|
10
|
+
|
|
11
|
+
## Process
|
|
12
|
+
|
|
13
|
+
### Phase 1: Inventory
|
|
14
|
+
|
|
15
|
+
Use `memory({ operation: "list" })` to see all entries grouped by tier.
|
|
16
|
+
|
|
17
|
+
### Phase 2: Analyze
|
|
18
|
+
|
|
19
|
+
For each system-tier entry:
|
|
20
|
+
|
|
21
|
+
1. Is it still accurate? Check against current project state.
|
|
22
|
+
2. Is it still relevant? If it hasn't been useful in the last few sessions, consider demoting to archive.
|
|
23
|
+
3. Is it a duplicate of another entry? Merge if so.
|
|
24
|
+
4. Does it contradict another entry? Keep the most recent, update or delete the other.
|
|
25
|
+
|
|
26
|
+
For each archive-tier entry:
|
|
27
|
+
|
|
28
|
+
1. Should it be promoted to system? If the project frequently needs this info, promote it.
|
|
29
|
+
2. Is it stale? References to files that no longer exist, outdated conventions, etc.
|
|
30
|
+
3. Can it be merged with a related entry?
|
|
31
|
+
|
|
32
|
+
### Phase 3: Act
|
|
33
|
+
|
|
34
|
+
- **Merge duplicates:** Read both entries, combine into one, delete the other.
|
|
35
|
+
- **Resolve contradictions:** Keep the one that matches current project state. Update description to note the change.
|
|
36
|
+
- **Promote high-value:** `memory({ operation: "promote", id: "..." })` for entries that belong in every prompt.
|
|
37
|
+
- **Demote low-value:** `memory({ operation: "demote", id: "..." })` for entries that are rarely accessed.
|
|
38
|
+
- **Delete stale:** `memory({ operation: "delete", id: "..." })` for entries that reference things that no longer exist.
|
|
39
|
+
- **Update outdated:** `memory({ operation: "write", ... })` to refresh content.
|
|
40
|
+
|
|
41
|
+
### Phase 4: Report
|
|
42
|
+
|
|
43
|
+
Summarize what changed:
|
|
44
|
+
|
|
45
|
+
- Entries merged: N
|
|
46
|
+
- Entries promoted: N
|
|
47
|
+
- Entries demoted: N
|
|
48
|
+
- Entries deleted: N
|
|
49
|
+
- Entries updated: N
|
|
50
|
+
- Total system entries: N
|
|
51
|
+
- Total archive entries: N
|
|
52
|
+
|
|
53
|
+
## Rules
|
|
54
|
+
|
|
55
|
+
- Be conservative — when uncertain, keep the entry.
|
|
56
|
+
- Never delete entries with `[keep]` in the name.
|
|
57
|
+
- Convert relative dates to absolute dates (e.g., "yesterday" → "2026-04-06").
|
|
58
|
+
- Each memory entry should have a clear, descriptive one-line description.
|
|
59
|
+
- System tier should be < 10 entries to avoid prompt bloat.
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: multi-model-routing
|
|
3
|
+
description: Leverage brainstorm's intelligent model routing. Use when optimizing cost, selecting models for specific tasks, or understanding routing decisions.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Multi-Model Routing
|
|
7
|
+
|
|
8
|
+
Brainstorm routes each task to the optimal model using Thompson sampling across providers. Understanding the routing system lets you make better decisions about cost vs quality.
|
|
9
|
+
|
|
10
|
+
## Routing Strategies
|
|
11
|
+
|
|
12
|
+
| Strategy | When | Tradeoff |
|
|
13
|
+
| ------------- | -------------------------------------------------------- | ------------------------------ |
|
|
14
|
+
| quality-first | Complex reasoning, code generation, architecture | Higher cost, better results |
|
|
15
|
+
| cost-first | Simple queries, bulk operations, high volume | Lower cost, adequate quality |
|
|
16
|
+
| combined | General use (default) | Balanced |
|
|
17
|
+
| capability | Tasks requiring specific features (vision, long context) | Feature-driven |
|
|
18
|
+
| learned | After enough usage data | Thompson sampling optimization |
|
|
19
|
+
|
|
20
|
+
## Cost-Aware Tool Selection
|
|
21
|
+
|
|
22
|
+
Before expensive operations, use `cost_estimate` to predict cost:
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
cost_estimate({ prompt: "the task description", strategy: "quality-first" })
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
For batch operations, prefer `cost-first` strategy to reduce spend.
|
|
29
|
+
|
|
30
|
+
## Model Override
|
|
31
|
+
|
|
32
|
+
Use `set_routing_hint` to override routing for the next request:
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
set_routing_hint({ model: "claude-haiku-4-5", reason: "simple search task" })
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Capability Scores
|
|
39
|
+
|
|
40
|
+
Each model has scores across 7 dimensions:
|
|
41
|
+
|
|
42
|
+
- toolSelection, toolSequencing, codeGeneration
|
|
43
|
+
- multiStepReasoning, instructionFollowing
|
|
44
|
+
- contextUtilization, selfCorrection
|
|
45
|
+
|
|
46
|
+
Use these when choosing models for specific tasks — a model with high toolSequencing is better for multi-step workflows than one with only high codeGeneration.
|
|
47
|
+
|
|
48
|
+
## Fallback Chain
|
|
49
|
+
|
|
50
|
+
Every routing decision includes a fallback chain. If the primary model fails:
|
|
51
|
+
|
|
52
|
+
1. First fallback (same quality tier, different provider)
|
|
53
|
+
2. Second fallback (lower tier)
|
|
54
|
+
3. Last resort (cheapest available)
|
|
55
|
+
|
|
56
|
+
The `--events` flag shows every routing decision and retry in real-time.
|