@oneie/claude 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,781 @@
1
+ ---
2
+ title: {Human-readable title}
3
+ slug: {kebab-slug}
4
+ type: plan
5
+ tier: simple # trivial | simple | complex — /do reads this before W0
6
+ mode: construction # discovery | construction | evolution | maintenance
7
+ tags: [] # drives pheromone routing + W2 context triggers
8
+
9
+ # ─── GOAL CONTRACT (the only thing that matters) ─────────────────────
10
+ # Both fields are load-bearing. /do reads them at plan start, every W2
11
+ # verifies its cycle moves outcome closer, every W4 re-runs the outcome
12
+ # command. Plan does not close until outcome exits 0.
13
+
14
+ goal: "" # ONE sentence — what becomes true that wasn't before. Not "ship X" but "user can do Y" or "system enforces Z".
15
+ outcome: "" # ONE bash command — exits 0 = goal achieved. Re-runs after every batch's W4. The kill-switch.
16
+ outcome_asserts: "" # ONE sentence — what passing the outcome command proves (human-readable).
17
+
18
+ deliverables: # Concrete artifacts that ship. Every entry maps to a cycle. If it isn't here, it doesn't ship.
19
+ # - {route|component|cli-verb|api|migration|doc}: {path} — {what the user sees / can do}
20
+ # Example:
21
+ # - route: /chat/memory — user can read & edit company memory inline
22
+ # - api: POST /api/memory/upsert — agent writes memory facts
23
+ # - cli: one memory add — operator writes memory facts from terminal
24
+
25
+ ux_before: "" # ONE sentence — what the user does TODAY (the current journey, friction included).
26
+ ux_after: "" # ONE sentence — what the user does AFTER this plan ships (the new journey).
27
+ ux_delta: "" # ONE sentence — the specific improvement (fewer clicks, clearer feedback, capability they didn't have, etc.).
28
+
29
+ # ─────────────────────────────────────────────────────────────────────
30
+
31
+
32
+ # ─── PARALLELISM CONTRACT ────────────────────────────────────────────
33
+ # Declared in frontmatter so /do can spawn the maximum safe fan-out
34
+ # without re-asking. Defaults below are conservative — raise per plan.
35
+
36
+ parallel_budget: # how many of each model can run simultaneously
37
+ # Two dials per agent: MODEL (haiku/sonnet/opus/bash) AND EFFORT (none/low/medium/high/xhigh).
38
+ # Pick the cheapest model that can decide, then the lowest effort that holds. See text/templates-plan.md.
39
+ haiku: 20 # recon (low) + verify rubric (medium) agents
40
+ sonnet: 10 # W3 edit agents — one per file, parallel (low mechanical / medium genuine edit)
41
+ opus: 2 # W2 architectural decisions (high) / substrate reconciliation (xhigh) — rarely > 1
42
+
43
+ batches: # plan-level cycle DAG, flattened into batches.
44
+ # Cycles inside the same batch run their waves IN PARALLEL.
45
+ # The next batch fires the moment the previous batch's W4 closes.
46
+ # Empty list → /do auto-derives from the cycle-level arrows below.
47
+ - [C1] # batch 1: foundation
48
+ - [C2, C3, C4] # batch 2: independent siblings (concurrent)
49
+ - [C5] # batch 3: composes C1-C4
50
+ - [C6] # batch 4: polish across all
51
+
52
+ shared_recon: # files /do reads ONCE at plan start (W0.5), shared by every cycle's W1
53
+ - {path} # ≤8 entries — the load-bearing specs cited across cycles
54
+
55
+ # ─────────────────────────────────────────────────────────────────────
56
+
57
+ source_of_truth: # ≤5 files — W2 auto-loads; only files that actually exist
58
+ - docs/relevant-spec.md
59
+ - src/relevant/file.ts
60
+ existing_primitives: # the components/libs this plan composes — NEVER reimplement these
61
+ # ≥3 entries. If you can't name 3, recon harder before writing the plan.
62
+ # If recon finds a primitive that does ≥70% of what a new file would do, the new file is rejected.
63
+ - {path}: {what it already does and which cycle uses it}
64
+ show: false # true = render cycle frames in --auto
65
+ escape: # plan-level halt: if condition is true, stop and take action
66
+ condition: "" # e.g. "C2 W4 fails delta_tsc > 0 twice"
67
+ action: "" # e.g. "halt; re-scope C2 before retrying"
68
+ context_triggers: # surgical injection — only loads when W1 findings match pattern
69
+ - pattern: "" # regex matched against W1 excerpts + file paths
70
+ inject: "" # "docs/foo.md § Section" — section only, not whole file
71
+ ---
72
+
73
+ # {Title}
74
+
75
+ ## Goal, outcome, deliverables, UX (the only thing that matters)
76
+
77
+ Everything below this section is *how*. This section is *what* and *for whom*. Every cycle, wave, and gate exists to make the four blocks below true. If a cycle doesn't visibly move one of these, it doesn't belong in this plan.
78
+
79
+ ### Goal
80
+
81
+ {ONE sentence — what becomes true that wasn't before. Mirrors `goal:` in frontmatter.}
82
+
83
+ ### Outcome (the kill-switch)
84
+
85
+ ```bash
86
+ {outcome command — exits 0 = goal achieved. Mirrors `outcome:` in frontmatter.}
87
+ ```
88
+
89
+ **What passing proves:** {mirrors `outcome_asserts:` — the observable behaviour the command verifies}
90
+
91
+ **Contract:** this command runs after every batch's W4. The plan does not close until it exits 0. The moment it passes, all remaining cycles enter `justify-or-drop` review — default verdict: drop.
92
+
93
+ ### Deliverables (what actually ships)
94
+
95
+ Concrete artifacts the user / agent / operator can touch. If it isn't on this list, it isn't in scope.
96
+
97
+ | Kind | Path / name | What the user sees or can do |
98
+ |---|---|---|
99
+ | route | `/foo` | {observable behaviour} |
100
+ | component | `web/src/components/foo/Bar.tsx` | {where it appears, what it does} |
101
+ | api | `POST /api/foo` | {request → response shape, who calls it} |
102
+ | cli verb | `one foo <args>` | {what operator can now do from terminal} |
103
+ | migration | `0042_foo.sql` | {schema change + what it unlocks} |
104
+ | doc | `text/foo-plan.md` | {who reads it and when} |
105
+
106
+ Mirror this table into `deliverables:` frontmatter. Every row is owned by exactly one cycle (note the cycle ID).
107
+
108
+ ### User experience: before → after
109
+
110
+ | | Today (ux_before) | After this plan (ux_after) |
111
+ |---|---|---|
112
+ | **Who** | {persona} | {persona — same or new} |
113
+ | **Goal** | {what they're trying to do} | {what they're trying to do — same goal, hopefully} |
114
+ | **Steps** | {1. … 2. … 3. …} | {1. … 2. …} |
115
+ | **Friction** | {what's painful / impossible today} | {what's removed / made trivial} |
116
+ | **Time** | {seconds / minutes / hours} | {seconds / minutes} |
117
+ | **Feedback** | {what they see along the way — or don't} | {what they see now} |
118
+
119
+ **The improvement (ux_delta):** {ONE sentence — the specific delta. "Three clicks become one." "A workflow that needed a developer now runs from the terminal." "Errors that were silent now surface with a fix path." If you can't name the delta, the plan is goldplating — drop it.}
120
+
121
+ **The one screenshot / log line / API response a future-you would point at to say "see, this is what we shipped":**
122
+
123
+ ```
124
+ {paste the after-state observable here — a JSON response shape, a UI flow snippet, a CLI session, whatever proves the UX is real}
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Canon reference (the 7 sources of truth)
130
+
131
+ Every artifact reconciles against the canons that apply to its category. Running `do-reconcile.sh <canon>` on a file exits 0 when it reconciles.
132
+
133
+ | Canon | Lives in | Reconcile test |
134
+ |---|---|---|
135
+ | **Substrate** | `schema/one.tql` | extends the model, never forks; no new dim/verb; no dead name |
136
+ | **Dictionary** | `text/dictionary-plan.md` | no synonym, no dead name |
137
+ | **Authority** | `schema/roles.tql` | walk-up resolves it; no ad-hoc role equality check |
138
+ | **SDK** | `@oneie/sdk` receivers | joins a receiver; never redefines a verb method |
139
+ | **Design** | shadcn/ui + tokens | composes existing; promise terms appear in shipped artifact |
140
+ | **Navigation** | `lib/menu.ts` · `data/in-types.ts` | registered in the right manifest + ≥1 inbound link |
141
+ | **Types** | `tsc` | tsc delta ≤ 0 — no new errors |
142
+
143
+ ## Category — what each tag family carries
144
+
145
+ | Category | Tags | Reconciles with | Done when |
146
+ |---|---|---|---|
147
+ | **PROMISE** | copy · benefit · journey | voice contract + personas | concrete enough to prove; voice consistent |
148
+ | **DATA** | schema · types · migration | Substrate + Dictionary + Types | schema extended, types flow from it, delta_tsc ≤ 0 |
149
+ | **SURFACE** | ui · page · nav · component | Design + Navigation | composed from real primitives; **registered + linked**; every state (empty/loading/error/edge) built |
150
+ | **GATEWAY** | api · route · worker · channel | SDK + Authority | joins a receiver + route family; guarded by walk-up |
151
+ | **PROOF** | test · e2e · a11y | goal / outcome | asserts the destination, not the path; exits 0 |
152
+ | **TEACH** | doc · runbook · tutorial | Dictionary (names + links) | no stale name, no dead link |
153
+
154
+ **The naming law:** `template-<suffix>.md` fills to `<slug>-<suffix>.md`. Four templates, four phases:
155
+ - `template-feature.md` → `<slug>.md` (PROMISE)
156
+ - `template-plan.md` → `<slug>-plan.md` (DESIGN)
157
+ - `template-todo.md` → `<slug>-todo.md` (PLAN)
158
+ - `template-agent.md` → `.claude/agents/<name>.md` (BUILD)
159
+
160
+ ## Reuse contract (read before drafting any cycle)
161
+
162
+ **Power through simplicity.** The smallest amount of new code that closes the
163
+ loop wins. Every cycle in this plan must answer the **compose-or-construct**
164
+ question before W3 spawns any agent.
165
+
166
+ ### The compose-or-construct test
167
+
168
+ For every new file a cycle proposes, W2 must record one line:
169
+
170
+ > **`{file}`** — no existing primitive covers `{specific behaviour}`. Closest
171
+ > match: `{path}` does `{what}` but lacks `{gap}`. Composition would require
172
+ > `{≥N hacks}` and lose `{what}`.
173
+
174
+ If you can't fill that in, **delete the new file from the W3 list** and slot
175
+ the behaviour into the closest existing primitive instead.
176
+
177
+ ### Compose-first taxonomy
178
+
179
+ Before drafting any cycle, walk these registries top-to-bottom. The first
180
+ match wins; only fall through to "new file" if every layer fails.
181
+
182
+ | Layer | Where to look | Default verdict |
183
+ |---|---|---|
184
+ | 1. **Domain composition** | the surface's own folder (`web/src/components/{surface}/`) | extend the file that already renders this surface |
185
+ | 2. **Cross-surface composition** | sibling component folders (`chat/`, `crm/`, `cards/`, `in/`, `settings/`) | import + slot — do not copy |
186
+ | 3. **Design primitives** | `web/src/components/ai-elements/`, `web/src/components/ui/` | compose the primitive into a parent; never reimplement |
187
+ | 4. **Library primitives** | `@/lib/`, `@/engine/`, existing hooks | reuse the helper; do not parallel-write |
188
+ | 5. **Skill / rule packs** | `.claude/skills/`, `.claude/rules/` | invoke the existing skill; do not inline its knowledge |
189
+ | 6. **New file** | only if 1-5 all fail | requires the W2 justification line above |
190
+
191
+ **Anti-patterns rejected on sight:**
192
+
193
+ - ❌ New `<Composer>` / `<Toolbar>` / `<Picker>` tree when `PromptInput*` already covers it
194
+ - ❌ New `<ThreadView>` / `<MessageBubble>` when `Conversation` + `Message` + `MessageList` cover it
195
+ - ❌ New `<Modal>` / `<Drawer>` / `<Sheet>` when `Drawer.tsx` is in `ui/`
196
+ - ❌ Inline `<svg>` when `lucide-react` + `<Icon>` + `<IconBadge>` exist
197
+ - ❌ Bespoke `<textarea>` when `PromptInputTextarea` is one import away
198
+ - ❌ A new test file per component when one test file per surface is the norm
199
+ - ❌ Re-implementing a debounce / fetch / SSE helper that already lives in `@/lib/`
200
+
201
+ If your plan trips any of these, rewrite the cycle to compose instead.
202
+
203
+ ### Reuse audit (mandatory W4 line item — every cycle, no exceptions)
204
+
205
+ Every W4 includes these greps; the cycle does not close if any fail:
206
+
207
+ - [ ] `wc -l` for all new files in this cycle totals **<{budget} LOC** (set in W2)
208
+ - [ ] `delta_loc_net ≤ {target}` (negative deltas preferred — deletion is a win)
209
+ - [ ] No reimplementation of a primitive on the taxonomy table (named grep per cycle)
210
+
211
+ ---
212
+
213
+ ## Testing — goal-based, Vitest-first, autonomous
214
+
215
+ **The rule.** Every cycle has **one demo gate**: a bash command that exits 0 = pass. The command runs zero LLM tokens. If the command can't decide the cycle, it's the wrong command.
216
+
217
+ ### Goal-based, not implementation-based
218
+
219
+ | Bad (implementation) | Good (goal) |
220
+ |---|---|
221
+ | `expect(getByText('Mark')).toHaveClass('bg-primary')` | `await run('mark', e); expect(getStrength(edge)).toBe(1)` |
222
+ | "the button is visible and styled correctly" | "the action deposits pheromone on the edge" |
223
+ | asserts the path | asserts the destination |
224
+
225
+ A test that breaks when the styling changes is testing the wrong thing.
226
+
227
+ ### Vitest-first; Playwright only when justified
228
+
229
+ | Test type | Use for | Time | Tokens (fail debug) | Default |
230
+ |---|---|---|---|---|
231
+ | **Vitest pure** | classifier, dispatcher, hooks, parsers | <100ms | ~200 | **default** |
232
+ | **Vitest + msw** | API contract, fetch dispatch, SSE protocol | <500ms | ~400 | **default for network** |
233
+ | **Vitest + @testing-library/react** | component render given prop, role-gated visibility | <1s | ~500 | **default for UI** |
234
+ | **Lighthouse CLI** (Vitest wrapper) | perf budgets only | 30s | ~600 | when cycle ships perf |
235
+ | **Playwright** | visual regression · real multi-context SSE · pixel-drag | 5-30s | 2k-10k | only if cycle declares `requires_playwright: true` in frontmatter |
236
+
237
+ **Default to the cheapest tool that can express the goal.** A Playwright test where Vitest + msw would suffice is a token-waste and a flake risk.
238
+
239
+ ### One demo gate per cycle
240
+
241
+ ```yaml
242
+ # In cycle frontmatter or cycle header:
243
+ demo:
244
+ command: "bun vitest run tests/e2e/{cycle}.test.ts"
245
+ asserts: "{single goal sentence — what passing means}"
246
+ budget: "<2s wall · <500 LOC test"
247
+ ```
248
+
249
+ W4's "cycle demo passes" line resolves to `$(command) && echo pass`. The test file should be ≤ 100 LOC. Three `expect()` calls in one test beats three test files. Five `expect()` is a sign the goal is too broad — split the cycle.
250
+
251
+ ### Test-file LOC budget
252
+
253
+ | Tier | Budget per cycle's demo |
254
+ |---|---|
255
+ | trivial | ≤ 30 LOC (one `expect()`) |
256
+ | simple | ≤ 80 LOC |
257
+ | complex | ≤ 150 LOC (still one file) |
258
+
259
+ If a cycle needs more than 150 LOC of test, it's actually two cycles.
260
+
261
+ ### Model × effort × wave × token math
262
+
263
+ Two dials, set per agent. **Model** = cheapest that can decide. **Effort** = lowest that holds.
264
+ Full per-stage routing in `text/templates-plan.md`.
265
+
266
+ ```
267
+ W1 recon bash · none cache check 0 tokens on hit (saves ~12k vs live call); 14-day TTL
268
+ Haiku · low SDK on miss ~5,400-token prefix cached; saves ~4,900/call vs no-cache (40%)
269
+ (inline) ≤5 files no agent spawn — read inline
270
+ W2 decide Opus · high architectural Opus · xhigh if substrate/schema
271
+ Sonnet · medium mechanical inline if trivial
272
+ W3 edit Sonnet · low mechanical edit × N parallel, single message
273
+ Sonnet · medium genuine restructure
274
+ W4 verify bash · none `bun vitest run` 0 LLM tokens
275
+ Haiku · medium × 6 rubric SDK: ~10,900-token block cached; saves ~46k tokens/run (70%)
276
+ demo gate bash · none test exit code 0 tokens
277
+ ```
278
+
279
+ Every check that can be a bash command is a bash command. Tests are bash commands. **The cycle closes when the test exits 0** — no LLM judges the outcome. Progressive disclosure applies at every wave: a Haiku gets the last few turns and top paths, never the full history; a context doc loads only when a `context_triggers:` pattern matches.
280
+
281
+ ### Autonomy gates
282
+
283
+ Cycle closes autonomously when:
284
+ - `bun run verify` exits 0
285
+ - `delta_tsc_errors ≤ 0`
286
+ - Cycle's `demo.command` exits 0
287
+ - Rubric composite ≥ 0.65 (W4 — inline for simple/trivial, 6-Haiku SDK script for complex with spec block cached)
288
+
289
+ Any one fails → cycle stops, root cause filed, **no user prompt unless trust=cautious or W4 loops > 3**.
290
+
291
+ ### Live verification (cycles that touch deploy surfaces)
292
+
293
+ Cycles that modify `web/src/middleware.ts`, `web/astro.config.mjs`, `wrangler.toml`,
294
+ `web/src/pages/api/**`, or any `.tql`/D1 migration are required to add a post-deploy
295
+ HTTP check to W4. Local `bun run verify` is necessary but **not sufficient** — D1
296
+ schema drift and CF adapter changes only surface against the deployed runtime.
297
+
298
+ ```bash
299
+ # W4 step for deploy-surface cycles (zero LLM tokens):
300
+ for path in / /chat /agents; do
301
+ code=$(curl -s -o /dev/null -w "%{http_code}" "https://<deploy-url>$path?_t=$(date +%s)")
302
+ [ "$code" = "200" ] || [ "$code" = "302" ] || { echo "FAIL: $path → $code"; exit 1; }
303
+ done
304
+ ```
305
+
306
+ The cache buster (`?_t=...`) is mandatory — Astro's Layout sets long-cache headers
307
+ on error responses too, so a stale 500 can mask a successful redeploy for up to 24h.
308
+
309
+ ---
310
+
311
+ ## Parallel execution plan
312
+
313
+ The most important section in this file. Everything else is detail.
314
+
315
+ ### Goal-proof ordering (do this before drawing the DAG)
316
+
317
+ Before drawing dependency arrows, ask: **which cycle, if it passes, most cheaply reveals whether the plan goal is achievable?** That cycle goes in batch 1 — even out of strict dependency order if you can stub the missing pieces. The point is to fail fast on a misconceived plan, not to satisfy a build order.
318
+
319
+ | Heuristic | Why |
320
+ |---|---|
321
+ | Cycle whose `Goal delta:` is closest to plan outcome | Earliest signal the goal is reachable |
322
+ | Cycle that ships a user-visible deliverable (route, UI, CLI) | The user can react before you've finished — feedback within the plan, not after |
323
+ | Cycle that can run with stubs for later work | Don't wait for foundations to validate the destination |
324
+ | Cycle you'd demo first if all else failed | If you'd show this one to a user, it should ship first |
325
+
326
+ ### Interface Contract (pin before drawing the DAG)
327
+
328
+ Pin shared names, CLI signatures, and design decisions **here**, before drawing any arrow. Every Batch-1 cycle codes against these frozen decisions — no cycle waits on another's output file.
329
+
330
+ **What to pin:**
331
+
332
+ | # | What | Example |
333
+ |---|---|---|
334
+ | 1 | CLI signatures | `do-reconcile.sh <canon> [<file>… \| --self-test]` — exit 0 = reconciles |
335
+ | 2 | Canon / type names | `substrate · dictionary · authority · sdk · design · navigation · types` |
336
+ | 3 | Collapse decisions | `design canon owns promise-check (was in do-prove.sh)` |
337
+ | 4 | Self-test protocol | `do-reconcile.sh navigation --self-test` exits 0 |
338
+ | 5 | Agent invocation strings | `do-reconcile.sh <canon> <file>` per category |
339
+ | 6 | Shadow → live rename map | `do2-reconcile.sh → do-reconcile.sh` |
340
+ | 7 | Template names | `template-feature.md · template-plan.md · template-todo.md · template-agent.md` |
341
+ | 8 | W2 delegation | W2 = spawned single Opus agent; conductor stays Sonnet |
342
+
343
+ **Contract test:** could every cycle's W2 fill in its diff specs right now, without waiting for another cycle? If yes → contract complete. If no → pin the missing decision before drawing any arrow.
344
+
345
+ ### Cycle-level DAG (what blocks what) — Mermaid, required
346
+
347
+ Every plan ships this graph. `/do` reads it to compute batches and fire the maximum parallel fan-out. Siblings (no edge between them) run their waves concurrently.
348
+
349
+ ```mermaid
350
+ graph TD
351
+ C1[C1 foundation] --> C2[C2]
352
+ C1 --> C3[C3]
353
+ C1 --> C4[C4]
354
+ C2 --> C5[C5 composes C2-C4]
355
+ C3 --> C5
356
+ C4 --> C5
357
+ C5 --> C6[C6 polish]
358
+ ```
359
+
360
+ C2·C3·C4 have no edge between them → fully parallel. Label every edge with the file that justifies it (`C1 -->|writes lib/foo.ts| C2`) when it isn't obvious.
361
+
362
+ **The only valid arrow:** C_m reads a file that C_n **writes** (the file is absent or wrong until C_n completes on disk). **No other reason justifies an arrow.**
363
+
364
+ **Arrow test — before drawing any arrow, fill this in:**
365
+ ```
366
+ C_m → C_n because C_n imports/reads `{exact file path}` which C_m creates/rewrites.
367
+ ```
368
+ Cannot name the exact file → delete the arrow.
369
+
370
+ | Real blocker | Imaginary blocker — delete the arrow |
371
+ |---|---|
372
+ | C_n imports a type C_m defines in a new file | "same feature area" |
373
+ | C_n's API route reads a DB schema C_m migrates | "might have merge conflicts" |
374
+ | C_n's W2 needs C_m's output shape to make decisions | C_n creates a NEW file (no blocker — just create it) |
375
+ | C_n calls an endpoint C_m adds | "better to do in order" |
376
+ | | "logically should come first" |
377
+ | | "we don't want too much in flight" |
378
+ | | "I want to review C_m before starting C_n" — that's review policy, not a blocker |
379
+
380
+ **Single-cycle plan → skip the DAG, list only the parallel agent map below.**
381
+
382
+ ### Batches (DAG flattened — what fires together)
383
+
384
+ Mirrors `batches:` frontmatter. Each batch fires the moment the previous batch's W4 closes.
385
+
386
+ | Batch | Cycles | What runs in parallel |
387
+ |-------|--------|----------------------|
388
+ | 0 | (shared W0 + W1) | baseline + read of every `shared_recon:` file, ONE message of N Haikus |
389
+ | 1 | C1 | full W1→W4 |
390
+ | 2 | C2, C3, C4 | THREE cycles run W1→W4 in lockstep; their W3a's merge into ONE Sonnet message |
391
+ | 3 | C5 | full W1→W4 |
392
+ | 4 | C6 | full W1→W4 |
393
+
394
+ **The fan-out rule.** When batch N contains cycles C_a, C_b, C_c:
395
+
396
+ - **Shared W0:** never re-run. Baseline captured in batch 0; W4 diffs against the same `.w0-baseline.json` for every cycle.
397
+ - **W1 (recon):** every cycle's W1 file list is union'd, deduped against `shared_recon:`, and the remaining unique files are read in ONE message of `min(unique_files, parallel_budget.haiku)` agents.
398
+ - **W2 (decide):** ONE Opus call per cycle (can't merge — each cycle is its own architectural decision). These N Opus calls fire in parallel in ONE message (subject to `parallel_budget.opus`).
399
+ - **W3a (edit):** ALL independent edits from ALL cycles in the batch merge into ONE Sonnet spawn message. If C2 has 8 edits and C3 has 6 edits and C4 has 4 edits, that's 18 Sonnet agents in one message (subject to `parallel_budget.sonnet`).
400
+ - **W3b (edit, dependent):** runs the moment its W3a counterpart settles, regardless of which cycle it belongs to.
401
+ - **Demo gates:** all cycles' `demo.command` test files run in ONE `vitest run` invocation: `bun vitest run tests/e2e/c2.test.ts tests/e2e/c3.test.ts tests/e2e/c4.test.ts`. Single bash, zero LLM tokens, one ratchet check.
402
+ - **W4 rubric (if COMPLEX):** rubric agents are PER-CYCLE — 5 Haikus × N cycles = 5N Haikus in ONE message (subject to `parallel_budget.haiku`).
403
+
404
+ ### Plan-level shared steps (run once per plan, never per cycle)
405
+
406
+ | Step | When | What |
407
+ |------|------|------|
408
+ | Shared W0 baseline | At plan start | `bun run verify` + `.w0-baseline.json` — every cycle's W4 ratchet reads this |
409
+ | Shared W1 recon | At plan start, after W0 | All `shared_recon:` files read in one Haiku spawn; results cached for cycle consumption |
410
+ | Final compress sweep | At plan end | `ts-prune` + `noUnusedLocals` once, after the last batch |
411
+ | Final docs/improvements append | At plan end | One write, not N |
412
+
413
+ ### Cross-cycle W3 batch (the big win)
414
+
415
+ When batch N has 3 cycles each with 8 independent edits, the naive path is 3 sequential W3a's (3 round-trips). The batch path is **ONE** W3a — 24 Sonnet agents in one message.
416
+
417
+ **Pre-condition check (`/do` runs at batch start):**
418
+ ```bash
419
+ # All target files unique across cycles in this batch?
420
+ sort .batch-{N}-targets.txt | uniq -d
421
+ # Empty → cross-cycle W3 merge eligible.
422
+ # Non-empty → split: shared files go to W3b after the unique ones land.
423
+ ```
424
+
425
+ **Anti-pattern reject (W3 must NOT merge if):**
426
+ - Two cycles' W3a's edit the same file with different anchors (race; even if anchors don't collide on disk, the W4 agent gets confused which cycle authored which)
427
+ - One cycle's W3a creates a file another cycle's W3a edits in the same batch (W3b territory)
428
+
429
+ ### Per-cycle W3a/W3b template
430
+
431
+ Fill in at each cycle's W2. The cross-cycle merge happens automatically when batch eligibility passes.
432
+
433
+ ```
434
+ C_n W3a — independent edits (file-disjoint, spawned in one message):
435
+ agent-1 → src/pages/api/foo.ts # new endpoint
436
+ agent-2 → src/lib/foo.ts # helper module
437
+ agent-3 → docs/foo.md # doc update — always parallel to its code file
438
+
439
+ C_n W3b — dependent edits (run after W3a settles):
440
+ agent-4 → src/pages/api/foo.ts # adds import that agent-2 just defined
441
+ ```
442
+
443
+ Empty W3b = preferred. If W3b is non-empty, prefer to flip the dependency by having W3a define the symbol first.
444
+
445
+ ### Imaginary blockers — the explicit reject list
446
+
447
+ Reject any of these as reasons to add an arrow or move cycles to later batches:
448
+
449
+ - ❌ "Should test C1 before starting C2" — that's W4's job, not a dependency
450
+ - ❌ "Same feature area" / "same folder"
451
+ - ❌ "Both touch the database" — only blocks if they touch the same schema entity
452
+ - ❌ "Logical reading order matters for the doc" — write order, not build order
453
+ - ❌ "Don't want too many agents at once" — `parallel_budget:` already caps this
454
+ - ❌ "C2 depends on what we learn in C1" — that's W2 learning, not file dependency; if no file is read, no arrow
455
+ - ❌ "Cycles in the same plan should be sequential by default" — they should not
456
+ - ❌ "Need to see C1 results before scoping C2" — scope at plan time; if you can't, the plan is unready
457
+
458
+ ---
459
+
460
+ ## Checkbox auto-tick contract
461
+
462
+ Every actionable item in this plan is a checkbox. `/do` ticks them **the
463
+ moment the action finishes** — never at the end of the run, never on
464
+ "I'll do it next time." This is how a user reading the file mid-run can
465
+ tell at a glance what's done.
466
+
467
+ ### The rule
468
+
469
+ | When | Action | Who marks it |
470
+ |------|--------|--------------|
471
+ | A W1 recon agent returns findings for a file | `- [ ] {file}` → `- [x] {file}` | `/do` after the agent settles |
472
+ | A W2 decision item resolves (verdict filed, slot map populated, diff spec output) | item → `[x]` | `/do` inline |
473
+ | A W3 edit agent reports success on a file | `- [ ] {file}` → `- [x] {file}` | `/do` after the agent settles |
474
+ | A W4 check passes (`bun run verify`, demo command, rubric, doc-sync) | item → `[x]` | `/do` after the bash exits 0 |
475
+ | A wave's items are all `[x]` | the wave header (`W1 recon`, `W2 decide`, etc.) → `[x]` | `/do` derives |
476
+ | A cycle's four waves are all `[x]` | the cycle header (`C1 — name`) → `[x]` | `/do` derives |
477
+ | A batch's cycles are all `[x]` | the batch header (`Batch 2`) → `[x]` | `/do` derives |
478
+ | All batches `[x]` + plan rubric ≥ 0.65 | `Plan close` items → `[x]` | `/do` derives |
479
+
480
+ ### What this means in practice
481
+
482
+ - **No silent progress.** Every Bash exit and every agent return triggers a checkbox edit. The file IS the progress bar.
483
+ - **Mid-run readability.** A user can `cat dashboard-todo.md` while `/do --auto` is running and see exactly where the wave fan-out is.
484
+ - **Forward-only.** A `[x]` is never un-ticked except by an explicit `/do --wave N --redo`.
485
+ - **Derivation, not duplication.** When `/do` sees all four waves of C2 are `[x]`, it ticks the `C2 —` header. The user doesn't tick anything manually unless they're overriding.
486
+ - **Auto-advance.** When a batch closes, the next batch's blocker chain unblocks and its cycles flip from `state: blocked` to `state: ready`. `/do --auto` fires the next batch's W1 immediately — no prompt.
487
+
488
+ ### Granularity (one checkbox per atomic action)
489
+
490
+ | Layer | What gets a checkbox |
491
+ |-------|---------------------|
492
+ | Plan | Shared W0, shared W1, final compress sweep, final docs append, plan rubric, plan close |
493
+ | Batch | The batch header (auto-derived) |
494
+ | Cycle | The cycle header (auto-derived) + the four wave headers |
495
+ | Wave | Every sub-item (file, decision, check) |
496
+ | W1 | One checkbox per file recon'd |
497
+ | W2 | One per compose verdict, one per diff spec, one per doc-plan trigger |
498
+ | W3a | One per file edited (parallel) |
499
+ | W3b | One per file edited (sequential, after W3a) |
500
+ | W4 | One per verify check (verify · demo · doc-sync · rubric · ratchet) |
501
+
502
+ If an action has no checkbox, it isn't tracked — and untracked work is the loudest possible code smell.
503
+
504
+ ---
505
+
506
+ ## Status (DAG-derived kanban, not flat checkboxes)
507
+
508
+ `/do` reads this section and the `batches:` frontmatter together. A cycle's state is **derived**, not declared:
509
+
510
+ - `ready` — all blockers in the DAG are `[x]` AND the batch is active
511
+ - `blocked` — at least one blocker is `[ ]`
512
+ - `in_flight` — at least one wave `[~]`
513
+ - `done` — all four waves `[x]`
514
+
515
+ ```
516
+ Batch 0 (shared)
517
+ - [ ] W0 baseline (plan-level)
518
+ - [ ] W1 shared recon (plan-level)
519
+
520
+ Batch 1
521
+ - [ ] C1 — {name} state: ready
522
+ - [ ] W1 recon
523
+ - [ ] W2 decide
524
+ - [ ] W3 edit
525
+ - [ ] W4 verify
526
+
527
+ Batch 2 (fires the instant C1 closes)
528
+ - [ ] C2 — {name} state: blocked-on-C1
529
+ - [ ] W1 · W2 · W3 · W4
530
+ - [ ] C3 — {name} state: blocked-on-C1
531
+ - [ ] W1 · W2 · W3 · W4
532
+ - [ ] C4 — {name} state: blocked-on-C1
533
+ - [ ] W1 · W2 · W3 · W4
534
+ - [ ] demo batch (vitest run c2.test c3.test c4.test)
535
+
536
+ Batch 3
537
+ - [ ] C5 — {name} state: blocked-on-C2,C3,C4
538
+ - [ ] W1 · W2 · W3 · W4
539
+
540
+ Batch 4
541
+ - [ ] C6 — {name} state: blocked-on-C5
542
+ - [ ] W1 · W2 · W3 · W4
543
+
544
+ Plan close
545
+ - [ ] **Plan outcome command exits 0** (the ONLY definition of "plan shipped")
546
+ - [ ] **Every row in `deliverables:` table is shipped and reachable** (route returns 2xx, component renders, CLI verb runs, api responds)
547
+ - [ ] **ux_after journey is walkable end-to-end** — record the screenshot / log / CLI session that proves it
548
+ - [ ] Justify-or-drop review on any cycles unstarted after outcome passed
549
+ - [ ] Final compress sweep
550
+ - [ ] Final docs/improvements.md append
551
+ - [ ] Plan rubric ≥ 0.65 across all cycles (goal-fit weight 0.35)
552
+ ```
553
+
554
+ ---
555
+
556
+ ## C1 — {name} [tier: {trivial|simple|complex} · batch: 1]
557
+
558
+ **Goal delta:** {ONE sentence — after this cycle closes, plan outcome is closer because `{observable}` is now true. If you can't write this, drop the cycle.}
559
+
560
+ **Deliverable:** {ONE row from the plan `deliverables:` table — the artifact this cycle owns. e.g. `route: /chat/memory — operator can view + edit company memory`}
561
+
562
+ **UX delta:** {ONE sentence — what the user can do after this cycle that they couldn't before. "Operator now sees company memory in the chat sidebar." If "no user-visible change," say so explicitly — internal-only cycles must justify why they ship before a user-visible one.}
563
+
564
+ **Cycle outcome:** {verifiable — bash command / test name / API shape / Lighthouse score}
565
+
566
+ ✓ valid: "`bun run verify` passes AND `GET /api/foo` returns `{id, name}`"
567
+ ✗ invalid: "implementation complete" · "looks good" · "done" · anything needing human judgment
568
+
569
+ **Contributes to plan outcome:** yes / partial / no — if `no`, drop this cycle.
570
+
571
+ **Demo gate (the only test that decides this cycle):**
572
+ ```yaml
573
+ demo:
574
+ command: "bun vitest run tests/e2e/c1.test.ts"
575
+ asserts: "{one goal sentence}"
576
+ budget: "<2s wall · <80 LOC test"
577
+ ```
578
+
579
+ ### W1 — Recon [Haiku · parallel · merged across batch]
580
+
581
+ /do spawns recon agents for ALL files in this batch's W1 list in a **single message**, deduped against `shared_recon:` cache. List only cycle-specific files that exist now (shared files are already cached at plan start).
582
+
583
+ **Two mandatory recon tracks** — every cycle, no exceptions. Every file is a checkbox; `/do` ticks each as its Haiku settles.
584
+
585
+ 1. **Existing-code recon** (what currently does this job)
586
+ - [ ] `src/pages/api/chat.ts` — {what to find: current handler shape, streaming approach}
587
+ - [ ] `src/lib/agents.ts` — {what to find: agent registry structure}
588
+
589
+ 2. **Primitive-inventory recon** (what we will compose, not rewrite)
590
+ - [ ] `web/src/components/{nearest-folder}/` — list files; mark each `✓ shipped` or `✗ missing`
591
+ - [ ] `web/src/components/ai-elements/` — name the primitives in scope (e.g. `PromptInput*`, `Conversation`, `Message`, `MessageList`)
592
+ - [ ] `web/src/components/ui/` — name the primitives (`Card`, `Drawer`, `Button`, `Icon`, `IconBadge`)
593
+ - [ ] `@/lib/` — name any helpers (`emitClick`, `cn`, `fetchSSE`, etc.) this cycle will reuse
594
+
595
+ Recon agents return the **public API** (exported names + key prop signatures)
596
+ of every primitive they find. W2 cannot decide compose-vs-construct without
597
+ this — make it explicit, not implicit.
598
+
599
+ ### W2 — Decide [Opus if complex · Sonnet if simple · inline if trivial]
600
+
601
+ /do resolves these from W1 findings. Write the real questions now — W2 answers them.
602
+
603
+ Every item below is a checkbox; `/do` ticks each as it resolves.
604
+
605
+ - [ ] **Goal-delta verified** — the cycle's `Goal delta:` sentence holds against the proposed diff. If the diff doesn't move plan outcome closer, drop the cycle.
606
+ - [ ] **Deliverable confirmed** — this cycle owns exactly one `deliverables:` row and the diff produces it
607
+ - [ ] **UX delta articulated** — the after-state observable is named (route reachable, component visible, CLI verb returns, etc.)
608
+ - [ ] **Compose-or-construct verdict** filed for every proposed new file
609
+ - [ ] **Slot map** populated (when composing — before any W3 file is listed)
610
+ - [ ] **Architectural questions** answered
611
+ - [ ] **Diff specs output** for every W3 target
612
+ - [ ] **Doc-plan** filed (`.w2-doc-plan.json`) if any trigger applies (new primitive · rename · public surface · directory contract)
613
+
614
+ **Compose-or-construct verdict (mandatory — top of W2, before any other decision):**
615
+
616
+ For each proposed new file in this cycle:
617
+
618
+ | Proposed file | Closest existing primitive | Gap | Verdict |
619
+ |---|---|---|---|
620
+ | `{new-file-path}` | `{primitive path}` does `{X}` | `{lacks Y}` | **compose** (slot into primitive) / **extend** (PR to primitive) / **new** (justified — closes which behaviour) |
621
+
622
+ If the verdict column reads "new" for more than one file, justify each
623
+ separately. Default to "compose"; "new" is the exception, not the default.
624
+
625
+ **Slot map** (when composing — fill this in before any W3 file is listed):
626
+
627
+ | Primitive | Slot used | What this cycle puts in it |
628
+ |---|---|---|
629
+ | `PromptInputHeader` | header slot | `{this cycle's content}` |
630
+ | `PromptInputFooter` | footer slot | `{this cycle's content}` |
631
+ | `{other primitive}` | `{slot}` | `{this cycle's content}` |
632
+
633
+ **Then the architectural questions:**
634
+
635
+ - [ ] Does `{file}` need a new function or can the existing one extend?
636
+ - [ ] {specific architectural question this cycle must settle}
637
+
638
+ ### W3 — Edit [Sonnet · parallel]
639
+
640
+ W2 fills in the anchors. Mark which edits are independent vs dependent.
641
+
642
+ **W3a — independent (spawned in one message):**
643
+ - [ ] `src/pages/api/chat.ts` — {what changes}
644
+ - [ ] `text/chat-plan.md` — {doc update parallel to code change}
645
+
646
+ **W3b — dependent (single message, after W3a completes):**
647
+ - [ ] `src/lib/agents.ts` — {depends on W3a output in chat.ts}
648
+
649
+ If all edits are independent, leave W3b empty — empty W3b = one fewer round-trip.
650
+
651
+ ### W4 — Verify [Haiku×6 SDK-cached if complex · inline composite if simple/trivial]
652
+
653
+ - [ ] `bun run verify` green (biome + tsc + vitest)
654
+ - [ ] `delta_tsc_errors ≤ 0` (hard gate — no new type errors introduced)
655
+ - [ ] {specific functional check — curl / test name / route}
656
+ - [ ] **Reuse audit** (hard gate — block if any line fails):
657
+ - [ ] Every primitive in the W2 slot map appears as an import in the new code (`grep -l "from '@/components/{primitive}'" {new files}`)
658
+ - [ ] No reimplementation: greps from W2 anti-patterns table return zero hits in this cycle's new files
659
+ - [ ] `wc -l {new files}` total ≤ W2-declared LOC budget
660
+ - [ ] `delta_loc_net` matches or beats W2 target (negative preferred when this cycle replaces a bespoke widget)
661
+ - [ ] **Deliverable shipped** — the `Deliverable:` row is live: route returns 2xx, component renders, CLI verb runs, api responds. Verified by bash, not vibes.
662
+ - [ ] **UX delta observable** — record the after-state proof (curl output, screenshot path, log line) and paste it into the cycle close note.
663
+ - [ ] **Plan outcome re-check** — `$(plan.outcome)` exit code recorded; if 0, trigger justify-or-drop on remaining cycles.
664
+ - [ ] **Goal-fit ≥ 0.50** (hard gate — cycle that didn't measurably move plan outcome cannot pass).
665
+ - [ ] Rubric composite ≥ 0.65 — `0.35·goal-fit + 0.20·security + 0.20·stability + 0.15·simplicity + 0.10·speed`
666
+
667
+ Targets: **goal-fit ≥ 0.80** · security ≥ 0.90 · stability ≥ 0.85 · simplicity ≥ 0.85 · speed ≥ 0.80
668
+
669
+ **Simplicity scoring penalty:** any new file the reuse audit flags as
670
+ "could compose instead" drops simplicity by 0.10 per file. Two flagged files
671
+ fail the rubric on simplicity alone.
672
+
673
+ Report: `delta_tsc=±N delta_loc=±N compress_orphans=N new_files=N primitives_composed=N`
674
+
675
+ ---
676
+
677
+ ## C2 — {name} [tier: {trivial|simple|complex}]
678
+
679
+ **Goal delta:** {one sentence}
680
+ **Deliverable:** {one row from `deliverables:` table}
681
+ **UX delta:** {one sentence — or "internal-only, justified by X"}
682
+ **Cycle outcome:** {verifiable}
683
+
684
+ ### W1 — Recon [Haiku · parallel]
685
+
686
+ - `{file}` — {what to find}
687
+
688
+ ### W2 — Decide [Opus if complex · Sonnet if simple · inline if trivial]
689
+
690
+ - {question}
691
+
692
+ ### W3 — Edit [Sonnet · parallel]
693
+
694
+ **W3a:**
695
+ - [ ] `{file}` — {what changes}
696
+
697
+ **W3b:**
698
+ *(empty — all edits independent)*
699
+
700
+ ### W4 — Verify [Haiku×6 SDK-cached if complex · inline if simple/trivial]
701
+
702
+ - [ ] `bun run verify` green
703
+ - [ ] {specific check}
704
+ - [ ] deliverable shipped + ux delta observable
705
+ - [ ] plan outcome re-check recorded
706
+ - [ ] goal-fit ≥ 0.50 (hard) · composite ≥ 0.65
707
+
708
+ ---
709
+
710
+ ## See also
711
+
712
+ - `text/templates-plan.md` — per-stage template/skill/agent registry + model·effort routing
713
+ - `text/template-plan.md` — the DESIGN template this todo is planned from (DESIGN phase)
714
+ - `text/template-feature.md` — the PROMISE template (PROMISE phase)
715
+ - `docs/relevant-spec.md` — {why relevant to this plan}
716
+ - `src/relevant/file.ts` — {why relevant}
717
+ - `one/dictionary.md` — canonical names (always)
718
+ - `one/rubrics.md` — scoring bands (always)
719
+
720
+ ---
721
+
722
+ ## Authoring rules (strip this section before committing the file)
723
+
724
+ **Goal first, deliverables second, reuse third.** Before scaffolding any cycle:
725
+
726
+ 1. Fill in `goal:`, `outcome:`, `ux_before:`, `ux_after:`, `ux_delta:` in the frontmatter. If `outcome:` isn't a bash command that can exit 0, the plan is unready — go back to the goal.
727
+ 2. Fill in `deliverables:` — concrete artifacts (route, component, api, cli, migration, doc). Every row maps to exactly one cycle. If you can't name what ships, you can't plan it.
728
+ 3. For every cycle, write `Goal delta:`, `Deliverable:`, `UX delta:`. If you can't fill in any one of them, the cycle doesn't belong in this plan.
729
+ 4. Fill in `existing_primitives:` in the frontmatter (≥3 entries). If you can't
730
+ name 3, your recon is incomplete — go back to the codebase.
731
+ 5. Walk the compose-first taxonomy table above for every new file you imagine.
732
+ The default verdict is "compose", not "new".
733
+ 6. Every cycle's W1 has **two tracks** — existing-code AND primitive-inventory.
734
+ 7. Every cycle's W2 begins with the **goal-delta + deliverable + UX-delta check**, THEN the **compose-or-construct verdict table**.
735
+ 8. Every cycle's W4 runs the **reuse audit**, records the **deliverable proof** (curl/screenshot/log), and re-runs the **plan outcome command** as hard gates.
736
+
737
+ If a cycle proposes ≥3 new files, that is the loudest possible smell. Stop,
738
+ re-recon, and ask: *which of these is actually a slot-fill into something we
739
+ already ship?*
740
+
741
+ **Tiers:**
742
+ - `trivial` — ≤3 files, ≤20 LOC, no new types/routes/schema → /do skips all agent spawns
743
+ - `simple` — ≤6 files, clear scope, no new primitives → W2 Sonnet, W4 inline
744
+ - `complex` — multi-file architecture, new primitives, schema/API changes → full W1→W4
745
+
746
+ A `complex` plan with **no `existing_primitives:` entries** is malformed —
747
+ nothing in this codebase is built on bare ground.
748
+
749
+ **Blockers (cycle-level):**
750
+ - An arrow is valid only when you can name the exact file: `"C_n reads {file} that C_m writes"`
751
+ - Default is parallel — omit the arrow and let cycles run simultaneously unless the file test passes
752
+ - Never block on: "feels related", "same feature", "might conflict", "better to do in order"
753
+ - Only reference cycle IDs from THIS file or verified task IDs from other todo files
754
+
755
+ **Exit criteria:**
756
+ - Must be checkable with a command, test name, or API call
757
+ - Write it as: "`{command}` returns `{exact output}` OR `{test name}` passes"
758
+
759
+ **W1 files:**
760
+ - List only files that exist right now — /do validates paths before spawning
761
+ - If a file doesn't exist yet, note it as "create new" in the W3a list, not W1
762
+
763
+ **W3 split:**
764
+ - Default is W3a — all agents in one message
765
+ - Move an edit to W3b ONLY when: `"this edit targets {file} which W3a agent-N already touches"`
766
+ - If you can't name the W3a agent and file, it stays in W3a
767
+ - Empty W3b = one fewer round-trip — preferred outcome
768
+
769
+ **source_of_truth:**
770
+ - These files are injected into W2 context on every cycle
771
+ - Keep to ≤5; more = token waste; pick the most load-bearing spec files
772
+
773
+ **escape:**
774
+ - Write a verifiable condition, not a judgment ("C2 W4 delta_tsc > 0 twice" not "if it gets too hard")
775
+ - Action must say what the human should do next — re-scope / re-read spec / escalate
776
+ - Leave both fields empty (`""`) for plans with no known failure modes; /do auto-halts at W4 max loops
777
+
778
+ **context_triggers:**
779
+ - Pattern is matched against W1 findings (file paths + excerpts) — regex, case-insensitive
780
+ - Inject only the *section* you need: `"one/signals.md § Six Verbs"` not the whole file
781
+ - If you don't know which patterns will fire, leave empty — do.md's built-in triggers still apply