@martintrojer/mu 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/ROADMAP.md CHANGED
@@ -1,542 +1,166 @@
1
1
  # Roadmap
2
2
 
3
- What's coming after [0.1.0](../CHANGELOG.md), with full design
4
- rationale per item. This is the **single forward-looking doc**: if
5
- a feature isn't listed here, it isn't planned. If it's listed but
6
- unbuilt, see its promotion criteria for what would move it.
3
+ The single forward-looking doc. If a feature isn't here, it isn't
4
+ planned.
7
5
 
8
- For canonical terms, see [VOCABULARY.md](VOCABULARY.md). For
9
- pillars that must not bend, see [VISION.md](VISION.md). For module
10
- layout and data flow, see [ARCHITECTURE.md](ARCHITECTURE.md).
6
+ For canonical terms see [VOCABULARY.md](VOCABULARY.md). For
7
+ load-bearing pillars see [VISION.md](VISION.md). For module
8
+ layout see [ARCHITECTURE.md](ARCHITECTURE.md). Shipped history
9
+ lives in [CHANGELOG.md](../CHANGELOG.md).
11
10
 
12
11
  ---
13
12
 
14
- ## Promotion criteria (the only bar)
13
+ ## Promotion criteria
15
14
 
16
15
  A roadmap item earns implementation when **all three** are true:
17
16
 
18
- 1. **Proven friction.** A real user (us, internal users, early
19
- adopters) hits the missing feature in a real workflow at least
20
- twice. "Imagined polish" doesn't count.
21
- 2. **No pillar refactor.** The addition fits the current substrate
22
- without bending any of the load-bearing pillars (see
23
- [VISION.md](VISION.md)).
24
- 3. **Bounded scope.** The addition fits in **<300 LOC** or has a
25
- clear smaller subset that does.
26
-
27
- If an item drops below the bar (no longer has criterion 1 met after
28
- real use), it moves to the bottom or is removed. We don't keep
29
- phantom plans alive.
30
-
31
- **Exception: data-loss footguns.** A change that fixes a default
32
- that silently destroys user artifacts (uncommitted output, scratch
33
- logs, benchmark results, etc.) ships on the **first** occurrence,
34
- not the second. The cost of waiting for criterion 1 is "lose more
35
- stuff"; that's the wrong cost to optimise. Document the friction
36
- in the commit message instead.
37
-
38
- **Polish doesn't count as promotion.** Bug fixes, ergonomic
39
- improvements, error-message wording, doc tightening, and similar
40
- "the existing thing works better" changes don't need promotion
41
- criteria — they just need to be small and to ship clean (typecheck
42
- + lint + tests + build). Polish is the dividend the project earns
43
- by refusing the things on this roadmap. Don't wait for occurrence
44
- #2 to fix a typo, tighten an error message, or truncate a runaway
45
- table column.
17
+ 1. **Proven friction.** A real user hits the missing feature in a
18
+ real workflow at least twice. Imagined polish doesn't count.
19
+ 2. **No pillar refactor.** Fits the current substrate without
20
+ bending any pillar in [VISION.md](VISION.md).
21
+ 3. **Bounded scope.** Fits in <300 LOC or has a clear smaller
22
+ subset that does.
23
+
24
+ **Exceptions.** Data-loss footguns (silent destruction of user
25
+ artifacts) ship on the *first* occurrence. Polish — bug fixes,
26
+ ergonomic tweaks, error-message wording, doc tightening doesn't
27
+ need promotion at all; just ship clean (typecheck + lint + tests +
28
+ build).
46
29
 
47
30
  ---
48
31
 
49
- ## Anti-feature pledges (still in force; reinforced by an internal critique)
32
+ ## Anti-feature pledges
50
33
 
51
34
  We will NOT, until each one earns its way back via the criteria
52
- above. Each pledge is a specific accumulation a prior internal
53
- multi-agent runtime made and mu chose not to inherit; an internal
54
- critique made the case sharply (TL;DR: that runtime's breadth had
55
- hidden state, lifecycle bugs, unclear ownership of truth, and high
56
- model-facing tool entropy).
57
-
58
- - Add a configuration file. All config is CLI flags or env vars.
59
- - Add a daemon, watcher, or background process beyond what tmux /
35
+ above:
36
+
37
+ - **Config file.** All config is CLI flags or env vars.
38
+ - **Daemon / watcher / background process** beyond what tmux and
60
39
  SQLite give us.
61
- - Add abstractions that exist for "future flexibility" with no
62
- current consumer (a prior internal LLM-runtime's `RunContext`
63
- trait was the cautionary tale).
64
- - Add wrappers around wrappers (stream-of-streams wrappers we've
65
- seen before `TextStream`/`TextState`/`StreamResult` shapes
66
- are the cautionary tale).
67
- - Generate code, embed a JS engine, or use any macro/decorator
68
- pattern beyond TypeScript itself. (Council: "A workflow DSL that
69
- becomes 'programming the runtime' is a liability.")
70
- - Ship a template/definition system for agent roles. Spawn flags +
71
- the orchestrator's first message are the only "definition."
72
- - Add a render layer beyond `cli-table3` + `picocolors`.
73
- - Bundle pi. The pi extension is the only anticipated future
74
- caller; even that is required to be a thin facade over the SDK
75
- (see Pi extension and the three rules](#pi-extension-and-the-three-rules)
76
- below).
77
- - Add a plugin runtime, a web UI, an RPC layer, a chat or docs
78
- integration, a memory system, or a workflow engine. (These are
79
- the kinds of accumulated subsystems the council critique flagged
80
- as costing more than they pay for. mu has none and intends to
81
- keep it that way.)
40
+ - **Anticipatory abstractions** with zero current consumer (the
41
+ cautionary tale: a `RunContext` trait with no implementor).
42
+ - **Wrappers around wrappers** (cautionary tale:
43
+ `TextStream`/`TextState`/`StreamResult`).
44
+ - **Codegen, embedded JS engine, macros, decorators** beyond
45
+ TypeScript itself. No workflow DSL.
46
+ - **Template/definition system for agent roles.** Spawn flags +
47
+ the orchestrator's first message ARE the definition.
48
+ - **Render layers beyond `cli-table3` + `picocolors`**, except
49
+ `ink` confined to `src/cli/tui/`. No second TUI stack alongside
50
+ `ink` if `ink` ever stops paying off, *replace* it; don't
51
+ stack stacks.
52
+ - **Bundle pi.** It's a peer dep.
53
+ - **Plugin runtime, web UI, RPC, chat/docs integrations, memory
54
+ system, workflow engine.** Rejected as a class — these are
55
+ exactly the accumulations a prior internal multi-agent runtime
56
+ collected, and not inheriting them is the point.
82
57
 
83
58
  ---
84
59
 
85
60
  ## Possible — small additions with an obvious shape
86
61
 
87
- These have a clear design but haven't yet hit criterion 1 (proven
88
- friction in ≥2 real workflows). They earn implementation when real
89
- use surfaces them.
90
-
91
- The section heading is deliberately "Possible," not "Next." "Next"
92
- implies it's coming. "Possible" doesn't. Items below ship if and
93
- when they earn it.
94
-
95
- ### Pi extension and the three rules
96
-
97
- The pi extension is the first "polish" tier — LLM-facing UX
98
- (typed `mu_*` tools, HUD widget, wakeups) that wraps the same core
99
- operations the CLI already exposes. Bundled in the same npm
100
- package; pi is a peer dep.
101
-
102
- The pi extension is **the only anticipated future caller**. When /
103
- if it lands, three rules stay non-negotiable:
104
-
105
- 1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
106
- Extension reads/writes it through the same modules the CLI uses.
107
- No extension-only state.
108
- 2. **Every operation works from the CLI.** No tool registered in
109
- the extension has logic that doesn't exist in the CLI. The
110
- extension is a typed/integrated facade.
111
- 3. **The skill teaches the CLI.** Pi sessions without the extension
112
- still get a working mu by following [the bundled
113
- skill](../skills/mu/SKILL.md).
114
-
115
- If those three rules hold, mu stays driveable from a shell forever
116
- and the extension stays thin.
117
-
118
- ### `mu adopt <pane-id> [--name <agent>]` — SHIPPED in v0.2 (`e20af89`)
119
-
120
- Reconciliation surfaces orphan panes; `mu adopt` formally registers
121
- one of them as a managed agent. Promotion was triggered by the
122
- multi-agent dogfood pattern (orchestrator runs in a pane outside
123
- the `mu-<ws>` session and wants to be claimable as a worker).
62
+ These have a clear design but haven't yet hit promotion criterion
63
+ 1 (friction in ≥2 real workflows). They earn implementation when
64
+ real use surfaces them.
124
65
 
125
- ### Heterogeneous CLI status detection (claude, codex, ...)
66
+ ### Per-CLI status detection (claude, codex, )
126
67
 
127
- mu is a pi orchestrator today, BUT v0.2 added a Braille-spinner
128
- fallback (`f68838f`) that catches every TUI wrapper using
129
- standard spinner glyphs (U+2800–U+28FF). pi-meta + solo are now
130
- covered without a per-CLI detector. Other vanilla TUIs (claude,
131
- codex) inherit the same fallback.
68
+ mu is a pi orchestrator today. v0.2's Braille-spinner fallback
69
+ catches every TUI wrapper using standard spinner glyphs
70
+ (U+2800–U+28FF), so pi-meta + solo + many vanilla TUIs (claude,
71
+ codex) work without a per-CLI detector.
132
72
 
133
- For patterns the spinner fallback misses (e.g. permission
134
- prompts), a per-CLI `Detector` registry keyed by CLI name (~50
135
- LOC per CLI) is the obvious shape. Promote when a real
136
- specific-prompt-misclassification surfaces.
73
+ For patterns the spinner fallback misses (permission prompts,
74
+ specific busy markers), a per-CLI `Detector` registry keyed by
75
+ CLI name (~50 LOC per CLI) is the obvious shape. Promote when a
76
+ real specific-prompt-misclassification surfaces.
137
77
 
138
- Pattern sketch (ported from a prior internal multi-agent runtime's
139
- per-CLI detector — kept here for whoever picks it up):
78
+ Pattern sketch:
140
79
 
141
- | CLI | Busy patterns | Permission patterns |
142
- | -------- | ------------------------------------------ | --------------------------------------------------------- |
143
- | Claude | `to interrupt`, `\(.*[↑↓].*tokens\)` | `Allow once`, `Allow for this session`, `Esc to cancel` |
144
- | Codex | `esc to interrupt)`, `to cancel` | `enter to confirm`, `enter to submit \| esc to cancel` |
145
- | Pi | (well-known mu-defined marker) | (well-known mu-defined marker) — shipped |
80
+ | CLI | Busy patterns | Permission patterns |
81
+ | ------ | -------------------------------------- | --------------------------------------------------------- |
82
+ | Claude | `to interrupt`, `\(.*[↑↓].*tokens\)` | `Allow once`, `Allow for this session`, `Esc to cancel` |
83
+ | Codex | `esc to interrupt)`, `to cancel` | `enter to confirm`, `enter to submit \| esc to cancel` |
84
+ | Pi | (well-known mu-defined marker) | (well-known mu-defined marker) — shipped |
146
85
 
147
86
  Critical subtleties any new detector must keep:
148
87
 
149
88
  - **Tail-window extraction**: take last ~100 lines, strip trailing
150
- blanks, then take last ~20. Prevents stale scrollback
151
- false-positives. Already implemented for pi in `src/detect.ts`;
152
- the registry version factors this out.
89
+ blanks, then take last ~20. Already implemented for pi in
90
+ `src/detect.ts`; the registry version factors it out.
153
91
  - **Permission detection uses a narrower window than busy
154
- detection** prevents already-answered prompts triggering
155
- re-detection.
156
- - **Permission patterns override busy** — if a permission prompt
157
- is visible, agent is `NeedsPermission`, not `Busy`.
158
-
159
- ### `tasks_v` enriched view
160
-
161
- ```sql
162
- CREATE VIEW tasks_v AS
163
- SELECT t.*,
164
- GROUP_CONCAT(n.content, char(10) || '---' || char(10)) AS notes,
165
- COUNT(n.id) AS note_count,
166
- MAX(n.created_at) AS last_note_at
167
- FROM tasks t
168
- LEFT JOIN task_notes n ON n.task_id = t.id
169
- GROUP BY t.id;
170
- ```
171
-
172
- Earns when `mu sql` queries against tasks + notes start getting
173
- verbose for a second consumer.
92
+ detection** to prevent already-answered prompts re-triggering.
93
+ - **Permission overrides busy** — if a permission prompt is
94
+ visible, agent is `NeedsPermission`, not `Busy`.
174
95
 
175
- ---
96
+ ### Subscription-based wakeups
176
97
 
177
- ## Snapshots + undo
178
-
179
- Theme: every destructive action becomes recoverable.
180
-
181
- ### `snapshots` table + auto-snapshot before mutation — SHIPPED in v0.2 (schema v4; tables carried into v5, and unchanged in v6/v7)
182
-
183
- `captureSnapshot()` runs at the top of every destructive verb
184
- (workstream destroy, agent close, task close/reject/defer/release/
185
- delete, workspace free). Whole-DB copy via
186
- `VACUUM INTO` (synchronous, FK-page-level atomic). Files land in
187
- `<dirname(db-path)>/snapshots/<id>.db`; one row per capture in:
188
-
189
- ```sql
190
- CREATE TABLE snapshots (
191
- id INTEGER PRIMARY KEY AUTOINCREMENT,
192
- workstream TEXT, -- nullable: destroy spans all
193
- label TEXT NOT NULL, -- operation name + args
194
- db_path TEXT NOT NULL, -- abs path to .db file
195
- schema_version INTEGER NOT NULL, -- for restore-time version check
196
- created_at TEXT NOT NULL
197
- );
198
- ```
199
-
200
- GC opportunistic in-hook (<14 days OR <100 rows). NO FK on
201
- `workstream` — destroying a workstream must NOT cascade-delete
202
- its pre-destroy snapshot.
203
-
204
- ### `mu undo` + `mu snapshot {list,show}` — SHIPPED in v0.2 (snap_undo_verb)
205
-
206
- Three verbs on top of the snapshots substrate:
207
-
208
- - **`mu undo [--yes] [--to <id>]`** — top-level. Restores latest
209
- snapshot (or the one named by `--to`). Dry-run by default;
210
- `--yes` commits. Post-restore reconciles every workstream
211
- (best-effort per workstream, errors swallowed) and reports
212
- ghosts pruned + orphans surfaced.
213
- - **`mu snapshot list [-n N] [--json]`** — newest-first table.
214
- - **`mu snapshot show <id> [--json]`** — full row metadata.
215
-
216
- Design decisions held to:
217
-
218
- - **No `mu redo`.** Verbs have side-effects (tmux kill, git worktree
219
- remove) that aren't replayable. Each restore captures a
220
- pre-restore snapshot first, so a second `mu undo` rolls forward
221
- to that one. Verified end-to-end. Promote `mu redo` only if real
222
- use surfaces a need.
223
- - **Cross-version restores rejected** (snapshot.schema_version <
224
- CURRENT_SCHEMA_VERSION); migrations are forward-only. Maps to
225
- `SnapshotVersionMismatchError` (exit 4).
226
- - **Tmux state is NOT rolled back.** Restore + reconcile prunes
227
- ghost rows; orphan panes surface in next `mu agent list`.
228
- Documented honestly in the verb's stdout.
229
-
230
- Destructive verbs that already auto-snapshot now also advertise
231
- undo in their `Next:` blocks (`mu task delete`, `mu workstream
232
- destroy --yes`, etc.). Closes `snap_destroy_safety`.
233
-
234
- ---
235
-
236
- ## Stretch
237
-
238
- Items that meet criterion 2 (no pillar bend) and 3 (small) but
239
- haven't yet hit criterion 1 (proven friction). Stays parked until
240
- real use surfaces them.
241
-
242
- ### `task_artifacts` — generalized "this task produced X"
243
-
244
- ```sql
245
- CREATE TABLE task_artifacts (
246
- id INTEGER PRIMARY KEY AUTOINCREMENT,
247
- task_id TEXT NOT NULL REFERENCES tasks(local_id) ON DELETE CASCADE,
248
- kind TEXT NOT NULL, -- pr|file|url|commit|image
249
- ref TEXT NOT NULL,
250
- label TEXT,
251
- created_at TEXT NOT NULL
252
- );
253
- ```
254
-
255
- `mu task artifact add <task> --kind pr <url>`. Surfaces in `mu
256
- task show` and a future `tasks_v` enriched view.
257
-
258
- ### Other parked items
259
-
260
- | Item | Source / origin |
261
- | --- | --- |
262
- | `CancelScope` for long-running ops — Ctrl-C handling that cooperatively cancels in-flight tmux/exec calls | prior-art pattern (workflows) |
263
- | `mu.step()` replay cache for `mu run` — re-running a partially-failed script skips already-completed steps | prior-art pattern (workflows; `SqliteWorkflowStore` shape) |
264
- | `init_tracing(config)` + RAII guard — NDJSON to `<state-dir>/logs/`, MINUTELY rotation, last 100 files | prior-art pattern (tracing) |
265
- | Subscription-based wakeups — `mu log --tail` polls SQLite once per second; SQLite update hooks (via better-sqlite3) or fs.watch on the WAL would drop latency. | internal critique gap |
266
-
267
- ### Schema normalization — SHIPPED in v0.2 (schema v5)
268
-
269
- `tasks.id INTEGER PK + (workstream_id, local_id) UNIQUE` shipped
270
- as the universal substrate-wide pattern, not just on tasks. See
271
- [docs/ARCHITECTURE.md § Surrogate-PK + SDK-boundary discipline](ARCHITECTURE.md#surrogate-pk--sdk-boundary-discipline-load-bearing).
272
- Two operators both running `mu task add design` in different
273
- workstreams just works; same for agents.
274
-
275
- Post-v5 evolution: schema v6 added the cross-workstream archive
276
- tables (`archives`, `archived_tasks`, `archived_edges`,
277
- `archived_notes`, `archived_events`); schema v7 dropped the
278
- unused `approvals` table. The surrogate-PK shape is unchanged.
279
-
280
- ---
281
-
282
- ## Explicitly rejected
283
-
284
- These were considered and turned down, with the reason. Listed so
285
- we don't rediscover the same ideas every quarter.
286
-
287
- ### JavaScript DSL (`mu run` / `mu eval` / `mu repl`)
288
-
289
- Why it's tempting: atomicity-as-syntax, forward refs as a parser
290
- feature, LLMs reliably emit structured code.
291
-
292
- Why we rejected (twice — first as a Lisp like the prior runtime
293
- used, then as JS-via-`vm`):
294
-
295
- - The gap a DSL fills is "compose multiple verbs into one
296
- transactional script." `--json` on every read verb plus typed
297
- verbs that accept evidence arguments cover that without a
298
- sandbox, codegen, `.d.ts` shipping, or a parallel typed surface
299
- to maintain.
300
- - **Independent corroboration from an internal critique**: five
301
- orthogonal reviewers (architect, engineer, model-UX,
302
- thin-harness advocate, operator) all flagged DSL/workflow
303
- language as the worst maintenance liability of the prior
304
- internal runtime. "A workflow DSL that becomes 'programming
305
- the runtime' is a liability."
306
- - The `vm` sandbox would have to be maintained against Node's
307
- security model forever; a non-trivial commitment for a feature
308
- with no proven friction.
309
- - bash composition over `mu --json | jq` covers what real users
310
- do.
311
-
312
- What the DSL would have provided, and what ships instead:
313
-
314
- | Original DSL feature | Shipped substitute |
315
- | --------------------------------------------- | ------------------------------------------------------- |
316
- | `mu run script.ts` (transactional script) | `bash + jq + --json`; SDK in-proc for typed callers |
317
- | `mu eval` | `mu sql` for raw queries; `bash -c` for actions |
318
- | `mu repl` | `node` + `import("mu-agent")` for in-proc exploration |
319
- | `mu.create / spawn / claim / send / ...` | `mu task add / agent spawn / task claim / agent send` |
320
- | `mu.ready()` / `mu.parallelTracks()` | `mu task next -n 0 --json` / bare `mu --json` / `mu state --json` |
321
- | Forward refs via deferred string IDs | Add tasks in topological order, or use `mu task block` after-the-fact |
322
- | Atomic transactions wrapping a script | Per-verb transactions in the SDK; idempotent verbs |
323
- | `mu.step()` replay cache | Not built; if needed, build on top of `agent_logs` event seq |
324
-
325
- Re-earn requires repeated friction reports of "I keep writing the
326
- same bash" that bash + jq + `--json` couldn't fix.
327
-
328
- ### `defineOperation()` registry framework
329
-
330
- The only consumer that motivated this was the JS DSL's `.d.ts`
331
- autocomplete. With the DSL rejected, no consumer remains. The pi
332
- extension, if/when it ships, can share types directly via
333
- `src/index.ts` SDK exports without a registry layer. Classic case
334
- of an abstraction with one anticipated consumer.
335
-
336
- ### Markdown agent-definition discovery
337
-
338
- Spawn already accepts `--cli` / `--command` / `--workspace` /
339
- `--role` directly; an orchestrator's first message + spawn flags
340
- ARE the agent's "definition." The `agents/` directory and a
341
- `docs/AGENT_FORMAT.md` were considered and dropped.
342
-
343
- Earn back if real friction surfaces ("I'm copy-pasting the same
344
- role doc into five spawn invocations every day, twice a week").
345
-
346
- ### Build mu as a pure pi extension (no CLI)
347
-
348
- Why it's tempting: simpler distribution, one install, full access
349
- to pi's `ExtensionAPI` for HUD and events.
350
-
351
- Why rejected:
352
-
353
- - Children spawned by mu can't drive mu without re-loading the
354
- extension.
355
- - Humans can't `mu agent list` from a shell to debug.
356
- - Recursion requires special plumbing.
357
- - Couples mu to pi's release cycle and extension API.
358
- - Throws away the "any process can drive this" property.
359
-
360
- ### Build mu as a library that pi imports (no standalone CLI)
361
-
362
- Why it's tempting: zero subprocess overhead.
363
-
364
- Why rejected:
365
-
366
- - Multiple pi instances would each load the library and fight over
367
- the DB.
368
- - A standalone CLI on `$PATH` is the cleanest "shared resource"
369
- model.
370
- - The library/CLI split is well-trodden — every good tool ships
371
- both, and the CLI is canonical.
372
-
373
- ### Two binaries: `mu-agents` and `mu-tasks`
374
-
375
- Why it's tempting: cleaner separation of concerns.
376
-
377
- Why rejected:
378
-
379
- - Agent ↔ task integration (claim, owner field, agent_logs about
380
- tasks) needs them in one transactional surface.
381
- - One install, one mental model, one `mu doctor`.
382
- - A prior internal precedent of separating task-graph and
383
- agent-runtime crates created awkward join logic; mu collapsing
384
- them is a feature.
385
-
386
- ### `TaskSurface` adapter abstraction with multiple backends
387
-
388
- Sync to GitHub Issues / Linear / Asana. Why it's tempting:
389
- composability, "bring your own work tracker."
390
-
391
- Why rejected:
392
-
393
- - mu without a built-in task graph is just a fancier agent runner
394
- — the killer features (parallel tracks, claim, ROI
395
- prioritization) require a graph.
396
- - Adapter complexity for systems most users don't have.
397
- - Round-tripping inverts the model: mu's task graph is local and
398
- authoritative.
399
- - If wanted: a separate companion package, not core.
400
-
401
- ### Cross-machine state sync
402
-
403
- Local-first SQLite. Layer something like syncthing on top if you
404
- want it. Multi-machine sync would force a server, conflict
405
- resolution, identity, auth — every one of those breaks the "zero
406
- ops" pledge.
407
-
408
- ### HTTP API on top of the SQLite registry
409
-
410
- mu is a CLI; if you need RPC, write it. The schema is small and
411
- stable enough.
412
-
413
- ### A "hosted" mu
414
-
415
- Zero ops, no accounts. Your machine is the deployment.
416
-
417
- ### Plugin system / web UI / RPC / chat & docs integrations / memory system / workflow engine
418
-
419
- Not "rejected one at a time" — rejected as a class. An internal
420
- critique established that the prior internal runtime's accumulation
421
- of these adjacent product identities was its central design
422
- failure: "hidden state, lifecycle bugs, unclear ownership of
423
- truth, and high model-facing tool entropy."
424
-
425
- mu's anti-feature pledges (no plugin runtime, no codegen, no
426
- daemon, no web UI, no chat integration, no memory system, no
427
- workflow engine) are specifically the accumulations of that prior
428
- internal runtime that mu chose not to inherit. Each one is
429
- provable as the absence of a subsystem mu was tempted to copy.
430
-
431
- ### Anthropomorphic builtin agent names (`alice`, `bob`)
432
-
433
- Use role-based names (`worker-1`, `reviewer-1`). See
434
- [VOCABULARY.md §"Naming conventions"](VOCABULARY.md#agent-names-prefer-role-n-not-human-names).
98
+ `mu log --tail` polls SQLite once per second. SQLite update hooks
99
+ (via better-sqlite3) or `fs.watch` on the WAL would drop latency
100
+ at the cost of more machinery. Promote when someone hits the
101
+ cliff.
435
102
 
436
103
  ---
437
104
 
438
105
  ## Open questions
439
106
 
440
- These were live during initial design and remain partly unresolved.
441
- Listed so we don't pretend they're settled.
442
-
443
- - **`agents.cli` as TEXT vs enum.** Went with TEXT (originally for
444
- heterogeneous-CLI forward-compat). Today the only meaningful
445
- value is `pi`. We're keeping it TEXT — if multi-CLI re-earns its
446
- way back, the column doesn't need a schema migration.
447
- - **Composite `(workstream, local_id)` PK on tasks.** Currently
448
- `local_id` is global PK. Two workstreams can't both have a
449
- `design` task. Recorded as a deferred normalization above.
450
- - **Capability tags on operations.** The `defineOperation()`
451
- registry that would have carried these is rejected. The role
452
- flag on agents is stored but unenforced. The internal critique
453
- flagged "capability-gated mutations" as part of the minimal
454
- core; for now mu's only authorization surface is "the agent ran
455
- the verb." Earn capability enforcement when an agent actually
456
- does damage.
457
- - **Per-workstream config.** Resisted (the anti-feature pledge).
458
- "This workstream uses one pi binary, that one uses another" is
459
- a real gap that env vars don't solve cleanly. Revisit when the
460
- second user hits it.
461
- - **Subscription-based wakeups.** `mu log --tail` polls SQLite
462
- once per second. Real subscriptions (SQLite update hooks via
463
- better-sqlite3, or fs.watch on the WAL) would drop latency at
464
- the cost of more machinery. Not worth it until someone hits
465
- the cliff.
466
-
467
- ---
107
+ Live during initial design and still partly unresolved. Listed so
108
+ we don't pretend they're settled.
468
109
 
469
- ## Operational lessons we're stealing (reference for implementers)
470
-
471
- Each of these is a real failure mode pi-subagents or a prior
472
- internal multi-agent runtime has already fixed. Listed here so
473
- when one of the items above is picked up, the implementer doesn't
474
- have to rediscover the lesson.
475
-
476
- ### From pi-subagents (`src/runs/shared/`)
477
-
478
- | File | Lesson |
479
- | -------------------------- | ----------------------------------------------------------------- |
480
- | `frontmatter.ts` | Agent-frontmatter parser: 28 lines, handles CRLF, quoted values, kebab-case. Port verbatim. |
481
- | `long-running-guard.ts` | Mutating-bash detection via regex + unquoted-redirection scanner. Don't trust tool names; scan command bodies. |
482
- | `long-running-guard.ts` | Mutating-failure burst detection: rolling window, consecutive vs same-path failures, escalation threshold. |
483
- | `completion-guard.ts` | Expected-mutation detection from task prose, not agent role. Strips framework-injected lines before checking. |
484
- | `model-fallback.ts` | Curated regex list of retryable failures (rate limit, 429, quota, 502/503/504). Don't waste a fallback on auth errors. |
485
- | `model-fallback.ts` | `splitThinkingSuffix` always splits on **last** colon — preserves `provider/model:high`. |
486
- | `single-output.ts` | Three cases for output files: agent wrote it, agent didn't, file unreadable. `captureSingleOutputSnapshot` before run to disambiguate. |
487
- | `worktree.ts` | `node_modules` symlinking + tracking as synthetic-path. Generic across VCS. |
488
- | `worktree.ts` | Per-task `cwd:` conflict detection. Best-effort rollback on hook failure. |
489
- | `result-watcher.ts` | `fs.watch` with mandatory polling fallback on `EMFILE`/`ENOSPC`. `unref()` timers. Coalescer for rapid rename events. |
490
- | `pi-args.ts` | Long tasks → temp file + `@path` argv. System prompt via `mode: 0o600` temp file. Identity env vars passed down. |
491
- | `extension/doctor.ts` | `lineFromCheck(label, fn)` wrapper turns thrown errors into `failed — <text>` lines so one broken probe doesn't break the report. |
492
-
493
- ### From a prior internal multi-agent runtime
494
-
495
- | Topic | Lesson |
496
- | ----------------------------------------------- | ------------------------------------------------------------ |
497
- | shell-escape | `shell_escape` via single-quote wrapping. |
498
- | granular workspace-free results | A `WorkspaceFreeResult` with independent `committed`/`submitted`/`commitError`/`submitError`. |
499
- | submit guard | `timeout -k 5s {N}s sh -c 'exec jf submit --draft </dev/null'` to prevent hanging on TTY prompts. |
500
- | per-CLI detector | Per-CLI Detector trait + pattern registry. Tail-window + narrow-window distinction. (deferred; pi only today.) |
501
- | lifecycle state machine | Side-effect-free lifecycle state machine: `(state, event) → outcome`. Single point for tracing. Distinguishes manual `Free` from inferred idle. |
502
- | read-list reconciliation | "Reality wins": every `list()` queries the substrate, prunes ghosts, adopts orphans. **Implemented (`src/reconcile.ts`).** |
503
- | parallel-tracks | Parallel-tracks union-find with diamond-merge. **Implemented (`src/tracks.ts`).** |
504
- | built-in graph views | Built-in views: `ready`, `blocked`, `goals`. **Implemented.** |
505
- | pane-title-as-identity | Pane-title-as-identity for the claim protocol. **Implemented.** |
506
- | lisp DSL (rejected for mu, ideas not adopted) | Atomic transactions are per-verb in the SDK; idempotent re-imports work via `INSERT OR IGNORE` + idempotent verbs; forward-ref checking handled at task-add time. JS DSL also rejected (above). |
507
- | notes model | Append-only, FILES/DECISION/VERIFIED conventions. **Implemented.** |
110
+ - **Capability tags on operations.** mu's only authorization
111
+ surface today is "the agent ran the verb." Promote capability
112
+ enforcement when an agent actually does damage.
113
+ - **Per-workstream config.** Resisted (anti-feature pledge). "This
114
+ workstream uses one pi binary, that one uses another" is a real
115
+ gap env vars don't solve cleanly. Revisit when a second user
116
+ hits it.
508
117
 
509
118
  ---
510
119
 
511
- ## Documents still to write
120
+ ## Pi extension and the three rules
512
121
 
513
- Meta-docs the project will need eventually:
122
+ If/when a pi extension lands (typed `mu_*` tools, HUD widget,
123
+ wakeups) bundled in this same npm package, three rules stay
124
+ non-negotiable:
514
125
 
515
- - **CONTRIBUTING.md** once external PRs land. Contains the LOC
516
- caps, the lint rules, the "no traits with zero implementors"
517
- rule, the test-first conventions.
518
- - **MIGRATIONS.md** the v3→v4 in-process migration framework + the
519
- one-shot v4→v5 script have shipped and (`src/migrations.ts`)
520
- retired. Capturing the operator-facing contract for future schema
521
- bumps in one place is still useful; leave as a follow-up.
522
-
523
- ---
524
-
525
- ## How to use this roadmap
526
-
527
- If you're starting work on an item:
126
+ 1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
127
+ Extension reads/writes through the same modules the CLI uses.
128
+ No extension-only state.
129
+ 2. **Every operation works from the CLI.** No tool registered in
130
+ the extension has logic that doesn't exist in the CLI.
131
+ 3. **The skill teaches the CLI.** Pi sessions without the
132
+ extension still get a working mu by following
133
+ [skills/mu/SKILL.md](../skills/mu/SKILL.md).
528
134
 
529
- 1. **Confirm it still meets the three promotion criteria.** Note
530
- the second real-use occurrence; cite the friction.
531
- 2. **Open a focused PR per item.** One typed verb per commit, one
532
- schema change per commit.
533
- 3. **Update [VOCABULARY.md](VOCABULARY.md) first** if you introduce
534
- a new concept or rename an existing one.
535
- 4. **Add a [CHANGELOG.md](../CHANGELOG.md) entry** under the
536
- upcoming version.
135
+ If those three rules hold, mu stays driveable from a shell forever
136
+ and the extension stays thin.
537
137
 
538
- If you're considering adding a new entry to this file:
138
+ ---
539
139
 
540
- - Read AGENTS.md §"What NOT to do" first.
541
- - Provide a concrete promotion-criteria assessment.
542
- - Match the format of existing entries.
140
+ ## Explicitly rejected (one-liners)
141
+
142
+ Listed so we don't rediscover them. See git history for the full
143
+ reasoning per item.
144
+
145
+ - **JS / Lisp DSL** (`mu run` / `mu eval` / `mu repl`) — bash +
146
+ jq + `--json` covers the gap. A workflow DSL is a maintenance
147
+ liability.
148
+ - **`defineOperation()` registry framework** — no consumer left
149
+ after the DSL was rejected.
150
+ - **Markdown agent-definition discovery** — spawn flags + first
151
+ message already are the definition.
152
+ - **mu as a pi extension only (no CLI)** — children couldn't drive
153
+ mu; humans couldn't debug from a shell.
154
+ - **mu as a library only (no CLI)** — multiple processes would
155
+ fight over the DB.
156
+ - **Two binaries (`mu-agents` + `mu-tasks`)** — agent ↔ task
157
+ integration needs one transactional surface.
158
+ - **`TaskSurface` adapter abstraction** — the built-in graph IS
159
+ the killer feature.
160
+ - **Cross-machine state sync** — local-first SQLite; layer
161
+ syncthing on top if you want it.
162
+ - **HTTP API on top of SQLite** — write your own RPC if you need
163
+ one.
164
+ - **A "hosted" mu** — your machine is the deployment.
165
+ - **Anthropomorphic agent names (`alice`, `bob`)** — use
166
+ role-based names (`worker-1`, `reviewer-1`).