@martintrojer/mu 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/ROADMAP.md CHANGED
@@ -1,544 +1,166 @@
1
1
  # Roadmap
2
2
 
3
- What's coming after [0.1.0](../CHANGELOG.md), with full design
4
- rationale per item. This is the **single forward-looking doc**: if
5
- a feature isn't listed here, it isn't planned. If it's listed but
6
- unbuilt, see its promotion criteria for what would move it.
3
+ The single forward-looking doc. If a feature isn't here, it isn't
4
+ planned.
7
5
 
8
- For canonical terms, see [VOCABULARY.md](VOCABULARY.md). For
9
- pillars that must not bend, see [VISION.md](VISION.md). For module
10
- layout and data flow, see [ARCHITECTURE.md](ARCHITECTURE.md).
6
+ For canonical terms see [VOCABULARY.md](VOCABULARY.md). For
7
+ load-bearing pillars see [VISION.md](VISION.md). For module
8
+ layout see [ARCHITECTURE.md](ARCHITECTURE.md). Shipped history
9
+ lives in [CHANGELOG.md](../CHANGELOG.md).
11
10
 
12
11
  ---
13
12
 
14
- ## Promotion criteria (the only bar)
13
+ ## Promotion criteria
15
14
 
16
15
  A roadmap item earns implementation when **all three** are true:
17
16
 
18
- 1. **Proven friction.** A real user (us, internal users, early
19
- adopters) hits the missing feature in a real workflow at least
20
- twice. "Imagined polish" doesn't count.
21
- 2. **No pillar refactor.** The addition fits the current substrate
22
- without bending any of the load-bearing pillars (see
23
- [VISION.md](VISION.md)).
24
- 3. **Bounded scope.** The addition fits in **<300 LOC** or has a
25
- clear smaller subset that does.
26
-
27
- If an item drops below the bar (no longer has criterion 1 met after
28
- real use), it moves to the bottom or is removed. We don't keep
29
- phantom plans alive.
30
-
31
- **Exception: data-loss footguns.** A change that fixes a default
32
- that silently destroys user artifacts (uncommitted output, scratch
33
- logs, benchmark results, etc.) ships on the **first** occurrence,
34
- not the second. The cost of waiting for criterion 1 is "lose more
35
- stuff"; that's the wrong cost to optimise. Document the friction
36
- in the commit message instead.
37
-
38
- **Polish doesn't count as promotion.** Bug fixes, ergonomic
39
- improvements, error-message wording, doc tightening, and similar
40
- "the existing thing works better" changes don't need promotion
41
- criteria — they just need to be small and to ship clean (typecheck
42
- + lint + tests + build). Polish is the dividend the project earns
43
- by refusing the things on this roadmap. Don't wait for occurrence
44
- #2 to fix a typo, tighten an error message, or truncate a runaway
45
- table column.
17
+ 1. **Proven friction.** A real user hits the missing feature in a
18
+ real workflow at least twice. Imagined polish doesn't count.
19
+ 2. **No pillar refactor.** Fits the current substrate without
20
+ bending any pillar in [VISION.md](VISION.md).
21
+ 3. **Bounded scope.** Fits in <300 LOC or has a clear smaller
22
+ subset that does.
23
+
24
+ **Exceptions.** Data-loss footguns (silent destruction of user
25
+ artifacts) ship on the *first* occurrence. Polish — bug fixes,
26
+ ergonomic tweaks, error-message wording, doc tightening doesn't
27
+ need promotion at all; just ship clean (typecheck + lint + tests +
28
+ build).
46
29
 
47
30
  ---
48
31
 
49
- ## Anti-feature pledges (still in force; reinforced by an internal critique)
32
+ ## Anti-feature pledges
50
33
 
51
34
  We will NOT, until each one earns its way back via the criteria
52
- above. Each pledge is a specific accumulation a prior internal
53
- multi-agent runtime made and mu chose not to inherit; an internal
54
- critique made the case sharply (TL;DR: that runtime's breadth had
55
- hidden state, lifecycle bugs, unclear ownership of truth, and high
56
- model-facing tool entropy).
57
-
58
- - Add a configuration file. All config is CLI flags or env vars.
59
- - Add a daemon, watcher, or background process beyond what tmux /
35
+ above:
36
+
37
+ - **Config file.** All config is CLI flags or env vars.
38
+ - **Daemon / watcher / background process** beyond what tmux and
60
39
  SQLite give us.
61
- - Add abstractions that exist for "future flexibility" with no
62
- current consumer (a prior internal LLM-runtime's `RunContext`
63
- trait was the cautionary tale).
64
- - Add wrappers around wrappers (stream-of-streams wrappers we've
65
- seen before `TextStream`/`TextState`/`StreamResult` shapes
66
- are the cautionary tale).
67
- - Generate code, embed a JS engine, or use any macro/decorator
68
- pattern beyond TypeScript itself. (Council: "A workflow DSL that
69
- becomes 'programming the runtime' is a liability.")
70
- - Ship a template/definition system for agent roles. Spawn flags +
71
- the orchestrator's first message are the only "definition."
72
- - Add a render layer beyond `cli-table3` + `picocolors`.
73
- - Bundle pi. The pi extension is the only anticipated future
74
- caller; even that is required to be a thin facade over the SDK
75
- (see Pi extension and the three rules](#pi-extension-and-the-three-rules)
76
- below).
77
- - Add a plugin runtime, a web UI, an RPC layer, a chat or docs
78
- integration, a memory system, or a workflow engine. (These are
79
- the kinds of accumulated subsystems the council critique flagged
80
- as costing more than they pay for. mu has none and intends to
81
- keep it that way.)
40
+ - **Anticipatory abstractions** with zero current consumer (the
41
+ cautionary tale: a `RunContext` trait with no implementor).
42
+ - **Wrappers around wrappers** (cautionary tale:
43
+ `TextStream`/`TextState`/`StreamResult`).
44
+ - **Codegen, embedded JS engine, macros, decorators** beyond
45
+ TypeScript itself. No workflow DSL.
46
+ - **Template/definition system for agent roles.** Spawn flags +
47
+ the orchestrator's first message ARE the definition.
48
+ - **Render layers beyond `cli-table3` + `picocolors`**, except
49
+ `ink` confined to `src/cli/tui/`. No second TUI stack alongside
50
+ `ink` if `ink` ever stops paying off, *replace* it; don't
51
+ stack stacks.
52
+ - **Bundle pi.** It's a peer dep.
53
+ - **Plugin runtime, web UI, RPC, chat/docs integrations, memory
54
+ system, workflow engine.** Rejected as a class — these are
55
+ exactly the accumulations a prior internal multi-agent runtime
56
+ collected, and not inheriting them is the point.
82
57
 
83
58
  ---
84
59
 
85
60
  ## Possible — small additions with an obvious shape
86
61
 
87
- These have a clear design but haven't yet hit criterion 1 (proven
88
- friction in ≥2 real workflows). They earn implementation when real
89
- use surfaces them.
90
-
91
- The section heading is deliberately "Possible," not "Next." "Next"
92
- implies it's coming. "Possible" doesn't. Items below ship if and
93
- when they earn it.
94
-
95
- ### Pi extension and the three rules
96
-
97
- The pi extension is the first "polish" tier — LLM-facing UX
98
- (typed `mu_*` tools, HUD widget, wakeups) that wraps the same core
99
- operations the CLI already exposes. Bundled in the same npm
100
- package; pi is a peer dep.
101
-
102
- The pi extension is **the only anticipated future caller**. When /
103
- if it lands, three rules stay non-negotiable:
104
-
105
- 1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
106
- Extension reads/writes it through the same modules the CLI uses.
107
- No extension-only state.
108
- 2. **Every operation works from the CLI.** No tool registered in
109
- the extension has logic that doesn't exist in the CLI. The
110
- extension is a typed/integrated facade.
111
- 3. **The skill teaches the CLI.** Pi sessions without the extension
112
- still get a working mu by following [the bundled
113
- skill](../skills/mu/SKILL.md).
114
-
115
- If those three rules hold, mu stays driveable from a shell forever
116
- and the extension stays thin.
117
-
118
- ### `mu agent adopt <pane-id> [--name <agent>]` — SHIPPED in v0.2 (`e20af89`)
119
-
120
- Reconciliation surfaces orphan panes; `mu agent adopt` formally
121
- registers one of them as a managed agent. Promotion was triggered
122
- by the multi-agent dogfood pattern (orchestrator runs in a pane
123
- outside the `mu-<ws>` session and wants to be claimable as a
124
- worker). Originally shipped at the top level as `mu adopt`; moved
125
- under `mu agent` to match every other agent-lifecycle verb.
62
+ These have a clear design but haven't yet hit promotion criterion
63
+ 1 (friction in ≥2 real workflows). They earn implementation when
64
+ real use surfaces them.
126
65
 
127
- ### Heterogeneous CLI status detection (claude, codex, ...)
66
+ ### Per-CLI status detection (claude, codex, )
128
67
 
129
- mu is a pi orchestrator today, BUT v0.2 added a Braille-spinner
130
- fallback (`f68838f`) that catches every TUI wrapper using
131
- standard spinner glyphs (U+2800–U+28FF). pi-meta + solo are now
132
- covered without a per-CLI detector. Other vanilla TUIs (claude,
133
- codex) inherit the same fallback.
68
+ mu is a pi orchestrator today. v0.2's Braille-spinner fallback
69
+ catches every TUI wrapper using standard spinner glyphs
70
+ (U+2800–U+28FF), so pi-meta + solo + many vanilla TUIs (claude,
71
+ codex) work without a per-CLI detector.
134
72
 
135
- For patterns the spinner fallback misses (e.g. permission
136
- prompts), a per-CLI `Detector` registry keyed by CLI name (~50
137
- LOC per CLI) is the obvious shape. Promote when a real
138
- specific-prompt-misclassification surfaces.
73
+ For patterns the spinner fallback misses (permission prompts,
74
+ specific busy markers), a per-CLI `Detector` registry keyed by
75
+ CLI name (~50 LOC per CLI) is the obvious shape. Promote when a
76
+ real specific-prompt-misclassification surfaces.
139
77
 
140
- Pattern sketch (ported from a prior internal multi-agent runtime's
141
- per-CLI detector — kept here for whoever picks it up):
78
+ Pattern sketch:
142
79
 
143
- | CLI | Busy patterns | Permission patterns |
144
- | -------- | ------------------------------------------ | --------------------------------------------------------- |
145
- | Claude | `to interrupt`, `\(.*[↑↓].*tokens\)` | `Allow once`, `Allow for this session`, `Esc to cancel` |
146
- | Codex | `esc to interrupt)`, `to cancel` | `enter to confirm`, `enter to submit \| esc to cancel` |
147
- | Pi | (well-known mu-defined marker) | (well-known mu-defined marker) — shipped |
80
+ | CLI | Busy patterns | Permission patterns |
81
+ | ------ | -------------------------------------- | --------------------------------------------------------- |
82
+ | Claude | `to interrupt`, `\(.*[↑↓].*tokens\)` | `Allow once`, `Allow for this session`, `Esc to cancel` |
83
+ | Codex | `esc to interrupt)`, `to cancel` | `enter to confirm`, `enter to submit \| esc to cancel` |
84
+ | Pi | (well-known mu-defined marker) | (well-known mu-defined marker) — shipped |
148
85
 
149
86
  Critical subtleties any new detector must keep:
150
87
 
151
88
  - **Tail-window extraction**: take last ~100 lines, strip trailing
152
- blanks, then take last ~20. Prevents stale scrollback
153
- false-positives. Already implemented for pi in `src/detect.ts`;
154
- the registry version factors this out.
89
+ blanks, then take last ~20. Already implemented for pi in
90
+ `src/detect.ts`; the registry version factors it out.
155
91
  - **Permission detection uses a narrower window than busy
156
- detection** prevents already-answered prompts triggering
157
- re-detection.
158
- - **Permission patterns override busy** — if a permission prompt
159
- is visible, agent is `NeedsPermission`, not `Busy`.
160
-
161
- ### `tasks_v` enriched view
162
-
163
- ```sql
164
- CREATE VIEW tasks_v AS
165
- SELECT t.*,
166
- GROUP_CONCAT(n.content, char(10) || '---' || char(10)) AS notes,
167
- COUNT(n.id) AS note_count,
168
- MAX(n.created_at) AS last_note_at
169
- FROM tasks t
170
- LEFT JOIN task_notes n ON n.task_id = t.id
171
- GROUP BY t.id;
172
- ```
173
-
174
- Earns when `mu sql` queries against tasks + notes start getting
175
- verbose for a second consumer.
92
+ detection** to prevent already-answered prompts re-triggering.
93
+ - **Permission overrides busy** — if a permission prompt is
94
+ visible, agent is `NeedsPermission`, not `Busy`.
176
95
 
177
- ---
96
+ ### Subscription-based wakeups
178
97
 
179
- ## Snapshots + undo
180
-
181
- Theme: every destructive action becomes recoverable.
182
-
183
- ### `snapshots` table + auto-snapshot before mutation — SHIPPED in v0.2 (schema v4; tables carried into v5, and unchanged in v6/v7)
184
-
185
- `captureSnapshot()` runs at the top of every destructive verb
186
- (workstream destroy, agent close, task close/reject/defer/release/
187
- delete, workspace free). Whole-DB copy via
188
- `VACUUM INTO` (synchronous, FK-page-level atomic). Files land in
189
- `<dirname(db-path)>/snapshots/<id>.db`; one row per capture in:
190
-
191
- ```sql
192
- CREATE TABLE snapshots (
193
- id INTEGER PRIMARY KEY AUTOINCREMENT,
194
- workstream TEXT, -- nullable: destroy spans all
195
- label TEXT NOT NULL, -- operation name + args
196
- db_path TEXT NOT NULL, -- abs path to .db file
197
- schema_version INTEGER NOT NULL, -- for restore-time version check
198
- created_at TEXT NOT NULL
199
- );
200
- ```
201
-
202
- GC opportunistic in-hook (<14 days OR <100 rows). NO FK on
203
- `workstream` — destroying a workstream must NOT cascade-delete
204
- its pre-destroy snapshot.
205
-
206
- ### `mu undo` + `mu snapshot {list,show}` — SHIPPED in v0.2 (snap_undo_verb)
207
-
208
- Three verbs on top of the snapshots substrate:
209
-
210
- - **`mu undo [--yes] [--to <id>]`** — top-level. Restores latest
211
- snapshot (or the one named by `--to`). Dry-run by default;
212
- `--yes` commits. Post-restore reconciles every workstream
213
- (best-effort per workstream, errors swallowed) and reports
214
- ghosts pruned + orphans surfaced.
215
- - **`mu snapshot list [-n N] [--json]`** — newest-first table.
216
- - **`mu snapshot show <id> [--json]`** — full row metadata.
217
-
218
- Design decisions held to:
219
-
220
- - **No `mu redo`.** Verbs have side-effects (tmux kill, git worktree
221
- remove) that aren't replayable. Each restore captures a
222
- pre-restore snapshot first, so a second `mu undo` rolls forward
223
- to that one. Verified end-to-end. Promote `mu redo` only if real
224
- use surfaces a need.
225
- - **Cross-version restores rejected** (snapshot.schema_version <
226
- CURRENT_SCHEMA_VERSION); migrations are forward-only. Maps to
227
- `SnapshotVersionMismatchError` (exit 4).
228
- - **Tmux state is NOT rolled back.** Restore + reconcile prunes
229
- ghost rows; orphan panes surface in next `mu agent list`.
230
- Documented honestly in the verb's stdout.
231
-
232
- Destructive verbs that already auto-snapshot now also advertise
233
- undo in their `Next:` blocks (`mu task delete`, `mu workstream
234
- destroy --yes`, etc.). Closes `snap_destroy_safety`.
235
-
236
- ---
237
-
238
- ## Stretch
239
-
240
- Items that meet criterion 2 (no pillar bend) and 3 (small) but
241
- haven't yet hit criterion 1 (proven friction). Stays parked until
242
- real use surfaces them.
243
-
244
- ### `task_artifacts` — generalized "this task produced X"
245
-
246
- ```sql
247
- CREATE TABLE task_artifacts (
248
- id INTEGER PRIMARY KEY AUTOINCREMENT,
249
- task_id TEXT NOT NULL REFERENCES tasks(local_id) ON DELETE CASCADE,
250
- kind TEXT NOT NULL, -- pr|file|url|commit|image
251
- ref TEXT NOT NULL,
252
- label TEXT,
253
- created_at TEXT NOT NULL
254
- );
255
- ```
256
-
257
- `mu task artifact add <task> --kind pr <url>`. Surfaces in `mu
258
- task show` and a future `tasks_v` enriched view.
259
-
260
- ### Other parked items
261
-
262
- | Item | Source / origin |
263
- | --- | --- |
264
- | `CancelScope` for long-running ops — Ctrl-C handling that cooperatively cancels in-flight tmux/exec calls | prior-art pattern (workflows) |
265
- | `mu.step()` replay cache for `mu run` — re-running a partially-failed script skips already-completed steps | prior-art pattern (workflows; `SqliteWorkflowStore` shape) |
266
- | `init_tracing(config)` + RAII guard — NDJSON to `<state-dir>/logs/`, MINUTELY rotation, last 100 files | prior-art pattern (tracing) |
267
- | Subscription-based wakeups — `mu log --tail` polls SQLite once per second; SQLite update hooks (via better-sqlite3) or fs.watch on the WAL would drop latency. | internal critique gap |
268
-
269
- ### Schema normalization — SHIPPED in v0.2 (schema v5)
270
-
271
- `tasks.id INTEGER PK + (workstream_id, local_id) UNIQUE` shipped
272
- as the universal substrate-wide pattern, not just on tasks. See
273
- [docs/ARCHITECTURE.md § Surrogate-PK + SDK-boundary discipline](ARCHITECTURE.md#surrogate-pk--sdk-boundary-discipline-load-bearing).
274
- Two operators both running `mu task add design` in different
275
- workstreams just works; same for agents.
276
-
277
- Post-v5 evolution: schema v6 added the cross-workstream archive
278
- tables (`archives`, `archived_tasks`, `archived_edges`,
279
- `archived_notes`, `archived_events`); schema v7 dropped the
280
- unused `approvals` table. The surrogate-PK shape is unchanged.
281
-
282
- ---
283
-
284
- ## Explicitly rejected
285
-
286
- These were considered and turned down, with the reason. Listed so
287
- we don't rediscover the same ideas every quarter.
288
-
289
- ### JavaScript DSL (`mu run` / `mu eval` / `mu repl`)
290
-
291
- Why it's tempting: atomicity-as-syntax, forward refs as a parser
292
- feature, LLMs reliably emit structured code.
293
-
294
- Why we rejected (twice — first as a Lisp like the prior runtime
295
- used, then as JS-via-`vm`):
296
-
297
- - The gap a DSL fills is "compose multiple verbs into one
298
- transactional script." `--json` on every read verb plus typed
299
- verbs that accept evidence arguments cover that without a
300
- sandbox, codegen, `.d.ts` shipping, or a parallel typed surface
301
- to maintain.
302
- - **Independent corroboration from an internal critique**: five
303
- orthogonal reviewers (architect, engineer, model-UX,
304
- thin-harness advocate, operator) all flagged DSL/workflow
305
- language as the worst maintenance liability of the prior
306
- internal runtime. "A workflow DSL that becomes 'programming
307
- the runtime' is a liability."
308
- - The `vm` sandbox would have to be maintained against Node's
309
- security model forever; a non-trivial commitment for a feature
310
- with no proven friction.
311
- - bash composition over `mu --json | jq` covers what real users
312
- do.
313
-
314
- What the DSL would have provided, and what ships instead:
315
-
316
- | Original DSL feature | Shipped substitute |
317
- | --------------------------------------------- | ------------------------------------------------------- |
318
- | `mu run script.ts` (transactional script) | `bash + jq + --json`; SDK in-proc for typed callers |
319
- | `mu eval` | `mu sql` for raw queries; `bash -c` for actions |
320
- | `mu repl` | `node` + `import("mu-agent")` for in-proc exploration |
321
- | `mu.create / spawn / claim / send / ...` | `mu task add / agent spawn / task claim / agent send` |
322
- | `mu.ready()` / `mu.parallelTracks()` | `mu task next -n 0 --json` / bare `mu --json` / `mu state --json` |
323
- | Forward refs via deferred string IDs | Add tasks in topological order, or use `mu task block` after-the-fact |
324
- | Atomic transactions wrapping a script | Per-verb transactions in the SDK; idempotent verbs |
325
- | `mu.step()` replay cache | Not built; if needed, build on top of `agent_logs` event seq |
326
-
327
- Re-earn requires repeated friction reports of "I keep writing the
328
- same bash" that bash + jq + `--json` couldn't fix.
329
-
330
- ### `defineOperation()` registry framework
331
-
332
- The only consumer that motivated this was the JS DSL's `.d.ts`
333
- autocomplete. With the DSL rejected, no consumer remains. The pi
334
- extension, if/when it ships, can share types directly via
335
- `src/index.ts` SDK exports without a registry layer. Classic case
336
- of an abstraction with one anticipated consumer.
337
-
338
- ### Markdown agent-definition discovery
339
-
340
- Spawn already accepts `--cli` / `--command` / `--workspace` /
341
- `--role` directly; an orchestrator's first message + spawn flags
342
- ARE the agent's "definition." The `agents/` directory and a
343
- `docs/AGENT_FORMAT.md` were considered and dropped.
344
-
345
- Earn back if real friction surfaces ("I'm copy-pasting the same
346
- role doc into five spawn invocations every day, twice a week").
347
-
348
- ### Build mu as a pure pi extension (no CLI)
349
-
350
- Why it's tempting: simpler distribution, one install, full access
351
- to pi's `ExtensionAPI` for HUD and events.
352
-
353
- Why rejected:
354
-
355
- - Children spawned by mu can't drive mu without re-loading the
356
- extension.
357
- - Humans can't `mu agent list` from a shell to debug.
358
- - Recursion requires special plumbing.
359
- - Couples mu to pi's release cycle and extension API.
360
- - Throws away the "any process can drive this" property.
361
-
362
- ### Build mu as a library that pi imports (no standalone CLI)
363
-
364
- Why it's tempting: zero subprocess overhead.
365
-
366
- Why rejected:
367
-
368
- - Multiple pi instances would each load the library and fight over
369
- the DB.
370
- - A standalone CLI on `$PATH` is the cleanest "shared resource"
371
- model.
372
- - The library/CLI split is well-trodden — every good tool ships
373
- both, and the CLI is canonical.
374
-
375
- ### Two binaries: `mu-agents` and `mu-tasks`
376
-
377
- Why it's tempting: cleaner separation of concerns.
378
-
379
- Why rejected:
380
-
381
- - Agent ↔ task integration (claim, owner field, agent_logs about
382
- tasks) needs them in one transactional surface.
383
- - One install, one mental model, one `mu doctor`.
384
- - A prior internal precedent of separating task-graph and
385
- agent-runtime crates created awkward join logic; mu collapsing
386
- them is a feature.
387
-
388
- ### `TaskSurface` adapter abstraction with multiple backends
389
-
390
- Sync to GitHub Issues / Linear / Asana. Why it's tempting:
391
- composability, "bring your own work tracker."
392
-
393
- Why rejected:
394
-
395
- - mu without a built-in task graph is just a fancier agent runner
396
- — the killer features (parallel tracks, claim, ROI
397
- prioritization) require a graph.
398
- - Adapter complexity for systems most users don't have.
399
- - Round-tripping inverts the model: mu's task graph is local and
400
- authoritative.
401
- - If wanted: a separate companion package, not core.
402
-
403
- ### Cross-machine state sync
404
-
405
- Local-first SQLite. Layer something like syncthing on top if you
406
- want it. Multi-machine sync would force a server, conflict
407
- resolution, identity, auth — every one of those breaks the "zero
408
- ops" pledge.
409
-
410
- ### HTTP API on top of the SQLite registry
411
-
412
- mu is a CLI; if you need RPC, write it. The schema is small and
413
- stable enough.
414
-
415
- ### A "hosted" mu
416
-
417
- Zero ops, no accounts. Your machine is the deployment.
418
-
419
- ### Plugin system / web UI / RPC / chat & docs integrations / memory system / workflow engine
420
-
421
- Not "rejected one at a time" — rejected as a class. An internal
422
- critique established that the prior internal runtime's accumulation
423
- of these adjacent product identities was its central design
424
- failure: "hidden state, lifecycle bugs, unclear ownership of
425
- truth, and high model-facing tool entropy."
426
-
427
- mu's anti-feature pledges (no plugin runtime, no codegen, no
428
- daemon, no web UI, no chat integration, no memory system, no
429
- workflow engine) are specifically the accumulations of that prior
430
- internal runtime that mu chose not to inherit. Each one is
431
- provable as the absence of a subsystem mu was tempted to copy.
432
-
433
- ### Anthropomorphic builtin agent names (`alice`, `bob`)
434
-
435
- Use role-based names (`worker-1`, `reviewer-1`). See
436
- [VOCABULARY.md §"Naming conventions"](VOCABULARY.md#agent-names-prefer-role-n-not-human-names).
98
+ `mu log --tail` polls SQLite once per second. SQLite update hooks
99
+ (via better-sqlite3) or `fs.watch` on the WAL would drop latency
100
+ at the cost of more machinery. Promote when someone hits the
101
+ cliff.
437
102
 
438
103
  ---
439
104
 
440
105
  ## Open questions
441
106
 
442
- These were live during initial design and remain partly unresolved.
443
- Listed so we don't pretend they're settled.
444
-
445
- - **`agents.cli` as TEXT vs enum.** Went with TEXT (originally for
446
- heterogeneous-CLI forward-compat). Today the only meaningful
447
- value is `pi`. We're keeping it TEXT — if multi-CLI re-earns its
448
- way back, the column doesn't need a schema migration.
449
- - **Composite `(workstream, local_id)` PK on tasks.** Currently
450
- `local_id` is global PK. Two workstreams can't both have a
451
- `design` task. Recorded as a deferred normalization above.
452
- - **Capability tags on operations.** The `defineOperation()`
453
- registry that would have carried these is rejected. The role
454
- flag on agents is stored but unenforced. The internal critique
455
- flagged "capability-gated mutations" as part of the minimal
456
- core; for now mu's only authorization surface is "the agent ran
457
- the verb." Earn capability enforcement when an agent actually
458
- does damage.
459
- - **Per-workstream config.** Resisted (the anti-feature pledge).
460
- "This workstream uses one pi binary, that one uses another" is
461
- a real gap that env vars don't solve cleanly. Revisit when the
462
- second user hits it.
463
- - **Subscription-based wakeups.** `mu log --tail` polls SQLite
464
- once per second. Real subscriptions (SQLite update hooks via
465
- better-sqlite3, or fs.watch on the WAL) would drop latency at
466
- the cost of more machinery. Not worth it until someone hits
467
- the cliff.
468
-
469
- ---
107
+ Live during initial design and still partly unresolved. Listed so
108
+ we don't pretend they're settled.
470
109
 
471
- ## Operational lessons we're stealing (reference for implementers)
472
-
473
- Each of these is a real failure mode pi-subagents or a prior
474
- internal multi-agent runtime has already fixed. Listed here so
475
- when one of the items above is picked up, the implementer doesn't
476
- have to rediscover the lesson.
477
-
478
- ### From pi-subagents (`src/runs/shared/`)
479
-
480
- | File | Lesson |
481
- | -------------------------- | ----------------------------------------------------------------- |
482
- | `frontmatter.ts` | Agent-frontmatter parser: 28 lines, handles CRLF, quoted values, kebab-case. Port verbatim. |
483
- | `long-running-guard.ts` | Mutating-bash detection via regex + unquoted-redirection scanner. Don't trust tool names; scan command bodies. |
484
- | `long-running-guard.ts` | Mutating-failure burst detection: rolling window, consecutive vs same-path failures, escalation threshold. |
485
- | `completion-guard.ts` | Expected-mutation detection from task prose, not agent role. Strips framework-injected lines before checking. |
486
- | `model-fallback.ts` | Curated regex list of retryable failures (rate limit, 429, quota, 502/503/504). Don't waste a fallback on auth errors. |
487
- | `model-fallback.ts` | `splitThinkingSuffix` always splits on **last** colon — preserves `provider/model:high`. |
488
- | `single-output.ts` | Three cases for output files: agent wrote it, agent didn't, file unreadable. `captureSingleOutputSnapshot` before run to disambiguate. |
489
- | `worktree.ts` | `node_modules` symlinking + tracking as synthetic-path. Generic across VCS. |
490
- | `worktree.ts` | Per-task `cwd:` conflict detection. Best-effort rollback on hook failure. |
491
- | `result-watcher.ts` | `fs.watch` with mandatory polling fallback on `EMFILE`/`ENOSPC`. `unref()` timers. Coalescer for rapid rename events. |
492
- | `pi-args.ts` | Long tasks → temp file + `@path` argv. System prompt via `mode: 0o600` temp file. Identity env vars passed down. |
493
- | `extension/doctor.ts` | `lineFromCheck(label, fn)` wrapper turns thrown errors into `failed — <text>` lines so one broken probe doesn't break the report. |
494
-
495
- ### From a prior internal multi-agent runtime
496
-
497
- | Topic | Lesson |
498
- | ----------------------------------------------- | ------------------------------------------------------------ |
499
- | shell-escape | `shell_escape` via single-quote wrapping. |
500
- | granular workspace-free results | A `WorkspaceFreeResult` with independent `committed`/`submitted`/`commitError`/`submitError`. |
501
- | submit guard | `timeout -k 5s {N}s sh -c 'exec jf submit --draft </dev/null'` to prevent hanging on TTY prompts. |
502
- | per-CLI detector | Per-CLI Detector trait + pattern registry. Tail-window + narrow-window distinction. (deferred; pi only today.) |
503
- | lifecycle state machine | Side-effect-free lifecycle state machine: `(state, event) → outcome`. Single point for tracing. Distinguishes manual `Free` from inferred idle. |
504
- | read-list reconciliation | "Reality wins": every `list()` queries the substrate, prunes ghosts, adopts orphans. **Implemented (`src/reconcile.ts`).** |
505
- | parallel-tracks | Parallel-tracks union-find with diamond-merge. **Implemented (`src/tracks.ts`).** |
506
- | built-in graph views | Built-in views: `ready`, `blocked`, `goals`. **Implemented.** |
507
- | pane-title-as-identity | Pane-title-as-identity for the claim protocol. **Implemented.** |
508
- | lisp DSL (rejected for mu, ideas not adopted) | Atomic transactions are per-verb in the SDK; idempotent re-imports work via `INSERT OR IGNORE` + idempotent verbs; forward-ref checking handled at task-add time. JS DSL also rejected (above). |
509
- | notes model | Append-only, FILES/DECISION/VERIFIED conventions. **Implemented.** |
110
+ - **Capability tags on operations.** mu's only authorization
111
+ surface today is "the agent ran the verb." Promote capability
112
+ enforcement when an agent actually does damage.
113
+ - **Per-workstream config.** Resisted (anti-feature pledge). "This
114
+ workstream uses one pi binary, that one uses another" is a real
115
+ gap env vars don't solve cleanly. Revisit when a second user
116
+ hits it.
510
117
 
511
118
  ---
512
119
 
513
- ## Documents still to write
120
+ ## Pi extension and the three rules
514
121
 
515
- Meta-docs the project will need eventually:
122
+ If/when a pi extension lands (typed `mu_*` tools, HUD widget,
123
+ wakeups) bundled in this same npm package, three rules stay
124
+ non-negotiable:
516
125
 
517
- - **CONTRIBUTING.md** once external PRs land. Contains the LOC
518
- caps, the lint rules, the "no traits with zero implementors"
519
- rule, the test-first conventions.
520
- - **MIGRATIONS.md** the v3→v4 in-process migration framework + the
521
- one-shot v4→v5 script have shipped and (`src/migrations.ts`)
522
- retired. Capturing the operator-facing contract for future schema
523
- bumps in one place is still useful; leave as a follow-up.
524
-
525
- ---
526
-
527
- ## How to use this roadmap
528
-
529
- If you're starting work on an item:
126
+ 1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
127
+ Extension reads/writes through the same modules the CLI uses.
128
+ No extension-only state.
129
+ 2. **Every operation works from the CLI.** No tool registered in
130
+ the extension has logic that doesn't exist in the CLI.
131
+ 3. **The skill teaches the CLI.** Pi sessions without the
132
+ extension still get a working mu by following
133
+ [skills/mu/SKILL.md](../skills/mu/SKILL.md).
530
134
 
531
- 1. **Confirm it still meets the three promotion criteria.** Note
532
- the second real-use occurrence; cite the friction.
533
- 2. **Open a focused PR per item.** One typed verb per commit, one
534
- schema change per commit.
535
- 3. **Update [VOCABULARY.md](VOCABULARY.md) first** if you introduce
536
- a new concept or rename an existing one.
537
- 4. **Add a [CHANGELOG.md](../CHANGELOG.md) entry** under the
538
- upcoming version.
135
+ If those three rules hold, mu stays driveable from a shell forever
136
+ and the extension stays thin.
539
137
 
540
- If you're considering adding a new entry to this file:
138
+ ---
541
139
 
542
- - Read AGENTS.md §"What NOT to do" first.
543
- - Provide a concrete promotion-criteria assessment.
544
- - Match the format of existing entries.
140
+ ## Explicitly rejected (one-liners)
141
+
142
+ Listed so we don't rediscover them. See git history for the full
143
+ reasoning per item.
144
+
145
+ - **JS / Lisp DSL** (`mu run` / `mu eval` / `mu repl`) — bash +
146
+ jq + `--json` covers the gap. A workflow DSL is a maintenance
147
+ liability.
148
+ - **`defineOperation()` registry framework** — no consumer left
149
+ after the DSL was rejected.
150
+ - **Markdown agent-definition discovery** — spawn flags + first
151
+ message already are the definition.
152
+ - **mu as a pi extension only (no CLI)** — children couldn't drive
153
+ mu; humans couldn't debug from a shell.
154
+ - **mu as a library only (no CLI)** — multiple processes would
155
+ fight over the DB.
156
+ - **Two binaries (`mu-agents` + `mu-tasks`)** — agent ↔ task
157
+ integration needs one transactional surface.
158
+ - **`TaskSurface` adapter abstraction** — the built-in graph IS
159
+ the killer feature.
160
+ - **Cross-machine state sync** — local-first SQLite; layer
161
+ syncthing on top if you want it.
162
+ - **HTTP API on top of SQLite** — write your own RPC if you need
163
+ one.
164
+ - **A "hosted" mu** — your machine is the deployment.
165
+ - **Anthropomorphic agent names (`alice`, `bob`)** — use
166
+ role-based names (`worker-1`, `reviewer-1`).