@martintrojer/mu 0.3.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +181 -42
- package/README.md +63 -22
- package/dist/cli.js +14143 -5403
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +1779 -1287
- package/dist/index.js +4339 -2943
- package/dist/index.js.map +1 -1
- package/docs/ARCHITECTURE.md +298 -58
- package/docs/HANDOVER.md +461 -0
- package/docs/ROADMAP.md +120 -496
- package/docs/USAGE_GUIDE.md +518 -152
- package/docs/VISION.md +48 -4
- package/docs/VOCABULARY.md +24 -8
- package/docs/img/tui-dashboard.png +0 -0
- package/package.json +12 -6
- package/skills/mu/SKILL.md +274 -443
package/docs/ROADMAP.md
CHANGED
|
@@ -1,542 +1,166 @@
|
|
|
1
1
|
# Roadmap
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
a feature isn't listed here, it isn't planned. If it's listed but
|
|
6
|
-
unbuilt, see its promotion criteria for what would move it.
|
|
3
|
+
The single forward-looking doc. If a feature isn't here, it isn't
|
|
4
|
+
planned.
|
|
7
5
|
|
|
8
|
-
For canonical terms
|
|
9
|
-
pillars
|
|
10
|
-
layout
|
|
6
|
+
For canonical terms see [VOCABULARY.md](VOCABULARY.md). For
|
|
7
|
+
load-bearing pillars see [VISION.md](VISION.md). For module
|
|
8
|
+
layout see [ARCHITECTURE.md](ARCHITECTURE.md). Shipped history
|
|
9
|
+
lives in [CHANGELOG.md](../CHANGELOG.md).
|
|
11
10
|
|
|
12
11
|
---
|
|
13
12
|
|
|
14
|
-
## Promotion criteria
|
|
13
|
+
## Promotion criteria
|
|
15
14
|
|
|
16
15
|
A roadmap item earns implementation when **all three** are true:
|
|
17
16
|
|
|
18
|
-
1. **Proven friction.** A real user
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
**Exception: data-loss footguns.** A change that fixes a default
|
|
32
|
-
that silently destroys user artifacts (uncommitted output, scratch
|
|
33
|
-
logs, benchmark results, etc.) ships on the **first** occurrence,
|
|
34
|
-
not the second. The cost of waiting for criterion 1 is "lose more
|
|
35
|
-
stuff"; that's the wrong cost to optimise. Document the friction
|
|
36
|
-
in the commit message instead.
|
|
37
|
-
|
|
38
|
-
**Polish doesn't count as promotion.** Bug fixes, ergonomic
|
|
39
|
-
improvements, error-message wording, doc tightening, and similar
|
|
40
|
-
"the existing thing works better" changes don't need promotion
|
|
41
|
-
criteria — they just need to be small and to ship clean (typecheck
|
|
42
|
-
+ lint + tests + build). Polish is the dividend the project earns
|
|
43
|
-
by refusing the things on this roadmap. Don't wait for occurrence
|
|
44
|
-
#2 to fix a typo, tighten an error message, or truncate a runaway
|
|
45
|
-
table column.
|
|
17
|
+
1. **Proven friction.** A real user hits the missing feature in a
|
|
18
|
+
real workflow at least twice. Imagined polish doesn't count.
|
|
19
|
+
2. **No pillar refactor.** Fits the current substrate without
|
|
20
|
+
bending any pillar in [VISION.md](VISION.md).
|
|
21
|
+
3. **Bounded scope.** Fits in <300 LOC or has a clear smaller
|
|
22
|
+
subset that does.
|
|
23
|
+
|
|
24
|
+
**Exceptions.** Data-loss footguns (silent destruction of user
|
|
25
|
+
artifacts) ship on the *first* occurrence. Polish — bug fixes,
|
|
26
|
+
ergonomic tweaks, error-message wording, doc tightening — doesn't
|
|
27
|
+
need promotion at all; just ship clean (typecheck + lint + tests +
|
|
28
|
+
build).
|
|
46
29
|
|
|
47
30
|
---
|
|
48
31
|
|
|
49
|
-
## Anti-feature pledges
|
|
32
|
+
## Anti-feature pledges
|
|
50
33
|
|
|
51
34
|
We will NOT, until each one earns its way back via the criteria
|
|
52
|
-
above
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
model-facing tool entropy).
|
|
57
|
-
|
|
58
|
-
- Add a configuration file. All config is CLI flags or env vars.
|
|
59
|
-
- Add a daemon, watcher, or background process beyond what tmux /
|
|
35
|
+
above:
|
|
36
|
+
|
|
37
|
+
- **Config file.** All config is CLI flags or env vars.
|
|
38
|
+
- **Daemon / watcher / background process** beyond what tmux and
|
|
60
39
|
SQLite give us.
|
|
61
|
-
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
- Bundle pi
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
integration, a memory system, or a workflow engine. (These are
|
|
79
|
-
the kinds of accumulated subsystems the council critique flagged
|
|
80
|
-
as costing more than they pay for. mu has none and intends to
|
|
81
|
-
keep it that way.)
|
|
40
|
+
- **Anticipatory abstractions** with zero current consumer (the
|
|
41
|
+
cautionary tale: a `RunContext` trait with no implementor).
|
|
42
|
+
- **Wrappers around wrappers** (cautionary tale:
|
|
43
|
+
`TextStream`/`TextState`/`StreamResult`).
|
|
44
|
+
- **Codegen, embedded JS engine, macros, decorators** beyond
|
|
45
|
+
TypeScript itself. No workflow DSL.
|
|
46
|
+
- **Template/definition system for agent roles.** Spawn flags +
|
|
47
|
+
the orchestrator's first message ARE the definition.
|
|
48
|
+
- **Render layers beyond `cli-table3` + `picocolors`**, except
|
|
49
|
+
`ink` confined to `src/cli/tui/`. No second TUI stack alongside
|
|
50
|
+
`ink` — if `ink` ever stops paying off, *replace* it; don't
|
|
51
|
+
stack stacks.
|
|
52
|
+
- **Bundle pi.** It's a peer dep.
|
|
53
|
+
- **Plugin runtime, web UI, RPC, chat/docs integrations, memory
|
|
54
|
+
system, workflow engine.** Rejected as a class — these are
|
|
55
|
+
exactly the accumulations a prior internal multi-agent runtime
|
|
56
|
+
collected, and not inheriting them is the point.
|
|
82
57
|
|
|
83
58
|
---
|
|
84
59
|
|
|
85
60
|
## Possible — small additions with an obvious shape
|
|
86
61
|
|
|
87
|
-
These have a clear design but haven't yet hit criterion
|
|
88
|
-
friction in ≥2 real workflows). They earn implementation when
|
|
89
|
-
use surfaces them.
|
|
90
|
-
|
|
91
|
-
The section heading is deliberately "Possible," not "Next." "Next"
|
|
92
|
-
implies it's coming. "Possible" doesn't. Items below ship if and
|
|
93
|
-
when they earn it.
|
|
94
|
-
|
|
95
|
-
### Pi extension and the three rules
|
|
96
|
-
|
|
97
|
-
The pi extension is the first "polish" tier — LLM-facing UX
|
|
98
|
-
(typed `mu_*` tools, HUD widget, wakeups) that wraps the same core
|
|
99
|
-
operations the CLI already exposes. Bundled in the same npm
|
|
100
|
-
package; pi is a peer dep.
|
|
101
|
-
|
|
102
|
-
The pi extension is **the only anticipated future caller**. When /
|
|
103
|
-
if it lands, three rules stay non-negotiable:
|
|
104
|
-
|
|
105
|
-
1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
|
|
106
|
-
Extension reads/writes it through the same modules the CLI uses.
|
|
107
|
-
No extension-only state.
|
|
108
|
-
2. **Every operation works from the CLI.** No tool registered in
|
|
109
|
-
the extension has logic that doesn't exist in the CLI. The
|
|
110
|
-
extension is a typed/integrated facade.
|
|
111
|
-
3. **The skill teaches the CLI.** Pi sessions without the extension
|
|
112
|
-
still get a working mu by following [the bundled
|
|
113
|
-
skill](../skills/mu/SKILL.md).
|
|
114
|
-
|
|
115
|
-
If those three rules hold, mu stays driveable from a shell forever
|
|
116
|
-
and the extension stays thin.
|
|
117
|
-
|
|
118
|
-
### `mu adopt <pane-id> [--name <agent>]` — SHIPPED in v0.2 (`e20af89`)
|
|
119
|
-
|
|
120
|
-
Reconciliation surfaces orphan panes; `mu adopt` formally registers
|
|
121
|
-
one of them as a managed agent. Promotion was triggered by the
|
|
122
|
-
multi-agent dogfood pattern (orchestrator runs in a pane outside
|
|
123
|
-
the `mu-<ws>` session and wants to be claimable as a worker).
|
|
62
|
+
These have a clear design but haven't yet hit promotion criterion
|
|
63
|
+
1 (friction in ≥2 real workflows). They earn implementation when
|
|
64
|
+
real use surfaces them.
|
|
124
65
|
|
|
125
|
-
###
|
|
66
|
+
### Per-CLI status detection (claude, codex, …)
|
|
126
67
|
|
|
127
|
-
mu is a pi orchestrator today
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
codex) inherit the same fallback.
|
|
68
|
+
mu is a pi orchestrator today. v0.2's Braille-spinner fallback
|
|
69
|
+
catches every TUI wrapper using standard spinner glyphs
|
|
70
|
+
(U+2800–U+28FF), so pi-meta + solo + many vanilla TUIs (claude,
|
|
71
|
+
codex) work without a per-CLI detector.
|
|
132
72
|
|
|
133
|
-
For patterns the spinner fallback misses (
|
|
134
|
-
|
|
135
|
-
LOC per CLI) is the obvious shape. Promote when a
|
|
136
|
-
specific-prompt-misclassification surfaces.
|
|
73
|
+
For patterns the spinner fallback misses (permission prompts,
|
|
74
|
+
specific busy markers), a per-CLI `Detector` registry keyed by
|
|
75
|
+
CLI name (~50 LOC per CLI) is the obvious shape. Promote when a
|
|
76
|
+
real specific-prompt-misclassification surfaces.
|
|
137
77
|
|
|
138
|
-
Pattern sketch
|
|
139
|
-
per-CLI detector — kept here for whoever picks it up):
|
|
78
|
+
Pattern sketch:
|
|
140
79
|
|
|
141
|
-
| CLI
|
|
142
|
-
|
|
|
143
|
-
| Claude
|
|
144
|
-
| Codex
|
|
145
|
-
| Pi
|
|
80
|
+
| CLI | Busy patterns | Permission patterns |
|
|
81
|
+
| ------ | -------------------------------------- | --------------------------------------------------------- |
|
|
82
|
+
| Claude | `to interrupt`, `\(.*[↑↓].*tokens\)` | `Allow once`, `Allow for this session`, `Esc to cancel` |
|
|
83
|
+
| Codex | `esc to interrupt)`, `to cancel` | `enter to confirm`, `enter to submit \| esc to cancel` |
|
|
84
|
+
| Pi | (well-known mu-defined marker) | (well-known mu-defined marker) — shipped |
|
|
146
85
|
|
|
147
86
|
Critical subtleties any new detector must keep:
|
|
148
87
|
|
|
149
88
|
- **Tail-window extraction**: take last ~100 lines, strip trailing
|
|
150
|
-
blanks, then take last ~20.
|
|
151
|
-
|
|
152
|
-
the registry version factors this out.
|
|
89
|
+
blanks, then take last ~20. Already implemented for pi in
|
|
90
|
+
`src/detect.ts`; the registry version factors it out.
|
|
153
91
|
- **Permission detection uses a narrower window than busy
|
|
154
|
-
detection**
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
is visible, agent is `NeedsPermission`, not `Busy`.
|
|
158
|
-
|
|
159
|
-
### `tasks_v` enriched view
|
|
160
|
-
|
|
161
|
-
```sql
|
|
162
|
-
CREATE VIEW tasks_v AS
|
|
163
|
-
SELECT t.*,
|
|
164
|
-
GROUP_CONCAT(n.content, char(10) || '---' || char(10)) AS notes,
|
|
165
|
-
COUNT(n.id) AS note_count,
|
|
166
|
-
MAX(n.created_at) AS last_note_at
|
|
167
|
-
FROM tasks t
|
|
168
|
-
LEFT JOIN task_notes n ON n.task_id = t.id
|
|
169
|
-
GROUP BY t.id;
|
|
170
|
-
```
|
|
171
|
-
|
|
172
|
-
Earns when `mu sql` queries against tasks + notes start getting
|
|
173
|
-
verbose for a second consumer.
|
|
92
|
+
detection** to prevent already-answered prompts re-triggering.
|
|
93
|
+
- **Permission overrides busy** — if a permission prompt is
|
|
94
|
+
visible, agent is `NeedsPermission`, not `Busy`.
|
|
174
95
|
|
|
175
|
-
|
|
96
|
+
### Subscription-based wakeups
|
|
176
97
|
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
### `snapshots` table + auto-snapshot before mutation — SHIPPED in v0.2 (schema v4; tables carried into v5, and unchanged in v6/v7)
|
|
182
|
-
|
|
183
|
-
`captureSnapshot()` runs at the top of every destructive verb
|
|
184
|
-
(workstream destroy, agent close, task close/reject/defer/release/
|
|
185
|
-
delete, workspace free). Whole-DB copy via
|
|
186
|
-
`VACUUM INTO` (synchronous, FK-page-level atomic). Files land in
|
|
187
|
-
`<dirname(db-path)>/snapshots/<id>.db`; one row per capture in:
|
|
188
|
-
|
|
189
|
-
```sql
|
|
190
|
-
CREATE TABLE snapshots (
|
|
191
|
-
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
192
|
-
workstream TEXT, -- nullable: destroy spans all
|
|
193
|
-
label TEXT NOT NULL, -- operation name + args
|
|
194
|
-
db_path TEXT NOT NULL, -- abs path to .db file
|
|
195
|
-
schema_version INTEGER NOT NULL, -- for restore-time version check
|
|
196
|
-
created_at TEXT NOT NULL
|
|
197
|
-
);
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
GC opportunistic in-hook (<14 days OR <100 rows). NO FK on
|
|
201
|
-
`workstream` — destroying a workstream must NOT cascade-delete
|
|
202
|
-
its pre-destroy snapshot.
|
|
203
|
-
|
|
204
|
-
### `mu undo` + `mu snapshot {list,show}` — SHIPPED in v0.2 (snap_undo_verb)
|
|
205
|
-
|
|
206
|
-
Three verbs on top of the snapshots substrate:
|
|
207
|
-
|
|
208
|
-
- **`mu undo [--yes] [--to <id>]`** — top-level. Restores latest
|
|
209
|
-
snapshot (or the one named by `--to`). Dry-run by default;
|
|
210
|
-
`--yes` commits. Post-restore reconciles every workstream
|
|
211
|
-
(best-effort per workstream, errors swallowed) and reports
|
|
212
|
-
ghosts pruned + orphans surfaced.
|
|
213
|
-
- **`mu snapshot list [-n N] [--json]`** — newest-first table.
|
|
214
|
-
- **`mu snapshot show <id> [--json]`** — full row metadata.
|
|
215
|
-
|
|
216
|
-
Design decisions held to:
|
|
217
|
-
|
|
218
|
-
- **No `mu redo`.** Verbs have side-effects (tmux kill, git worktree
|
|
219
|
-
remove) that aren't replayable. Each restore captures a
|
|
220
|
-
pre-restore snapshot first, so a second `mu undo` rolls forward
|
|
221
|
-
to that one. Verified end-to-end. Promote `mu redo` only if real
|
|
222
|
-
use surfaces a need.
|
|
223
|
-
- **Cross-version restores rejected** (snapshot.schema_version <
|
|
224
|
-
CURRENT_SCHEMA_VERSION); migrations are forward-only. Maps to
|
|
225
|
-
`SnapshotVersionMismatchError` (exit 4).
|
|
226
|
-
- **Tmux state is NOT rolled back.** Restore + reconcile prunes
|
|
227
|
-
ghost rows; orphan panes surface in next `mu agent list`.
|
|
228
|
-
Documented honestly in the verb's stdout.
|
|
229
|
-
|
|
230
|
-
Destructive verbs that already auto-snapshot now also advertise
|
|
231
|
-
undo in their `Next:` blocks (`mu task delete`, `mu workstream
|
|
232
|
-
destroy --yes`, etc.). Closes `snap_destroy_safety`.
|
|
233
|
-
|
|
234
|
-
---
|
|
235
|
-
|
|
236
|
-
## Stretch
|
|
237
|
-
|
|
238
|
-
Items that meet criterion 2 (no pillar bend) and 3 (small) but
|
|
239
|
-
haven't yet hit criterion 1 (proven friction). Stays parked until
|
|
240
|
-
real use surfaces them.
|
|
241
|
-
|
|
242
|
-
### `task_artifacts` — generalized "this task produced X"
|
|
243
|
-
|
|
244
|
-
```sql
|
|
245
|
-
CREATE TABLE task_artifacts (
|
|
246
|
-
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
247
|
-
task_id TEXT NOT NULL REFERENCES tasks(local_id) ON DELETE CASCADE,
|
|
248
|
-
kind TEXT NOT NULL, -- pr|file|url|commit|image
|
|
249
|
-
ref TEXT NOT NULL,
|
|
250
|
-
label TEXT,
|
|
251
|
-
created_at TEXT NOT NULL
|
|
252
|
-
);
|
|
253
|
-
```
|
|
254
|
-
|
|
255
|
-
`mu task artifact add <task> --kind pr <url>`. Surfaces in `mu
|
|
256
|
-
task show` and a future `tasks_v` enriched view.
|
|
257
|
-
|
|
258
|
-
### Other parked items
|
|
259
|
-
|
|
260
|
-
| Item | Source / origin |
|
|
261
|
-
| --- | --- |
|
|
262
|
-
| `CancelScope` for long-running ops — Ctrl-C handling that cooperatively cancels in-flight tmux/exec calls | prior-art pattern (workflows) |
|
|
263
|
-
| `mu.step()` replay cache for `mu run` — re-running a partially-failed script skips already-completed steps | prior-art pattern (workflows; `SqliteWorkflowStore` shape) |
|
|
264
|
-
| `init_tracing(config)` + RAII guard — NDJSON to `<state-dir>/logs/`, MINUTELY rotation, last 100 files | prior-art pattern (tracing) |
|
|
265
|
-
| Subscription-based wakeups — `mu log --tail` polls SQLite once per second; SQLite update hooks (via better-sqlite3) or fs.watch on the WAL would drop latency. | internal critique gap |
|
|
266
|
-
|
|
267
|
-
### Schema normalization — SHIPPED in v0.2 (schema v5)
|
|
268
|
-
|
|
269
|
-
`tasks.id INTEGER PK + (workstream_id, local_id) UNIQUE` shipped
|
|
270
|
-
as the universal substrate-wide pattern, not just on tasks. See
|
|
271
|
-
[docs/ARCHITECTURE.md § Surrogate-PK + SDK-boundary discipline](ARCHITECTURE.md#surrogate-pk--sdk-boundary-discipline-load-bearing).
|
|
272
|
-
Two operators both running `mu task add design` in different
|
|
273
|
-
workstreams just works; same for agents.
|
|
274
|
-
|
|
275
|
-
Post-v5 evolution: schema v6 added the cross-workstream archive
|
|
276
|
-
tables (`archives`, `archived_tasks`, `archived_edges`,
|
|
277
|
-
`archived_notes`, `archived_events`); schema v7 dropped the
|
|
278
|
-
unused `approvals` table. The surrogate-PK shape is unchanged.
|
|
279
|
-
|
|
280
|
-
---
|
|
281
|
-
|
|
282
|
-
## Explicitly rejected
|
|
283
|
-
|
|
284
|
-
These were considered and turned down, with the reason. Listed so
|
|
285
|
-
we don't rediscover the same ideas every quarter.
|
|
286
|
-
|
|
287
|
-
### JavaScript DSL (`mu run` / `mu eval` / `mu repl`)
|
|
288
|
-
|
|
289
|
-
Why it's tempting: atomicity-as-syntax, forward refs as a parser
|
|
290
|
-
feature, LLMs reliably emit structured code.
|
|
291
|
-
|
|
292
|
-
Why we rejected (twice — first as a Lisp like the prior runtime
|
|
293
|
-
used, then as JS-via-`vm`):
|
|
294
|
-
|
|
295
|
-
- The gap a DSL fills is "compose multiple verbs into one
|
|
296
|
-
transactional script." `--json` on every read verb plus typed
|
|
297
|
-
verbs that accept evidence arguments cover that without a
|
|
298
|
-
sandbox, codegen, `.d.ts` shipping, or a parallel typed surface
|
|
299
|
-
to maintain.
|
|
300
|
-
- **Independent corroboration from an internal critique**: five
|
|
301
|
-
orthogonal reviewers (architect, engineer, model-UX,
|
|
302
|
-
thin-harness advocate, operator) all flagged DSL/workflow
|
|
303
|
-
language as the worst maintenance liability of the prior
|
|
304
|
-
internal runtime. "A workflow DSL that becomes 'programming
|
|
305
|
-
the runtime' is a liability."
|
|
306
|
-
- The `vm` sandbox would have to be maintained against Node's
|
|
307
|
-
security model forever; a non-trivial commitment for a feature
|
|
308
|
-
with no proven friction.
|
|
309
|
-
- bash composition over `mu --json | jq` covers what real users
|
|
310
|
-
do.
|
|
311
|
-
|
|
312
|
-
What the DSL would have provided, and what ships instead:
|
|
313
|
-
|
|
314
|
-
| Original DSL feature | Shipped substitute |
|
|
315
|
-
| --------------------------------------------- | ------------------------------------------------------- |
|
|
316
|
-
| `mu run script.ts` (transactional script) | `bash + jq + --json`; SDK in-proc for typed callers |
|
|
317
|
-
| `mu eval` | `mu sql` for raw queries; `bash -c` for actions |
|
|
318
|
-
| `mu repl` | `node` + `import("mu-agent")` for in-proc exploration |
|
|
319
|
-
| `mu.create / spawn / claim / send / ...` | `mu task add / agent spawn / task claim / agent send` |
|
|
320
|
-
| `mu.ready()` / `mu.parallelTracks()` | `mu task next -n 0 --json` / bare `mu --json` / `mu state --json` |
|
|
321
|
-
| Forward refs via deferred string IDs | Add tasks in topological order, or use `mu task block` after-the-fact |
|
|
322
|
-
| Atomic transactions wrapping a script | Per-verb transactions in the SDK; idempotent verbs |
|
|
323
|
-
| `mu.step()` replay cache | Not built; if needed, build on top of `agent_logs` event seq |
|
|
324
|
-
|
|
325
|
-
Re-earn requires repeated friction reports of "I keep writing the
|
|
326
|
-
same bash" that bash + jq + `--json` couldn't fix.
|
|
327
|
-
|
|
328
|
-
### `defineOperation()` registry framework
|
|
329
|
-
|
|
330
|
-
The only consumer that motivated this was the JS DSL's `.d.ts`
|
|
331
|
-
autocomplete. With the DSL rejected, no consumer remains. The pi
|
|
332
|
-
extension, if/when it ships, can share types directly via
|
|
333
|
-
`src/index.ts` SDK exports without a registry layer. Classic case
|
|
334
|
-
of an abstraction with one anticipated consumer.
|
|
335
|
-
|
|
336
|
-
### Markdown agent-definition discovery
|
|
337
|
-
|
|
338
|
-
Spawn already accepts `--cli` / `--command` / `--workspace` /
|
|
339
|
-
`--role` directly; an orchestrator's first message + spawn flags
|
|
340
|
-
ARE the agent's "definition." The `agents/` directory and a
|
|
341
|
-
`docs/AGENT_FORMAT.md` were considered and dropped.
|
|
342
|
-
|
|
343
|
-
Earn back if real friction surfaces ("I'm copy-pasting the same
|
|
344
|
-
role doc into five spawn invocations every day, twice a week").
|
|
345
|
-
|
|
346
|
-
### Build mu as a pure pi extension (no CLI)
|
|
347
|
-
|
|
348
|
-
Why it's tempting: simpler distribution, one install, full access
|
|
349
|
-
to pi's `ExtensionAPI` for HUD and events.
|
|
350
|
-
|
|
351
|
-
Why rejected:
|
|
352
|
-
|
|
353
|
-
- Children spawned by mu can't drive mu without re-loading the
|
|
354
|
-
extension.
|
|
355
|
-
- Humans can't `mu agent list` from a shell to debug.
|
|
356
|
-
- Recursion requires special plumbing.
|
|
357
|
-
- Couples mu to pi's release cycle and extension API.
|
|
358
|
-
- Throws away the "any process can drive this" property.
|
|
359
|
-
|
|
360
|
-
### Build mu as a library that pi imports (no standalone CLI)
|
|
361
|
-
|
|
362
|
-
Why it's tempting: zero subprocess overhead.
|
|
363
|
-
|
|
364
|
-
Why rejected:
|
|
365
|
-
|
|
366
|
-
- Multiple pi instances would each load the library and fight over
|
|
367
|
-
the DB.
|
|
368
|
-
- A standalone CLI on `$PATH` is the cleanest "shared resource"
|
|
369
|
-
model.
|
|
370
|
-
- The library/CLI split is well-trodden — every good tool ships
|
|
371
|
-
both, and the CLI is canonical.
|
|
372
|
-
|
|
373
|
-
### Two binaries: `mu-agents` and `mu-tasks`
|
|
374
|
-
|
|
375
|
-
Why it's tempting: cleaner separation of concerns.
|
|
376
|
-
|
|
377
|
-
Why rejected:
|
|
378
|
-
|
|
379
|
-
- Agent ↔ task integration (claim, owner field, agent_logs about
|
|
380
|
-
tasks) needs them in one transactional surface.
|
|
381
|
-
- One install, one mental model, one `mu doctor`.
|
|
382
|
-
- A prior internal precedent of separating task-graph and
|
|
383
|
-
agent-runtime crates created awkward join logic; mu collapsing
|
|
384
|
-
them is a feature.
|
|
385
|
-
|
|
386
|
-
### `TaskSurface` adapter abstraction with multiple backends
|
|
387
|
-
|
|
388
|
-
Sync to GitHub Issues / Linear / Asana. Why it's tempting:
|
|
389
|
-
composability, "bring your own work tracker."
|
|
390
|
-
|
|
391
|
-
Why rejected:
|
|
392
|
-
|
|
393
|
-
- mu without a built-in task graph is just a fancier agent runner
|
|
394
|
-
— the killer features (parallel tracks, claim, ROI
|
|
395
|
-
prioritization) require a graph.
|
|
396
|
-
- Adapter complexity for systems most users don't have.
|
|
397
|
-
- Round-tripping inverts the model: mu's task graph is local and
|
|
398
|
-
authoritative.
|
|
399
|
-
- If wanted: a separate companion package, not core.
|
|
400
|
-
|
|
401
|
-
### Cross-machine state sync
|
|
402
|
-
|
|
403
|
-
Local-first SQLite. Layer something like syncthing on top if you
|
|
404
|
-
want it. Multi-machine sync would force a server, conflict
|
|
405
|
-
resolution, identity, auth — every one of those breaks the "zero
|
|
406
|
-
ops" pledge.
|
|
407
|
-
|
|
408
|
-
### HTTP API on top of the SQLite registry
|
|
409
|
-
|
|
410
|
-
mu is a CLI; if you need RPC, write it. The schema is small and
|
|
411
|
-
stable enough.
|
|
412
|
-
|
|
413
|
-
### A "hosted" mu
|
|
414
|
-
|
|
415
|
-
Zero ops, no accounts. Your machine is the deployment.
|
|
416
|
-
|
|
417
|
-
### Plugin system / web UI / RPC / chat & docs integrations / memory system / workflow engine
|
|
418
|
-
|
|
419
|
-
Not "rejected one at a time" — rejected as a class. An internal
|
|
420
|
-
critique established that the prior internal runtime's accumulation
|
|
421
|
-
of these adjacent product identities was its central design
|
|
422
|
-
failure: "hidden state, lifecycle bugs, unclear ownership of
|
|
423
|
-
truth, and high model-facing tool entropy."
|
|
424
|
-
|
|
425
|
-
mu's anti-feature pledges (no plugin runtime, no codegen, no
|
|
426
|
-
daemon, no web UI, no chat integration, no memory system, no
|
|
427
|
-
workflow engine) are specifically the accumulations of that prior
|
|
428
|
-
internal runtime that mu chose not to inherit. Each one is
|
|
429
|
-
provable as the absence of a subsystem mu was tempted to copy.
|
|
430
|
-
|
|
431
|
-
### Anthropomorphic builtin agent names (`alice`, `bob`)
|
|
432
|
-
|
|
433
|
-
Use role-based names (`worker-1`, `reviewer-1`). See
|
|
434
|
-
[VOCABULARY.md §"Naming conventions"](VOCABULARY.md#agent-names-prefer-role-n-not-human-names).
|
|
98
|
+
`mu log --tail` polls SQLite once per second. SQLite update hooks
|
|
99
|
+
(via better-sqlite3) or `fs.watch` on the WAL would drop latency
|
|
100
|
+
at the cost of more machinery. Promote when someone hits the
|
|
101
|
+
cliff.
|
|
435
102
|
|
|
436
103
|
---
|
|
437
104
|
|
|
438
105
|
## Open questions
|
|
439
106
|
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
- **`agents.cli` as TEXT vs enum.** Went with TEXT (originally for
|
|
444
|
-
heterogeneous-CLI forward-compat). Today the only meaningful
|
|
445
|
-
value is `pi`. We're keeping it TEXT — if multi-CLI re-earns its
|
|
446
|
-
way back, the column doesn't need a schema migration.
|
|
447
|
-
- **Composite `(workstream, local_id)` PK on tasks.** Currently
|
|
448
|
-
`local_id` is global PK. Two workstreams can't both have a
|
|
449
|
-
`design` task. Recorded as a deferred normalization above.
|
|
450
|
-
- **Capability tags on operations.** The `defineOperation()`
|
|
451
|
-
registry that would have carried these is rejected. The role
|
|
452
|
-
flag on agents is stored but unenforced. The internal critique
|
|
453
|
-
flagged "capability-gated mutations" as part of the minimal
|
|
454
|
-
core; for now mu's only authorization surface is "the agent ran
|
|
455
|
-
the verb." Earn capability enforcement when an agent actually
|
|
456
|
-
does damage.
|
|
457
|
-
- **Per-workstream config.** Resisted (the anti-feature pledge).
|
|
458
|
-
"This workstream uses one pi binary, that one uses another" is
|
|
459
|
-
a real gap that env vars don't solve cleanly. Revisit when the
|
|
460
|
-
second user hits it.
|
|
461
|
-
- **Subscription-based wakeups.** `mu log --tail` polls SQLite
|
|
462
|
-
once per second. Real subscriptions (SQLite update hooks via
|
|
463
|
-
better-sqlite3, or fs.watch on the WAL) would drop latency at
|
|
464
|
-
the cost of more machinery. Not worth it until someone hits
|
|
465
|
-
the cliff.
|
|
466
|
-
|
|
467
|
-
---
|
|
107
|
+
Live during initial design and still partly unresolved. Listed so
|
|
108
|
+
we don't pretend they're settled.
|
|
468
109
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
### From pi-subagents (`src/runs/shared/`)
|
|
477
|
-
|
|
478
|
-
| File | Lesson |
|
|
479
|
-
| -------------------------- | ----------------------------------------------------------------- |
|
|
480
|
-
| `frontmatter.ts` | Agent-frontmatter parser: 28 lines, handles CRLF, quoted values, kebab-case. Port verbatim. |
|
|
481
|
-
| `long-running-guard.ts` | Mutating-bash detection via regex + unquoted-redirection scanner. Don't trust tool names; scan command bodies. |
|
|
482
|
-
| `long-running-guard.ts` | Mutating-failure burst detection: rolling window, consecutive vs same-path failures, escalation threshold. |
|
|
483
|
-
| `completion-guard.ts` | Expected-mutation detection from task prose, not agent role. Strips framework-injected lines before checking. |
|
|
484
|
-
| `model-fallback.ts` | Curated regex list of retryable failures (rate limit, 429, quota, 502/503/504). Don't waste a fallback on auth errors. |
|
|
485
|
-
| `model-fallback.ts` | `splitThinkingSuffix` always splits on **last** colon — preserves `provider/model:high`. |
|
|
486
|
-
| `single-output.ts` | Three cases for output files: agent wrote it, agent didn't, file unreadable. `captureSingleOutputSnapshot` before run to disambiguate. |
|
|
487
|
-
| `worktree.ts` | `node_modules` symlinking + tracking as synthetic-path. Generic across VCS. |
|
|
488
|
-
| `worktree.ts` | Per-task `cwd:` conflict detection. Best-effort rollback on hook failure. |
|
|
489
|
-
| `result-watcher.ts` | `fs.watch` with mandatory polling fallback on `EMFILE`/`ENOSPC`. `unref()` timers. Coalescer for rapid rename events. |
|
|
490
|
-
| `pi-args.ts` | Long tasks → temp file + `@path` argv. System prompt via `mode: 0o600` temp file. Identity env vars passed down. |
|
|
491
|
-
| `extension/doctor.ts` | `lineFromCheck(label, fn)` wrapper turns thrown errors into `failed — <text>` lines so one broken probe doesn't break the report. |
|
|
492
|
-
|
|
493
|
-
### From a prior internal multi-agent runtime
|
|
494
|
-
|
|
495
|
-
| Topic | Lesson |
|
|
496
|
-
| ----------------------------------------------- | ------------------------------------------------------------ |
|
|
497
|
-
| shell-escape | `shell_escape` via single-quote wrapping. |
|
|
498
|
-
| granular workspace-free results | A `WorkspaceFreeResult` with independent `committed`/`submitted`/`commitError`/`submitError`. |
|
|
499
|
-
| submit guard | `timeout -k 5s {N}s sh -c 'exec jf submit --draft </dev/null'` to prevent hanging on TTY prompts. |
|
|
500
|
-
| per-CLI detector | Per-CLI Detector trait + pattern registry. Tail-window + narrow-window distinction. (deferred; pi only today.) |
|
|
501
|
-
| lifecycle state machine | Side-effect-free lifecycle state machine: `(state, event) → outcome`. Single point for tracing. Distinguishes manual `Free` from inferred idle. |
|
|
502
|
-
| read-list reconciliation | "Reality wins": every `list()` queries the substrate, prunes ghosts, adopts orphans. **Implemented (`src/reconcile.ts`).** |
|
|
503
|
-
| parallel-tracks | Parallel-tracks union-find with diamond-merge. **Implemented (`src/tracks.ts`).** |
|
|
504
|
-
| built-in graph views | Built-in views: `ready`, `blocked`, `goals`. **Implemented.** |
|
|
505
|
-
| pane-title-as-identity | Pane-title-as-identity for the claim protocol. **Implemented.** |
|
|
506
|
-
| lisp DSL (rejected for mu, ideas not adopted) | Atomic transactions are per-verb in the SDK; idempotent re-imports work via `INSERT OR IGNORE` + idempotent verbs; forward-ref checking handled at task-add time. JS DSL also rejected (above). |
|
|
507
|
-
| notes model | Append-only, FILES/DECISION/VERIFIED conventions. **Implemented.** |
|
|
110
|
+
- **Capability tags on operations.** mu's only authorization
|
|
111
|
+
surface today is "the agent ran the verb." Promote capability
|
|
112
|
+
enforcement when an agent actually does damage.
|
|
113
|
+
- **Per-workstream config.** Resisted (anti-feature pledge). "This
|
|
114
|
+
workstream uses one pi binary, that one uses another" is a real
|
|
115
|
+
gap env vars don't solve cleanly. Revisit when a second user
|
|
116
|
+
hits it.
|
|
508
117
|
|
|
509
118
|
---
|
|
510
119
|
|
|
511
|
-
##
|
|
120
|
+
## Pi extension and the three rules
|
|
512
121
|
|
|
513
|
-
|
|
122
|
+
If/when a pi extension lands (typed `mu_*` tools, HUD widget,
|
|
123
|
+
wakeups) bundled in this same npm package, three rules stay
|
|
124
|
+
non-negotiable:
|
|
514
125
|
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
520
|
-
|
|
521
|
-
|
|
522
|
-
|
|
523
|
-
---
|
|
524
|
-
|
|
525
|
-
## How to use this roadmap
|
|
526
|
-
|
|
527
|
-
If you're starting work on an item:
|
|
126
|
+
1. **The DB is canonical.** All state in `<state-dir>/mu.db`.
|
|
127
|
+
Extension reads/writes through the same modules the CLI uses.
|
|
128
|
+
No extension-only state.
|
|
129
|
+
2. **Every operation works from the CLI.** No tool registered in
|
|
130
|
+
the extension has logic that doesn't exist in the CLI.
|
|
131
|
+
3. **The skill teaches the CLI.** Pi sessions without the
|
|
132
|
+
extension still get a working mu by following
|
|
133
|
+
[skills/mu/SKILL.md](../skills/mu/SKILL.md).
|
|
528
134
|
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
2. **Open a focused PR per item.** One typed verb per commit, one
|
|
532
|
-
schema change per commit.
|
|
533
|
-
3. **Update [VOCABULARY.md](VOCABULARY.md) first** if you introduce
|
|
534
|
-
a new concept or rename an existing one.
|
|
535
|
-
4. **Add a [CHANGELOG.md](../CHANGELOG.md) entry** under the
|
|
536
|
-
upcoming version.
|
|
135
|
+
If those three rules hold, mu stays driveable from a shell forever
|
|
136
|
+
and the extension stays thin.
|
|
537
137
|
|
|
538
|
-
|
|
138
|
+
---
|
|
539
139
|
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
|
|
140
|
+
## Explicitly rejected (one-liners)
|
|
141
|
+
|
|
142
|
+
Listed so we don't rediscover them. See git history for the full
|
|
143
|
+
reasoning per item.
|
|
144
|
+
|
|
145
|
+
- **JS / Lisp DSL** (`mu run` / `mu eval` / `mu repl`) — bash +
|
|
146
|
+
jq + `--json` covers the gap. A workflow DSL is a maintenance
|
|
147
|
+
liability.
|
|
148
|
+
- **`defineOperation()` registry framework** — no consumer left
|
|
149
|
+
after the DSL was rejected.
|
|
150
|
+
- **Markdown agent-definition discovery** — spawn flags + first
|
|
151
|
+
message already are the definition.
|
|
152
|
+
- **mu as a pi extension only (no CLI)** — children couldn't drive
|
|
153
|
+
mu; humans couldn't debug from a shell.
|
|
154
|
+
- **mu as a library only (no CLI)** — multiple processes would
|
|
155
|
+
fight over the DB.
|
|
156
|
+
- **Two binaries (`mu-agents` + `mu-tasks`)** — agent ↔ task
|
|
157
|
+
integration needs one transactional surface.
|
|
158
|
+
- **`TaskSurface` adapter abstraction** — the built-in graph IS
|
|
159
|
+
the killer feature.
|
|
160
|
+
- **Cross-machine state sync** — local-first SQLite; layer
|
|
161
|
+
syncthing on top if you want it.
|
|
162
|
+
- **HTTP API on top of SQLite** — write your own RPC if you need
|
|
163
|
+
one.
|
|
164
|
+
- **A "hosted" mu** — your machine is the deployment.
|
|
165
|
+
- **Anthropomorphic agent names (`alice`, `bob`)** — use
|
|
166
|
+
role-based names (`worker-1`, `reviewer-1`).
|