@yemi33/minions 0.1.1950 → 0.1.1952

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/dashboard/js/command-center.js +13 -2
  2. package/dashboard/js/modal-qa.js +10 -0
  3. package/dashboard/js/refresh.js +4 -0
  4. package/dashboard/js/render-dispatch.js +25 -0
  5. package/dashboard/js/render-other.js +109 -2
  6. package/dashboard/js/settings.js +1 -1
  7. package/dashboard/layout.html +2 -2
  8. package/dashboard/pages/engine.html +6 -0
  9. package/dashboard/slim.html +1987 -0
  10. package/dashboard/styles.css +8 -0
  11. package/dashboard.js +450 -40
  12. package/docs/completion-reports.md +25 -0
  13. package/docs/design-state-storage.md +1 -1
  14. package/docs/slim-ux/architecture-suggestions.md +467 -0
  15. package/docs/slim-ux/concepts.md +824 -0
  16. package/engine/ado-mcp-wrapper.js +33 -7
  17. package/engine/ado.js +123 -15
  18. package/engine/cc-worker-pool.js +41 -0
  19. package/engine/cleanup.js +71 -34
  20. package/engine/cli.js +37 -0
  21. package/engine/dispatch.js +32 -9
  22. package/engine/features.js +6 -0
  23. package/engine/gh-token.js +137 -0
  24. package/engine/github.js +166 -29
  25. package/engine/issues.js +29 -0
  26. package/engine/keep-process-sweep.js +397 -0
  27. package/engine/lifecycle.js +150 -33
  28. package/engine/playbook.js +17 -0
  29. package/engine/queries.js +71 -0
  30. package/engine/recovery.js +6 -0
  31. package/engine/shared.js +446 -14
  32. package/engine/spawn-agent.js +44 -2
  33. package/engine/timeout.js +34 -11
  34. package/engine/worktree-pool.js +410 -0
  35. package/engine.js +643 -119
  36. package/package.json +6 -3
  37. package/playbooks/review.md +2 -0
  38. package/playbooks/shared-rules.md +3 -1
  39. package/prompts/cc-system.md +24 -0
  40. package/engine/copilot-models.json +0 -5
@@ -19,6 +19,28 @@ Path shape (resolved by `shared.dispatchCompletionReportPath()` in `engine/share
19
19
 
20
20
  The agent must write the JSON to that exact path before exiting. Any character outside `[a-zA-Z0-9._-]` in the dispatch id is replaced with `-` by the engine when computing the path.
21
21
 
22
+ ## Trust boundary
23
+
24
+ Each spawn also receives a per-dispatch cryptographic value via the `MINIONS_COMPLETION_NONCE` environment variable. The engine generates this with `crypto.randomBytes(16).toString('hex')` in `engine.js:spawnAgent()` and stores it on the in-memory active-process record. The agent is required to copy the value verbatim into the report's `nonce` field. On parse, `engine/lifecycle.js:runPostCompletionHooks()` compares `report.nonce` against the in-memory value:
25
+
26
+ - **Match** — the report is trusted and processed normally.
27
+ - **Mismatch** — the report is treated as forged (a prompt-injected agent or a stale process writing into a sibling dispatch's completion path). Every signal it carries — `status`, `pr`, `noop`, `failure_class`, `retryable`, `needs_rerun`, fenced/summary fallbacks — is discarded. The dispatch is failed with `failure_class: 'completion-nonce-mismatch'` and the work item is marked failed (no auto-retry honors the agent's `retryable` claim).
28
+ - **Missing** — by default, the engine logs `[security] completion-nonce-missing dispatch=… required=false (degraded — report honored)` and still honors the report. Flip `ENGINE_DEFAULTS.completionNonceRequired` (or `engine.completionNonceRequired` in `config.json`) to `true` to hard-fail missing nonces too. Default is `false` for one release so older runtime caches and agents that haven't picked up the prompt change degrade with a warning instead of breaking.
29
+
30
+ Security event log lines are emitted on the `error` channel and are designed to be greppable:
31
+
32
+ ```
33
+ [security] completion-nonce-missing dispatch=<id> agent=<name> wi=<wiId> required=<bool>
34
+ [security] completion-nonce-mismatch dispatch=<id> agent=<name> wi=<wiId> expected=<8char> got=<8char>
35
+ ```
36
+
37
+ **Migration path.** Roll out in two phases:
38
+
39
+ 1. **Phase 1 (default):** Ship the engine + playbook change with `completionNonceRequired: false`. Agents that echo the nonce are validated; agents that don't continue to work but emit a warning per dispatch. Watch the `[security] completion-nonce-missing` warnings drain to zero across runtimes.
40
+ 2. **Phase 2:** Once warnings are quiet for at least one release window, flip `ENGINE_DEFAULTS.completionNonceRequired` to `true` (or set per-deployment via `config.json` → `engine.completionNonceRequired`). Missing nonces then hard-fail like mismatched ones.
41
+
42
+ Do **not** invent, regenerate, or share the nonce across dispatches — each spawn gets a unique value.
43
+
22
44
  ## Top-level schema
23
45
 
24
46
  ```json
@@ -31,6 +53,7 @@ The agent must write the JSON to that exact path before exiting. Any character o
31
53
  "retryable": false,
32
54
  "needs_rerun": false,
33
55
  "noop": false,
56
+ "nonce": "<value of MINIONS_COMPLETION_NONCE env var>",
34
57
  "artifacts": [
35
58
  {"type": "pr", "path": "https://github.com/owner/repo/pull/123", "title": "PR-123"}
36
59
  ]
@@ -49,6 +72,7 @@ The agent must write the JSON to that exact path before exiting. Any character o
49
72
  | `retryable` | boolean | `true` if the engine should auto-retry the dispatch on failure. Overrides the default per-class retry policy when present. |
50
73
  | `needs_rerun` | boolean | `true` if the same work needs to be re-dispatched (vs. retried). Used by build-fix and review-fix loops. |
51
74
  | `artifacts` | array | Durable artifacts the agent created or updated; surfaces in the dashboard work-item detail modal. See [Artifacts](#artifacts). |
75
+ | `nonce` | string | Per-spawn cryptographic value the engine injects via `MINIONS_COMPLETION_NONCE`. Copy the env var value verbatim; the engine validates it on read. See [Trust boundary](#trust-boundary). |
52
76
 
53
77
  ### Optional fields
54
78
 
@@ -76,6 +100,7 @@ Defined in `engine/shared.js` as `FAILURE_CLASS`. Use the canonical hyphenated s
76
100
  | `network-error` | API rate limit, DNS, connectivity | Default retry logic |
77
101
  | `out-of-context` | Context window exhausted | Flag for human review |
78
102
  | `max-turns` | Claude CLI `error_max_turns` — work in progress | Retry same agent |
103
+ | `completion-nonce-mismatch` | Completion JSON missing or mismatched `nonce` (forged completion). See [Trust boundary](#trust-boundary). | Never retry (untrusted) |
79
104
  | `unknown` | Unclassified failure | Default retry logic |
80
105
 
81
106
  Use `"N/A"` when `status` is `success` or `partial` without a failure.
@@ -46,7 +46,7 @@ release .lock file
46
46
  Key properties:
47
47
  - **Synchronous blocking** — `withFileLock` spins with `sleepMs(25)` until lock acquired or 5s timeout (source: `engine/shared.js:175-231`)
48
48
  - **Whole-file granularity** — updating one field in one work item rewrites all 180 items (370 KB)
49
- - **Stale lock recovery** — locks older than 60s are force-removed (source: `engine/shared.js:173`, `LOCK_STALE_MS`)
49
+ - **Stale lock recovery** — locks older than 5 min (`LOCK_STALE_MS = 300_000`) are force-removed; holders that recorded a `{pid, ts}` payload are kept alive past the threshold while `process.kill(pid, 0)` succeeds, with a hard last-resort cap at 5×LOCK_STALE_MS (source: `engine/shared.js`, P-b7d4e8f2)
50
50
  - **Read caching** — only `dispatch.json` has a 2s TTL cache (source: `engine/queries.js:82-91`)
51
51
 
52
52
  ### 1.3 Read vs Write Ratio
@@ -0,0 +1,467 @@
1
+ # Slim UX — Architectural Suggestions
2
+
3
+ > Written after Round 1 (Phase A concept mapping + Phase B layout rebuild). These are **proposals**
4
+ > for engineering changes that would make the slim UX cheaper to maintain and more honest about
5
+ > what's happening under the hood. **None of these are implemented in Round 1.** They're for
6
+ > human discussion before any of them get prioritized.
7
+ >
8
+ > Format per section: **Problem → Proposal → Cost → Risk**.
9
+ >
10
+ > Cost is rough person-days. Risk is what could go wrong.
11
+
12
+ ---
13
+
14
+ ## 1. A `/api/cockpit` aggregate endpoint
15
+
16
+ **Problem.** Right now the slim cockpit has to hit `/api/status` every 5 seconds and pluck six
17
+ fields out of a ~60 KB blob (`dispatch.active`, `dispatch.pending`, `pullRequests`, `watches`,
18
+ `engine`, `dispatch.completed`). The full status payload includes `agents`, `inbox`, `notes`,
19
+ `metrics`, `prdProgress`, `verifyGuides`, `archivedPrds`, `skills`, `mcpServers`, `schedules`,
20
+ `pipelines`, `pinned`, `projects`, `autoMode`, `version` — all of which the cockpit ignores.
21
+ Polling that whole thing every 5 s on every open tab is wasteful.
22
+
23
+ **Proposal.** A new `/api/cockpit` that returns *only* the cockpit-relevant aggregates:
24
+
25
+ ```jsonc
26
+ {
27
+ "engine": { "running": true, "mode": "running", "startedAt": "…" },
28
+ "dispatches": { "active": 2, "pending": 5, "workingAgents": ["dallas", "ralph"] },
29
+ "prs": { "active": 3, "failingBuilds": 1 },
30
+ "watches": { "active": 4, "triggeredRecently": 1 },
31
+ "schedules":{ "dueWithinHour": 2 },
32
+ "lastEvents": [ { "kind": "completion", "ts": "…", "title": "…" }, … ]
33
+ }
34
+ ```
35
+
36
+ Compute it from the existing fast-state cache in `dashboard.js` (`_fastState`) so the cost is
37
+ near-zero — just a few field projections. Slim polls this; full dashboard keeps `/api/status`.
38
+
39
+ **Cost.** ~0.5 day. New handler + a couple of unit tests covering aggregate math.
40
+
41
+ **Risk.** Low. Pure read-side; doesn't touch any state. The only failure mode is the slim
42
+ showing stale numbers if the cache pruner gets confused, but that's the same risk `/api/status`
43
+ already has.
44
+
45
+ ---
46
+
47
+ ## 2. SSE for cockpit updates instead of polling
48
+
49
+ **Problem.** 5-second polling is fine for Round 1 but feels laggy when an agent finishes — you
50
+ see "Sending…" disappear in the chat 30 s before "Active dispatches" updates from 1 to 0. The
51
+ full dashboard already has `/api/status-stream` (SSE); slim just doesn't use it.
52
+
53
+ **Proposal.** Slim subscribes to `/api/status-stream` (or a new `/api/cockpit-stream` that emits
54
+ deltas only). Polling becomes a fallback for browsers without SSE. Visibility-change pause
55
+ already exists in slim (round-1 code), so background tabs cost nothing.
56
+
57
+ **Cost.** ~0.5 day if reusing `/api/status-stream`; ~1.5 days if writing a delta-encoded
58
+ `/api/cockpit-stream`.
59
+
60
+ **Risk.** Medium. SSE connections occasionally drop and need exponential-backoff reconnect.
61
+ Multi-tab leaks are a real concern (each tab opens its own EventSource). Need a small connection
62
+ manager on the client.
63
+
64
+ ---
65
+
66
+ ## 3. A unified event stream for the History panel
67
+
68
+ **Problem.** The History panel currently merges three sources client-side: `dispatch.active`,
69
+ `dispatch.completed`, and `pullRequests`. Each has its own timestamp field (`startedAt`,
70
+ `completedAt`, `updatedAt`), its own shape, and its own truthiness rules. The merge logic is
71
+ fragile — adding a fourth source (e.g. consolidation runs, watch fires, schedule executions,
72
+ pipeline state changes) means more glue.
73
+
74
+ **Proposal.** A canonical `engine/events.jsonl` (append-only, rotated weekly) with a uniform
75
+ shape:
76
+
77
+ ```jsonc
78
+ { "ts": "2026-05-08T15:23:11Z", "kind": "completion", "agent": "dallas",
79
+ "title": "fix slim button", "workItemId": "W-…", "pr": "#42",
80
+ "summary": "…", "level": "info" }
81
+ ```
82
+
83
+ Plus `/api/events?since=…&limit=…` for paged reads. The engine already writes most of these
84
+ moments to `engine/log.json` (the audit ring buffer); we just need to enrich them and expose
85
+ them. The History panel becomes one fetch.
86
+
87
+ **Cost.** ~1.5 days. The hard part is catching every event-emit site — dispatch start, dispatch
88
+ complete, PR sync, watch fire, schedule run, pipeline transition, meeting round advance,
89
+ consolidation run. Mostly mechanical.
90
+
91
+ **Risk.** Medium-low. Events are reads; the risk is missing one. Add a unit test that asserts
92
+ each lifecycle path emits an event.
93
+
94
+ ---
95
+
96
+ ## 4. Should Work Items + Pipelines merge in the data model?
97
+
98
+ **Problem.** Carlos called out "Work Items" and "Pipelines" as conceptually overlapping in the
99
+ existing UI. Looking at the data: a Pipeline is a sequence of stages where each stage *is*
100
+ either a Work Item, a Meeting, or a Plan. So a Pipeline isn't really a separate concept — it's
101
+ a Work Item with a stage list. Yet they have entirely separate stores
102
+ (`engine/dispatch.json` vs `engine/pipeline-runs.json`), separate API surfaces
103
+ (`/api/work-items*` vs `/api/pipelines*`), and separate UIs.
104
+
105
+ **Proposal.** Don't merge the *storage* — that's a deep migration. Instead, introduce a
106
+ "workspace" view at the API layer that returns Work Items and Pipeline runs in one paginated
107
+ list, tagged with `kind: "work-item" | "pipeline-run" | "pipeline-stage"`, sorted by recency.
108
+ Slim's "Work" button opens this unified view; it can drill down into stages when the user
109
+ expands a pipeline.
110
+
111
+ **Cost.** ~2 days. Read-only aggregator + a UI that knows how to render the three flavors. No
112
+ storage changes.
113
+
114
+ **Risk.** Low at the engine layer (read-only); medium at the UX layer (pipelines have richer
115
+ state — wait conditions, retriggers — that don't map cleanly to a flat list).
116
+
117
+ ---
118
+
119
+ ## 5. Plan + PRD: collapse to one document?
120
+
121
+ **Problem.** Carlos asked "What is a PRD? Should I generate one to get started?" The answer is
122
+ "no, PRDs are auto-generated from approved plans by the `plan-to-prd` agent" — but that's
123
+ opaque. The two-stage flow (plan → PRD → work items) exists because the plan is a human
124
+ discussion artifact and the PRD is the machine-readable execution contract.
125
+
126
+ **Proposal.** Don't drop the distinction; *unify the surface*. Treat the plan as the primary
127
+ document and embed the PRD JSON as a fenced block inside the plan markdown:
128
+
129
+ ```markdown
130
+ # Plan: Refresh the Settings sidebar
131
+
132
+ ## Goal
133
+ Three small commits, no big-bang refactor.
134
+
135
+ ## Acceptance criteria
136
+ - …
137
+
138
+ ```prd
139
+ { "items": [ … ] }
140
+ ```
141
+ ```
142
+
143
+ The materializer reads the fenced block instead of a separate `prd/*.json` file. Plan and PRD
144
+ have the same lifecycle, the same `source_plan`, and the same archive path. The slim shows one
145
+ "Plans" button instead of two concepts.
146
+
147
+ **Cost.** ~3 days. Migration script for existing PRDs + materializer changes + verify flow.
148
+ Risky because plan resume logic depends on reading the PRD JSON twin.
149
+
150
+ **Risk.** Medium-high. The plan-to-prd diff-aware update logic is non-trivial and would need to
151
+ be re-verified. **Recommendation: defer until at least one round of slim UX shows the unified
152
+ "Plans" button is the right primitive.**
153
+
154
+ ---
155
+
156
+ ## 6. Notes + KB + Pinned: which survives?
157
+
158
+ **Problem.** Carlos couldn't tell Notes vs KB vs Pinned Context apart. The reality:
159
+
160
+ - **Notes inbox** = raw, unsorted (one file per agent per task).
161
+ - **KB** = consolidated, classified into 5 categories.
162
+ - **Pinned** = hand-picked subset that gets prepended to every agent prompt.
163
+ - **`notes.md`** = a *separate* consolidated team-decisions blob, also injected into agent
164
+ prompts.
165
+
166
+ Four distinct concepts for "the team's memory" is at least one too many.
167
+
168
+ **Proposal.** Collapse to two concepts the human sees: **"Knowledge"** (everything durable)
169
+ and **"Pinned"** (the always-prepended subset). Hide the inbox/`notes.md` distinction from the
170
+ UI — they're consolidation pipeline internals.
171
+
172
+ The data layer keeps four stores under the hood; the UI presents two:
173
+
174
+ | UI label | Data sources |
175
+ |-----------|---------------------------------------------------------------|
176
+ | Knowledge | `notes/inbox/*.md` + `notes/archive/**/*.md` + `notes.md` |
177
+ | Pinned | `pinned.md` |
178
+
179
+ Pinning a Knowledge entry promotes it into `pinned.md`; un-pinning returns it. Inbox vs archived
180
+ becomes a "freshness" badge on the Knowledge entry rather than a separate tab.
181
+
182
+ **Cost.** ~1 day for the API aggregator + new UI tab. No engine changes.
183
+
184
+ **Risk.** Low. Pure presentation.
185
+
186
+ ---
187
+
188
+ ## 7. Schedule vs Watch: is one a special case of the other?
189
+
190
+ **Problem.** Schedule = "fire on a cron pattern." Watch = "fire when a condition flips." They
191
+ share most of the lifecycle (definition, fire history, pause/resume, expire/stopAfter). The
192
+ two state files (`schedule-runs.json` / `watches.json`) and parallel CRUD APIs add maintenance
193
+ without adding real concepts.
194
+
195
+ **Proposal.** Generalize to **Trigger**, with two `kind`s: `cron` and `event`. Backend stays
196
+ in two files for now (avoid the migration), but the API and UI present them as one.
197
+
198
+ ```jsonc
199
+ {
200
+ "id": "nightly-tests",
201
+ "kind": "cron",
202
+ "cron": "0 2 *",
203
+ "action": { "type": "work-item", "title": "Nightly tests", "agent": "dallas" }
204
+ }
205
+ ```
206
+
207
+ ```jsonc
208
+ {
209
+ "id": "watch-pr-42",
210
+ "kind": "event",
211
+ "target": "PR-42",
212
+ "condition": "merged",
213
+ "action": { "type": "notify", "channel": "inbox" }
214
+ }
215
+ ```
216
+
217
+ Slim's "Trigger" button opens one creator that branches on `kind`.
218
+
219
+ **Cost.** ~2 days. New aggregator endpoint, one creator UI, normalized response shape.
220
+
221
+ **Risk.** Medium. Watches have a richer condition vocabulary
222
+ (`new-comments`, `status-change`, `vote-change`, `build-fail`, …) than Schedules. The unified
223
+ surface can't paper over that — the creator UI still needs branch-per-kind rendering. So this
224
+ is mostly a mental-model win, not a surface-area-savings win.
225
+
226
+ ---
227
+
228
+ ## 8. "Two unrelated tasks side-by-side on the same project"
229
+
230
+ **Problem.** Carlos asked how to run two unrelated work streams in parallel on the same project.
231
+ Today the answer is: just queue two work items; the dispatcher will pick two agents (up to
232
+ `engine.maxConcurrent = 5`) and they'll work in separate worktrees. **But** there's no UI
233
+ affordance for separating them — both show up in the same flat dispatch queue. There's no
234
+ "context A vs context B" boundary.
235
+
236
+ **Proposal.** Add a **"thread" / "stream" tag** to Work Items. Optional string, free-form. The
237
+ slim's Status and History panels group by thread when present. Default thread = `default`.
238
+ Threads aren't in the engine at all — they're a presentation grouping. CC accepts them on
239
+ dispatch.
240
+
241
+ ```jsonc
242
+ { "title": "fix login bug", "thread": "auth-rewrite" }
243
+ { "title": "rename buttons", "thread": "ux-cleanup" }
244
+ ```
245
+
246
+ This costs almost nothing technically, and gives the human a way to keep two storylines
247
+ mentally distinct.
248
+
249
+ **Cost.** ~1 day. Schema field on work items, group-by in queries, slim UX wiring.
250
+
251
+ **Risk.** Low. Backward compat by treating missing thread as `default`.
252
+
253
+ ---
254
+
255
+ ## 9. Settings sprawl: split fast vs advanced
256
+
257
+ **Problem.** Carlos said Settings has too many options. The full dashboard's Settings page
258
+ mixes:
259
+
260
+ - Agents (per-agent CLI/model/skill/budget)
261
+ - Projects (paths, repo hosts, work sources)
262
+ - Engine knobs (timeouts, retries, concurrency)
263
+ - Runtime defaults (default CLI, default model)
264
+ - Feature flags
265
+ - ADO/GitHub auth
266
+ - MCP servers
267
+ - Cache controls / reset
268
+
269
+ Most users only ever touch feature flags + the slim-ux toggle. Most settings are either
270
+ *infra* (set once at install) or *fleet* (rarely changed).
271
+
272
+ **Proposal.** Two-tier settings:
273
+
274
+ - **Slim "Quick settings"**: just feature flags + a button to open the full settings.
275
+ - **Full "Advanced settings"**: everything else. Stays as is in the existing dashboard. The
276
+ slim deep-links to it.
277
+
278
+ Round 1 already does this. The next step is to *split* the full dashboard's Settings page into
279
+ "Quick" (the things `/api/settings/reset` fixes) and "Advanced" (the things you tune once).
280
+
281
+ **Cost.** ~1 day on the full dashboard side. Zero on the slim side (already done).
282
+
283
+ **Risk.** None.
284
+
285
+ ---
286
+
287
+ ## 10. Recent Completions: less metadata, more context
288
+
289
+ **Problem.** Carlos said the completions widget shows too much (ID column, agent column, no
290
+ click-through to source). His mental model is: "I want to know what shipped, when, and what
291
+ prompted it."
292
+
293
+ **Proposal.** Restructure each completion entry as a **prompt → result** pair:
294
+
295
+ ```
296
+ 12:04 Carlos: "Fix the dashboard typo on the Plans page"
297
+ → Dallas: ✅ shipped PR #43 (2 files, 4 lines) · 1m ago
298
+ view diff · view PR · view chat thread
299
+ ```
300
+
301
+ The "what prompted it" piece requires the engine to remember the originating CC message — which
302
+ it doesn't today. Either:
303
+
304
+ 1. **Cheap:** When CC dispatches, stash the originating message text in the work item's
305
+ `description` field (it already does this loosely).
306
+ 2. **Right:** Add `_originPrompt` (originating CC turn) and `_originSession` (CC session id) to
307
+ the work item. Slim's history panel can then render the prompt above the result and link
308
+ back to the chat.
309
+
310
+ **Cost.** ~1 day for the cheap version, ~2 days for the right version.
311
+
312
+ **Risk.** Low. Read-side change to the dispatch creation path.
313
+
314
+ ---
315
+
316
+ ## 11. "Skills + MCPs probably belong off the main screen"
317
+
318
+ **Problem.** Carlos called these out as "off the main screen." Skills and MCPs are
319
+ power-user/runtime concerns; they pollute the slim's mental model.
320
+
321
+ **Proposal.** Drop the "Tools" page entirely from slim. Keep them in the full dashboard.
322
+ Surface a single "X skills, Y MCPs available" line in slim's Settings dialog with a deep link.
323
+
324
+ **Cost.** Zero (already done in Round 1 — slim has no Tools page).
325
+
326
+ **Risk.** None.
327
+
328
+ ---
329
+
330
+ ## 12. Dashboard build pipeline: extract slim assets
331
+
332
+ **Problem.** The full dashboard SPA is built once at startup from `dashboard/layout.html` +
333
+ fragments and gzipped. The slim deliberately bypasses this so iteration is hot — read fresh on
334
+ every request. As slim grows past the 1000-line mark, that single-file model will hurt:
335
+
336
+ - CSS, action-prompt JS, cockpit JS, history JS all in one file.
337
+ - No component reuse between slim panels.
338
+ - Hot-reload works for the file but not for individual sections.
339
+
340
+ **Proposal.** Extract slim assets into `dashboard/slim/` once it crosses a complexity threshold
341
+ (round 3 or 4):
342
+
343
+ ```
344
+ dashboard/slim/
345
+ index.html ← references the assets below
346
+ cockpit.js
347
+ history.js
348
+ actions.js
349
+ chat.js
350
+ styles.css
351
+ ```
352
+
353
+ `serveSlimUx` reads `index.html` and inlines or links the others. Hot-reload still works. Each
354
+ file < 300 lines. **Don't do this yet** — the seam is wrong below 1500 lines.
355
+
356
+ **Cost.** ~1 day.
357
+
358
+ **Risk.** Low. CSP relaxation already in place; just needs to extend to include
359
+ `/dashboard/slim/*` if assets become cross-origin-y. (They won't for same-origin assets.)
360
+
361
+ ---
362
+
363
+ ## 13. CC actions: the "create a watch from chat" path needs a visual confirm
364
+
365
+ **Problem.** Round 1 wires the four Action buttons through CC by typing
366
+ "Create a work item: …" into the chat. CC parses, emits `===ACTIONS===`, and the action
367
+ executes. But the user has no easy way to confirm "yes, that's the work item I meant" before
368
+ it's queued. CC's free-text understanding is good but not perfect.
369
+
370
+ **Proposal.** Add an optional `dryRun: true` flag on CC actions. When set, CC echoes the parsed
371
+ action as a structured preview and *does not* enqueue. The slim modal sets `dryRun: true` by
372
+ default; the user clicks "Confirm" to submit a second turn that drops the flag.
373
+
374
+ Or, simpler: skip CC entirely for the four buttons, and call `/api/work-items` /
375
+ `/api/plans/create` / `/api/notes` / `/api/schedules` directly with form fields. Round 2 task.
376
+
377
+ **Cost.** ~1 day for direct API path. ~1.5 days for the dryRun preview flow.
378
+
379
+ **Risk.** Low for the direct path; medium for dryRun (needs CC system prompt edits + tests).
380
+ **Recommendation: do the direct API path; skip dryRun.**
381
+
382
+ ---
383
+
384
+ ## 14. Charter + Routing should be one page in slim
385
+
386
+ **Problem.** Charter is "who the agent is." Routing is "what work goes to whom." They're tightly
387
+ coupled (you can't route review to an agent whose charter says "doesn't review") but live in
388
+ totally separate places.
389
+
390
+ **Proposal.** Add a slim sub-page (off the gear menu) called **"Team"** that shows each agent
391
+ side-by-side with: charter excerpt, routing rules they own, recent completions, current
392
+ dispatch (if any). One screen for "what is the team doing?"
393
+
394
+ **Cost.** ~1.5 days. Read-only aggregator + simple grid UI.
395
+
396
+ **Risk.** Low.
397
+
398
+ ---
399
+
400
+ ## 15. The slim-ux feature flag itself: ramp plan
401
+
402
+ **Problem.** `slim-ux` is currently the only entry point; if it breaks, the user can't get back
403
+ to the full dashboard except via the `?fullDashboard=1` deep link (round-1 added this) or the
404
+ settings dialog. That's fine for early dogfooding, but at GA we'll want a smoother fallback.
405
+
406
+ **Proposal.** Two feature flags instead of one:
407
+
408
+ - `slim-ux` — show the slim *option* (renders a "Try slim UX" link from the full dashboard).
409
+ - `slim-ux-default` — slim becomes the root route; full moves to `/full`.
410
+
411
+ Round 1's behavior corresponds to `slim-ux: true && slim-ux-default: true`. Once slim is GA,
412
+ default flips on; `slim-ux` registry entry gets deleted as redundant.
413
+
414
+ **Cost.** ~0.5 day.
415
+
416
+ **Risk.** None. Simple gating.
417
+
418
+ ---
419
+
420
+ ## 16. Surprises found during Phase A
421
+
422
+ A few things were genuinely surprising while building the concept dictionary. Capturing them so
423
+ the team can decide whether they're features or footguns:
424
+
425
+ **a. `notes.md` and the KB are *separate* memory stores.** Both feed agent prompts; they don't
426
+ share content. Consolidation either writes to `notes.md` (decisions / pinned-style) or
427
+ classifies into a KB category. There's no "this fact lives in both" path. **Probably fine; just
428
+ worth a docstring somewhere.**
429
+
430
+ **b. CC sessions are *non-expiring*.** The CC system intentionally never prunes sessions
431
+ (per `CLAUDE.md` line ~XYZ). So a long-lived tab keeps growing the prompt forever. The
432
+ prompt-hash-mismatch invalidator is the only safety net. **Consider a soft "session age" warning
433
+ in the UI when context approaches the cache limit.**
434
+
435
+ **c. Build-failure cache can go stale.** `_buildStatusStale` is honored by the auto-fix
436
+ dispatcher but not by the cockpit display. Slim's PR tile could show "1 failing build" when the
437
+ build is actually green-after-rerun. **Add `_buildStatusStale` consideration to the cockpit
438
+ display.**
439
+
440
+ **d. `dispatchCompletionReportPath` writes to `engine/completions/<dispatchId>.json` but the
441
+ oversized-prompt sidecar writes to `engine/contexts/<dispatchId>.json`.** Two directories with
442
+ similar purposes. The dashboard widget for "open completion report" sometimes hits the wrong
443
+ one if you copy-paste a path. **Either merge the directories or make the API explicit about
444
+ which kind of artifact you want.**
445
+
446
+ **e. `WORK_TYPE` has 13 entries but `routing.md` only routes ~7 of them.** The unrouted types
447
+ (`MEETING`, `EXPLORE`, `ASK`, …) are routed by other code paths. **Either consolidate routing
448
+ or document the gap clearly.**
449
+
450
+ ---
451
+
452
+ ## Suggested order of operations
453
+
454
+ If we were prioritizing for a Round 2 sprint, this is roughly the order I'd go:
455
+
456
+ 1. **§1** `/api/cockpit` (½ day, immediate slim perf win)
457
+ 2. **§13** Direct API calls for the four Action buttons (1 day, removes the TEMP layer)
458
+ 3. **§3** `/api/events` + history feed (1.5 days, replaces three client-side merges)
459
+ 4. **§6** Knowledge / Pinned UI consolidation (1 day, big mental-model simplification)
460
+ 5. **§2** SSE for cockpit (½–1.5 days, removes the polling loop)
461
+ 6. **§10** Origin prompt linkage on completions (1 day, makes "what prompted this?" answerable)
462
+
463
+ Total: ~6 days for a meaningful Round 2 that earns real shipping confidence.
464
+
465
+ The bigger model-merger questions (§4 Work Items + Pipelines, §5 Plan + PRD, §7 Schedule +
466
+ Watch) should wait until at least Round 3 — we should let the slim UX expose where the
467
+ *model* friction actually is, not assume it from the current UX friction.