@yemi33/minions 0.1.1949 → 0.1.1951
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dashboard/js/command-center.js +9 -0
- package/dashboard/js/modal-qa.js +10 -0
- package/dashboard/js/refresh.js +4 -0
- package/dashboard/js/render-dispatch.js +25 -0
- package/dashboard/js/render-other.js +109 -2
- package/dashboard/js/settings.js +1 -1
- package/dashboard/layout.html +2 -2
- package/dashboard/pages/engine.html +6 -0
- package/dashboard/slim.html +1987 -0
- package/dashboard/styles.css +8 -0
- package/dashboard.js +450 -40
- package/docs/completion-reports.md +25 -0
- package/docs/design-state-storage.md +1 -1
- package/docs/slim-ux/architecture-suggestions.md +467 -0
- package/docs/slim-ux/concepts.md +824 -0
- package/engine/ado-mcp-wrapper.js +33 -7
- package/engine/ado.js +123 -15
- package/engine/cc-worker-pool.js +41 -0
- package/engine/cleanup.js +71 -34
- package/engine/cli.js +37 -0
- package/engine/dispatch.js +32 -9
- package/engine/features.js +6 -0
- package/engine/gh-token.js +137 -0
- package/engine/github.js +166 -29
- package/engine/issues.js +29 -0
- package/engine/keep-process-sweep.js +397 -0
- package/engine/lifecycle.js +150 -33
- package/engine/playbook.js +17 -0
- package/engine/queries.js +71 -0
- package/engine/recovery.js +6 -0
- package/engine/shared.js +481 -30
- package/engine/spawn-agent.js +44 -2
- package/engine/timeout.js +34 -11
- package/engine/worktree-pool.js +410 -0
- package/engine.js +643 -119
- package/package.json +6 -3
- package/playbooks/review.md +2 -0
- package/playbooks/shared-rules.md +3 -1
- package/prompts/cc-system.md +24 -0
- package/engine/copilot-models.json +0 -5
|
@@ -19,6 +19,28 @@ Path shape (resolved by `shared.dispatchCompletionReportPath()` in `engine/share
|
|
|
19
19
|
|
|
20
20
|
The agent must write the JSON to that exact path before exiting. Any character outside `[a-zA-Z0-9._-]` in the dispatch id is replaced with `-` by the engine when computing the path.
|
|
21
21
|
|
|
22
|
+
## Trust boundary
|
|
23
|
+
|
|
24
|
+
Each spawn also receives a per-dispatch cryptographic value via the `MINIONS_COMPLETION_NONCE` environment variable. The engine generates this with `crypto.randomBytes(16).toString('hex')` in `engine.js:spawnAgent()` and stores it on the in-memory active-process record. The agent is required to copy the value verbatim into the report's `nonce` field. On parse, `engine/lifecycle.js:runPostCompletionHooks()` compares `report.nonce` against the in-memory value:
|
|
25
|
+
|
|
26
|
+
- **Match** — the report is trusted and processed normally.
|
|
27
|
+
- **Mismatch** — the report is treated as forged (a prompt-injected agent or a stale process writing into a sibling dispatch's completion path). Every signal it carries — `status`, `pr`, `noop`, `failure_class`, `retryable`, `needs_rerun`, fenced/summary fallbacks — is discarded. The dispatch is failed with `failure_class: 'completion-nonce-mismatch'` and the work item is marked failed (no auto-retry honors the agent's `retryable` claim).
|
|
28
|
+
- **Missing** — by default, the engine logs `[security] completion-nonce-missing dispatch=… required=false (degraded — report honored)` and still honors the report. Flip `ENGINE_DEFAULTS.completionNonceRequired` (or `engine.completionNonceRequired` in `config.json`) to `true` to hard-fail missing nonces too. Default is `false` for one release so older runtime caches and agents that haven't picked up the prompt change degrade with a warning instead of breaking.
|
|
29
|
+
|
|
30
|
+
Security event log lines are emitted on the `error` channel and are designed to be greppable:
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
[security] completion-nonce-missing dispatch=<id> agent=<name> wi=<wiId> required=<bool>
|
|
34
|
+
[security] completion-nonce-mismatch dispatch=<id> agent=<name> wi=<wiId> expected=<8char> got=<8char>
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Migration path.** Roll out in two phases:
|
|
38
|
+
|
|
39
|
+
1. **Phase 1 (default):** Ship the engine + playbook change with `completionNonceRequired: false`. Agents that echo the nonce are validated; agents that don't continue to work but emit a warning per dispatch. Watch the `[security] completion-nonce-missing` warnings drain to zero across runtimes.
|
|
40
|
+
2. **Phase 2:** Once warnings are quiet for at least one release window, flip `ENGINE_DEFAULTS.completionNonceRequired` to `true` (or set per-deployment via `config.json` → `engine.completionNonceRequired`). Missing nonces then hard-fail like mismatched ones.
|
|
41
|
+
|
|
42
|
+
Do **not** invent, regenerate, or share the nonce across dispatches — each spawn gets a unique value.
|
|
43
|
+
|
|
22
44
|
## Top-level schema
|
|
23
45
|
|
|
24
46
|
```json
|
|
@@ -31,6 +53,7 @@ The agent must write the JSON to that exact path before exiting. Any character o
|
|
|
31
53
|
"retryable": false,
|
|
32
54
|
"needs_rerun": false,
|
|
33
55
|
"noop": false,
|
|
56
|
+
"nonce": "<value of MINIONS_COMPLETION_NONCE env var>",
|
|
34
57
|
"artifacts": [
|
|
35
58
|
{"type": "pr", "path": "https://github.com/owner/repo/pull/123", "title": "PR-123"}
|
|
36
59
|
]
|
|
@@ -49,6 +72,7 @@ The agent must write the JSON to that exact path before exiting. Any character o
|
|
|
49
72
|
| `retryable` | boolean | `true` if the engine should auto-retry the dispatch on failure. Overrides the default per-class retry policy when present. |
|
|
50
73
|
| `needs_rerun` | boolean | `true` if the same work needs to be re-dispatched (vs. retried). Used by build-fix and review-fix loops. |
|
|
51
74
|
| `artifacts` | array | Durable artifacts the agent created or updated; surfaces in the dashboard work-item detail modal. See [Artifacts](#artifacts). |
|
|
75
|
+
| `nonce` | string | Per-spawn cryptographic value the engine injects via `MINIONS_COMPLETION_NONCE`. Copy the env var value verbatim; the engine validates it on read. See [Trust boundary](#trust-boundary). |
|
|
52
76
|
|
|
53
77
|
### Optional fields
|
|
54
78
|
|
|
@@ -76,6 +100,7 @@ Defined in `engine/shared.js` as `FAILURE_CLASS`. Use the canonical hyphenated s
|
|
|
76
100
|
| `network-error` | API rate limit, DNS, connectivity | Default retry logic |
|
|
77
101
|
| `out-of-context` | Context window exhausted | Flag for human review |
|
|
78
102
|
| `max-turns` | Claude CLI `error_max_turns` — work in progress | Retry same agent |
|
|
103
|
+
| `completion-nonce-mismatch` | Completion JSON missing or mismatched `nonce` (forged completion). See [Trust boundary](#trust-boundary). | Never retry (untrusted) |
|
|
79
104
|
| `unknown` | Unclassified failure | Default retry logic |
|
|
80
105
|
|
|
81
106
|
Use `"N/A"` when `status` is `success` or `partial` without a failure.
|
|
@@ -46,7 +46,7 @@ release .lock file
|
|
|
46
46
|
Key properties:
|
|
47
47
|
- **Synchronous blocking** — `withFileLock` spins with `sleepMs(25)` until lock acquired or 5s timeout (source: `engine/shared.js:175-231`)
|
|
48
48
|
- **Whole-file granularity** — updating one field in one work item rewrites all 180 items (370 KB)
|
|
49
|
-
- **Stale lock recovery** — locks older than
|
|
49
|
+
- **Stale lock recovery** — locks older than 5 min (`LOCK_STALE_MS = 300_000`) are force-removed; holders that recorded a `{pid, ts}` payload are kept alive past the threshold while `process.kill(pid, 0)` succeeds, with a hard last-resort cap at 5×LOCK_STALE_MS (source: `engine/shared.js`, P-b7d4e8f2)
|
|
50
50
|
- **Read caching** — only `dispatch.json` has a 2s TTL cache (source: `engine/queries.js:82-91`)
|
|
51
51
|
|
|
52
52
|
### 1.3 Read vs Write Ratio
|
|
@@ -0,0 +1,467 @@
|
|
|
1
|
+
# Slim UX — Architectural Suggestions
|
|
2
|
+
|
|
3
|
+
> Written after Round 1 (Phase A concept mapping + Phase B layout rebuild). These are **proposals**
|
|
4
|
+
> for engineering changes that would make the slim UX cheaper to maintain and more honest about
|
|
5
|
+
> what's happening under the hood. **None of these are implemented in Round 1.** They're for
|
|
6
|
+
> human discussion before any of them get prioritized.
|
|
7
|
+
>
|
|
8
|
+
> Format per section: **Problem → Proposal → Cost → Risk**.
|
|
9
|
+
>
|
|
10
|
+
> Cost is rough person-days. Risk is what could go wrong.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 1. A `/api/cockpit` aggregate endpoint
|
|
15
|
+
|
|
16
|
+
**Problem.** Right now the slim cockpit has to hit `/api/status` every 5 seconds and pluck six
|
|
17
|
+
fields out of a ~60 KB blob (`dispatch.active`, `dispatch.pending`, `pullRequests`, `watches`,
|
|
18
|
+
`engine`, `dispatch.completed`). The full status payload includes `agents`, `inbox`, `notes`,
|
|
19
|
+
`metrics`, `prdProgress`, `verifyGuides`, `archivedPrds`, `skills`, `mcpServers`, `schedules`,
|
|
20
|
+
`pipelines`, `pinned`, `projects`, `autoMode`, `version` — all of which the cockpit ignores.
|
|
21
|
+
Polling that whole thing every 5 s on every open tab is wasteful.
|
|
22
|
+
|
|
23
|
+
**Proposal.** A new `/api/cockpit` that returns *only* the cockpit-relevant aggregates:
|
|
24
|
+
|
|
25
|
+
```jsonc
|
|
26
|
+
{
|
|
27
|
+
"engine": { "running": true, "mode": "running", "startedAt": "…" },
|
|
28
|
+
"dispatches": { "active": 2, "pending": 5, "workingAgents": ["dallas", "ralph"] },
|
|
29
|
+
"prs": { "active": 3, "failingBuilds": 1 },
|
|
30
|
+
"watches": { "active": 4, "triggeredRecently": 1 },
|
|
31
|
+
"schedules":{ "dueWithinHour": 2 },
|
|
32
|
+
"lastEvents": [ { "kind": "completion", "ts": "…", "title": "…" }, … ]
|
|
33
|
+
}
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Compute it from the existing fast-state cache in `dashboard.js` (`_fastState`) so the cost is
|
|
37
|
+
near-zero — just a few field projections. Slim polls this; full dashboard keeps `/api/status`.
|
|
38
|
+
|
|
39
|
+
**Cost.** ~0.5 day. New handler + a couple of unit tests covering aggregate math.
|
|
40
|
+
|
|
41
|
+
**Risk.** Low. Pure read-side; doesn't touch any state. The only failure mode is the slim
|
|
42
|
+
showing stale numbers if the cache pruner gets confused, but that's the same risk `/api/status`
|
|
43
|
+
already has.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## 2. SSE for cockpit updates instead of polling
|
|
48
|
+
|
|
49
|
+
**Problem.** 5-second polling is fine for Round 1 but feels laggy when an agent finishes — you
|
|
50
|
+
see "Sending…" disappear in the chat 30 s before "Active dispatches" updates from 1 to 0. The
|
|
51
|
+
full dashboard already has `/api/status-stream` (SSE); slim just doesn't use it.
|
|
52
|
+
|
|
53
|
+
**Proposal.** Slim subscribes to `/api/status-stream` (or a new `/api/cockpit-stream` that emits
|
|
54
|
+
deltas only). Polling becomes a fallback for browsers without SSE. Visibility-change pause
|
|
55
|
+
already exists in slim (round-1 code), so background tabs cost nothing.
|
|
56
|
+
|
|
57
|
+
**Cost.** ~0.5 day if reusing `/api/status-stream`; ~1.5 days if writing a delta-encoded
|
|
58
|
+
`/api/cockpit-stream`.
|
|
59
|
+
|
|
60
|
+
**Risk.** Medium. SSE connections occasionally drop and need exponential-backoff reconnect.
|
|
61
|
+
Multi-tab leaks are a real concern (each tab opens its own EventSource). Need a small connection
|
|
62
|
+
manager on the client.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 3. A unified event stream for the History panel
|
|
67
|
+
|
|
68
|
+
**Problem.** The History panel currently merges three sources client-side: `dispatch.active`,
|
|
69
|
+
`dispatch.completed`, and `pullRequests`. Each has its own timestamp field (`startedAt`,
|
|
70
|
+
`completedAt`, `updatedAt`), its own shape, and its own truthiness rules. The merge logic is
|
|
71
|
+
fragile — adding a fourth source (e.g. consolidation runs, watch fires, schedule executions,
|
|
72
|
+
pipeline state changes) means more glue.
|
|
73
|
+
|
|
74
|
+
**Proposal.** A canonical `engine/events.jsonl` (append-only, rotated weekly) with a uniform
|
|
75
|
+
shape:
|
|
76
|
+
|
|
77
|
+
```jsonc
|
|
78
|
+
{ "ts": "2026-05-08T15:23:11Z", "kind": "completion", "agent": "dallas",
|
|
79
|
+
"title": "fix slim button", "workItemId": "W-…", "pr": "#42",
|
|
80
|
+
"summary": "…", "level": "info" }
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Plus `/api/events?since=…&limit=…` for paged reads. The engine already writes most of these
|
|
84
|
+
moments to `engine/log.json` (the audit ring buffer); we just need to enrich them and expose
|
|
85
|
+
them. The History panel becomes one fetch.
|
|
86
|
+
|
|
87
|
+
**Cost.** ~1.5 days. The hard part is catching every event-emit site — dispatch start, dispatch
|
|
88
|
+
complete, PR sync, watch fire, schedule run, pipeline transition, meeting round advance,
|
|
89
|
+
consolidation run. Mostly mechanical.
|
|
90
|
+
|
|
91
|
+
**Risk.** Medium-low. Events are reads; the risk is missing one. Add a unit test that asserts
|
|
92
|
+
each lifecycle path emits an event.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## 4. Should Work Items + Pipelines merge in the data model?
|
|
97
|
+
|
|
98
|
+
**Problem.** Carlos called out "Work Items" and "Pipelines" as conceptually overlapping in the
|
|
99
|
+
existing UI. Looking at the data: a Pipeline is a sequence of stages where each stage *is*
|
|
100
|
+
either a Work Item, a Meeting, or a Plan. So a Pipeline isn't really a separate concept — it's
|
|
101
|
+
a Work Item with a stage list. Yet they have entirely separate stores
|
|
102
|
+
(`engine/dispatch.json` vs `engine/pipeline-runs.json`), separate API surfaces
|
|
103
|
+
(`/api/work-items*` vs `/api/pipelines*`), and separate UIs.
|
|
104
|
+
|
|
105
|
+
**Proposal.** Don't merge the *storage* — that's a deep migration. Instead, introduce a
|
|
106
|
+
"workspace" view at the API layer that returns Work Items and Pipeline runs in one paginated
|
|
107
|
+
list, tagged with `kind: "work-item" | "pipeline-run" | "pipeline-stage"`, sorted by recency.
|
|
108
|
+
Slim's "Work" button opens this unified view; it can drill down into stages when the user
|
|
109
|
+
expands a pipeline.
|
|
110
|
+
|
|
111
|
+
**Cost.** ~2 days. Read-only aggregator + a UI that knows how to render the three flavors. No
|
|
112
|
+
storage changes.
|
|
113
|
+
|
|
114
|
+
**Risk.** Low at the engine layer (read-only); medium at the UX layer (pipelines have richer
|
|
115
|
+
state — wait conditions, retriggers — that don't map cleanly to a flat list).
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## 5. Plan + PRD: collapse to one document?
|
|
120
|
+
|
|
121
|
+
**Problem.** Carlos asked "What is a PRD? Should I generate one to get started?" The answer is
|
|
122
|
+
"no, PRDs are auto-generated from approved plans by the `plan-to-prd` agent" — but that's
|
|
123
|
+
opaque. The two-stage flow (plan → PRD → work items) exists because the plan is a human
|
|
124
|
+
discussion artifact and the PRD is the machine-readable execution contract.
|
|
125
|
+
|
|
126
|
+
**Proposal.** Don't drop the distinction; *unify the surface*. Treat the plan as the primary
|
|
127
|
+
document and embed the PRD JSON as a fenced block inside the plan markdown:
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
# Plan: Refresh the Settings sidebar
|
|
131
|
+
|
|
132
|
+
## Goal
|
|
133
|
+
Three small commits, no big-bang refactor.
|
|
134
|
+
|
|
135
|
+
## Acceptance criteria
|
|
136
|
+
- …
|
|
137
|
+
|
|
138
|
+
```prd
|
|
139
|
+
{ "items": [ … ] }
|
|
140
|
+
```
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
The materializer reads the fenced block instead of a separate `prd/*.json` file. Plan and PRD
|
|
144
|
+
have the same lifecycle, the same `source_plan`, and the same archive path. The slim shows one
|
|
145
|
+
"Plans" button instead of two concepts.
|
|
146
|
+
|
|
147
|
+
**Cost.** ~3 days. Migration script for existing PRDs + materializer changes + verify flow.
|
|
148
|
+
Risky because plan resume logic depends on reading the PRD JSON twin.
|
|
149
|
+
|
|
150
|
+
**Risk.** Medium-high. The plan-to-prd diff-aware update logic is non-trivial and would need to
|
|
151
|
+
be re-verified. **Recommendation: defer until at least one round of slim UX shows the unified
|
|
152
|
+
"Plans" button is the right primitive.**
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## 6. Notes + KB + Pinned: which survives?
|
|
157
|
+
|
|
158
|
+
**Problem.** Carlos couldn't tell Notes vs KB vs Pinned Context apart. The reality:
|
|
159
|
+
|
|
160
|
+
- **Notes inbox** = raw, unsorted (one file per agent per task).
|
|
161
|
+
- **KB** = consolidated, classified into 5 categories.
|
|
162
|
+
- **Pinned** = hand-picked subset that gets prepended to every agent prompt.
|
|
163
|
+
- **`notes.md`** = a *separate* consolidated team-decisions blob, also injected into agent
|
|
164
|
+
prompts.
|
|
165
|
+
|
|
166
|
+
Four distinct concepts for "the team's memory" is at least one too many.
|
|
167
|
+
|
|
168
|
+
**Proposal.** Collapse to two concepts the human sees: **"Knowledge"** (everything durable)
|
|
169
|
+
and **"Pinned"** (the always-prepended subset). Hide the inbox/`notes.md` distinction from the
|
|
170
|
+
UI — they're consolidation pipeline internals.
|
|
171
|
+
|
|
172
|
+
The data layer keeps four stores under the hood; the UI presents two:
|
|
173
|
+
|
|
174
|
+
| UI label | Data sources |
|
|
175
|
+
|-----------|---------------------------------------------------------------|
|
|
176
|
+
| Knowledge | `notes/inbox/*.md` + `notes/archive/**/*.md` + `notes.md` |
|
|
177
|
+
| Pinned | `pinned.md` |
|
|
178
|
+
|
|
179
|
+
Pinning a Knowledge entry promotes it into `pinned.md`; un-pinning returns it. Inbox vs archived
|
|
180
|
+
becomes a "freshness" badge on the Knowledge entry rather than a separate tab.
|
|
181
|
+
|
|
182
|
+
**Cost.** ~1 day for the API aggregator + new UI tab. No engine changes.
|
|
183
|
+
|
|
184
|
+
**Risk.** Low. Pure presentation.
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## 7. Schedule vs Watch: is one a special case of the other?
|
|
189
|
+
|
|
190
|
+
**Problem.** Schedule = "fire on a cron pattern." Watch = "fire when a condition flips." They
|
|
191
|
+
share most of the lifecycle (definition, fire history, pause/resume, expire/stopAfter). The
|
|
192
|
+
two state files (`schedule-runs.json` / `watches.json`) and parallel CRUD APIs add maintenance
|
|
193
|
+
without adding real concepts.
|
|
194
|
+
|
|
195
|
+
**Proposal.** Generalize to **Trigger**, with two `kind`s: `cron` and `event`. Backend stays
|
|
196
|
+
in two files for now (avoid the migration), but the API and UI present them as one.
|
|
197
|
+
|
|
198
|
+
```jsonc
|
|
199
|
+
{
|
|
200
|
+
"id": "nightly-tests",
|
|
201
|
+
"kind": "cron",
|
|
202
|
+
"cron": "0 2 *",
|
|
203
|
+
"action": { "type": "work-item", "title": "Nightly tests", "agent": "dallas" }
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
```jsonc
|
|
208
|
+
{
|
|
209
|
+
"id": "watch-pr-42",
|
|
210
|
+
"kind": "event",
|
|
211
|
+
"target": "PR-42",
|
|
212
|
+
"condition": "merged",
|
|
213
|
+
"action": { "type": "notify", "channel": "inbox" }
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Slim's "Trigger" button opens one creator that branches on `kind`.
|
|
218
|
+
|
|
219
|
+
**Cost.** ~2 days. New aggregator endpoint, one creator UI, normalized response shape.
|
|
220
|
+
|
|
221
|
+
**Risk.** Medium. Watches have a richer condition vocabulary
|
|
222
|
+
(`new-comments`, `status-change`, `vote-change`, `build-fail`, …) than Schedules. The unified
|
|
223
|
+
surface can't paper over that — the creator UI still needs branch-per-kind rendering. So this
|
|
224
|
+
is mostly a mental-model win, not a surface-area-savings win.
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## 8. "Two unrelated tasks side-by-side on the same project"
|
|
229
|
+
|
|
230
|
+
**Problem.** Carlos asked how to run two unrelated work streams in parallel on the same project.
|
|
231
|
+
Today the answer is: just queue two work items; the dispatcher will pick two agents (up to
|
|
232
|
+
`engine.maxConcurrent = 5`) and they'll work in separate worktrees. **But** there's no UI
|
|
233
|
+
affordance for separating them — both show up in the same flat dispatch queue. There's no
|
|
234
|
+
"context A vs context B" boundary.
|
|
235
|
+
|
|
236
|
+
**Proposal.** Add a **"thread" / "stream" tag** to Work Items. Optional string, free-form. The
|
|
237
|
+
slim's Status and History panels group by thread when present. Default thread = `default`.
|
|
238
|
+
Threads aren't in the engine at all — they're a presentation grouping. CC accepts them on
|
|
239
|
+
dispatch.
|
|
240
|
+
|
|
241
|
+
```jsonc
|
|
242
|
+
{ "title": "fix login bug", "thread": "auth-rewrite" }
|
|
243
|
+
{ "title": "rename buttons", "thread": "ux-cleanup" }
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
This costs almost nothing technically, and gives the human a way to keep two storylines
|
|
247
|
+
mentally distinct.
|
|
248
|
+
|
|
249
|
+
**Cost.** ~1 day. Schema field on work items, group-by in queries, slim UX wiring.
|
|
250
|
+
|
|
251
|
+
**Risk.** Low. Backward compat by treating missing thread as `default`.
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## 9. Settings sprawl: split fast vs advanced
|
|
256
|
+
|
|
257
|
+
**Problem.** Carlos said Settings has too many options. The full dashboard's Settings page
|
|
258
|
+
mixes:
|
|
259
|
+
|
|
260
|
+
- Agents (per-agent CLI/model/skill/budget)
|
|
261
|
+
- Projects (paths, repo hosts, work sources)
|
|
262
|
+
- Engine knobs (timeouts, retries, concurrency)
|
|
263
|
+
- Runtime defaults (default CLI, default model)
|
|
264
|
+
- Feature flags
|
|
265
|
+
- ADO/GitHub auth
|
|
266
|
+
- MCP servers
|
|
267
|
+
- Cache controls / reset
|
|
268
|
+
|
|
269
|
+
Most users only ever touch feature flags + the slim-ux toggle. Most settings are either
|
|
270
|
+
*infra* (set once at install) or *fleet* (rarely changed).
|
|
271
|
+
|
|
272
|
+
**Proposal.** Two-tier settings:
|
|
273
|
+
|
|
274
|
+
- **Slim "Quick settings"**: just feature flags + a button to open the full settings.
|
|
275
|
+
- **Full "Advanced settings"**: everything else. Stays as is in the existing dashboard. The
|
|
276
|
+
slim deep-links to it.
|
|
277
|
+
|
|
278
|
+
Round 1 already does this. The next step is to *split* the full dashboard's Settings page into
|
|
279
|
+
"Quick" (the things `/api/settings/reset` fixes) and "Advanced" (the things you tune once).
|
|
280
|
+
|
|
281
|
+
**Cost.** ~1 day on the full dashboard side. Zero on the slim side (already done).
|
|
282
|
+
|
|
283
|
+
**Risk.** None.
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
## 10. Recent Completions: less metadata, more context
|
|
288
|
+
|
|
289
|
+
**Problem.** Carlos said the completions widget shows too much (ID column, agent column, no
|
|
290
|
+
click-through to source). His mental model is: "I want to know what shipped, when, and what
|
|
291
|
+
prompted it."
|
|
292
|
+
|
|
293
|
+
**Proposal.** Restructure each completion entry as a **prompt → result** pair:
|
|
294
|
+
|
|
295
|
+
```
|
|
296
|
+
12:04 Carlos: "Fix the dashboard typo on the Plans page"
|
|
297
|
+
→ Dallas: ✅ shipped PR #43 (2 files, 4 lines) · 1m ago
|
|
298
|
+
view diff · view PR · view chat thread
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
The "what prompted it" piece requires the engine to remember the originating CC message — which
|
|
302
|
+
it doesn't today. Either:
|
|
303
|
+
|
|
304
|
+
1. **Cheap:** When CC dispatches, stash the originating message text in the work item's
|
|
305
|
+
`description` field (it already does this loosely).
|
|
306
|
+
2. **Right:** Add `_originPrompt` (originating CC turn) and `_originSession` (CC session id) to
|
|
307
|
+
the work item. Slim's history panel can then render the prompt above the result and link
|
|
308
|
+
back to the chat.
|
|
309
|
+
|
|
310
|
+
**Cost.** ~1 day for the cheap version, ~2 days for the right version.
|
|
311
|
+
|
|
312
|
+
**Risk.** Low. Read-side change to the dispatch creation path.
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## 11. "Skills + MCPs probably belong off the main screen"
|
|
317
|
+
|
|
318
|
+
**Problem.** Carlos called these out as "off the main screen." Skills and MCPs are
|
|
319
|
+
power-user/runtime concerns; they pollute the slim's mental model.
|
|
320
|
+
|
|
321
|
+
**Proposal.** Drop the "Tools" page entirely from slim. Keep them in the full dashboard.
|
|
322
|
+
Surface a single "X skills, Y MCPs available" line in slim's Settings dialog with a deep link.
|
|
323
|
+
|
|
324
|
+
**Cost.** Zero (already done in Round 1 — slim has no Tools page).
|
|
325
|
+
|
|
326
|
+
**Risk.** None.
|
|
327
|
+
|
|
328
|
+
---
|
|
329
|
+
|
|
330
|
+
## 12. Dashboard build pipeline: extract slim assets
|
|
331
|
+
|
|
332
|
+
**Problem.** The full dashboard SPA is built once at startup from `dashboard/layout.html` +
|
|
333
|
+
fragments and gzipped. The slim deliberately bypasses this so iteration is hot — read fresh on
|
|
334
|
+
every request. As slim grows past the 1000-line mark, that single-file model will hurt:
|
|
335
|
+
|
|
336
|
+
- CSS, action-prompt JS, cockpit JS, history JS all in one file.
|
|
337
|
+
- No component reuse between slim panels.
|
|
338
|
+
- Hot-reload works for the file but not for individual sections.
|
|
339
|
+
|
|
340
|
+
**Proposal.** Extract slim assets into `dashboard/slim/` once it crosses a complexity threshold
|
|
341
|
+
(round 3 or 4):
|
|
342
|
+
|
|
343
|
+
```
|
|
344
|
+
dashboard/slim/
|
|
345
|
+
index.html ← references the assets below
|
|
346
|
+
cockpit.js
|
|
347
|
+
history.js
|
|
348
|
+
actions.js
|
|
349
|
+
chat.js
|
|
350
|
+
styles.css
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
`serveSlimUx` reads `index.html` and inlines or links the others. Hot-reload still works. Each
|
|
354
|
+
file < 300 lines. **Don't do this yet** — the seam is wrong below 1500 lines.
|
|
355
|
+
|
|
356
|
+
**Cost.** ~1 day.
|
|
357
|
+
|
|
358
|
+
**Risk.** Low. CSP relaxation already in place; just needs to extend to include
|
|
359
|
+
`/dashboard/slim/*` if assets become cross-origin-y. (They won't for same-origin assets.)
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## 13. CC actions: the "create a watch from chat" path needs a visual confirm
|
|
364
|
+
|
|
365
|
+
**Problem.** Round 1 wires the four Action buttons through CC by typing
|
|
366
|
+
"Create a work item: …" into the chat. CC parses, emits `===ACTIONS===`, and the action
|
|
367
|
+
executes. But the user has no easy way to confirm "yes, that's the work item I meant" before
|
|
368
|
+
it's queued. CC's free-text understanding is good but not perfect.
|
|
369
|
+
|
|
370
|
+
**Proposal.** Add an optional `dryRun: true` flag on CC actions. When set, CC echoes the parsed
|
|
371
|
+
action as a structured preview and *does not* enqueue. The slim modal sets `dryRun: true` by
|
|
372
|
+
default; the user clicks "Confirm" to submit a second turn that drops the flag.
|
|
373
|
+
|
|
374
|
+
Or, simpler: skip CC entirely for the four buttons, and call `/api/work-items` /
|
|
375
|
+
`/api/plans/create` / `/api/notes` / `/api/schedules` directly with form fields. Round 2 task.
|
|
376
|
+
|
|
377
|
+
**Cost.** ~1 day for direct API path. ~1.5 days for the dryRun preview flow.
|
|
378
|
+
|
|
379
|
+
**Risk.** Low for the direct path; medium for dryRun (needs CC system prompt edits + tests).
|
|
380
|
+
**Recommendation: do the direct API path; skip dryRun.**
|
|
381
|
+
|
|
382
|
+
---
|
|
383
|
+
|
|
384
|
+
## 14. Charter + Routing should be one page in slim
|
|
385
|
+
|
|
386
|
+
**Problem.** Charter is "who the agent is." Routing is "what work goes to whom." They're tightly
|
|
387
|
+
coupled (you can't route review to an agent whose charter says "doesn't review") but live in
|
|
388
|
+
totally separate places.
|
|
389
|
+
|
|
390
|
+
**Proposal.** Add a slim sub-page (off the gear menu) called **"Team"** that shows each agent
|
|
391
|
+
side-by-side with: charter excerpt, routing rules they own, recent completions, current
|
|
392
|
+
dispatch (if any). One screen for "what is the team doing?"
|
|
393
|
+
|
|
394
|
+
**Cost.** ~1.5 days. Read-only aggregator + simple grid UI.
|
|
395
|
+
|
|
396
|
+
**Risk.** Low.
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
## 15. The slim-ux feature flag itself: ramp plan
|
|
401
|
+
|
|
402
|
+
**Problem.** `slim-ux` is currently the only entry point; if it breaks, the user can't get back
|
|
403
|
+
to the full dashboard except via the `?fullDashboard=1` deep link (round-1 added this) or the
|
|
404
|
+
settings dialog. That's fine for early dogfooding, but at GA we'll want a smoother fallback.
|
|
405
|
+
|
|
406
|
+
**Proposal.** Two feature flags instead of one:
|
|
407
|
+
|
|
408
|
+
- `slim-ux` — show the slim *option* (renders a "Try slim UX" link from the full dashboard).
|
|
409
|
+
- `slim-ux-default` — slim becomes the root route; full moves to `/full`.
|
|
410
|
+
|
|
411
|
+
Round 1's behavior corresponds to `slim-ux: true && slim-ux-default: true`. Once slim is GA,
|
|
412
|
+
default flips on; `slim-ux` registry entry gets deleted as redundant.
|
|
413
|
+
|
|
414
|
+
**Cost.** ~0.5 day.
|
|
415
|
+
|
|
416
|
+
**Risk.** None. Simple gating.
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## 16. Surprises found during Phase A
|
|
421
|
+
|
|
422
|
+
A few things were genuinely surprising while building the concept dictionary. Capturing them so
|
|
423
|
+
the team can decide whether they're features or footguns:
|
|
424
|
+
|
|
425
|
+
**a. `notes.md` and the KB are *separate* memory stores.** Both feed agent prompts; they don't
|
|
426
|
+
share content. Consolidation either writes to `notes.md` (decisions / pinned-style) or
|
|
427
|
+
classifies into a KB category. There's no "this fact lives in both" path. **Probably fine; just
|
|
428
|
+
worth a docstring somewhere.**
|
|
429
|
+
|
|
430
|
+
**b. CC sessions are *non-expiring*.** The CC system intentionally never prunes sessions
|
|
431
|
+
(per `CLAUDE.md` line ~XYZ). So a long-lived tab keeps growing the prompt forever. The
|
|
432
|
+
prompt-hash-mismatch invalidator is the only safety net. **Consider a soft "session age" warning
|
|
433
|
+
in the UI when context approaches the cache limit.**
|
|
434
|
+
|
|
435
|
+
**c. Build-failure cache can go stale.** `_buildStatusStale` is honored by the auto-fix
|
|
436
|
+
dispatcher but not by the cockpit display. Slim's PR tile could show "1 failing build" when the
|
|
437
|
+
build is actually green-after-rerun. **Add `_buildStatusStale` consideration to the cockpit
|
|
438
|
+
display.**
|
|
439
|
+
|
|
440
|
+
**d. `dispatchCompletionReportPath` writes to `engine/completions/<dispatchId>.json` but the
|
|
441
|
+
oversized-prompt sidecar writes to `engine/contexts/<dispatchId>.json`.** Two directories with
|
|
442
|
+
similar purposes. The dashboard widget for "open completion report" sometimes hits the wrong
|
|
443
|
+
one if you copy-paste a path. **Either merge the directories or make the API explicit about
|
|
444
|
+
which kind of artifact you want.**
|
|
445
|
+
|
|
446
|
+
**e. `WORK_TYPE` has 13 entries but `routing.md` only routes ~7 of them.** The unrouted types
|
|
447
|
+
(`MEETING`, `EXPLORE`, `ASK`, …) are routed by other code paths. **Either consolidate routing
|
|
448
|
+
or document the gap clearly.**
|
|
449
|
+
|
|
450
|
+
---
|
|
451
|
+
|
|
452
|
+
## Suggested order of operations
|
|
453
|
+
|
|
454
|
+
If we were prioritizing for a Round 2 sprint, this is roughly the order I'd go:
|
|
455
|
+
|
|
456
|
+
1. **§1** `/api/cockpit` (½ day, immediate slim perf win)
|
|
457
|
+
2. **§13** Direct API calls for the four Action buttons (1 day, removes the TEMP layer)
|
|
458
|
+
3. **§3** `/api/events` + history feed (1.5 days, replaces three client-side merges)
|
|
459
|
+
4. **§6** Knowledge / Pinned UI consolidation (1 day, big mental-model simplification)
|
|
460
|
+
5. **§2** SSE for cockpit (½–1.5 days, removes the polling loop)
|
|
461
|
+
6. **§10** Origin prompt linkage on completions (1 day, makes "what prompted this?" answerable)
|
|
462
|
+
|
|
463
|
+
Total: ~6 days for a meaningful Round 2 that earns real shipping confidence.
|
|
464
|
+
|
|
465
|
+
The bigger model-merger questions (§4 Work Items + Pipelines, §5 Plan + PRD, §7 Schedule +
|
|
466
|
+
Watch) should wait until at least Round 3 — we should let the slim UX expose where the
|
|
467
|
+
*model* friction actually is, not assume it from the current UX friction.
|