@yemi33/minions 0.1.1574 → 0.1.1576

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,13 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.1.1576 (2026-04-27)
4
+
5
+ ### Fixes
6
+ - extend heartbeat for PowerShell/Monitor + teach run_in_background pattern (#1786) (#1787)
7
+
8
+ ### Other
9
+ - docs(SEC-07): RFC for completion.json agent control-plane protocol (#1785)
10
+
3
11
  ## 0.1.1574 (2026-04-27)
4
12
 
5
13
  ### Fixes
@@ -0,0 +1,420 @@
1
+ # RFC: `completion.json` — Structured Agent Control-Plane Protocol
2
+
3
+ > Author: Dallas (Engineer) | Date: 2026-04-27 | Status: **awaiting-approval**
4
+ > Plan: `minions-2026-04-27.json` | Plan item: `P-7a8b9c1d` (SEC-07 / C1)
5
+ > Note: this RFC lives in `docs/` rather than `plans/` because the repo's `.gitignore` excludes `plans/`. The task description explicitly allowed "or similar".
6
+
7
+ ## TL;DR
8
+
9
+ Replace stdout regex-scraping with a per-dispatch `completion.json` written by each agent into an engine-owned location. The engine reads that file post-run as the source of truth for PR links, completion status, failure class, review verdict, decomposition output, declared skills, and the learnings inbox path. The current ` ```completion ` fenced block (already parsed by `parseStructuredCompletion`) becomes the legacy fallback during a one-week dual-mode period and is then retired.
10
+
11
+ This closes a class of spoofable-stdout attacks where agent output (or content quoted by an agent — error logs, file reads, copied PR templates) accidentally contains literal completion sentinels, ` ```skill ` blocks, `VERDICT: APPROVE` strings, GitHub/ADO PR URLs, or `===ACTIONS===` markers, and is mis-treated as a control-plane signal.
12
+
13
+ **No engine code or playbook code is modified by this item.** This RFC is the design; implementation is a follow-up plan.
14
+
15
+ ---
16
+
17
+ ## 1. Problem
18
+
19
+ The engine reconstructs control-plane state from the unstructured stdout of `claude` CLI subprocesses. Eight call sites in `engine/lifecycle.js` and one in `engine/timeout.js` regex-match the agent's output stream:
20
+
21
+ | # | Site | What it scrapes | Spoof surface |
22
+ |---|------|-----------------|---------------|
23
+ | 1 | `syncPrsFromOutput` (`lifecycle.js:704`) | PR numbers + URLs from `tool_result`, assistant text, `result` events; regex on `github.com/.../pull/N`, `dev.azure.com/.../pullrequest/N`, "PR created" phrases | Any agent that quotes a URL (e.g. linking the PR being reviewed, copying a build log, citing an existing issue body) creates a phantom new-PR record |
24
+ | 2 | `parseReviewVerdict` (`lifecycle.js:883`) | `VERDICT: APPROVE` / `VERDICT: REQUEST_CHANGES` regex | An agent that quotes another verdict (e.g. "the previous reviewer wrote VERDICT: APPROVE but I disagree") flips review status |
25
+ | 3 | `isReviewBailout` (`lifecycle.js:907`) | `bailing out` / `already posted` regex | Any tool stderr or quoted error containing those phrases turns a real failure into a silent no-op |
26
+ | 4 | `handleDecompositionResult` (`lifecycle.js:1535`) | First ` ```json ` fenced block parsed for sub-items | An agent that includes a sample JSON block earlier in its reasoning poisons the parent → children mapping |
27
+ | 5 | `extractSkillsFromOutput` (`lifecycle.js:1278`) | All ` ```skill ` fenced blocks | Highest-value attack: agent reads a doc/inbox note containing a ` ```skill ` block and quotes it, the engine writes it to `~/.claude/skills/` |
28
+ | 6 | `parseStructuredCompletion` (`lifecycle.js:1494`) | Last ` ```completion ` fenced block, parsed as `key: value` | An agent that includes a ` ```completion ` block in a quoted file (e.g. another playbook) overrides its own real status |
29
+ | 7 | `classifyFailure` (`lifecycle.js:2096`) | Failure-class regexes on combined stdout/stderr (`max_turns`, `permission denied`, `merge conflict`, …) | An agent that quotes one error class while genuinely failing on another gets the wrong recovery recipe |
30
+ | 8 | `checkForLearnings` (`lifecycle.js:1266`) | Filesystem scan for `notes/inbox/*<agentId>*<date>*` | Not stdout-based, but date-collisions cause cross-task attribution |
31
+ | 9 | `checkTimeouts` (`engine/timeout.js:189-219`) | Tail of `live-output.log` for `"type":"result"` and `[process-exit]` markers — completion-via-output detection for hung dispatches | Lower-risk: this is the claude CLI's own output, not agent-authored content |
32
+
33
+ Sites 1–8 are agent-spoofable (intentionally or accidentally). Site 9 is claude-CLI-emitted and stays on stdout — see §6.
34
+
35
+ The current ` ```completion ` fenced block (Site 6) was a half-step toward structured completion: it gives a status field but is still parsed from stdout, still spoofable by quoted text, and only carries six string fields. `completion.json` is the full step.
36
+
37
+ ## 2. Goals & Non-Goals
38
+
39
+ **Goals.**
40
+ 1. Make every control-plane signal an *intentional* write by the agent to a known location, not a string match against stdout.
41
+ 2. Preserve all data lifecycle.js currently extracts (sites 1–8).
42
+ 3. Cross-platform — works on Windows, macOS, Linux without shell-quoting hazards.
43
+ 4. Migration with a dual-mode read window so already-running agents and queued dispatches don't fail.
44
+ 5. Zero new dependencies — file write + JSON parse, same toolbox as the rest of Minions.
45
+
46
+ **Non-goals.**
47
+ 1. Replacing `live-output.log` for liveness/heartbeat tracking. The CLI's own stream-json output is still the authoritative liveness signal (`"type":"result"`, `subtype:"success"` etc.) — see §6.
48
+ 2. Replacing `safeWrite`/`mutateJsonFileLocked` for engine state files. `completion.json` is one-shot, write-once, agent-authored — no concurrent writers.
49
+ 3. Hardening against a *malicious* agent. An attacker who controls the agent process could write any completion.json. The threat model is *accidental spoofing by quoted text* and *forward compatibility with structured tool outputs*.
50
+
51
+ ## 3. File Location & Write Protocol
52
+
53
+ ### 3.1 Location
54
+
55
+ `engine/tmp/completion-<dispatchId>.json` (absolute path injected via env var `MINIONS_COMPLETION_PATH`).
56
+
57
+ Why not `<worktree>/.minions-completion.json`?
58
+ - Read-only tasks (`explore`, `ask`, `meeting-*`, `plan-to-prd`) run with `cwd=rootDir` — writing into the user's repo would pollute the working tree and fight `.gitignore` per-project.
59
+ - A worktree-local path makes lifecycle bookkeeping race with worktree cleanup (worktrees are removed in `runPostCompletionHooks` itself).
60
+ - Engine-owned `engine/tmp/` is already gitignored (`.gitignore:30`), already used for prompts, PIDs, and sidecar files, and survives the worktree removal that happens later in the same hook.
61
+
62
+ Why per-dispatch ID? `dispatchId` is unique, monotonic, and already on the dispatch item — no collisions across concurrent agents on shared branches, and the engine cleans up `engine/tmp/` on tick 10 of `cleanup.js` so old completion files don't accumulate.
63
+
64
+ ### 3.2 Injection
65
+
66
+ The engine sets the env var pre-spawn, alongside the existing `MINIONS_ADO_TOKEN` injection in `engine.js:865`:
67
+
68
+ ```js
69
+ childEnv.MINIONS_COMPLETION_PATH = path.join(ENGINE_DIR, 'tmp', `completion-${dispatchId}.json`);
70
+ ```
71
+
72
+ Agents read `process.env.MINIONS_COMPLETION_PATH` directly — no template variable, no playbook substitution, no shell quoting. Cross-platform: Node, Python, bash, and PowerShell all read env vars natively.
73
+
74
+ ### 3.3 Write Protocol — Atomic Temp + Rename
75
+
76
+ ```js
77
+ // pseudocode every playbook executes before final exit
78
+ const tmp = process.env.MINIONS_COMPLETION_PATH + '.tmp';
79
+ fs.writeFileSync(tmp, JSON.stringify(completion, null, 2));
80
+ fs.renameSync(tmp, process.env.MINIONS_COMPLETION_PATH);
81
+ ```
82
+
83
+ `fs.renameSync` is atomic on POSIX and on NTFS for same-volume renames (which `engine/tmp/` always is). The engine never observes a partial file — either the rename has happened (full JSON) or it hasn't (engine falls through to legacy stdout parse).
84
+
85
+ The agent must not write the file in pieces. Empty, truncated, or malformed JSON triggers fallback to the legacy stdout parser during dual-mode (§5) and a hard failure post-flip.
86
+
87
+ ### 3.4 Cleanup
88
+
89
+ `engine/cleanup.js` (every 10 ticks) gains a sweep over `engine/tmp/completion-*.json` older than 24h. The `runPostCompletionHooks` flow already removes the worktree but leaves `engine/tmp/` files for diagnostics — completion files are tiny (<10 KB typical) so a 24h window is generous and matches the existing temp-prompt retention.
90
+
91
+ ## 4. Schema
92
+
93
+ ### 4.1 Top-Level Object
94
+
95
+ ```jsonc
96
+ {
97
+ "schemaVersion": 1, // bump on breaking schema changes
98
+ "dispatchId": "dallas-implement-mohs8s8r7dy6",
99
+ "agentId": "dallas",
100
+ "writtenAt": "2026-04-27T22:42:00.000Z",
101
+
102
+ // ── Always required ──────────────────────────────────────────────────────
103
+ "status": "done", // see §4.2
104
+ "summary": "Added /api/bot endpoint and wired Teams inbox.", // ≤500 chars
105
+ "filesChanged": ["engine/teams.js", "dashboard.js"], // optional, hint only
106
+
107
+ // ── PR control plane (replaces sites 1, 4 for fix/implement/verify) ─────
108
+ "prs": [
109
+ {
110
+ "number": 1234,
111
+ "url": "https://github.com/yemi33/minions/pull/1234",
112
+ "branch": "feat/P-7a8b9c1d-rfc-completion-json",
113
+ "title": "feat: RFC for completion.json control-plane",
114
+ "host": "github", // "github" | "ado"
115
+ "action": "created" // "created" | "updated" | "linked"
116
+ }
117
+ ],
118
+
119
+ // ── Review control plane (replaces sites 2, 3 for review tasks) ─────────
120
+ "review": {
121
+ "verdict": "approve", // "approve" | "request-changes" | "bail"
122
+ "bailReason": null, // string when verdict==="bail"
123
+ "comments": [] // optional inline comments [{file,line,body}]
124
+ },
125
+
126
+ // ── Decomposition (replaces site 4 for decompose tasks) ─────────────────
127
+ "decomposition": {
128
+ "subItems": [
129
+ {
130
+ "id": "P-7a8b9c1d-1",
131
+ "title": "...",
132
+ "type": "implement", // "implement" | "implement:large"
133
+ "estimated_complexity": "medium",
134
+ "depends_on": [],
135
+ "acceptance_criteria": [],
136
+ "scope_boundaries": []
137
+ }
138
+ ]
139
+ },
140
+
141
+ // ── Failure classification (replaces site 7 when status !== "done") ─────
142
+ "failure": {
143
+ "class": "build-failure", // FAILURE_CLASS value from shared.js
144
+ "reason": "npm test exited 1 — 3 failing in test/unit.test.js",
145
+ "details": "..." // optional verbose context (≤2000 chars)
146
+ },
147
+
148
+ // ── Learnings (replaces site 8) ─────────────────────────────────────────
149
+ "learnings": {
150
+ "inboxFile": "notes/inbox/dallas-P-7a8b9c1d-2026-04-27-2242.md"
151
+ },
152
+
153
+ // ── Skills (replaces site 5) ────────────────────────────────────────────
154
+ // Replaces ```skill fenced-block scraping. Each entry is a fully formed
155
+ // skill manifest. The engine never re-parses agent prose for skills.
156
+ "skills": [
157
+ {
158
+ "name": "skill-name-here",
159
+ "description": "When to trigger",
160
+ "scope": "minions", // "minions" | "project"
161
+ "project": null, // string when scope==="project"
162
+ "body": "---\nname: skill-name-here\n---\n\n# Title\n..."
163
+ }
164
+ ],
165
+
166
+ // ── Optional checks (build-and-test playbook + verify) ──────────────────
167
+ "checks": {
168
+ "build": "pass", // "pass" | "fail" | "skipped" | "n/a"
169
+ "tests": "pass",
170
+ "lint": "pass"
171
+ },
172
+
173
+ // ── Meeting output (replaces collectMeetingFindings text scrape) ────────
174
+ // Only set by meeting-investigate / meeting-debate / meeting-conclude.
175
+ "meeting": {
176
+ "round": "investigate", // "investigate" | "debate" | "conclude"
177
+ "content": "<full markdown content>"
178
+ }
179
+ }
180
+ ```
181
+
182
+ ### 4.2 `status` Values
183
+
184
+ | Value | When | Engine action |
185
+ |-------|------|---------------|
186
+ | `done` | Work complete; PR pushed (if applicable) | Mark WI `done`, sync PRD |
187
+ | `partial` | Some progress; agent ran out of turns or hit a known stop point | Auto-retry per `RECOVERY_RECIPES` (`engine/recovery.js`) |
188
+ | `failed` | Hard failure; no recovery attempted by agent | Use `failure.class` to pick recipe |
189
+ | `noop` | Idempotent bail (review already posted, plan already shipped, etc.) | Mark WI `done` without retry, no failure metric |
190
+ | `needs-review` | Agent could not classify; flag for human | Set WI `needs-human-review` |
191
+
192
+ `noop` collapses the current `isReviewBailout` (lifecycle.js:907), the `verify-plan-already-shipped` family of skills, and the "shared-branch redispatch" skill into a single explicit signal. Any agent that detects "the work is already done" returns `status: "noop"` and a one-line `summary` — the engine takes the success path without retry.
193
+
194
+ ### 4.3 Cardinality & Required Fields by Task Type
195
+
196
+ | Task type | Required | Forbidden |
197
+ |-----------|----------|-----------|
198
+ | `implement`, `implement:large` | `status`, `summary`, `prs[]` (≥1 if status===done unless `noop`) | `review`, `decomposition`, `meeting` |
199
+ | `fix` | `status`, `summary`, `prs[]` (≥1 if status===done unless `noop`) | `decomposition`, `meeting` |
200
+ | `review` | `status`, `summary`, `review.verdict` | `decomposition`, `meeting` |
201
+ | `decompose` | `status`, `summary`, `decomposition.subItems` (≥1 if status===done) | `prs`, `review`, `meeting` |
202
+ | `verify` | `status`, `summary`, `prs[]`, `checks` | `review`, `decomposition`, `meeting` |
203
+ | `meeting-*` | `status`, `summary`, `meeting.round`, `meeting.content` | `prs`, `review`, `decomposition` |
204
+ | `plan-to-prd` | `status`, `summary` (PRD file existence is checked separately, see lifecycle.js:1721) | `review`, `decomposition`, `meeting` |
205
+ | `explore`, `ask`, `test`, `docs` | `status`, `summary` | `prs`, `review`, `decomposition`, `meeting` |
206
+
207
+ Validation lives in a new `validateCompletion(obj, taskType)` in `engine/shared.js` and runs in `runPostCompletionHooks` *before* any field is consumed.
208
+
209
+ ## 5. Engine Read Path & Migration
210
+
211
+ ### 5.1 New Helper
212
+
213
+ ```js
214
+ // engine/lifecycle.js (new — replaces parseStructuredCompletion as primary)
215
+ function readCompletionFile(dispatchItem) {
216
+ const p = path.join(ENGINE_DIR, 'tmp', `completion-${dispatchItem.id}.json`);
217
+ if (!fs.existsSync(p)) return null;
218
+ try {
219
+ const raw = fs.readFileSync(p, 'utf8');
220
+ const obj = JSON.parse(raw);
221
+ const valid = validateCompletion(obj, dispatchItem.type);
222
+ if (!valid.ok) {
223
+ log('warn', `completion.json for ${dispatchItem.id}: ${valid.reason} — falling back to stdout parse`);
224
+ return null;
225
+ }
226
+ return obj;
227
+ } catch (err) {
228
+ log('warn', `completion.json read failed for ${dispatchItem.id}: ${err.message}`);
229
+ return null;
230
+ }
231
+ }
232
+ ```
233
+
234
+ `runPostCompletionHooks` calls `readCompletionFile` *once* at entry and threads the resulting object through the existing call sites:
235
+
236
+ | Old call (regex on stdout) | New call (read from completion) | Fallback |
237
+ |----------------------------|----------------------------------|----------|
238
+ | `syncPrsFromOutput(stdout)` | `syncPrsFromCompletion(completion.prs)` | If `completion === null`, call old `syncPrsFromOutput`. **Never call both** — duplicate-PR detection on `id`/`url` already exists at `lifecycle.js:833` and would block, but the warn log noise is unwanted. |
239
+ | `parseReviewVerdict(text)` | `completion?.review?.verdict` | Old regex |
240
+ | `isReviewBailout(text)` | `completion?.review?.verdict === 'bail'` or `completion?.status === 'noop'` | Old regex |
241
+ | `handleDecompositionResult(stdout)` | `completion?.decomposition?.subItems` | Old regex on first ` ```json ` block |
242
+ | `extractSkillsFromOutput(stdout)` | `completion?.skills` | Old ` ```skill ` regex (still needed for inbox-file skill scan at `lifecycle.js:2017` — see §5.4) |
243
+ | `classifyFailure(code, stdout, stderr)` | `completion?.failure?.class` if present and valid `FAILURE_CLASS` | Old regex chain |
244
+ | `checkForLearnings(agentId)` | `fs.existsSync(completion.learnings.inboxFile)` | Old date+agent file scan |
245
+ | `parseStructuredCompletion(stdout)` | Subsumed entirely | Kept as deprecated shim during dual-mode (see §5.3) |
246
+ | `collectMeetingFindings(output)` | `completion?.meeting?.content` | Old `parseStreamJsonOutput` text |
247
+
248
+ ### 5.2 Single Source of Truth — Conflict Resolution
249
+
250
+ When `completion.json` is present and validates: **completion.json wins, no fallback merging.** The engine logs a warning if stdout regex would have produced a different signal than completion.json (e.g., regex finds a PR URL but `completion.prs` is empty), but does not act on it. Mixing sources defeats the security goal.
251
+
252
+ When `completion.json` is absent or invalid: full fallback to stdout regex on every site, identical behavior to today.
253
+
254
+ ### 5.3 Phased Migration
255
+
256
+ | Phase | Window | Behavior | Flip criterion |
257
+ |-------|--------|----------|----------------|
258
+ | **0. Preparation** (no flag) | Day 0 | Engine writes `MINIONS_COMPLETION_PATH` env var. Engine reads completion.json *opportunistically* (uses it when present, falls back to regex when absent). Playbooks updated to write the file. `parseStructuredCompletion`'s ` ```completion ` block continues to be parsed and merged with `completion.json` during this phase only — agents who upgrade slowly still work. | — |
259
+ | **1. Dual-mode** | Day 0 → Day 7 | Same as Phase 0, plus new metric `_engine.completionFile.{parsed,fallback,invalid}` per agent in `metrics.json`. Daily KB sweep posts a digest of fallback rates. | ≥95% of dispatches in the last 24h produce a parseable completion.json |
260
+ | **2. Strict** (gated by `engine.requireCompletionFile = false` → `true`) | Day 7 → Day 10 | When the flag is `true`, missing/invalid completion.json marks the dispatch `failed` with `failure.class = 'config-error'` (no retry, see `RECOVERY_RECIPES`). Default still `false`. | All permanent agents observed clean for 3 consecutive days |
261
+ | **3. Default flip** | Day 10 | `engine.requireCompletionFile` default becomes `true`. Stdout regex parsers (`syncPrsFromOutput`, `parseReviewVerdict`, etc.) become deprecated shims, registered in `docs/deprecated.json` with a `cleanup` date 3 days out (per the existing `/cleanup-deprecated` skill convention). | — |
262
+ | **4. Removal** | Day 13 | Stdout regex parsers deleted; ` ```completion ` block support removed. Only `completion.json` is read. | — |
263
+
264
+ Day 0 is the day the implementation PR merges, not the day this RFC is approved.
265
+
266
+ The flag name `engine.requireCompletionFile` mirrors existing engine flags (`autoFixBuilds`, `evalLoop`, `adoPollEnabled`).
267
+
268
+ ### 5.4 What Does *Not* Switch
269
+
270
+ These paths stay on stdout / live-output.log:
271
+
272
+ 1. **`engine/timeout.js` completion-via-output detection** (`timeout.js:189-219`). The signal there is the claude CLI's own `"type":"result"` event, emitted by the binary even if the agent crashed before writing completion.json. Removing it would mean orphan/hung agents are never reaped. This stays as the heartbeat mechanism.
273
+ 2. **Per-tick liveness via `live-output.log` mtime** (`timeout.js:178`). Same reason — completion.json is written once at exit, not as a heartbeat.
274
+ 3. **`parseStreamJsonOutput` for `resultSummary`** in `parseAgentOutput` (`lifecycle.js:1483`). This extracts the human-readable summary from the CLI's stream-json. Even after the flip, `completion.summary` is *also* extracted, but the stream-json text remains the canonical "what did the agent say last" — used in dashboards, agent history, Teams notifications. The two coexist: `completion.summary` is for routing decisions, the stream-json text is for display.
275
+ 4. **Inbox-file skill scan** (`lifecycle.js:2013-2024`). Some agents write skills into their inbox findings file (a deliberate human-discoverable artifact). The completion file deprecates inline ` ```skill ` blocks in stdout, but the inbox file scan is opt-in and stays — it's a different surface (a real file the agent intentionally wrote, not regex-scraped from stdout).
276
+
277
+ ### 5.5 Backward Compatibility
278
+
279
+ - Agents that fail to write completion.json during Phase 1–2 silently fall back. Phase 3 fails them, but the deprecation tracker (`docs/deprecated.json`) flags this 3 days in advance and the daily fallback-rate digest gives operators visibility.
280
+ - The legacy ` ```completion ` block in `playbooks/fix.md` and `playbooks/implement-shared.md` is removed in Phase 0 (replaced by completion.json instructions). The `parseStructuredCompletion` reader is removed in Phase 4.
281
+ - `docs/deprecated.json` entries (added in the implementation PR, not this RFC):
282
+
283
+ ```json
284
+ {
285
+ "id": "stdout-regex-pr-scrape",
286
+ "summary": "syncPrsFromOutput stdout regex superseded by completion.json prs[]",
287
+ "deprecated": "Day 10",
288
+ "reason": "Spoofable surface — agents that quote PR URLs created phantom PR records.",
289
+ "locations": [
290
+ "engine/lifecycle.js:704 syncPrsFromOutput",
291
+ "engine/lifecycle.js:734 createdPattern, createdIdPattern",
292
+ "engine/lifecycle.js:748 prHeaderPattern (inbox scan)"
293
+ ],
294
+ "cleanup": "Delete syncPrsFromOutput; keep syncPrsFromCompletion."
295
+ }
296
+ ```
297
+
298
+ (One entry per site 1–7 above; site 8 is unaffected.)
299
+
300
+ ## 6. Threat Model & What's Closed
301
+
302
+ | Spoof vector | Closed? | How |
303
+ |--------------|---------|-----|
304
+ | Agent quotes a PR URL it didn't create → phantom PR record | ✅ | `prs[]` is an explicit list; quoted URLs in prose are ignored |
305
+ | Agent quotes "VERDICT: APPROVE" while disagreeing → flipped review | ✅ | `review.verdict` is explicit |
306
+ | Tool stderr contains "bailing out" → real failure becomes silent no-op | ✅ | `noop` is opt-in |
307
+ | Agent reads a doc with a ` ```skill ` block and quotes it → arbitrary skill written to `~/.claude/skills/` | ✅ | `skills[]` is explicit; ` ```skill ` regex removed in Phase 4 |
308
+ | Agent quotes a previous ` ```completion ` block → wrong status | ✅ | ` ```completion ` parser removed in Phase 4 |
309
+ | Agent quotes one error class while failing on another → wrong recovery recipe | ✅ | `failure.class` is explicit; if missing or invalid, falls through to `FAILURE_CLASS.UNKNOWN` (safe default) |
310
+ | Decompose agent emits a ` ```json ` block earlier in reasoning → corrupted children | ✅ | `decomposition.subItems` is explicit |
311
+ | Hung/orphaned agent never reaches the write site → no completion.json | ⚠️ | Engine's existing live-output.log heartbeat reaper (`timeout.js`) catches this; dispatch is marked failed via stdout completion-via-output signal |
312
+ | Malicious agent writes a fake completion.json (e.g. claims `noop` to avoid retry) | ❌ | Out of scope — see §2 non-goals. An adversarial agent owns its own write path regardless. |
313
+
314
+ The key shift: **the agent's intent is now in a place no quoted text can reach.** Stdout becomes display-only.
315
+
316
+ ## 7. Playbook Changes
317
+
318
+ ### 7.1 Centralized Instruction in `shared-rules.md`
319
+
320
+ `playbooks/shared-rules.md` is auto-injected into every playbook (per `engine/playbook.js`). The completion-write block lives there once, so per-playbook diffs are minimal.
321
+
322
+ ```markdown
323
+ ## Completion Protocol — Required Before Exit
324
+
325
+ Before your final message, write a JSON object to the absolute path in the
326
+ `MINIONS_COMPLETION_PATH` environment variable. The engine reads this file
327
+ as the source of truth — fields not declared here are NOT detected even if
328
+ they appear in your stdout.
329
+
330
+ Schema reference: docs/rfc-completion-json.md §4.
331
+
332
+ Required for every task:
333
+ status, summary
334
+
335
+ Required for your task type (see §4.3):
336
+ - implement / implement:large / fix / verify → prs[]
337
+ - review → review.verdict
338
+ - decompose → decomposition.subItems
339
+ - meeting-* → meeting.round, meeting.content
340
+
341
+ Write atomically — temp file + rename:
342
+
343
+ // From a Bash tool call:
344
+ cat > "$MINIONS_COMPLETION_PATH.tmp" <<'JSON'
345
+ { "schemaVersion": 1, "status": "done", "summary": "...", "prs": [ ... ] }
346
+ JSON
347
+ mv "$MINIONS_COMPLETION_PATH.tmp" "$MINIONS_COMPLETION_PATH"
348
+
349
+ // From PowerShell:
350
+ $json | Out-File -Encoding UTF8 "$env:MINIONS_COMPLETION_PATH.tmp"
351
+ Move-Item "$env:MINIONS_COMPLETION_PATH.tmp" $env:MINIONS_COMPLETION_PATH -Force
352
+
353
+ If you cannot write completion.json (e.g. you bailed before any work), the
354
+ engine falls back to stdout parsing during the dual-mode period. After the
355
+ flip date, missing/invalid completion.json marks your dispatch failed.
356
+
357
+ Do NOT include sensitive data (tokens, API keys) — completion.json is read
358
+ by the engine and may surface in dashboard views and Teams notifications.
359
+ ```
360
+
361
+ ### 7.2 Per-Playbook Removals
362
+
363
+ The current ` ```completion ` block in `playbooks/fix.md:85-93` and `playbooks/implement-shared.md:86-93` is removed in Phase 0 (the new shared-rules block supersedes it).
364
+
365
+ `playbooks/decompose.md` already has a dedicated ` ```json ` block instruction; it is replaced with a one-liner that references the `decomposition.subItems` field of completion.json.
366
+
367
+ `playbooks/review.md` already documents `VERDICT: APPROVE`/`REQUEST_CHANGES`. Phase 0 keeps the human-readable verdict in stdout (for inline dashboard display) AND requires `review.verdict` in completion.json. Phase 4 makes completion.json the only source.
368
+
369
+ `playbooks/meeting-investigate.md`, `meeting-debate.md`, `meeting-conclude.md` add `meeting.round` + `meeting.content` to completion.json. The transcript inbox write at `meeting.js:365` continues to use the same content.
370
+
371
+ ### 7.3 No-PR Tasks
372
+
373
+ `explore`, `ask`, `test`, `docs`, `plan-to-prd`, and the read-only legs of `meeting-*` simply omit `prs[]`. They still write completion.json with `status` + `summary`. This makes "I had nothing to push" an explicit signal instead of inferred from "no PR URL found in stdout" (which today triggers the auto-retry-then-needs-review chain at `lifecycle.js:1943-1984`).
374
+
375
+ ## 8. Validation & Testing
376
+
377
+ The implementation PR ships:
378
+
379
+ 1. **Unit tests** in `test/unit.test.js`:
380
+ - `validateCompletion` accepts valid shapes for each task type.
381
+ - Rejects missing required fields, wrong cardinality, unknown enum values.
382
+ - Rejects payloads >256 KB (DoS guard).
383
+ 2. **Behavioral tests**:
384
+ - Stub `engine/tmp/completion-<id>.json` with each task-type fixture; assert `runPostCompletionHooks` updates work-items.json, pull-requests.json, and PRD JSON consistently.
385
+ - Stub *no* completion.json + the same stdout the regex path expects; assert identical end state (regression gate during Phase 0–2).
386
+ - Stub an invalid JSON; assert fallback to regex path with a `warn` log entry.
387
+ 3. **Migration regression**: replay 50 random completed dispatches from `engine/dispatch.json` history, assert that the regex path and the (synthetic) completion.json path produce the same `WI_STATUS` and same set of PR records.
388
+
389
+ ## 9. Open Questions
390
+
391
+ 1. **Schema versioning & forward-compat.** `schemaVersion: 1` is declared but no upgrade story. Recommend: add a `validateCompletion(obj, taskType, { strict: false })` mode where unknown top-level fields are tolerated with a warn log, so v2 fields don't break v1 readers.
392
+ 2. **Multi-PR dispatches** (e.g. cross-repo features). The `prs[]` array supports this natively. The current regex path *also* supports this but mis-attributes when URLs from multiple repos appear in close proximity. completion.json fixes that for free.
393
+ 3. **Resumed dispatches.** If an agent is killed mid-task and resumed via `--resume` (engine.js:1104), and the resumed run writes a *new* completion.json, the rename clobbers the original. This is the intended behavior — only the final run's completion is read — but worth a note in the implementation PR.
394
+ 4. **CC and doc-chat**. Command-Center and doc-chat use the `direct: true` LLM path that bypasses spawn-agent.js. They don't use dispatch IDs the same way. **Recommendation: out of scope.** CC's ` ===ACTIONS=== ` block lives in dashboard.js and operates on a different threat model (user-typed prompts, not agent-quoted text). Don't bundle it.
395
+
396
+ ## 10. Acceptance Criteria for the Implementation Plan
397
+
398
+ When this RFC is approved and the follow-up plan is drafted, the implementation must satisfy:
399
+
400
+ 1. `MINIONS_COMPLETION_PATH` is set on every spawned agent process (`engine.js` and `engine/spawn-agent.js`).
401
+ 2. `engine/shared.js` exports `validateCompletion(obj, taskType)`.
402
+ 3. `engine/lifecycle.js` adds `readCompletionFile(dispatchItem)` and threads its return value through `runPostCompletionHooks`.
403
+ 4. Every regex-based site listed in §1 has a completion-aware path with stdout fallback (Phase 0–2) or no fallback (Phase 4).
404
+ 5. `playbooks/shared-rules.md` carries the §7.1 instruction; `fix.md` / `implement-shared.md` drop their inline ` ```completion ` block.
405
+ 6. `engine.requireCompletionFile` is added to `ENGINE_DEFAULTS` (`shared.js`) defaulting to `false`.
406
+ 7. Metrics: `_engine.completionFile.{parsed,fallback,invalid}` counters added per agent.
407
+ 8. Tests per §8 land green; `npm test` passes with 0 new failures.
408
+ 9. `docs/deprecated.json` gains entries for each retired regex site, with the cleanup date set 3 days after the Phase 4 flip.
409
+
410
+ ## 11. References
411
+
412
+ - `engine/lifecycle.js:1494` — current `parseStructuredCompletion` (legacy ` ```completion ` block reader, half-step toward this design).
413
+ - `engine/lifecycle.js:704` — `syncPrsFromOutput` (Site 1, primary attack surface).
414
+ - `engine/lifecycle.js:1278` — `extractSkillsFromOutput` (Site 5, highest-value attack surface).
415
+ - `engine/timeout.js:189` — completion-via-output detection (stays on stdout per §6).
416
+ - `engine.js:865` — existing env-var injection pattern (`MINIONS_ADO_TOKEN`).
417
+ - `engine/cleanup.js` — temp-file sweep (extended in implementation PR).
418
+ - `playbooks/shared-rules.md` — auto-injected into every playbook.
419
+ - `docs/deprecated.json` — deprecation tracker driving `/cleanup-deprecated`.
420
+ - `docs/design-state-storage.md` — precedent for design-doc-in-`docs/` rather than `plans/`.
package/engine/timeout.js CHANGED
@@ -273,6 +273,26 @@ function checkTimeouts(config) {
273
273
  isBlocking = true;
274
274
  blockingTool = 'Bash';
275
275
  }
276
+ // PowerShell tool call — Windows-native shell with same explicit-timeout
277
+ // semantics as Bash (input.timeout, max 600s). Required for projects that
278
+ // build via PowerShell on Windows (gradlew.bat, MSBuild, dotnet test) where
279
+ // the cold-start phase produces no stdout for several minutes (#1786).
280
+ if (name === 'PowerShell') {
281
+ const psTimeout = input.timeout || 120000;
282
+ blockingTimeout = Math.max(itemHeartbeat, psTimeout + 60000);
283
+ isBlocking = true;
284
+ blockingTool = 'PowerShell';
285
+ }
286
+ // Monitor tool call — blocks waiting for stdout-line notifications from a
287
+ // background process started via Bash with run_in_background. Between
288
+ // notifications the call produces no output, so the heartbeat monitor
289
+ // must extend timeout. No fixed timeout on Monitor — match Agent (30min)
290
+ // since both are inherently long-running waits (#1786).
291
+ if (name === 'Monitor') {
292
+ blockingTimeout = Math.max(itemHeartbeat, 1800000); // 30min for background process waits
293
+ isBlocking = true;
294
+ blockingTool = 'Monitor';
295
+ }
276
296
  // Agent (subagent) tool call — parent waits silently for child to complete
277
297
  if (name === 'Agent') {
278
298
  blockingTimeout = Math.max(itemHeartbeat, 1800000); // 30min for subagents
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yemi33/minions",
3
- "version": "0.1.1574",
3
+ "version": "0.1.1576",
4
4
  "description": "Multi-agent AI dev team that runs from ~/.minions/ — five autonomous agents share a single engine, dashboard, and knowledge base",
5
5
  "bin": {
6
6
  "minions": "bin/minions.js"
@@ -43,6 +43,8 @@ cargo build
43
43
 
44
44
  If the build **fails**, report the errors clearly and stop. Do NOT attempt to fix the code.
45
45
 
46
+ > ⚠️ **Cold builds are silent for minutes** (Gradle daemon spin-up, dotnet restore, fresh `npm install`). Run them via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` on the Bash call (max 600000 ms). Without one of these, the heartbeat monitor will kill the agent at ~5 min of silence. See **Long-Running Build / Test Commands** below.
47
+
46
48
  ### 4. Run tests
47
49
 
48
50
  ```bash
package/playbooks/fix.md CHANGED
@@ -47,6 +47,8 @@ Before pushing, verify the fix doesn't break anything:
47
47
  4. If the build fails 3 times, report the errors in your PR comment and stop.
48
48
  5. Do NOT push code that breaks existing tests or the build.
49
49
 
50
+ > ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
51
+
50
52
  ## Push & Comment on PR
51
53
 
52
54
  Only after build and tests pass:
@@ -65,6 +65,8 @@ After implementation, verify everything works:
65
65
  5. If the build fails 3 times, report the errors in your findings and stop
66
66
  6. Do NOT push code with a broken build or failing tests that you introduced
67
67
 
68
+ > ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
69
+
68
70
  ## Push
69
71
 
70
72
  Only after build and tests pass:
@@ -59,6 +59,8 @@ Build and test before pushing:
59
59
  4. **Run any other checks** the repo defines (linting, type checking, formatting).
60
60
  5. Do NOT push code with a broken build or failing tests that you introduced.
61
61
 
62
+ > ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
63
+
62
64
  ## Push & Create PR
63
65
 
64
66
  Only after build and tests pass:
@@ -49,6 +49,37 @@ Your context window may be compacted or summarized mid-task by Claude's automati
49
49
  Do **not** create a skill for one-off bug fixes, isolated command outputs, obvious repo facts, or anything already covered by existing docs/playbooks/skills.
50
50
  - Do TDD where it makes sense — write failing tests first, then implement, then verify tests pass. Especially for bug fixes (write a test that reproduces the bug) and new utility functions.
51
51
 
52
+ ## Long-Running Build / Test Commands (Heartbeat Safety)
53
+
54
+ The engine kills agents that produce no stdout for `heartbeatTimeout` (default **300s / 5 min**). A blocking shell call with zero stdout for that long is treated as hung. Cold builds (Gradle daemon spin-up, MSBuild restore, fresh `npm install`, `cargo build`) routinely exceed this with no intermediate output.
55
+
56
+ **Two approved patterns — pick one when a command may exceed 4 minutes of silence:**
57
+
58
+ ### Pattern A — Stream output via background process + Monitor (preferred)
59
+
60
+ ```
61
+ 1. Bash({ command: "./gradlew test", run_in_background: true }) # returns a bash_id immediately
62
+ 2. Monitor({ bash_id: "<id>" }) # streams each stdout line as a notification
63
+ ```
64
+
65
+ Why: each line that the build emits arrives as a notification, which resets the heartbeat. You see live progress in the dashboard. The Monitor call itself is recognised by the engine as a blocking tool (heartbeat extended ~30 min).
66
+
67
+ ### Pattern B — Single Bash call with explicit `timeout`
68
+
69
+ ```
70
+ Bash({ command: "./gradlew test", timeout: 600000 }) # max 600000 = 10 min
71
+ ```
72
+
73
+ The engine reads `input.timeout` from the tool call and extends the heartbeat to `timeout + 60s` for that turn. **The extension is opt-in** — without an explicit `timeout`, the agent is killed at `heartbeatTimeout`. PowerShell tool follows the same rule.
74
+
75
+ ### What NOT to do
76
+
77
+ - Do NOT run `./gradlew`, `mvn`, `dotnet test`, or any cold-cache build as a default `Bash` call (no `timeout`, no `run_in_background`). It will hit the 120s Bash default, then the 300s heartbeat, and the engine will kill you.
78
+ - Do NOT loop `sleep` to "wait it out" — sleep produces no stdout and looks identical to a hang.
79
+ - Do NOT pipe through `tee` thinking that helps — heartbeat reads agent stdout, not the underlying file.
80
+
81
+ If you don't know how long a command takes, default to **Pattern A** — there is no downside to streaming.
82
+
52
83
  ## Checking PR and Build Status
53
84
 
54
85
  When asked to check build status, CI results, or review state for a PR:
@@ -58,6 +58,8 @@ For each project worktree:
58
58
 
59
59
  If a build or test fails, **do NOT fix it** — report the exact error and continue with other projects.
60
60
 
61
+ > ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
62
+
61
63
  ## Step 4: Start the Application (if applicable)
62
64
 
63
65
  Determine if the project has a **runnable application** (web server, API, desktop app, mobile emulator, etc.) by reading its documentation and build config. For mobile apps, check if an emulator/simulator can be launched or if building an APK/IPA is the appropriate verification step.