npm - @yemi33/minions - Versions diffs - 0.1.1633 → 0.1.1635 - Mend

@yemi33/minions 0.1.1633 → 0.1.1635

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +10 -0
package/README.md +11 -11
package/dashboard.js +46 -0
package/docs/auto-discovery.md +17 -15
package/docs/blog-first-successful-dispatch.md +7 -10
package/docs/engine-restart.md +8 -11
package/docs/human-vs-automated.md +3 -4
package/docs/pr-review-fix-loop.md +1 -1
package/docs/rfc-completion-json.md +5 -5
package/engine/copilot-models.json +1 -1
package/engine/lifecycle.js +1 -1
package/engine/playbook.js +2 -1
package/engine/queries.js +4 -4
package/engine/shared.js +4 -12
package/engine/timeout.js +59 -168
package/engine.js +11 -42
package/package.json +1 -1
package/playbooks/build-and-test.md +22 -139
package/playbooks/docs.md +113 -0
package/playbooks/fix.md +1 -1
package/playbooks/implement-shared.md +1 -1
package/playbooks/implement.md +3 -7
package/playbooks/shared-rules.md +4 -45
package/playbooks/test.md +17 -40
package/playbooks/verify.md +29 -141
package/playbooks/work-item.md +1 -0
package/prompts/cc-system.md +2 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,15 @@
 # Changelog
+## 0.1.1635 (2026-04-30)
+### Other
+- Make agent liveness process-based
+## 0.1.1634 (2026-04-30)
+### Features
+- build-and-test CC action + docs playbook
 ## 0.1.1633 (2026-04-30)
 ### Features

package/README.md CHANGED Viewed

@@ -227,7 +227,7 @@ You can also run scripts directly: `node ~/.minions/engine.js start`, `node ~/.m
 - **Pipelines** — multi-stage workflows chaining tasks, meetings, plans, and more. Cron triggers or manual. Artifacts flow between stages.
 - **Eval loop** — after implementation, auto-dispatches review → fix cycles (configurable iterations and cost ceiling per work item)
 - **Pinned notes** — critical context pinned to all agent prompts via `pinned.md`
-- **Heartbeat monitoring** — detects dead/hung agents via output file activity, not just timeouts
+- **Process-based liveness** — live agents may be quiet; output staleness is only used for orphan cleanup after process tracking is lost
 - **Auto-cleanup** — stale temp files, orphaned worktrees, zombie processes cleaned every 10 minutes
 ## Dashboard
@@ -403,7 +403,7 @@ No bash or shell involved — Node spawns Node directly. Dependency branches are
 - **MCP servers** — inherited from `~/.claude.json` (no extra config needed)
 - **Full tool access** — all built-in tools plus all MCP tools
 - **Permission mode** — `bypassPermissions` (no interactive prompts)
-- **Output format** — `stream-json` (real-time streaming for live dashboard + heartbeat)
+- **Output format** — `stream-json` (real-time streaming for live dashboard + completion recovery)
 ### Post-Completion
@@ -462,15 +462,15 @@ Playbooks are fully customizable — edit the shared templates in `playbooks/` t
 ## Health Monitoring
-### Heartbeat Check (every tick)
+### Liveness Check (every tick)
-Uses `live-output.log` file modification time as a heartbeat:
-- **Process alive + recent output** → healthy, keep running
-- **Process alive + in blocking tool call** → extended timeout (matches tool's timeout + grace period)
-- **Process alive + silent >5min** → hung, kill and mark failed
-- **No process + silent >5min** → orphaned (engine restarted), mark failed
+Agent liveness mirrors a normal CLI process:
+- **Tracked process alive** → keep running, even if stdout/stderr are quiet
+- **Tracked process exceeds `agentTimeout`** → stop and mark timed out
+- **Tracked process exits** → handle normal completion/failure
+- **No tracked process + stale output** → treat as an orphan from engine restart/process loss and mark failed
-Agents can run for hours as long as they're producing output. The `heartbeatTimeout` (default 5min) only triggers on silence. When an agent is in a blocking tool call (e.g., `TaskOutput` with `block:true`, `Bash` with long timeout), the engine detects this from the live output and extends the timeout automatically.
+Builds, dependency installs, tests, and other CLI commands can legitimately produce no output for long periods. The engine does not infer "hung" from stdout/stderr silence while it still has a live process handle. `heartbeatTimeout` is only the stale-orphan grace window used when the engine has lost process tracking.
 ### Automated Cleanup (every 10 ticks)
@@ -532,7 +532,7 @@ Engine behavior is controlled via `config.json`. Key settings:
 | `tickInterval` | 60000 (1min) | Milliseconds between engine ticks |
 | `maxConcurrent` | 5 | Max agents running simultaneously |
 | `agentTimeout` | 18000000 (5h) | Max total agent runtime |
-| `heartbeatTimeout` | 300000 (5min) | Kill agents silent longer than this |
+| `heartbeatTimeout` | 300000 (5min) | Stale-orphan grace after process tracking is lost |
 | `maxTurns` | 100 | Max Claude CLI turns per agent session |
 | `inboxConsolidateThreshold` | 5 | Inbox files needed before consolidation |
 | `worktreeCreateTimeout` | 300000 (5min) | Timeout for each `git worktree add` attempt |
@@ -649,7 +649,7 @@ To move to a new machine: `npm install -g @yemi33/minions && minions init --forc
     pipeline.js          <- Multi-stage pipeline orchestration
     meeting.js           <- Meeting creation, rounds, conclusion
     cleanup.js           <- Worktree + temp file cleanup
-    timeout.js           <- Agent timeout and heartbeat detection
+    timeout.js           <- Agent timeout and orphan detection
     cooldown.js          <- Dispatch cooldown with exponential backoff
     github.js            <- GitHub PR polling, comment polling, reconciliation
     routing.js           <- Agent routing and temp agent management

package/dashboard.js CHANGED Viewed

@@ -25,6 +25,9 @@ const ado = require('./engine/ado');
 const gh = require('./engine/github');
 const issues = require('./engine/issues');
 const watchesMod = require('./engine/watches');
+const routing = require('./engine/routing');
+const playbook = require('./engine/playbook');
+const dispatchMod = require('./engine/dispatch');
 const os = require('os');
 const { safeRead, safeReadDir, safeWrite, safeJson, safeJsonObj, safeJsonArr, safeUnlink, mutateJsonFileLocked, mutateWorkItems, getProjects: _getProjects, DONE_STATUSES, WI_STATUS, reopenWorkItem } = shared;
@@ -1238,6 +1241,49 @@ async function executeCCActions(actions) {
           results.push({ type: action.type, id, ok: true });
           break;
         }
+        case 'build-and-test': {
+          // Resolve PR by number, ID, or URL — same lookup that drives the link-pr / PR-row paths.
+          const allPrs = getPullRequests().filter(p => !p._ghost);
+          const pr = shared.findPrRecord(allPrs, action.pr) || null;
+          if (!pr) {
+            results.push({ type: 'build-and-test', error: `PR not found: ${action.pr}` });
+            break;
+          }
+          // Resolve project: explicit param wins, else PR's _project, else first configured project as last resort.
+          const projectName = action.project || pr._project || null;
+          const project = projectName
+            ? PROJECTS.find(p => p.name?.toLowerCase() === String(projectName).toLowerCase())
+            : null;
+          if (!project) {
+            results.push({ type: 'build-and-test', error: `Project not found for PR ${pr.id}: ${projectName || '(none)'}` });
+            break;
+          }
+          // Pick agent: explicit param wins; else routing for 'test' work type.
+          let agentId = action.agent && CONFIG.agents?.[action.agent] ? action.agent : null;
+          if (!agentId) {
+            agentId = routing.resolveAgent('test', CONFIG, { authorAgent: pr.agent });
+          }
+          if (!agentId) {
+            results.push({ type: 'build-and-test', error: 'No available agent for test routing' });
+            break;
+          }
+          const prNumber = shared.getPrNumber(pr);
+          const dispatchKey = `cc-bt-${project.name}-${pr.id}`;
+          const item = playbook.buildPrDispatch(agentId, CONFIG, project, pr, 'test', {
+            pr_id: pr.id, pr_number: prNumber, pr_title: pr.title || '', pr_branch: pr.branch || '',
+            pr_author: pr.agent || '', pr_url: pr.url || '',
+            project_path: project.localPath || '',
+            task: `Build & test ${pr.id}: ${pr.title || ''}`,
+          }, `Build & test ${pr.id}: ${pr.title || ''}`,
+          { dispatchKey, source: 'cc-build-and-test', pr, branch: pr.branch, project: { name: project.name, localPath: project.localPath } });
+          if (!item) {
+            results.push({ type: 'build-and-test', error: 'Failed to render build-and-test playbook' });
+            break;
+          }
+          const id = dispatchMod.addToDispatch(item);
+          results.push({ type: 'build-and-test', id, agent: agentId, pr: pr.id, ok: true });
+          break;
+        }
         case 'note': {
           shared.writeToInbox('command-center', shared.slugify(action.title || 'note'), `# ${action.title || 'Note'}\n\n${action.content || action.description || ''}`);
           results.push({ type: 'note', ok: true });

package/docs/auto-discovery.md CHANGED Viewed

@@ -8,7 +8,7 @@ The engine runs a tick every 60 seconds (configurable via `config.json` → `eng
 ```
 tick()
-  1. checkTimeouts()            Kill stale/hung agents (>heartbeatTimeout)
+  1. checkTimeouts()            Enforce runtime limits and stale-orphan cleanup
   2. consolidateInbox()         Merge learnings into notes.md (Haiku-powered)
   2.5 runCleanup()              Periodic cleanup (every 10 ticks ≈ 10min)
   2.6 pollPrStatus()            Poll ADO + GitHub for build, review, merge status (wall-clock cadence from prPollStatusEvery × tickInterval, default ≈ 12min)
@@ -283,7 +283,7 @@ proc.on('close')
   ├─ Post-completion hooks:
   │    review     → update PR minionsReview in pull-requests.json, vote on ADO
   │    fix        → set PR minionsReview back to "waiting"
-  │    build-test → (agent auto-files fix work items on failure)
+  │    build-test → record verification result and findings
   │
   ├─ Check for learnings in notes/inbox/
   │    (warns if agent didn't write findings)
@@ -346,10 +346,10 @@ ADO + GitHub REST ── pollPrStatus() ──► pull-requests.json
                                             │
                         ┌───────────┬───────┼───────┬──────────┐
                         ▼           ▼       ▼       ▼          ▼
-                   output.log  notes/  PRs    work-items  localhost
-                   (per agent) inbox/*.md  .json  .json       (if webapp,
-                                    │             (auto-filed  from build
-                          consolidateInbox()       on failure)  & test)
+                    output.log  notes/  PRs    work-items  localhost
+                    (per agent) inbox/*.md  .json  .json       (if webapp,
+                                    │                         from build
+                          consolidateInbox()                  & test)
                           (at 5+ files)
                                     │
                                     ▼
@@ -359,18 +359,20 @@ ADO + GitHub REST ── pollPrStatus() ──► pull-requests.json
                                playbooks)
 ```
-## Timeout & Stale Detection
+## Timeout & Stale-Orphan Detection
 Two layers of protection:
 **Agent timeout** (`engine.agentTimeout`, default 5 hours / 18,000,000ms):
-- Checks `activeProcesses` Map for elapsed time
-- Sends SIGTERM, then SIGKILL after 5s
+- Applies to tracked live processes regardless of output activity
+- Sends SIGTERM, then SIGKILL after a short grace period
-**Stale detection** (`engine.heartbeatTimeout`, default 5 min / 300,000ms):
-- Scans `dispatch.active` for items where `started_at` exceeds threshold
-- Catches cases where the process exited but dispatch wasn't cleaned up
-- Kills process if still tracked, marks dispatch as error, resets agent to idle
+**Stale-orphan detection** (`engine.heartbeatTimeout`, default 5 min / 300,000ms):
+- Applies only when an active dispatch has no live tracked process
+- Uses `live-output.log` mtime as indirect evidence after engine restart or process-handle loss
+- Marks stale orphaned dispatches failed and resets the agent to idle
+Lack of stdout/stderr is not treated as a hang while the engine still has a live process handle. Long builds, dependency installs, and tests can legitimately run quietly.
 ## Cooldown Behavior
@@ -391,8 +393,8 @@ All discovery behavior is controlled via `config.json`:
   "engine": {
     "tickInterval": 60000,       // ms between ticks
     "maxConcurrent": 5,          // max agents running at once
-    "agentTimeout": 18000000,     // 5 hours — kill hung processes
-    "heartbeatTimeout": 300000,  // 5min — kill stale/silent agents
+    "agentTimeout": 18000000,     // 5 hours — hard runtime limit
+    "heartbeatTimeout": 300000,  // 5min — stale-orphan grace after process tracking is lost
     "maxTurns": 100,             // max claude CLI turns per agent
     "worktreeCreateTimeout": 300000, // timeout for git worktree add on large repos
     "worktreeCreateRetries": 1   // retry count for transient add failures

package/docs/blog-first-successful-dispatch.md CHANGED Viewed

@@ -59,9 +59,7 @@ proc.stdin.end();
 ### Output Format: json vs stream-json
-`--output-format json` produces **one JSON blob at exit**. No streaming output during execution. This broke:
-- Live output in dashboard (nothing to show until agent finishes)
-- Heartbeat monitoring (no file writes to check mtime against)
+`--output-format json` produces **one JSON blob at exit**. No streaming output during execution. This broke live output in the dashboard and made restart recovery less observable.
 Fix: switched to `--output-format stream-json` — streams events as they happen.
@@ -73,15 +71,15 @@ Agents would hang waiting for permission prompts (invisible in headless mode). A
 Claude Code sets `CLAUDECODE` env var to prevent nested sessions. Spawned agents inherit it and refuse to start. The engine strips it from `childEnv`, but the wrapper script was using `process.env` (which re-inherits from the parent). Fixed by stripping in both places.
-### Heartbeat vs Stale Detection
+### Process Liveness vs Stale-Orphan Detection
 Original approach: kill agents after a fixed time threshold (staleThreshold). Problem: agents can legitimately run for hours on complex tasks.
-New approach: heartbeat based on `live-output.log` mtime. As long as the agent produces output, it's alive. If silent for 5 minutes → declared dead. Catches orphaned processes (engine restart loses process handles) and hung agents.
+New approach: rely on the tracked process while the engine has a live process handle, regardless of whether stdout/stderr are quiet. `live-output.log` mtime is only used after process tracking is lost, such as an engine restart, to clean up stale orphaned dispatches.
 ### Engine Restart Orphan Problem
-When the engine restarts, the in-memory `activeProcesses` Map is lost. Active dispatch items stay in `dispatch.json` but the engine has no process handle. Old stale detection (6h threshold) was too slow to catch this. The heartbeat check catches it in 5 minutes.
+When the engine restarts, the in-memory `activeProcesses` Map is lost. Active dispatch items stay in `dispatch.json` but the engine has no process handle. Old stale detection (6h threshold) was too slow to catch this. Stale-orphan detection uses recent `live-output.log` activity and catches abandoned dispatches after the restart grace window.
 ## The Successful Run
@@ -107,9 +105,9 @@ When the engine restarts, the in-memory `activeProcesses` Map is lost. Active di
 1. **Never pass user content through shell expansion.** Use stdin or direct args via Node's `spawn` (without shell).
 2. **On Windows, npm-installed CLI tools are shell wrappers.** Resolve the actual `.js` entry point and spawn via `node`.
-3. **Streaming output format is essential** for monitoring long-running agents. One-shot JSON is useless for heartbeats.
+3. **Streaming output format is essential** for live dashboards and restart recovery. One-shot JSON hides everything until exit.
 4. **Environment variable inheritance is tricky** with nested spawns. Strip at every level.
-5. **Heartbeats > timeouts** for agents that can run for hours. Check liveness, not elapsed time.
+5. **Process liveness beats output-silence heuristics** for agents that can run quiet CLI commands for long periods.
 ## The Spawn Chain (Final Working Version)
@@ -120,9 +118,8 @@ engine.js (tick loop)
       → spawn(process.execPath, ['cli.js', '-p', '--system-prompt', content, ...args])
         → claude-code runs with prompt via stdin
           → agent works, streams JSON events to stdout
-            → engine captures to live-output.log (heartbeat)
+            → engine captures to live-output.log (dashboard + restart recovery)
             → dashboard polls /api/agent/:id/live (3s refresh)
 ```
 No bash. No shell. No metacharacter interpretation. Just Node spawning Node.

package/docs/engine-restart.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## The Problem
-When the engine restarts, it loses its in-memory process handles (`activeProcesses` Map). Claude CLI agents spawned before the restart are still running as OS processes, but the engine can't monitor their stdout, detect exit codes, or manage their lifecycle. Without protection, the heartbeat check (5-min default) would kill these agents as "orphans."
+When the engine restarts, it loses its in-memory process handles (`activeProcesses` Map). Claude CLI agents spawned before the restart may still be running as OS processes, but the engine can't monitor their process state, detect exit codes, or manage their lifecycle. Stale-orphan detection keeps these dispatch records from staying active forever after the restart grace period expires.
 ## What's Persisted vs Lost
@@ -10,7 +10,7 @@ When the engine restarts, it loses its in-memory process handles (`activeProcess
 |-------|---------|-----------------|
 | Dispatch queue (pending/active/completed) | `engine/dispatch.json` | Yes |
 | Agent status (working/idle/error) | Derived from `engine/dispatch.json` | Yes |
-| Agent live output | `agents/*/live-output.log` | Yes (mtime used as heartbeat) |
+| Agent live output | `agents/*/live-output.log` | Yes (mtime used for orphan cleanup) |
 | Process handles (`ChildProcess`) | In-memory Map | **No** |
 | Cooldown timestamps | In-memory Map | **No** (repopulated from `engine/cooldowns.json`) |
@@ -29,14 +29,11 @@ Configurable via `config.json`:
 }
 ```
-### 2. Blocking Tool Detection
+### 2. Process-Based Liveness
-Even after the grace period expires, the engine scans each agent's `live-output.log` for the most recent `tool_use` call. If the agent is in a known blocking tool:
+After the grace period expires, a dispatch with a tracked live process keeps running until the process exits or exceeds `engine.agentTimeout`. Quiet stdout/stderr alone is not a hang signal; long builds, dependency installs, and tests can legitimately be silent.
-- **`TaskOutput` with `block: true`** — timeout extended to the task's own timeout + 1 min
-- **`Bash` with long timeout (>5 min)** — timeout extended to the bash timeout + 1 min
-This works for both tracked processes and orphans (no process handle).
+If there is no live tracked process, the engine uses `live-output.log` mtime as indirect evidence. Once the log is stale for `engine.heartbeatTimeout`, the dispatch is treated as an orphan and marked failed.
 ### 3. Stop Warning
@@ -86,7 +83,7 @@ T+0-20m  Ticks run. Orphan detection skipped (grace period).
          Engine detects completed output on next tick via file scan.
 T+20m    Grace period expires.
-         Heartbeat check resumes. Blocking tool detection still active.
-         Agent in TaskOutput block:true gets extended timeout.
-         Agent with no output for 5min+ and no blocking tool → orphaned.
+          Stale-orphan detection resumes.
+          Dispatch with live tracked process → keep running.
+          Dispatch with no live process and stale output → orphaned.
 ```

package/docs/human-vs-automated.md CHANGED Viewed

@@ -50,8 +50,8 @@ These run continuously without you:
 - **Build failure detection** — auto-files fix tasks when CI fails
 - **Inbox consolidation** — LLM-powered dedup and categorization when inbox hits threshold
 - **Knowledge base classification** — auto-assigns category to consolidated notes
-- **Heartbeat monitoring** — detects hung/dead agents, marks them failed
-- **Blocking tool detection** — extends timeout when agent is in a long-running operation
+- **Process-based liveness** — tracks running agent processes and enforces the hard runtime limit
+- **Stale-orphan detection** — cleans up dispatches after process tracking is lost
 - **Metrics collection** — tracks tasks, errors, PRs, approvals per agent
 - **Dispatch priority** — fixes first, then reviews, then implementations
 - **Cooldown & backoff** — prevents re-dispatching recently failed items
@@ -98,11 +98,10 @@ If you start the engine and dashboard, then leave:
 2. Discovers pending work items, PRD gaps, PR reviews needed
 3. Dispatches agents (up to max concurrent)
 4. Agents create worktrees, write code, create PRs
-5. Engine monitors for completion, hung agents, build failures
+5. Engine monitors for process exit, hard timeouts, stale orphans, and build failures
 6. Successful work → PRs appear in your ADO/GitHub queue
 7. Failed work → marked failed, waiting for your retry
 8. Notes consolidated into team knowledge automatically
 9. Worktrees cleaned up after PRs merge
 **What blocks:** Plans waiting for approval. PRs waiting for your review vote. Failed tasks waiting for retry. Everything else keeps moving.

package/docs/pr-review-fix-loop.md CHANGED Viewed

@@ -96,7 +96,7 @@ When multiple problems coexist, earlier triggers get the first chance to enqueue
 | Build fix before CI runs | `_buildFixPushedAt` grace period (10min) |
 | Duplicate dispatch | `dispatchKey` dedup + cooldown |
 | Stale review status | Pre-dispatch live API check |
-| Orphan detection | Heartbeat timeout + output scan |
+| Orphan detection | Stale-orphan timeout + output scan |
 ## Key files

package/docs/rfc-completion-json.md CHANGED Viewed

@@ -28,7 +28,7 @@ The engine reconstructs control-plane state from the unstructured stdout of `cla
 | 6 | `parseStructuredCompletion` (`lifecycle.js:1494`) | Last ` ```completion ` fenced block, parsed as `key: value` | An agent that includes a ` ```completion ` block in a quoted file (e.g. another playbook) overrides its own real status |
 | 7 | `classifyFailure` (`lifecycle.js:2096`) | Failure-class regexes on combined stdout/stderr (`max_turns`, `permission denied`, `merge conflict`, …) | An agent that quotes one error class while genuinely failing on another gets the wrong recovery recipe |
 | 8 | `checkForLearnings` (`lifecycle.js:1266`) | Filesystem scan for `notes/inbox/*<agentId>*<date>*` | Not stdout-based, but date-collisions cause cross-task attribution |
-| 9 | `checkTimeouts` (`engine/timeout.js:189-219`) | Tail of `live-output.log` for `"type":"result"` and `[process-exit]` markers — completion-via-output detection for hung dispatches | Lower-risk: this is the claude CLI's own output, not agent-authored content |
+| 9 | `checkTimeouts` (`engine/timeout.js:189-219`) | Tail of `live-output.log` for `[process-exit]` markers — completion-via-output detection after process tracking is lost | Lower-risk: this is engine/CLI output, not agent-authored content |
 Sites 1–8 are agent-spoofable (intentionally or accidentally). Site 9 is claude-CLI-emitted and stays on stdout — see §6.
@@ -44,7 +44,7 @@ The current ` ```completion ` fenced block (Site 6) was a half-step toward struc
 5. Zero new dependencies — file write + JSON parse, same toolbox as the rest of Minions.
 **Non-goals.**
-1. Replacing `live-output.log` for liveness/heartbeat tracking. The CLI's own stream-json output is still the authoritative liveness signal (`"type":"result"`, `subtype:"success"` etc.) — see §6.
+1. Replacing process tracking or `live-output.log` recovery. A live tracked process is the authoritative liveness signal; `live-output.log` remains useful for completion recovery after process tracking is lost — see §6.
 2. Replacing `safeWrite`/`mutateJsonFileLocked` for engine state files. `completion.json` is one-shot, write-once, agent-authored — no concurrent writers.
 3. Hardening against a *malicious* agent. An attacker who controls the agent process could write any completion.json. The threat model is *accidental spoofing by quoted text* and *forward compatibility with structured tool outputs*.
@@ -269,8 +269,8 @@ The flag name `engine.requireCompletionFile` mirrors existing engine flags (`aut
 These paths stay on stdout / live-output.log:
-1. **`engine/timeout.js` completion-via-output detection** (`timeout.js:189-219`). The signal there is the claude CLI's own `"type":"result"` event, emitted by the binary even if the agent crashed before writing completion.json. Removing it would mean orphan/hung agents are never reaped. This stays as the heartbeat mechanism.
-2. **Per-tick liveness via `live-output.log` mtime** (`timeout.js:178`). Same reason — completion.json is written once at exit, not as a heartbeat.
+1. **`engine/timeout.js` completion-via-output detection** (`timeout.js:189-219`). The signal is the engine-written `[process-exit]` sentinel, emitted even if the agent crashed before writing completion.json. Removing it would mean orphans that finished during process-handle loss are never reconciled.
+2. **Stale-orphan cleanup via `live-output.log` mtime** (`timeout.js:178`). Completion.json is written once at exit, so `live-output.log` remains the best indirect signal after the engine loses process tracking.
 3. **`parseStreamJsonOutput` for `resultSummary`** in `parseAgentOutput` (`lifecycle.js:1483`). This extracts the human-readable summary from the CLI's stream-json. Even after the flip, `completion.summary` is *also* extracted, but the stream-json text remains the canonical "what did the agent say last" — used in dashboards, agent history, Teams notifications. The two coexist: `completion.summary` is for routing decisions, the stream-json text is for display.
 4. **Inbox-file skill scan** (`lifecycle.js:2013-2024`). Some agents write skills into their inbox findings file (a deliberate human-discoverable artifact). The completion file deprecates inline ` ```skill ` blocks in stdout, but the inbox file scan is opt-in and stays — it's a different surface (a real file the agent intentionally wrote, not regex-scraped from stdout).
@@ -308,7 +308,7 @@ These paths stay on stdout / live-output.log:
 | Agent quotes a previous ` ```completion ` block → wrong status | ✅ | ` ```completion ` parser removed in Phase 4 |
 | Agent quotes one error class while failing on another → wrong recovery recipe | ✅ | `failure.class` is explicit; if missing or invalid, falls through to `FAILURE_CLASS.UNKNOWN` (safe default) |
 | Decompose agent emits a ` ```json ` block earlier in reasoning → corrupted children | ✅ | `decomposition.subItems` is explicit |
-| Hung/orphaned agent never reaches the write site → no completion.json | ⚠️ | Engine's existing live-output.log heartbeat reaper (`timeout.js`) catches this; dispatch is marked failed via stdout completion-via-output signal |
+| Orphaned agent never reaches the write site → no completion.json | ⚠️ | Engine's stale-orphan cleanup (`timeout.js`) catches this after process tracking is lost; completed agents can still be reconciled via the `[process-exit]` sentinel |
 | Malicious agent writes a fake completion.json (e.g. claims `noop` to avoid retry) | ❌ | Out of scope — see §2 non-goals. An adversarial agent owns its own write path regardless. |
 The key shift: **the agent's intent is now in a place no quoted text can reach.** Stdout becomes display-only.

package/engine/copilot-models.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
   "runtime": "copilot",
   "models": null,
-  "cachedAt": "2026-04-30T08:56:05.025Z"
+  "cachedAt": "2026-04-30T10:13:15.188Z"
 }

package/engine/lifecycle.js CHANGED Viewed

@@ -1665,7 +1665,7 @@ async function runPostCompletionHooks(dispatchItem, agentId, code, stdout, confi
     log('info', `Structured completion reports PR (${structuredCompletion.pr}) but regex sync found none — PR may already be tracked`);
   }
-  // Auto-recover: if a failed implement/fix agent created PRs, it likely succeeded before being killed (e.g. heartbeat timeout)
+  // Auto-recover: if a failed implement/fix agent created PRs, it likely succeeded before the failure surfaced.
   const prCreatingType = type === WORK_TYPE.IMPLEMENT || type === WORK_TYPE.IMPLEMENT_LARGE || type === WORK_TYPE.FIX;
   const autoRecovered = !isSuccess && prsCreatedCount > 0 && prCreatingType && !!meta?.item?.id;
   if (autoRecovered) {

package/engine/playbook.js CHANGED Viewed

@@ -278,6 +278,7 @@ const PLAYBOOK_REQUIRED_VARS = {
   'decompose':            ['item_id', 'item_description', 'project_path'],
   'verify':               ['task_description'],
   'test':                 ['item_name'],
+  'docs':                 ['item_id', 'item_name'],
   'work-item':            ['item_id', 'item_name'],
   'meeting-investigate':  ['meeting_title', 'agenda'],
   'meeting-debate':       ['meeting_title', 'agenda'],
@@ -630,7 +631,7 @@ function selectPlaybook(workType, item) {
   if (workType === WORK_TYPE.REVIEW && !item?._pr && !item?.pr_id) {
     return 'work-item';
   }
-  const typeSpecificPlaybooks = ['explore', 'review', 'test', 'plan-to-prd', 'plan', 'ask', 'verify', 'decompose', 'meeting-investigate', 'meeting-debate', 'meeting-conclude'];
+  const typeSpecificPlaybooks = ['explore', 'review', 'test', 'plan-to-prd', 'plan', 'ask', 'verify', 'decompose', 'docs', 'meeting-investigate', 'meeting-debate', 'meeting-conclude'];
   return typeSpecificPlaybooks.includes(workType) ? workType : 'work-item';
 }

package/engine/queries.js CHANGED Viewed

@@ -300,7 +300,7 @@ function getAgentStatus(agentId) {
       branch: active.meta?.branch || '',
       started_at: active.started_at || active.created_at || null,
     };
-    // Surface blocking tool call state from dispatch annotation (set by timeout.js)
+    // Surface any legacy blocking-tool annotation until timeout.js clears it.
     if (active._blockingToolCall) {
       result._blockingToolCall = active._blockingToolCall;
     }
@@ -355,12 +355,12 @@ function getAgentStatus(agentId) {
   // Fallback: derive active state from work-item markers.
   // This protects UI status when dispatch.json briefly desyncs from work-item files.
-  // Guard: only trust dispatched state within 2x heartbeatTimeout to prevent stale
+  // Guard: only trust dispatched state within 2x stale-orphan timeout to prevent stale
   // dispatched items from permanently showing an agent as working after a dead process.
   try {
     const config = getConfig();
-    const heartbeatTimeout = config.engine?.heartbeatTimeout || ENGINE_DEFAULTS.heartbeatTimeout;
-    const staleThresholdMs = heartbeatTimeout * 2;
+    const staleOrphanTimeout = config.engine?.heartbeatTimeout || ENGINE_DEFAULTS.heartbeatTimeout;
+    const staleThresholdMs = staleOrphanTimeout * 2;
     const now = Date.now();
     const allItems = getWorkItems(config);
     const latestInFlight = allItems

package/engine/shared.js CHANGED Viewed

@@ -699,8 +699,8 @@ const ENGINE_DEFAULTS = {
   maxConcurrent: 5,
   inboxConsolidateThreshold: 5,
   agentTimeout: 18000000,  // 5h
-  heartbeatTimeout: 300000, // 5min — base heartbeat for most work types
-  heartbeatTimeouts: {}, // per-type overrides; merged with defaults at runtime (see timeout.js)
+  heartbeatTimeout: 300000, // 5min — stale-orphan grace after process tracking is lost
+  heartbeatTimeouts: {}, // optional per-type stale-orphan overrides; merged at runtime (see timeout.js)
   maxTurns: 100,
   worktreeCreateTimeout: 300000, // 5min for git worktree add on large Windows repos
   worktreeCreateRetries: 1, // retry once on transient timeout/lock races
@@ -758,7 +758,7 @@ const ENGINE_DEFAULTS = {
   copilotReasoningSummaries: false,  // Copilot --enable-reasoning-summaries (Anthropic-family models only)
   maxBudgetUsd: undefined,       // fleet USD ceiling for --max-budget-usd (per-agent override: agents.<id>.maxBudgetUsd). Honors 0 via ?? so a literal cap of $0 works
   disableModelDiscovery: false,  // skip runtime.listModels() REST calls fleet-wide (settings UI falls back to free-text)
-  heartbeatTimeouts: {}, // populated after WORK_TYPE is defined (below)
+  heartbeatTimeouts: {},
   maxPendingContexts: 20, // cap pendingContexts arrays in cooldowns.json to prevent unbounded growth
   maxPendingContextEntryBytes: 256 * 1024, // 256 KB — cap each pendingContexts entry to prevent huge PR comments from bloating cooldowns.json
   maxDispatchPromptBytes: 1024 * 1024, // 1 MB — dispatch items with prompts larger than this sidecar to engine/contexts/ to prevent dispatch.json OOM (#1167)
@@ -1063,14 +1063,6 @@ const WORK_TYPE = {
   MEETING: 'meeting', EXPLORE: 'explore', ASK: 'ask', TEST: 'test', DOCS: 'docs',
 };
-// Per-work-type heartbeat timeouts (ms) — read-heavy tasks need longer silence windows.
-// Keyed by WORK_TYPE constants; types not listed fall back to ENGINE_DEFAULTS.heartbeatTimeout.
-Object.assign(ENGINE_DEFAULTS.heartbeatTimeouts, {
-  [WORK_TYPE.EXPLORE]: 600000,   // 10 min — spends most time reading/analyzing, minimal stdout
-  [WORK_TYPE.ASK]:     600000,   // 10 min — research-heavy, long silent analysis periods
-  [WORK_TYPE.REVIEW]:  480000,   // 8 min — code review reads extensively before producing output
-});
 const PLAN_STATUS = {
   ACTIVE: 'active', AWAITING_APPROVAL: 'awaiting-approval', APPROVED: 'approved',
   PAUSED: 'paused', REJECTED: 'rejected', COMPLETED: 'completed',
@@ -1161,7 +1153,7 @@ const FAILURE_CLASS = {
   PERMISSION_BLOCKED: 'permission-blocked', // Trust gate, permission denied, auth failure
   MERGE_CONFLICT: 'merge-conflict',       // Git merge conflict in worktree or dependency
   BUILD_FAILURE: 'build-failure',         // Compilation, lint, or test failure
-  TIMEOUT: 'timeout',                     // Hard timeout or heartbeat timeout
+  TIMEOUT: 'timeout',                     // Hard runtime timeout or stale-orphan timeout
   EMPTY_OUTPUT: 'empty-output',           // Agent produced no meaningful output
   SPAWN_ERROR: 'spawn-error',             // Process failed to start or crashed immediately
   NETWORK_ERROR: 'network-error',         // API rate limit, DNS, connectivity