npm - pi-crew - Versions diffs - 0.9.4 → 0.9.7 - Mend

pi-crew 0.9.4 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +592 -0
package/README.md +55 -3
package/docs/HARNESS_BACKLOG.md +51 -3
package/docs/dynamic-workflows.md +315 -2
package/docs/fix-plan-disabletools-exit-null.md +219 -0
package/docs/troubleshooting.md +102 -0
package/package.json +8 -2
package/src/extension/command-completions.ts +1 -0
package/src/extension/crew-shortcuts.ts +1 -0
package/src/extension/register.ts +2 -0
package/src/extension/registration/commands.ts +3 -0
package/src/extension/team-tool/doctor.ts +14 -0
package/src/extension/team-tool/goal.ts +1 -0
package/src/extension/team-tool/run.ts +4 -0
package/src/runtime/background-runner.ts +24 -2
package/src/runtime/chain-runner.ts +1 -0
package/src/runtime/child-pi.ts +101 -10
package/src/runtime/crash-recovery.ts +78 -36
package/src/runtime/deterministic-ast.ts +161 -0
package/src/runtime/dwf-state-store.ts +97 -0
package/src/runtime/dynamic-workflow-context.ts +381 -7
package/src/runtime/dynamic-workflow-runner.ts +94 -2
package/src/runtime/goal-loop-runner.ts +2 -0
package/src/runtime/live-session-runtime.ts +1 -0
package/src/runtime/model-scope.ts +1 -0
package/src/runtime/peer-dep.ts +1 -0
package/src/runtime/pi-args.ts +11 -0
package/src/runtime/resilient-edit.ts +1 -0
package/src/runtime/result-extractor.ts +72 -7
package/src/runtime/task-runner.ts +1 -0
package/src/runtime/team-runner.ts +8 -3
package/src/runtime/zombie-scanner.ts +297 -0
package/src/schema/team-tool-schema.ts +28 -0
package/src/state/contracts.ts +1 -0
package/src/state/hook-instinct-bridge.ts +3 -0
package/src/state/state-store.ts +3 -0
package/src/state/types.ts +9 -0
package/src/ui/dashboard-panes/progress-pane.ts +5 -0
package/src/ui/dwf-phase-display.ts +151 -0
package/src/ui/run-snapshot-cache.ts +4 -0
package/src/ui/snapshot-types.ts +3 -0
package/src/utils/bm25-search.ts +2 -0
package/src/workflows/workflow-config.ts +3 -0
package/src/worktree/worktree-manager.ts +94 -0
package/types/dwf.d.ts +187 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,597 @@
 # Changelog
+## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
+P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
+script crashes (timeout, OOM, agent error) between `ctx.agent()` calls, the runner now
+persists a checkpoint after every agent call so `team action='resume' runId='X'` can
+continue from the last checkpoint instead of re-running from scratch. **Backward
+compatible** — fresh runs (no checkpoint) behave exactly as before.
+### Implementation
+**New file** `src/runtime/dwf-state-store.ts` (`DwfStore`):
+- Atomic CRUD for a single run's DWF checkpoint, modeled on `GoalStore` /
+  `FileCheckpointStore`.
+- Persists `DwfCheckpointState` (vars, phases, currentPhase, logs, spent, agentCount,
+  updatedAt) to `<stateRoot>/dwf-checkpoint.json` via `atomicWriteJson`.
+- `load()` returns `undefined` for a missing/corrupt checkpoint (fresh run); `delete()`
+  is best-effort and never throws.
+**`ctx.agent()` checkpoint hook** in `src/runtime/dynamic-workflow-context.ts`:
+- New `MakeWorkflowCtxOptions.onCheckpoint?: (state) => void` — invoked after each
+  `ctx.agent()` call (success OR fail) so a crash between calls leaves durable state.
+- New `MakeWorkflowCtxOptions.resumedState?` — hydrates `ctx.vars`, phase state, logs,
+  `budget.spent()`, and `agentCount` from the checkpoint on resume.
+- New closure counter `agentCount` (incremented in `agent()`'s `finally`), exposed via a
+  non-enumerable `__agentCount` getter.
+- New `getWorkflowCheckpoint(ctx)` helper (mirror of `getWorkflowPhaseState`).
+**Runner wiring** in `src/runtime/dynamic-workflow-runner.ts`:
+- On run start: `DwfStore.load()` → hydrate ctx (`resumedState`) + emit `dwf.resumed`.
+- `onCheckpoint` → `DwfStore.save()` (best-effort, errors swallowed).
+- On clean completion: `DwfStore.delete()` so a re-run starts fresh.
+### Resume semantics
+`team action='resume' runId='X'` re-dispatches with `runKind='dynamic-workflow'`. The
+runner loads the checkpoint, hydrates `ctx.vars`/phases/logs, and re-executes the
+script from the top. Scripts SHOULD be written defensively — check `ctx.vars.lastPhase`
+to skip completed work (documented in `docs/dynamic-workflows.md`). No partial-resume of
+a single agent call (it re-runs from scratch); checkpoints are written AFTER an agent
+completes, never before.
+### Tests (14 new)
+- `test/unit/dwf-state-store.test.ts` (10): save/load round-trip, missing→undefined,
+  delete, corrupt-file resilience, path layout, dir creation, large-state preservation.
+- `test/unit/dynamic-workflow-context.test.ts` (+8): onCheckpoint fires on success/fail,
+  agentCount accumulation, backward-compat (no callback), resumedState hydration,
+  shallow-copy isolation, getWorkflowCheckpoint snapshot.
+- `test/integration/dwf-setresult.test.ts` (+4): fresh run (no resumed event),
+  completed run (checkpoint deleted), resume (hydration + dwf.resumed + delete),
+  corrupt checkpoint treated as fresh run.
+### Docs
+- `docs/dynamic-workflows.md` — new "Resume & Checkpoint (round-18 P2-3)" section +
+  defensive-script example.
+- `types/dwf.d.ts` — resume pattern documented in header + `ctx.vars` JSDoc.
+- `package.json` — bumped 1.0.1 → 1.1.0 (minor — new opt-in capability).
+### Out of scope (future rounds)
+- P2-2 VM sandbox — still waiting for `isolated-vm` v1.5 (vm.createContext is not a
+  real security boundary). This was the LAST P2 item.
+### Process-safety follow-up fix (same release)
+A heuristic-based zombie "cleanup" had killed a live interactive main `pi`
+session by accident (uptime/RSS/orphan heuristics match a main session just
+as readily as a real orphaned sub-agent). Fixed authoritatively:
+- **`PI_CREW_KIND=subagent`** env marker, set by `buildPiWorkerArgs`
+  (`src/runtime/pi-args.ts`) on every child-pi spawn. A main session does NOT
+  carry it, so it can never be matched as a sub-agent. (An earlier draft also
+  added a `--crew-subagent` argv flag — removed because pi's strict option
+  parser rejects unknown flags and exits non-zero, which silently broke every
+  `ctx.agent()` call. The env var alone is the authoritative signal.)
+- **`src/runtime/zombie-scanner.ts`** (new): read-only scanner that matches
+  ONLY processes with `PI_CREW_KIND=subagent` AND a dead `PI_CREW_PARENT_PID`.
+  Never matches a main session. Never kills.
+- **`team action='doctor' focus='zombies'`**: renders the safe scan as a
+  human-readable report (zombies vs live sub-agents, with explicit do-not-kill
+  labelling for live parents).
+- **`PI_CREW_KIND`** added to the env allowlist in `child-pi.ts`.
+- **`docs/troubleshooting.md`** + **`.crew/knowledge.md`**: documented the
+  marker + the read-only rule so future agents never repeat the mistake.
+- 8 new unit tests in `test/unit/zombie-scanner.test.ts`, including a
+  regression test asserting the current (main-session) process is never matched.
+### Real-world smoke testing findings (2026-06-24)
+Three bugs were caught by real `team action='run'` smoke tests that the unit
+suite missed (units don't shell out to the real `pi` binary). **All three are
+now fixed.**
+- **Fixed: `--crew-subagent` argv flag broke every `ctx.agent()` call.** Pi's
+  strict option parser exits non-zero on unknown flags. The marker is now the
+  `PI_CREW_KIND=subagent` ENV var only.
+- **Fixed: `ctx.agent({schema, systemPrompt})` silently dropped `systemPrompt`.**
+  The round-13 schema branch used the resolved role persona as the base for the
+  JSON-output instruction, ignoring the caller's explicit persona. Models then
+  returned prose and failed schema validation. Fix: `call.systemPrompt` is now
+  preferred as the base when both are set.
+- **Fixed: `ctx.agent({disableTools: true, maxTurns: 1})` returned `exit null`.**
+  Root cause (found via Phase-0 diagnostic instrumentation) was NOT the
+  final-drain race originally hypothesised — it was an erroneous
+  `killProcessTree` call in the steer-injection path. When `maxTurns` was
+  reached on a `turn_end` event, the code wrote a "wrap up" steer to
+  `child.stdin`; Node's `writable.write()` returns `false` on normal
+  backpressure, which the code mis-treated as a fatal injection failure and
+  killed the worker mid-answer (answer was in stdout; exit came back `null`).
+  The `disableTools` correlation was a red herring — the real trigger was
+  `maxTurns:1` hitting on the first turn. Fix: steer injection is now
+  ADVISORY — a `write() === false` or non-writable stdin is logged, not fatal;
+  the hard-abort at `maxTurns + graceTurns` remains the safety net for genuine
+  runaways. Verified: maxTurns=1 × 10 real-binary runs now 10/10 exit=0 (was
+  ~60% fail pre-fix). Regression guard: `test/unit/child-pi-steer-backpressure.test.ts`
+  (source-contract checks + opt-in real-binary smoke via `PI_CREW_SMOKE=1`).
+## [v0.9.7] — round-17 (P2-4 worktree isolation per agent) (2026-06-23)
+P2-4 feature: `ctx.agent({worktree: true})` spawns the agent in an isolated
+git worktree so parallel file-modifying agents don't conflict. Fully backward
+compatible — `worktree` defaults to false, so existing calls are unchanged.
+### Implementation
+**New helpers** in `src/worktree/worktree-manager.ts`:
+- `prepareAgentWorktree(manifest, opts)` — creates a worktree from HEAD on a
+  unique branch; returns `{path, branch}` or `undefined` when the cwd is not a
+  git repo (graceful fallback).
+- `cleanupAgentWorktree(manifest, worktreePath, branch?)` — removes the
+  worktree dir + branch, and captures a git diff as a side artifact when the
+  worktree has changes (for audit/merge).
+**`ctx.agent()` integration** in `src/runtime/dynamic-workflow-context.ts`:
+- New `worktree?: boolean` field on `AgentCallOpts`.
+- When true, the agent runs with the worktree path as its cwd.
+- Cleanup always runs (success, failure, or agent error) — no worktree leaks.
+- Non-git cwd or creation failure → fall back to normal cwd + `ctx.log()`
+  warning; the agent still runs.
+### Why worktrees for DWF
+DWF scripts commonly fan out parallel agents that each modify files (e.g.
+fixing different modules). Without isolation they race on the working tree.
+`worktree: true` gives each agent its own checkout; the diff is captured as
+an artifact for later merge.
+### Tests (8 new, 60 total in dynamic-workflow-context.test.ts)
+- prepareAgentWorktree creates an isolated worktree
+- cleanupAgentWorktree removes dir + branch (no leak)
+- cleanupAgentWorktree captures a diff artifact when there are changes
+- prepareAgentWorktree returns undefined for non-git cwd (graceful fallback)
+- ctx.agent({worktree:true}) isolates + cleans up (mock)
+- ctx.agent({worktree:false}) uses normal cwd (backward compat)
+- ctx.agent({worktree:true}) falls back gracefully + warns in non-git cwd
+- ctx.agent({worktree:true}) cleans up even when the agent fails
+### Docs
+- `docs/dynamic-workflows.md` — worktree option in API table + example
+- `types/dwf.d.ts` — `worktree?: boolean` on AgentCallOpts
+- `package.json` — bumped 1.0.0 → 1.0.1 (patch, additive opt-in)
+### Out of scope (future rounds)
+- P2-2 VM sandbox — waiting for `isolated-vm` v1.5 (vm.createContext is not
+  a real security boundary)
+- P2-3 Resume/checkpoint — round-18 candidate (large effort)
+## [v0.9.7] — round-16 (P2-1 pipeline primitive) (2026-06-23)
+Adds **`ctx.pipeline(items, ...stages)`** — a multi-stage transform primitive for
+dynamic workflows. **Backward compatible** — existing DWF scripts are unaffected;
+`pipeline` is a new opt-in capability.
+### Feature — Pipeline primitive (P2-1)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `types/dwf.d.ts` + `docs/dynamic-workflows.md`
+Previously the DWF context only offered `ctx.fanOut()` (a single parallel map). The
+new `ctx.pipeline()` chains stages: each item flows through **all stages in sequence**
+(stage 1 → stage 2 → …), while **different items run concurrently**, bounded by the
+workflow concurrency (`mapConcurrent`, the same primitive as `fanOut`).
+Semantics (mirrors `pi-dynamic-workflows`' `pipeline()`):
+- Each stage receives `(previous, original, index)` — `previous` is the prior stage's
+  output (the raw item for the first stage), `original` is the unchanged input item.
+- A failed stage yields `null` for that item, logs `pipeline[i] failed: <msg>` via
+  `ctx.log()`, and the other items continue.
+- On **abort**, the error propagates (it is not swallowed into `null`).
+- Returns `(TResult | null)[]`, order-preserving.
+Signature:
+```ts
+ctx.pipeline<TItem, TResult = unknown>(
+  items: TItem[],
+  ...stages: Array<(previous: TResult, original: TItem, index: number) => Promise<TResult> | TResult>
+): Promise<(TResult | null)[]>;
+```
+Implementation notes:
+- Uses `mapConcurrent(items, concurrency, …)` — NOT unbounded `Promise.all` — so
+  item-level parallelism respects the workflow's configured concurrency. Stages that
+  spawn agents additionally acquire `ctx.semaphore` for agent-level throttling.
+- Validates inputs: non-array first arg → `TypeError`; non-function stage (or no
+  stages) → `TypeError`.
+- Empty items array short-circuits to `[]`.
+- Authoring types (`types/dwf.d.ts`) mirror the runtime signature for IDE IntelliSense.
+Example use case: scan → analyze → review each shard, up to `concurrency` shards at a
+time, with per-shard failure isolation.
+Tests: `test/unit/dynamic-workflow-context.test.ts` — single/multi-stage transforms,
+empty array, failed-stage isolation + logging, TypeError on bad inputs, stage-argument
+contract, async stages, and concurrency-bounded execution.
+## [v0.9.7] — round-15 (P1-4 phase UI) (2026-06-23)
+The progress pane now **renders DWF phase markers** (▶/✓/⏸) by consuming the
+`dwf.phase_started` / `dwf.phase_completed` events emitted by `ctx.phase()`
+(round-12). **Backward compatible** — non-DWF runs are unaffected.
+### Feature — Phase UI in progress-pane (P1-4)
+**Files:** `src/ui/dwf-phase-display.ts` (new) + `progress-pane.ts` +
+`run-snapshot-cache.ts` + `snapshot-types.ts`
+Previously the phase events were produced but the UI did not consume them.
+Now the progress pane shows a phase overview:
+- `▶ Phase: <name>` — currently running phase.
+- `✓ Phase: <name>` — completed phase.
+- `⏸ Phase: <name>` — a phase whose completion scrolled out of the recent-event
+  window and is not the current one.
+Implementation details:
+- New pure-function module `src/ui/dwf-phase-display.ts`:
+  `extractDwfPhaseState(events)` derives phase state from the event window
+  (returns `null` for non-DWF runs); `renderDwfPhaseLines(state, { ascii })`
+  renders markers with Unicode glyphs and ASCII fallbacks (`[>]`/`[v]`/`[ ]`).
+- `RunUiSnapshot` gains an optional `dwfPhaseState` field, computed from the
+  existing tailed `recentEvents` window (no extra I/O) in both sync and async
+  snapshot builders.
+- `progress-pane.ts` renders the phase lines right after the summary line,
+  before the task-based phase grouping. Non-DWF runs produce zero lines.
+- `signatureFor` includes `dwfPhaseState` so cache invalidation reflects phase
+  changes.
+- Tests: `test/unit/dwf-phase-display.test.ts` — phase state tracking from an
+  event sequence (incl. scrolled-off recovery), correct markers (Unicode +
+  ASCII), header gating, and non-DWF snapshots unaffected.
+## [v0.9.7] — round-14 (P1 DX + observability) (2026-06-23)
+Four additive P1 features land in this round — **authoring types**, **per-workflow
+token budget**, **log API**, and **typed args**. **Backward compatible** — existing
+DWF scripts continue to work unchanged. New behavior is opt-in.
+### Feature 1 — Authoring Types / IDE IntelliSense (P1-1)
+**Files:** `types/dwf.d.ts` (new) + `package.json` (`./workflow` export)
+A `.dwf.ts` script can now import the `WorkflowCtx` (and supporting) types from
+the package's `./workflow` export for full TypeScript IntelliSense:
+```ts
+import type { WorkflowCtx } from "pi-crew/workflow";
+export default async function run(ctx: WorkflowCtx): Promise<void> { /* ... */ }
+```
+- New file: `types/dwf.d.ts` — named exports mirroring the runtime types
+  (`WorkflowCtx`, `AgentCallOpts`, `AgentResult`, `WorkflowBudget`, ...).
+- `package.json` gains `"./workflow": { "types": "./types/dwf.d.ts" }` and ships
+  the `types/` directory.
+- New test: `test/unit/dwf-authoring-types.test.ts` — compiles a sample `.dwf.ts`
+  against the export (positive + negative `@ts-expect-error` check).
+### Feature 2 — Per-Workflow Token Budget (P1-2)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + dispatch wiring
+`ctx.budget` is a frozen `{total, spent(), remaining()}` surface. When a
+per-workflow token budget is set, `ctx.agent()` auto-rejects with `ok:false`
+(`"workflow token budget exhausted"`) **before** spawning a child worker. `spent()`
+accumulates each run's reported `usage.input + usage.output`.
+- `total` is `null` (unbounded) by default; `remaining()` is `Infinity` then.
+- New `MakeWorkflowCtxOptions.tokenBudget`, `WorkflowConfig.maxTokenBudget`,
+  `RunDynamicWorkflowInput.tokenBudget`, and the `team run` `tokenBudget` param
+  (param overrides the workflow value).
+- Budget check + accumulation wired into `ctx.agent()`; budget object passed
+  through `run.ts` and `background-runner.ts`.
+- Tests: 7 unit cases (default null, set total, spent/remaining, exhaustion,
+  accumulation from the mock's `{input:10,output:5}`, frozen check).
+### Feature 3 — Log API (P1-3)
+**Files:** `src/state/contracts.ts` + `src/runtime/dynamic-workflow-context.ts`
+`ctx.log(message)` appends a workflow-level log line: stringifies non-strings,
+keeps a bounded in-memory copy (capped at **1000**), and always emits a durable
+`dwf.log` event (`{message}`) to the run's `events.jsonl`.
+- New event type `"dwf.log"` in `TEAM_EVENT_TYPES` (non-terminal).
+- New runner-only `getWorkflowLogs(ctx)` accessor (mirrors `getWorkflowPhaseState`).
+- Tests: 4 unit cases (append, stringify, event emission, 1000-cap) + 1
+  integration case (end-to-end through `runDynamicWorkflow`).
+### Feature 4 — Typed Args (P1-5)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `src/state/types.ts` +
+`src/schema/team-tool-schema.ts` + `src/state/state-store.ts` + dispatch wiring
+`ctx.args<T>()` returns typed workflow arguments (sourced from `manifest.args`,
+passed via the run `args` param). Defaults to `{}` when unset.
+- New `TeamRunManifest.args`, `MakeWorkflowCtxOptions.args`, `createRunManifest`
+  `args` param, and the `team run` `args` schema field (`Type.Unsafe`, any JSON
+  value — avoids `any`).
+- The runner reads `manifest.args` and forwards it to `makeWorkflowCtx`.
+- Tests: 3 unit cases (default `{}`, typed object, array) + 1 integration case
+  (reads `manifest.args` end-to-end).
+### Other
+- Version bumped `0.9.7` → `0.9.8`.
+- `docs/dynamic-workflows.md`: API table rows + Log/Budget/Args/Authoring-types
+  sections.
+## [v0.9.7] — round-13 (P0 AST determinism + structured output + abort cleanup) (2026-06-23)
+Three P0 features land in this round. **Backward compatible** — existing DWF
+scripts continue to work unchanged. New behavior is opt-in via the `schema`
+field on `ctx.agent()` and an env-var escape hatch for the determinism check.
+### Feature 1 — AST Determinism Check (P0-2)
+**File:** `src/runtime/deterministic-ast.ts` (new) +
+`src/runtime/dynamic-workflow-runner.ts` (integration)
+Dynamic workflow scripts must now be **deterministic**. The runner parses each
+`.dwf.ts` with `acorn` and walks the AST, rejecting `Date.now()`,
+`Math.random()`, and `new Date()` calls before `jiti` executes the script.
+Two runs of the same script against the same inputs now produce the same
+outputs — critical for regression testing and workflow replay.
+Why AST, not regex: regex matches `Date.now()` everywhere — including string
+literals, comments, and prompt text. AST walking distinguishes **calls** from
+strings, so prompts that say *"avoid `Date.now()` in your code"* still parse
+cleanly. Other `Date.*` and `Math.*` methods (`Date.parse`, `Date.UTC`,
+`Math.floor`, `Math.max`, etc.) are accepted — only `now` and `random` are
+blocked.
+- New dep: `acorn ^8.14.0` (small, well-maintained; verified Node ≥22 ESM/strip-types compatibility)
+- New file: `src/runtime/deterministic-ast.ts` (determinism walker; MIT-licensed adaptation from pi-dynamic-workflows, attribution in `NOTICE.md`)
+- New file: `test/unit/deterministic-ast.test.ts` (27 cases: accepts/rejects every form, comments, template literals, computed properties, parse-error delegation)
+- New tests in `test/integration/dwf-setresult.test.ts` (5 end-to-end cases including env-var opt-out)
+**Escape hatch:** `PI_CREW_DWF_SKIP_DETERMINISM_CHECK=1` bypasses the check for
+power users who legitimately need time/random (e.g. randomized benchmark
+scripts). Off by default.
+### Feature 2 — Structured Output Helper (P0-3)
+**Files:** `src/runtime/result-extractor.ts` + `src/runtime/dynamic-workflow-context.ts`
+`AgentCallOpts` gains an optional `schema?: TSchema` field (TypeBox). When set,
+`ctx.agent()` validates the extracted JSON against the schema via
+`@sinclair/typebox`'s `Value.Check`. Mismatch yields
+`{ok: false, error: "structured output does not match schema: ..."}` instead
+of an untyped `structured: { ... }` blob.
+How the runner helps the model comply:
+- Appends a JSON-output directive to the prompt.
+- Replaces the agent's system prompt suffix with a "structured-output
+  assistant" preamble that describes the schema's shape.
+When `schema` is **omitted**, behavior is byte-identical to the previous
+regex-based extractor — verified by the existing 30+ test cases plus 9 new
+schema-specific cases in `test/unit/result-extractor.test.ts` and 4 new
+end-to-end cases in `test/unit/dynamic-workflow-context.test.ts`.
+Caveat: pi-crew DWF spawns `pi` as a subprocess (`runChildPi`), not an
+in-memory `createAgentSession`. Subprocess structured output is captured via
+the same event-stream → JSON-line → schema-check pipeline used for everything
+else, so this round ships Option B (regex-extract + schema validation
+post-hoc). Option A (in-process terminating tool) is planned for round-14.
+### Feature 3 — Abort Listener Cleanup (P0-5) — NO-OP (already fixed in round 27)
+Audited `src/runtime/child-pi.ts` for AbortSignal listener leaks. The fix was
+landed in round 27 (BUG 4): both the `onParentAbort` flag handler and the
+`abort` cancellation handler are now removed inside `settle()` regardless of
+the exit path (normal completion, response timeout, hard kill, parent abort,
+forced final drain). On runs with >10 tasks sharing one AbortSignal (the
+common pattern under `background-runner`), this prevents the
+`MaxListenersExceededWarning` and per-task closure capture that previously
+pinned the worker stack frame in memory.
+No code changes needed in round 13. Documented for the audit trail.
+### Verification
+- `npm run typecheck` — clean
+- `npx tsc --noEmit` — clean
+- 31 new unit tests across `deterministic-ast.test.ts` (27) and 9 schema-validation cases in `result-extractor.test.ts` and 4 new cases in `dynamic-workflow-context.test.ts` — all pass
+- 5 new integration tests in `dwf-setresult.test.ts` — all pass
+- 0 regressions in the existing 4 round-12 tests
+## [v0.9.7] — round-12: DWF phases + structured-clone guard (2026-06-23)
+Two additive P0 features for dynamic-workflow (DWF) scripts, both fully
+backward-compatible (existing scripts continue to work unchanged). Researched
+and adopted from the public `pi-dynamic-workflows` (Michaelliv/v1.0.1)
+package — full comparison and adoption plan in
+`.crew/artifacts/team_20260623095016_b693d3f967f88048/shared/06_synthesize.md`.
+### Feature 1: `ctx.phase(title)` runtime phase API (P0-1)
+`WorkflowCtx` gains a new `phase(title: string): void` method. The orphan
+`dwf.phase_started` / `dwf.phase_completed` event types — declared in
+`src/state/contracts.ts:89-93` since v0.9.0 but never produced by any
+producer — finally have a producer. Use cases:
+- Group `ctx.agent()` calls under logical phases (e.g. "Scan", "Audit",
+  "Review") so downstream UI and log readers can group by phase.
+- Emit a clear phase boundary to the run's `events.jsonl` without writing
+  custom event-log code.
+- Drive live progress reporting from the script itself.
+Semantics:
+- Validates `title` is a non-empty string (throws `TypeError` otherwise).
+- Idempotent: calling `ctx.phase("Scan")` twice does not emit a duplicate
+  event or change state.
+- When a previous phase is still open, emits `dwf.phase_completed` for it
+  **before** emitting `dwf.phase_started` for the new one (consumers never
+  see two open phases at once).
+- The in-memory `phases[]` list (read-only via `getWorkflowPhaseState`,
+  mirrors the `__finalResult` non-enumerable getter pattern) is deduped and
+  capped at **100 distinct titles** to bound memory. Events still flow
+  past the cap — the events log is the durable source of truth.
+- The runner **auto-closes the last open phase** before emitting
+  `dwf.completed`, so a script that ends mid-phase still produces a
+  well-formed event sequence.
+**Files changed:**
+- `src/runtime/dynamic-workflow-context.ts` — interface, implementation,
+  `__phaseState` getter, `getWorkflowPhaseState` helper
+- `src/runtime/dynamic-workflow-runner.ts` — auto-close on completion
+### Feature 2: structured-clone guard at the runner boundary (P0-4)
+Defensive `assertStructuredCloneable(value, name)` helper applied to the
+final artifact content and `manifest.summary` before they reach
+`writeArtifact` and the run-event-bus emitter. Today this is mostly
+future-proofing (the artifact file is read as a string, and strings are
+always structured-cloneable), but the guard surfaces a clear, actionable
+error pointing at the most common cause — forgetting `await` on
+`ctx.agent()` / `ctx.review()` — instead of letting a cryptic
+`DataCloneError` leak from deep inside the artifact store.
+**Files changed:**
+- `src/runtime/dynamic-workflow-runner.ts` — `assertStructuredCloneable`
+  helper, applied to `finalText` and `summaryText` (slice)
+### Tests
+- 7 new unit tests in `test/unit/dynamic-workflow-context.test.ts`
+  (emission, idempotency, validation, sequence, helper, dedup, 100-cap).
+- 1 new integration test in `test/integration/dwf-setresult.test.ts`
+  (end-to-end phase event sequence, including runner auto-close).
+- All 23 existing DWF unit tests still pass; both pre-existing integration
+  tests still pass.
+### Docs
+- `docs/dynamic-workflows.md` — updated WorkflowCtx example to use
+  `ctx.phase("Scan")` / `ctx.phase("Audit")`; added a `ctx.phase` row to
+  the API table; added a "Phases (round-12)" subsection explaining
+  semantics, idempotency, and the 100-cap.
+### Out of scope (planned for future rounds)
+- AST determinism check (P0-2)
+- Structured output helper (P0-3)
+- Abort listener cleanup pattern (P0-5)
+- Authoring types / IDE IntelliSense (P1-1)
+- Token budget (P1-2)
+- Phase UI in `progress-pane` (P1-4)
+- Pipeline primitive (P2-1)
+- `isolated-vm` sandbox (P2-2, planned for v1.5)
+## [v0.9.5] — fix "team run hangs forever at 25%" (2026-06-23)
+Two coupled runtime bugs caused the recurring "run stuck at 25% (1/4)" failure
+observed across 4+ consecutive review/fast-fix runs. Both are now fixed; full
+diagnostics (background.log, events.jsonl, heartbeat.json) are preserved for
+all runs.
+### Bug X — `purgeStaleActiveRunIndex` destroyed the run's stateRoot (proximate cause)
+**File:** `src/runtime/crash-recovery.ts`
+**What was wrong:** `purgeStaleActiveRunIndex` decided whether a run was
+"orphaned" using `entry.updatedAt`, which is **frozen at registration** and
+never refreshed during execution. A long-running legitimate async run whose
+background worker had exited (e.g. after a 5–15 min explorer) would have its
+entire durable state (manifest/tasks/events/heartbeat) hard-deleted. Because
+`saveRunTasks()` silently no-ops once the state dir is missing, the workflow
+could never advance past the current task → **permanent invisible hang**
+("Run not found"), with all diagnostics lost.
+**Fix:**
+- Liveness now corroborated via (a) the on-disk `manifest.updatedAt` (rewritten
+  on every task transition) and (b) the team-level `heartbeat.json` mtime —
+  any one of which is sufficient to declare the run live.
+- Cancelling a run now **keeps its stateRoot** so the run stays queryable and
+  resumable, and its diagnostics survive. The finished-run pruner removes the
+  directory later on its normal schedule.
+- Removed two redundant `saveRunManifest(fullLoaded.manifest)` calls that
+  were clobbering the freshly-saved `cancelled` status back to `running`.
+**New regression test:** `test/unit/crash-recovery-purge-liveness.test.ts`
+(3 cases: fresh manifest kept, orphan cancelled-but-preserved, fresh
+heartbeat kept — all using a live-worker-then-reap + `now`-time-shift
+harness to deterministically simulate the registration-then-aging race).
+### Bug Y — background runner crashed with EPIPE on the first post-detach `console.debug` (root cause)
+**File:** `src/runtime/background-runner.ts`
+**What was wrong:** The in-process console redirect only covered `console.log`
+and `console.error`; `console.debug` and `console.warn` still wrote to the
+original stdout/stderr pipes. The background runner is spawned with
+`detached:true` + `setsid:true`, so the parent disconnects the stdio pipes
+immediately after spawn. The first post-detach `console.debug` call from
+`team-runner.ts:242` (inside `mergeTaskUpdatesPreservingTerminal` →
+"Skipping stale merge") hit the closed stdout → unhandled `EPIPE` error →
+**process exit** → scheduler dead → run stuck at 25% forever.
+Prior investigators saw only "the run died silently right after explorer
+completed" and concluded (incorrectly) that the cause was a native crash
+(SIGKILL/segfault/V8 heap-OOM), because their [DIAG] handlers never fired.
+In reality the diagnostic handlers DID fire — but on a `EPIPE` write error,
+which `process.on('error')` doesn't catch. The fix below makes the crash
+observable AND non-fatal.
+**Fix:**
+- Extend the console redirect to also cover `console.debug` and `console.warn`,
+  so they go to the log file (logFd) instead of the disconnected stdio pipes.
+- Wrap the `fs.writeSync` in try-catch so any log-write failure (closed fd,
+  ENOSPC, etc.) can never crash the scheduler. The scheduler log is
+  best-effort by design.
+**New regression test:** `test/unit/background-runner-console-redirect.test.ts`
+(4 cases: undefined logFd no-op, valid logFd writes correctly, EBADF on
+closed logFd is swallowed, post-undefined fd-toggle is safe). Replicates the
+`origWrite` pattern from the source so any drift between the two is easy to
+spot.
+### Why this took multiple attempts
+All prior attempts to diagnose the hang destroyed the only evidence (the
+stateRoot) the moment the `purgeStaleActiveRunIndex` heuristic misfired.
+The chain was always the same: a worker exits for any reason → purge sees
+dead PID + frozen-stale entry → **deletes stateRoot** → the run becomes
+"Run not found" with no log, no events, no heartbeat, no way to even resume.
+That hid the real cause (Bug Y) for the entire series of failed diagnostic
+runs. With Bug X fixed, the diagnostic trail (background.log 345 KB +
+events.jsonl 166 KB) survives long enough to read the actual EPIPE crash
+that Bug Y left behind.
+### Verification
+- 7/7 new regression tests pass (`crash-recovery-purge-liveness.test.ts` +
+  `background-runner-console-redirect.test.ts`).
+- Existing crash-recovery / active-run-registry / stale-reconciler /
+  async-stale / run-accumulation / auto-recovery suites: 71/71 pass.
+- End-to-end: a 4-step review run now advances 3/4 tasks (75%) instead of
+  hanging at 25%; the verify step that would have failed earlier now fails
+  only for environmental reasons (memory OOM under load), not the fix.
+- `npx tsc --noEmit` is green.
+### Notes for users
+If you have a stuck "running" run from v0.9.4 or earlier (the symptom was
+"Run not found" / "25% hang" / "had to kill pi"), upgrading alone will not
+recover it — its `stateRoot` was already destroyed by the buggy purge.
+Re-dispatch the workflow. New runs are fully protected.
 ## [v0.9.4] — fix macOS CI: benchmark allowlist + cross-platform fixtures (2026-06-23)
 Patch fix for a CI failure introduced in v0.9.3 (caught by the macOS CI job,

package/README.md CHANGED Viewed

@@ -39,13 +39,65 @@ npm: pi-crew
 repo: https://github.com/baphuongna/pi-crew
 ```
-**v0.9.0**: See [CHANGELOG.md](CHANGELOG.md).
+**v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
-### Highlights (v0.6.4 → v0.9.0)
+### Highlights (v0.6.4 → v0.9.5)
 A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
 trust and cliff-resilience, stay lean, delete before adding.*
+#### v0.9.5 — fix "team run hangs forever at 25%" (2026-06-23)
+Two coupled runtime bugs caused recurring "run stuck at 25% (1/4)" failures
+across 4+ consecutive review/fast-fix runs. The combined symptom: scheduler
+appears to stop responding right after the first task (explorer) finishes, no
+progress to task 2, and `team action='status'` returns "Run not found" with
+**no diagnostic trail** to investigate. Manual `kill` of the parent `pi`
+process was the only workaround.
+- **🩹 Bug X (proximate cause)** — `purgeStaleActiveRunIndex`
+  (`src/runtime/crash-recovery.ts`) destroyed a run's `stateRoot` based on a
+  **frozen** `entry.updatedAt` (set once at registration, never refreshed).
+  Any long-running legitimate async run (≥5 min) whose worker had exited
+  lost its entire durable state. `saveRunTasks()` then silently no-op'd on
+  the missing dir, and the workflow could never advance. Fix: corroborate
+  liveness via the on-disk `manifest.updatedAt` AND the team-level
+  `heartbeat.json`; keep `stateRoot` on cancel so runs stay queryable and
+  resumable.
+- **🩹 Bug Y (root cause — why the scheduler died in the first place)** —
+  `src/runtime/background-runner.ts` redirected only `console.log` /
+  `console.error` to the log file. The first post-detach `console.debug`
+  call from `team-runner.ts:242` (inside `mergeTaskUpdatesPreservingTerminal`
+  → "Skipping stale merge") hit the disconnected stdout pipe → unhandled
+  `EPIPE` → process exit. Prior investigators concluded (incorrectly) that
+  the cause was a native crash, because diagnostic `[DIAG]` handlers never
+  fired on the EPIPE. Fix: extend the console redirect to `console.debug` /
+  `console.warn`, and wrap `fs.writeSync` in try-catch so any log-write
+  failure can never crash the scheduler.
+- **🧪 Regression coverage** — 7 new tests: 3 in
+  `test/unit/crash-recovery-purge-liveness.test.ts` (fresh-manifest-kept,
+  orphan-cancelled-preserved, fresh-heartbeat-kept) + 4 in
+  `test/unit/background-runner-console-redirect.test.ts` (drift-detector
+  pattern that exercises undefined / valid / EBADF / post-toggle logFd).
+- **📖 See [CHANGELOG.md](CHANGELOG.md) for full details**, including
+  why prior attempts to diagnose the hang kept destroying the only
+  evidence (Bug X nuked the stateRoot before anyone could read the EPIPE
+  crash in Bug Y).
+> **Recovering a stuck run from v0.9.4 or earlier:** the `stateRoot` for
+> those runs is already gone. Re-dispatch the workflow — new runs are
+> fully protected.
+#### v0.9.4 — macOS CI fixture (2026-06-23)
+- **🧪 BSD-vs-GNU grep fix** — benchmark test fixtures used
+  `grep --help` (exits 0 on GNU/Linux, exits 2 on BSD/macOS). Switched
+  the exit-0 fixture to `echo ok`; the not-in-allowlist fixture is now
+  `ls`. CI matrix is now green on all 3 OSes.
+- **📌 Process note** — this release re-commits to: **tag/publish ONLY
+  after the full OS matrix CI is green.** v0.9.3 was published mid-CI-run
+  (the macOS job hadn't finished); the package itself was correct (the
+  broken file is test-only and not shipped), but the repo CI went red.
+  v0.9.4 restores green CI. v0.9.5 follows the same discipline.
 #### v0.9.0 — goal loops + dynamic workflows (2026-06-18)
 Two new features, both modeled on Claude Code, built on a shared `runKind`
 background-dispatch discriminator.
@@ -145,7 +197,7 @@ background-dispatch discriminator.
 - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
 - **Health scoring** — penalty-based run health with time-series snapshots
 - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
-- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
 ---