pi-crew 0.9.4 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +592 -0
  2. package/README.md +55 -3
  3. package/docs/HARNESS_BACKLOG.md +51 -3
  4. package/docs/dynamic-workflows.md +315 -2
  5. package/docs/fix-plan-disabletools-exit-null.md +219 -0
  6. package/docs/troubleshooting.md +102 -0
  7. package/package.json +8 -2
  8. package/src/extension/command-completions.ts +1 -0
  9. package/src/extension/crew-shortcuts.ts +1 -0
  10. package/src/extension/register.ts +2 -0
  11. package/src/extension/registration/commands.ts +3 -0
  12. package/src/extension/team-tool/doctor.ts +14 -0
  13. package/src/extension/team-tool/goal.ts +1 -0
  14. package/src/extension/team-tool/run.ts +4 -0
  15. package/src/runtime/background-runner.ts +24 -2
  16. package/src/runtime/chain-runner.ts +1 -0
  17. package/src/runtime/child-pi.ts +101 -10
  18. package/src/runtime/crash-recovery.ts +78 -36
  19. package/src/runtime/deterministic-ast.ts +161 -0
  20. package/src/runtime/dwf-state-store.ts +97 -0
  21. package/src/runtime/dynamic-workflow-context.ts +381 -7
  22. package/src/runtime/dynamic-workflow-runner.ts +94 -2
  23. package/src/runtime/goal-loop-runner.ts +2 -0
  24. package/src/runtime/live-session-runtime.ts +1 -0
  25. package/src/runtime/model-scope.ts +1 -0
  26. package/src/runtime/peer-dep.ts +1 -0
  27. package/src/runtime/pi-args.ts +11 -0
  28. package/src/runtime/resilient-edit.ts +1 -0
  29. package/src/runtime/result-extractor.ts +72 -7
  30. package/src/runtime/task-runner.ts +1 -0
  31. package/src/runtime/team-runner.ts +8 -3
  32. package/src/runtime/zombie-scanner.ts +297 -0
  33. package/src/schema/team-tool-schema.ts +28 -0
  34. package/src/state/contracts.ts +1 -0
  35. package/src/state/hook-instinct-bridge.ts +3 -0
  36. package/src/state/state-store.ts +3 -0
  37. package/src/state/types.ts +9 -0
  38. package/src/ui/dashboard-panes/progress-pane.ts +5 -0
  39. package/src/ui/dwf-phase-display.ts +151 -0
  40. package/src/ui/run-snapshot-cache.ts +4 -0
  41. package/src/ui/snapshot-types.ts +3 -0
  42. package/src/utils/bm25-search.ts +2 -0
  43. package/src/workflows/workflow-config.ts +3 -0
  44. package/src/worktree/worktree-manager.ts +94 -0
  45. package/types/dwf.d.ts +187 -0
package/CHANGELOG.md CHANGED
@@ -1,5 +1,597 @@
1
1
  # Changelog
2
2
 
3
+ ## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
4
+
5
+ P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
6
+ script crashes (timeout, OOM, agent error) between `ctx.agent()` calls, the runner now
7
+ persists a checkpoint after every agent call so `team action='resume' runId='X'` can
8
+ continue from the last checkpoint instead of re-running from scratch. **Backward
9
+ compatible** — fresh runs (no checkpoint) behave exactly as before.
10
+
11
+ ### Implementation
12
+
13
+ **New file** `src/runtime/dwf-state-store.ts` (`DwfStore`):
14
+ - Atomic CRUD for a single run's DWF checkpoint, modeled on `GoalStore` /
15
+ `FileCheckpointStore`.
16
+ - Persists `DwfCheckpointState` (vars, phases, currentPhase, logs, spent, agentCount,
17
+ updatedAt) to `<stateRoot>/dwf-checkpoint.json` via `atomicWriteJson`.
18
+ - `load()` returns `undefined` for a missing/corrupt checkpoint (fresh run); `delete()`
19
+ is best-effort and never throws.
20
+
21
+ **`ctx.agent()` checkpoint hook** in `src/runtime/dynamic-workflow-context.ts`:
22
+ - New `MakeWorkflowCtxOptions.onCheckpoint?: (state) => void` — invoked after each
23
+ `ctx.agent()` call (success OR fail) so a crash between calls leaves durable state.
24
+ - New `MakeWorkflowCtxOptions.resumedState?` — hydrates `ctx.vars`, phase state, logs,
25
+ `budget.spent()`, and `agentCount` from the checkpoint on resume.
26
+ - New closure counter `agentCount` (incremented in `agent()`'s `finally`), exposed via a
27
+ non-enumerable `__agentCount` getter.
28
+ - New `getWorkflowCheckpoint(ctx)` helper (mirror of `getWorkflowPhaseState`).
29
+
30
+ **Runner wiring** in `src/runtime/dynamic-workflow-runner.ts`:
31
+ - On run start: `DwfStore.load()` → hydrate ctx (`resumedState`) + emit `dwf.resumed`.
32
+ - `onCheckpoint` → `DwfStore.save()` (best-effort, errors swallowed).
33
+ - On clean completion: `DwfStore.delete()` so a re-run starts fresh.
34
+
35
+ ### Resume semantics
36
+
37
+ `team action='resume' runId='X'` re-dispatches with `runKind='dynamic-workflow'`. The
38
+ runner loads the checkpoint, hydrates `ctx.vars`/phases/logs, and re-executes the
39
+ script from the top. Scripts SHOULD be written defensively — check `ctx.vars.lastPhase`
40
+ to skip completed work (documented in `docs/dynamic-workflows.md`). No partial-resume of
41
+ a single agent call (it re-runs from scratch); checkpoints are written AFTER an agent
42
+ completes, never before.
43
+
44
+ ### Tests (14 new)
45
+ - `test/unit/dwf-state-store.test.ts` (10): save/load round-trip, missing→undefined,
46
+ delete, corrupt-file resilience, path layout, dir creation, large-state preservation.
47
+ - `test/unit/dynamic-workflow-context.test.ts` (+8): onCheckpoint fires on success/fail,
48
+ agentCount accumulation, backward-compat (no callback), resumedState hydration,
49
+ shallow-copy isolation, getWorkflowCheckpoint snapshot.
50
+ - `test/integration/dwf-setresult.test.ts` (+4): fresh run (no resumed event),
51
+ completed run (checkpoint deleted), resume (hydration + dwf.resumed + delete),
52
+ corrupt checkpoint treated as fresh run.
53
+
54
+ ### Docs
55
+ - `docs/dynamic-workflows.md` — new "Resume & Checkpoint (round-18 P2-3)" section +
56
+ defensive-script example.
57
+ - `types/dwf.d.ts` — resume pattern documented in header + `ctx.vars` JSDoc.
58
+ - `package.json` — bumped 1.0.1 → 1.1.0 (minor — new opt-in capability).
59
+
60
+ ### Out of scope (future rounds)
61
+ - P2-2 VM sandbox — still waiting for `isolated-vm` v1.5 (vm.createContext is not a
62
+ real security boundary). This was the LAST P2 item.
63
+
64
+ ### Process-safety follow-up fix (same release)
65
+
66
+ A heuristic-based zombie "cleanup" had killed a live interactive main `pi`
67
+ session by accident (uptime/RSS/orphan heuristics match a main session just
68
+ as readily as a real orphaned sub-agent). Fixed authoritatively:
69
+
70
+ - **`PI_CREW_KIND=subagent`** env marker, set by `buildPiWorkerArgs`
71
+ (`src/runtime/pi-args.ts`) on every child-pi spawn. A main session does NOT
72
+ carry it, so it can never be matched as a sub-agent. (An earlier draft also
73
+ added a `--crew-subagent` argv flag — removed because pi's strict option
74
+ parser rejects unknown flags and exits non-zero, which silently broke every
75
+ `ctx.agent()` call. The env var alone is the authoritative signal.)
76
+ - **`src/runtime/zombie-scanner.ts`** (new): read-only scanner that matches
77
+ ONLY processes with `PI_CREW_KIND=subagent` AND a dead `PI_CREW_PARENT_PID`.
78
+ Never matches a main session. Never kills.
79
+ - **`team action='doctor' focus='zombies'`**: renders the safe scan as a
80
+ human-readable report (zombies vs live sub-agents, with explicit do-not-kill
81
+ labelling for live parents).
82
+ - **`PI_CREW_KIND`** added to the env allowlist in `child-pi.ts`.
83
+ - **`docs/troubleshooting.md`** + **`.crew/knowledge.md`**: documented the
84
+ marker + the read-only rule so future agents never repeat the mistake.
85
+ - 8 new unit tests in `test/unit/zombie-scanner.test.ts`, including a
86
+ regression test asserting the current (main-session) process is never matched.
87
+
88
+ ### Real-world smoke testing findings (2026-06-24)
89
+
90
+ Three bugs were caught by real `team action='run'` smoke tests that the unit
91
+ suite missed (units don't shell out to the real `pi` binary). **All three are
92
+ now fixed.**
93
+
94
+ - **Fixed: `--crew-subagent` argv flag broke every `ctx.agent()` call.** Pi's
95
+ strict option parser exits non-zero on unknown flags. The marker is now the
96
+ `PI_CREW_KIND=subagent` ENV var only.
97
+ - **Fixed: `ctx.agent({schema, systemPrompt})` silently dropped `systemPrompt`.**
98
+ The round-13 schema branch used the resolved role persona as the base for the
99
+ JSON-output instruction, ignoring the caller's explicit persona. Models then
100
+ returned prose and failed schema validation. Fix: `call.systemPrompt` is now
101
+ preferred as the base when both are set.
102
+ - **Fixed: `ctx.agent({disableTools: true, maxTurns: 1})` returned `exit null`.**
103
+ Root cause (found via Phase-0 diagnostic instrumentation) was NOT the
104
+ final-drain race originally hypothesised — it was an erroneous
105
+ `killProcessTree` call in the steer-injection path. When `maxTurns` was
106
+ reached on a `turn_end` event, the code wrote a "wrap up" steer to
107
+ `child.stdin`; Node's `writable.write()` returns `false` on normal
108
+ backpressure, which the code mis-treated as a fatal injection failure and
109
+ killed the worker mid-answer (answer was in stdout; exit came back `null`).
110
+ The `disableTools` correlation was a red herring — the real trigger was
111
+ `maxTurns:1` hitting on the first turn. Fix: steer injection is now
112
+ ADVISORY — a `write() === false` or non-writable stdin is logged, not fatal;
113
+ the hard-abort at `maxTurns + graceTurns` remains the safety net for genuine
114
+ runaways. Verified: maxTurns=1 × 10 real-binary runs now 10/10 exit=0 (was
115
+ ~60% fail pre-fix). Regression guard: `test/unit/child-pi-steer-backpressure.test.ts`
116
+ (source-contract checks + opt-in real-binary smoke via `PI_CREW_SMOKE=1`).
117
+
118
+ ## [v0.9.7] — round-17 (P2-4 worktree isolation per agent) (2026-06-23)
119
+
120
+ P2-4 feature: `ctx.agent({worktree: true})` spawns the agent in an isolated
121
+ git worktree so parallel file-modifying agents don't conflict. Fully backward
122
+ compatible — `worktree` defaults to false, so existing calls are unchanged.
123
+
124
+ ### Implementation
125
+
126
+ **New helpers** in `src/worktree/worktree-manager.ts`:
127
+ - `prepareAgentWorktree(manifest, opts)` — creates a worktree from HEAD on a
128
+ unique branch; returns `{path, branch}` or `undefined` when the cwd is not a
129
+ git repo (graceful fallback).
130
+ - `cleanupAgentWorktree(manifest, worktreePath, branch?)` — removes the
131
+ worktree dir + branch, and captures a git diff as a side artifact when the
132
+ worktree has changes (for audit/merge).
133
+
134
+ **`ctx.agent()` integration** in `src/runtime/dynamic-workflow-context.ts`:
135
+ - New `worktree?: boolean` field on `AgentCallOpts`.
136
+ - When true, the agent runs with the worktree path as its cwd.
137
+ - Cleanup always runs (success, failure, or agent error) — no worktree leaks.
138
+ - Non-git cwd or creation failure → fall back to normal cwd + `ctx.log()`
139
+ warning; the agent still runs.
140
+
141
+ ### Why worktrees for DWF
142
+
143
+ DWF scripts commonly fan out parallel agents that each modify files (e.g.
144
+ fixing different modules). Without isolation they race on the working tree.
145
+ `worktree: true` gives each agent its own checkout; the diff is captured as
146
+ an artifact for later merge.
147
+
148
+ ### Tests (8 new, 60 total in dynamic-workflow-context.test.ts)
149
+ - prepareAgentWorktree creates an isolated worktree
150
+ - cleanupAgentWorktree removes dir + branch (no leak)
151
+ - cleanupAgentWorktree captures a diff artifact when there are changes
152
+ - prepareAgentWorktree returns undefined for non-git cwd (graceful fallback)
153
+ - ctx.agent({worktree:true}) isolates + cleans up (mock)
154
+ - ctx.agent({worktree:false}) uses normal cwd (backward compat)
155
+ - ctx.agent({worktree:true}) falls back gracefully + warns in non-git cwd
156
+ - ctx.agent({worktree:true}) cleans up even when the agent fails
157
+
158
+ ### Docs
159
+ - `docs/dynamic-workflows.md` — worktree option in API table + example
160
+ - `types/dwf.d.ts` — `worktree?: boolean` on AgentCallOpts
161
+ - `package.json` — bumped 1.0.0 → 1.0.1 (patch, additive opt-in)
162
+
163
+ ### Out of scope (future rounds)
164
+ - P2-2 VM sandbox — waiting for `isolated-vm` v1.5 (vm.createContext is not
165
+ a real security boundary)
166
+ - P2-3 Resume/checkpoint — round-18 candidate (large effort)
167
+
168
+ ## [v0.9.7] — round-16 (P2-1 pipeline primitive) (2026-06-23)
169
+
170
+ Adds **`ctx.pipeline(items, ...stages)`** — a multi-stage transform primitive for
171
+ dynamic workflows. **Backward compatible** — existing DWF scripts are unaffected;
172
+ `pipeline` is a new opt-in capability.
173
+
174
+ ### Feature — Pipeline primitive (P2-1)
175
+
176
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + `types/dwf.d.ts` + `docs/dynamic-workflows.md`
177
+
178
+ Previously the DWF context only offered `ctx.fanOut()` (a single parallel map). The
179
+ new `ctx.pipeline()` chains stages: each item flows through **all stages in sequence**
180
+ (stage 1 → stage 2 → …), while **different items run concurrently**, bounded by the
181
+ workflow concurrency (`mapConcurrent`, the same primitive as `fanOut`).
182
+
183
+ Semantics (mirrors `pi-dynamic-workflows`' `pipeline()`):
184
+
185
+ - Each stage receives `(previous, original, index)` — `previous` is the prior stage's
186
+ output (the raw item for the first stage), `original` is the unchanged input item.
187
+ - A failed stage yields `null` for that item, logs `pipeline[i] failed: <msg>` via
188
+ `ctx.log()`, and the other items continue.
189
+ - On **abort**, the error propagates (it is not swallowed into `null`).
190
+ - Returns `(TResult | null)[]`, order-preserving.
191
+
192
+ Signature:
193
+
194
+ ```ts
195
+ ctx.pipeline<TItem, TResult = unknown>(
196
+ items: TItem[],
197
+ ...stages: Array<(previous: TResult, original: TItem, index: number) => Promise<TResult> | TResult>
198
+ ): Promise<(TResult | null)[]>;
199
+ ```
200
+
201
+ Implementation notes:
202
+
203
+ - Uses `mapConcurrent(items, concurrency, …)` — NOT unbounded `Promise.all` — so
204
+ item-level parallelism respects the workflow's configured concurrency. Stages that
205
+ spawn agents additionally acquire `ctx.semaphore` for agent-level throttling.
206
+ - Validates inputs: non-array first arg → `TypeError`; non-function stage (or no
207
+ stages) → `TypeError`.
208
+ - Empty items array short-circuits to `[]`.
209
+ - Authoring types (`types/dwf.d.ts`) mirror the runtime signature for IDE IntelliSense.
210
+
211
+ Example use case: scan → analyze → review each shard, up to `concurrency` shards at a
212
+ time, with per-shard failure isolation.
213
+
214
+ Tests: `test/unit/dynamic-workflow-context.test.ts` — single/multi-stage transforms,
215
+ empty array, failed-stage isolation + logging, TypeError on bad inputs, stage-argument
216
+ contract, async stages, and concurrency-bounded execution.
217
+
218
+ ## [v0.9.7] — round-15 (P1-4 phase UI) (2026-06-23)
219
+
220
+ The progress pane now **renders DWF phase markers** (▶/✓/⏸) by consuming the
221
+ `dwf.phase_started` / `dwf.phase_completed` events emitted by `ctx.phase()`
222
+ (round-12). **Backward compatible** — non-DWF runs are unaffected.
223
+
224
+ ### Feature — Phase UI in progress-pane (P1-4)
225
+
226
+ **Files:** `src/ui/dwf-phase-display.ts` (new) + `progress-pane.ts` +
227
+ `run-snapshot-cache.ts` + `snapshot-types.ts`
228
+
229
+ Previously the phase events were produced but the UI did not consume them.
230
+ Now the progress pane shows a phase overview:
231
+
232
+ - `▶ Phase: <name>` — currently running phase.
233
+ - `✓ Phase: <name>` — completed phase.
234
+ - `⏸ Phase: <name>` — a phase whose completion scrolled out of the recent-event
235
+ window and is not the current one.
236
+
237
+ Implementation details:
238
+
239
+ - New pure-function module `src/ui/dwf-phase-display.ts`:
240
+ `extractDwfPhaseState(events)` derives phase state from the event window
241
+ (returns `null` for non-DWF runs); `renderDwfPhaseLines(state, { ascii })`
242
+ renders markers with Unicode glyphs and ASCII fallbacks (`[>]`/`[v]`/`[ ]`).
243
+ - `RunUiSnapshot` gains an optional `dwfPhaseState` field, computed from the
244
+ existing tailed `recentEvents` window (no extra I/O) in both sync and async
245
+ snapshot builders.
246
+ - `progress-pane.ts` renders the phase lines right after the summary line,
247
+ before the task-based phase grouping. Non-DWF runs produce zero lines.
248
+ - `signatureFor` includes `dwfPhaseState` so cache invalidation reflects phase
249
+ changes.
250
+ - Tests: `test/unit/dwf-phase-display.test.ts` — phase state tracking from an
251
+ event sequence (incl. scrolled-off recovery), correct markers (Unicode +
252
+ ASCII), header gating, and non-DWF snapshots unaffected.
253
+
254
+ ## [v0.9.7] — round-14 (P1 DX + observability) (2026-06-23)
255
+
256
+ Four additive P1 features land in this round — **authoring types**, **per-workflow
257
+ token budget**, **log API**, and **typed args**. **Backward compatible** — existing
258
+ DWF scripts continue to work unchanged. New behavior is opt-in.
259
+
260
+ ### Feature 1 — Authoring Types / IDE IntelliSense (P1-1)
261
+
262
+ **Files:** `types/dwf.d.ts` (new) + `package.json` (`./workflow` export)
263
+
264
+ A `.dwf.ts` script can now import the `WorkflowCtx` (and supporting) types from
265
+ the package's `./workflow` export for full TypeScript IntelliSense:
266
+
267
+ ```ts
268
+ import type { WorkflowCtx } from "pi-crew/workflow";
269
+ export default async function run(ctx: WorkflowCtx): Promise<void> { /* ... */ }
270
+ ```
271
+
272
+ - New file: `types/dwf.d.ts` — named exports mirroring the runtime types
273
+ (`WorkflowCtx`, `AgentCallOpts`, `AgentResult`, `WorkflowBudget`, ...).
274
+ - `package.json` gains `"./workflow": { "types": "./types/dwf.d.ts" }` and ships
275
+ the `types/` directory.
276
+ - New test: `test/unit/dwf-authoring-types.test.ts` — compiles a sample `.dwf.ts`
277
+ against the export (positive + negative `@ts-expect-error` check).
278
+
279
+ ### Feature 2 — Per-Workflow Token Budget (P1-2)
280
+
281
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + dispatch wiring
282
+
283
+ `ctx.budget` is a frozen `{total, spent(), remaining()}` surface. When a
284
+ per-workflow token budget is set, `ctx.agent()` auto-rejects with `ok:false`
285
+ (`"workflow token budget exhausted"`) **before** spawning a child worker. `spent()`
286
+ accumulates each run's reported `usage.input + usage.output`.
287
+
288
+ - `total` is `null` (unbounded) by default; `remaining()` is `Infinity` then.
289
+ - New `MakeWorkflowCtxOptions.tokenBudget`, `WorkflowConfig.maxTokenBudget`,
290
+ `RunDynamicWorkflowInput.tokenBudget`, and the `team run` `tokenBudget` param
291
+ (param overrides the workflow value).
292
+ - Budget check + accumulation wired into `ctx.agent()`; budget object passed
293
+ through `run.ts` and `background-runner.ts`.
294
+ - Tests: 7 unit cases (default null, set total, spent/remaining, exhaustion,
295
+ accumulation from the mock's `{input:10,output:5}`, frozen check).
296
+
297
+ ### Feature 3 — Log API (P1-3)
298
+
299
+ **Files:** `src/state/contracts.ts` + `src/runtime/dynamic-workflow-context.ts`
300
+
301
+ `ctx.log(message)` appends a workflow-level log line: stringifies non-strings,
302
+ keeps a bounded in-memory copy (capped at **1000**), and always emits a durable
303
+ `dwf.log` event (`{message}`) to the run's `events.jsonl`.
304
+
305
+ - New event type `"dwf.log"` in `TEAM_EVENT_TYPES` (non-terminal).
306
+ - New runner-only `getWorkflowLogs(ctx)` accessor (mirrors `getWorkflowPhaseState`).
307
+ - Tests: 4 unit cases (append, stringify, event emission, 1000-cap) + 1
308
+ integration case (end-to-end through `runDynamicWorkflow`).
309
+
310
+ ### Feature 4 — Typed Args (P1-5)
311
+
312
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + `src/state/types.ts` +
313
+ `src/schema/team-tool-schema.ts` + `src/state/state-store.ts` + dispatch wiring
314
+
315
+ `ctx.args<T>()` returns typed workflow arguments (sourced from `manifest.args`,
316
+ passed via the run `args` param). Defaults to `{}` when unset.
317
+
318
+ - New `TeamRunManifest.args`, `MakeWorkflowCtxOptions.args`, `createRunManifest`
319
+ `args` param, and the `team run` `args` schema field (`Type.Unsafe`, any JSON
320
+ value — avoids `any`).
321
+ - The runner reads `manifest.args` and forwards it to `makeWorkflowCtx`.
322
+ - Tests: 3 unit cases (default `{}`, typed object, array) + 1 integration case
323
+ (reads `manifest.args` end-to-end).
324
+
325
+ ### Other
326
+
327
+ - Version bumped `0.9.7` → `0.9.8`.
328
+ - `docs/dynamic-workflows.md`: API table rows + Log/Budget/Args/Authoring-types
329
+ sections.
330
+
331
+ ## [v0.9.7] — round-13 (P0 AST determinism + structured output + abort cleanup) (2026-06-23)
332
+
333
+ Three P0 features land in this round. **Backward compatible** — existing DWF
334
+ scripts continue to work unchanged. New behavior is opt-in via the `schema`
335
+ field on `ctx.agent()` and an env-var escape hatch for the determinism check.
336
+
337
+ ### Feature 1 — AST Determinism Check (P0-2)
338
+
339
+ **File:** `src/runtime/deterministic-ast.ts` (new) +
340
+ `src/runtime/dynamic-workflow-runner.ts` (integration)
341
+
342
+ Dynamic workflow scripts must now be **deterministic**. The runner parses each
343
+ `.dwf.ts` with `acorn` and walks the AST, rejecting `Date.now()`,
344
+ `Math.random()`, and `new Date()` calls before `jiti` executes the script.
345
+ Two runs of the same script against the same inputs now produce the same
346
+ outputs — critical for regression testing and workflow replay.
347
+
348
+ Why AST, not regex: regex matches `Date.now()` everywhere — including string
349
+ literals, comments, and prompt text. AST walking distinguishes **calls** from
350
+ strings, so prompts that say *"avoid `Date.now()` in your code"* still parse
351
+ cleanly. Other `Date.*` and `Math.*` methods (`Date.parse`, `Date.UTC`,
352
+ `Math.floor`, `Math.max`, etc.) are accepted — only `now` and `random` are
353
+ blocked.
354
+
355
+ - New dep: `acorn ^8.14.0` (small, well-maintained; verified Node ≥22 ESM/strip-types compatibility)
356
+ - New file: `src/runtime/deterministic-ast.ts` (determinism walker; MIT-licensed adaptation from pi-dynamic-workflows, attribution in `NOTICE.md`)
357
+ - New file: `test/unit/deterministic-ast.test.ts` (27 cases: accepts/rejects every form, comments, template literals, computed properties, parse-error delegation)
358
+ - New tests in `test/integration/dwf-setresult.test.ts` (5 end-to-end cases including env-var opt-out)
359
+
360
+ **Escape hatch:** `PI_CREW_DWF_SKIP_DETERMINISM_CHECK=1` bypasses the check for
361
+ power users who legitimately need time/random (e.g. randomized benchmark
362
+ scripts). Off by default.
363
+
364
+ ### Feature 2 — Structured Output Helper (P0-3)
365
+
366
+ **Files:** `src/runtime/result-extractor.ts` + `src/runtime/dynamic-workflow-context.ts`
367
+
368
+ `AgentCallOpts` gains an optional `schema?: TSchema` field (TypeBox). When set,
369
+ `ctx.agent()` validates the extracted JSON against the schema via
370
+ `@sinclair/typebox`'s `Value.Check`. Mismatch yields
371
+ `{ok: false, error: "structured output does not match schema: ..."}` instead
372
+ of an untyped `structured: { ... }` blob.
373
+
374
+ How the runner helps the model comply:
375
+
376
+ - Appends a JSON-output directive to the prompt.
377
+ - Replaces the agent's system prompt suffix with a "structured-output
378
+ assistant" preamble that describes the schema's shape.
379
+
380
+ When `schema` is **omitted**, behavior is byte-identical to the previous
381
+ regex-based extractor — verified by the existing 30+ test cases plus 9 new
382
+ schema-specific cases in `test/unit/result-extractor.test.ts` and 4 new
383
+ end-to-end cases in `test/unit/dynamic-workflow-context.test.ts`.
384
+
385
+ Caveat: pi-crew DWF spawns `pi` as a subprocess (`runChildPi`), not an
386
+ in-memory `createAgentSession`. Subprocess structured output is captured via
387
+ the same event-stream → JSON-line → schema-check pipeline used for everything
388
+ else, so this round ships Option B (regex-extract + schema validation
389
+ post-hoc). Option A (in-process terminating tool) is planned for round-14.
390
+
391
+ ### Feature 3 — Abort Listener Cleanup (P0-5) — NO-OP (already fixed in round 27)
392
+
393
+ Audited `src/runtime/child-pi.ts` for AbortSignal listener leaks. The fix was
394
+ landed in round 27 (BUG 4): both the `onParentAbort` flag handler and the
395
+ `abort` cancellation handler are now removed inside `settle()` regardless of
396
+ the exit path (normal completion, response timeout, hard kill, parent abort,
397
+ forced final drain). On runs with >10 tasks sharing one AbortSignal (the
398
+ common pattern under `background-runner`), this prevents the
399
+ `MaxListenersExceededWarning` and per-task closure capture that previously
400
+ pinned the worker stack frame in memory.
401
+
402
+ No code changes needed in round 13. Documented for the audit trail.
403
+
404
+ ### Verification
405
+
406
+ - `npm run typecheck` — clean
407
+ - `npx tsc --noEmit` — clean
408
+ - 31 new unit tests across `deterministic-ast.test.ts` (27) and 9 schema-validation cases in `result-extractor.test.ts` and 4 new cases in `dynamic-workflow-context.test.ts` — all pass
409
+ - 5 new integration tests in `dwf-setresult.test.ts` — all pass
410
+ - 0 regressions in the existing 4 round-12 tests
411
+
412
+
413
+ ## [v0.9.7] — round-12: DWF phases + structured-clone guard (2026-06-23)
414
+
415
+ Two additive P0 features for dynamic-workflow (DWF) scripts, both fully
416
+ backward-compatible (existing scripts continue to work unchanged). Researched
417
+ and adopted from the public `pi-dynamic-workflows` (Michaelliv/v1.0.1)
418
+ package — full comparison and adoption plan in
419
+ `.crew/artifacts/team_20260623095016_b693d3f967f88048/shared/06_synthesize.md`.
420
+
421
+ ### Feature 1: `ctx.phase(title)` runtime phase API (P0-1)
422
+
423
+ `WorkflowCtx` gains a new `phase(title: string): void` method. The orphan
424
+ `dwf.phase_started` / `dwf.phase_completed` event types — declared in
425
+ `src/state/contracts.ts:89-93` since v0.9.0 but never produced by any
426
+ producer — finally have a producer. Use cases:
427
+
428
+ - Group `ctx.agent()` calls under logical phases (e.g. "Scan", "Audit",
429
+ "Review") so downstream UI and log readers can group by phase.
430
+ - Emit a clear phase boundary to the run's `events.jsonl` without writing
431
+ custom event-log code.
432
+ - Drive live progress reporting from the script itself.
433
+
434
+ Semantics:
435
+
436
+ - Validates `title` is a non-empty string (throws `TypeError` otherwise).
437
+ - Idempotent: calling `ctx.phase("Scan")` twice does not emit a duplicate
438
+ event or change state.
439
+ - When a previous phase is still open, emits `dwf.phase_completed` for it
440
+ **before** emitting `dwf.phase_started` for the new one (consumers never
441
+ see two open phases at once).
442
+ - The in-memory `phases[]` list (read-only via `getWorkflowPhaseState`,
443
+ mirrors the `__finalResult` non-enumerable getter pattern) is deduped and
444
+ capped at **100 distinct titles** to bound memory. Events still flow
445
+ past the cap — the events log is the durable source of truth.
446
+ - The runner **auto-closes the last open phase** before emitting
447
+ `dwf.completed`, so a script that ends mid-phase still produces a
448
+ well-formed event sequence.
449
+
450
+ **Files changed:**
451
+ - `src/runtime/dynamic-workflow-context.ts` — interface, implementation,
452
+ `__phaseState` getter, `getWorkflowPhaseState` helper
453
+ - `src/runtime/dynamic-workflow-runner.ts` — auto-close on completion
454
+
455
+ ### Feature 2: structured-clone guard at the runner boundary (P0-4)
456
+
457
+ Defensive `assertStructuredCloneable(value, name)` helper applied to the
458
+ final artifact content and `manifest.summary` before they reach
459
+ `writeArtifact` and the run-event-bus emitter. Today this is mostly
460
+ future-proofing (the artifact file is read as a string, and strings are
461
+ always structured-cloneable), but the guard surfaces a clear, actionable
462
+ error pointing at the most common cause — forgetting `await` on
463
+ `ctx.agent()` / `ctx.review()` — instead of letting a cryptic
464
+ `DataCloneError` leak from deep inside the artifact store.
465
+
466
+ **Files changed:**
467
+ - `src/runtime/dynamic-workflow-runner.ts` — `assertStructuredCloneable`
468
+ helper, applied to `finalText` and `summaryText` (slice)
469
+
470
+ ### Tests
471
+
472
+ - 7 new unit tests in `test/unit/dynamic-workflow-context.test.ts`
473
+ (emission, idempotency, validation, sequence, helper, dedup, 100-cap).
474
+ - 1 new integration test in `test/integration/dwf-setresult.test.ts`
475
+ (end-to-end phase event sequence, including runner auto-close).
476
+ - All 23 existing DWF unit tests still pass; both pre-existing integration
477
+ tests still pass.
478
+
479
+ ### Docs
480
+
481
+ - `docs/dynamic-workflows.md` — updated WorkflowCtx example to use
482
+ `ctx.phase("Scan")` / `ctx.phase("Audit")`; added a `ctx.phase` row to
483
+ the API table; added a "Phases (round-12)" subsection explaining
484
+ semantics, idempotency, and the 100-cap.
485
+
486
+ ### Out of scope (planned for future rounds)
487
+
488
+ - AST determinism check (P0-2)
489
+ - Structured output helper (P0-3)
490
+ - Abort listener cleanup pattern (P0-5)
491
+ - Authoring types / IDE IntelliSense (P1-1)
492
+ - Token budget (P1-2)
493
+ - Phase UI in `progress-pane` (P1-4)
494
+ - Pipeline primitive (P2-1)
495
+ - `isolated-vm` sandbox (P2-2, planned for v1.5)
496
+
497
+ ## [v0.9.5] — fix "team run hangs forever at 25%" (2026-06-23)
498
+
499
+ Two coupled runtime bugs caused the recurring "run stuck at 25% (1/4)" failure
500
+ observed across 4+ consecutive review/fast-fix runs. Both are now fixed; full
501
+ diagnostics (background.log, events.jsonl, heartbeat.json) are preserved for
502
+ all runs.
503
+
504
+ ### Bug X — `purgeStaleActiveRunIndex` destroyed the run's stateRoot (proximate cause)
505
+
506
+ **File:** `src/runtime/crash-recovery.ts`
507
+
508
+ **What was wrong:** `purgeStaleActiveRunIndex` decided whether a run was
509
+ "orphaned" using `entry.updatedAt`, which is **frozen at registration** and
510
+ never refreshed during execution. A long-running legitimate async run whose
511
+ background worker had exited (e.g. after a 5–15 min explorer) would have its
512
+ entire durable state (manifest/tasks/events/heartbeat) hard-deleted. Because
513
+ `saveRunTasks()` silently no-ops once the state dir is missing, the workflow
514
+ could never advance past the current task → **permanent invisible hang**
515
+ ("Run not found"), with all diagnostics lost.
516
+
517
+ **Fix:**
518
+ - Liveness now corroborated via (a) the on-disk `manifest.updatedAt` (rewritten
519
+ on every task transition) and (b) the team-level `heartbeat.json` mtime —
520
+ any one of which is sufficient to declare the run live.
521
+ - Cancelling a run now **keeps its stateRoot** so the run stays queryable and
522
+ resumable, and its diagnostics survive. The finished-run pruner removes the
523
+ directory later on its normal schedule.
524
+ - Removed two redundant `saveRunManifest(fullLoaded.manifest)` calls that
525
+ were clobbering the freshly-saved `cancelled` status back to `running`.
526
+
527
+ **New regression test:** `test/unit/crash-recovery-purge-liveness.test.ts`
528
+ (3 cases: fresh manifest kept, orphan cancelled-but-preserved, fresh
529
+ heartbeat kept — all using a live-worker-then-reap + `now`-time-shift
530
+ harness to deterministically simulate the registration-then-aging race).
531
+
532
+ ### Bug Y — background runner crashed with EPIPE on the first post-detach `console.debug` (root cause)
533
+
534
+ **File:** `src/runtime/background-runner.ts`
535
+
536
+ **What was wrong:** The in-process console redirect only covered `console.log`
537
+ and `console.error`; `console.debug` and `console.warn` still wrote to the
538
+ original stdout/stderr pipes. The background runner is spawned with
539
+ `detached:true` + `setsid:true`, so the parent disconnects the stdio pipes
540
+ immediately after spawn. The first post-detach `console.debug` call from
541
+ `team-runner.ts:242` (inside `mergeTaskUpdatesPreservingTerminal` →
542
+ "Skipping stale merge") hit the closed stdout → unhandled `EPIPE` error →
543
+ **process exit** → scheduler dead → run stuck at 25% forever.
544
+
545
+ Prior investigators saw only "the run died silently right after explorer
546
+ completed" and concluded (incorrectly) that the cause was a native crash
547
+ (SIGKILL/segfault/V8 heap-OOM), because their [DIAG] handlers never fired.
548
+ In reality the diagnostic handlers DID fire — but on a `EPIPE` write error,
549
+ which `process.on('error')` doesn't catch. The fix below makes the crash
550
+ observable AND non-fatal.
551
+
552
+ **Fix:**
553
+ - Extend the console redirect to also cover `console.debug` and `console.warn`,
554
+ so they go to the log file (logFd) instead of the disconnected stdio pipes.
555
+ - Wrap the `fs.writeSync` in try-catch so any log-write failure (closed fd,
556
+ ENOSPC, etc.) can never crash the scheduler. The scheduler log is
557
+ best-effort by design.
558
+
559
+ **New regression test:** `test/unit/background-runner-console-redirect.test.ts`
560
+ (4 cases: undefined logFd no-op, valid logFd writes correctly, EBADF on
561
+ closed logFd is swallowed, post-undefined fd-toggle is safe). Replicates the
562
+ `origWrite` pattern from the source so any drift between the two is easy to
563
+ spot.
564
+
565
+ ### Why this took multiple attempts
566
+
567
+ All prior attempts to diagnose the hang destroyed the only evidence (the
568
+ stateRoot) the moment the `purgeStaleActiveRunIndex` heuristic misfired.
569
+ The chain was always the same: a worker exits for any reason → purge sees
570
+ dead PID + frozen-stale entry → **deletes stateRoot** → the run becomes
571
+ "Run not found" with no log, no events, no heartbeat, no way to even resume.
572
+ That hid the real cause (Bug Y) for the entire series of failed diagnostic
573
+ runs. With Bug X fixed, the diagnostic trail (background.log 345 KB +
574
+ events.jsonl 166 KB) survives long enough to read the actual EPIPE crash
575
+ that Bug Y left behind.
576
+
577
+ ### Verification
578
+
579
+ - 7/7 new regression tests pass (`crash-recovery-purge-liveness.test.ts` +
580
+ `background-runner-console-redirect.test.ts`).
581
+ - Existing crash-recovery / active-run-registry / stale-reconciler /
582
+ async-stale / run-accumulation / auto-recovery suites: 71/71 pass.
583
+ - End-to-end: a 4-step review run now advances 3/4 tasks (75%) instead of
584
+ hanging at 25%; the verify step that would have failed earlier now fails
585
+ only for environmental reasons (memory OOM under load), not the fix.
586
+ - `npx tsc --noEmit` is green.
587
+
588
+ ### Notes for users
589
+
590
+ If you have a stuck "running" run from v0.9.4 or earlier (the symptom was
591
+ "Run not found" / "25% hang" / "had to kill pi"), upgrading alone will not
592
+ recover it — its `stateRoot` was already destroyed by the buggy purge.
593
+ Re-dispatch the workflow. New runs are fully protected.
594
+
3
595
  ## [v0.9.4] — fix macOS CI: benchmark allowlist + cross-platform fixtures (2026-06-23)
4
596
 
5
597
  Patch fix for a CI failure introduced in v0.9.3 (caught by the macOS CI job,
package/README.md CHANGED
@@ -39,13 +39,65 @@ npm: pi-crew
39
39
  repo: https://github.com/baphuongna/pi-crew
40
40
  ```
41
41
 
42
- **v0.9.0**: See [CHANGELOG.md](CHANGELOG.md).
42
+ **v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
43
43
 
44
- ### Highlights (v0.6.4 → v0.9.0)
44
+ ### Highlights (v0.6.4 → v0.9.5)
45
45
 
46
46
  A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
47
47
  trust and cliff-resilience, stay lean, delete before adding.*
48
48
 
49
+ #### v0.9.5 — fix "team run hangs forever at 25%" (2026-06-23)
50
+ Two coupled runtime bugs caused recurring "run stuck at 25% (1/4)" failures
51
+ across 4+ consecutive review/fast-fix runs. The combined symptom: scheduler
52
+ appears to stop responding right after the first task (explorer) finishes, no
53
+ progress to task 2, and `team action='status'` returns "Run not found" with
54
+ **no diagnostic trail** to investigate. Manual `kill` of the parent `pi`
55
+ process was the only workaround.
56
+
57
+ - **🩹 Bug X (proximate cause)** — `purgeStaleActiveRunIndex`
58
+ (`src/runtime/crash-recovery.ts`) destroyed a run's `stateRoot` based on a
59
+ **frozen** `entry.updatedAt` (set once at registration, never refreshed).
60
+ Any long-running legitimate async run (≥5 min) whose worker had exited
61
+ lost its entire durable state. `saveRunTasks()` then silently no-op'd on
62
+ the missing dir, and the workflow could never advance. Fix: corroborate
63
+ liveness via the on-disk `manifest.updatedAt` AND the team-level
64
+ `heartbeat.json`; keep `stateRoot` on cancel so runs stay queryable and
65
+ resumable.
66
+ - **🩹 Bug Y (root cause — why the scheduler died in the first place)** —
67
+ `src/runtime/background-runner.ts` redirected only `console.log` /
68
+ `console.error` to the log file. The first post-detach `console.debug`
69
+ call from `team-runner.ts:242` (inside `mergeTaskUpdatesPreservingTerminal`
70
+ → "Skipping stale merge") hit the disconnected stdout pipe → unhandled
71
+ `EPIPE` → process exit. Prior investigators concluded (incorrectly) that
72
+ the cause was a native crash, because diagnostic `[DIAG]` handlers never
73
+ fired on the EPIPE. Fix: extend the console redirect to `console.debug` /
74
+ `console.warn`, and wrap `fs.writeSync` in try-catch so any log-write
75
+ failure can never crash the scheduler.
76
+ - **🧪 Regression coverage** — 7 new tests: 3 in
77
+ `test/unit/crash-recovery-purge-liveness.test.ts` (fresh-manifest-kept,
78
+ orphan-cancelled-preserved, fresh-heartbeat-kept) + 4 in
79
+ `test/unit/background-runner-console-redirect.test.ts` (drift-detector
80
+ pattern that exercises undefined / valid / EBADF / post-toggle logFd).
81
+ - **📖 See [CHANGELOG.md](CHANGELOG.md) for full details**, including
82
+ why prior attempts to diagnose the hang kept destroying the only
83
+ evidence (Bug X nuked the stateRoot before anyone could read the EPIPE
84
+ crash in Bug Y).
85
+
86
+ > **Recovering a stuck run from v0.9.4 or earlier:** the `stateRoot` for
87
+ > those runs is already gone. Re-dispatch the workflow — new runs are
88
+ > fully protected.
89
+
90
+ #### v0.9.4 — macOS CI fixture (2026-06-23)
91
+ - **🧪 BSD-vs-GNU grep fix** — benchmark test fixtures used
92
+ `grep --help` (exits 0 on GNU/Linux, exits 2 on BSD/macOS). Switched
93
+ the exit-0 fixture to `echo ok`; the not-in-allowlist fixture is now
94
+ `ls`. CI matrix is now green on all 3 OSes.
95
+ - **📌 Process note** — this release re-commits to: **tag/publish ONLY
96
+ after the full OS matrix CI is green.** v0.9.3 was published mid-CI-run
97
+ (the macOS job hadn't finished); the package itself was correct (the
98
+ broken file is test-only and not shipped), but the repo CI went red.
99
+ v0.9.4 restores green CI. v0.9.5 follows the same discipline.
100
+
49
101
  #### v0.9.0 — goal loops + dynamic workflows (2026-06-18)
50
102
  Two new features, both modeled on Claude Code, built on a shared `runKind`
51
103
  background-dispatch discriminator.
@@ -145,7 +197,7 @@ background-dispatch discriminator.
145
197
  - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
146
198
  - **Health scoring** — penalty-based run health with time-series snapshots
147
199
  - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
148
- - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
200
+ - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
149
201
 
150
202
  ---
151
203