pi-crew 0.9.5 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,499 @@
1
1
  # Changelog
2
2
 
3
+ ## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
4
+
5
+ P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
6
+ script crashes (timeout, OOM, agent error) between `ctx.agent()` calls, the runner now
7
+ persists a checkpoint after every agent call so `team action='resume' runId='X'` can
8
+ continue from the last checkpoint instead of re-running from scratch. **Backward
9
+ compatible** — fresh runs (no checkpoint) behave exactly as before.
10
+
11
+ ### Implementation
12
+
13
+ **New file** `src/runtime/dwf-state-store.ts` (`DwfStore`):
14
+ - Atomic CRUD for a single run's DWF checkpoint, modeled on `GoalStore` /
15
+ `FileCheckpointStore`.
16
+ - Persists `DwfCheckpointState` (vars, phases, currentPhase, logs, spent, agentCount,
17
+ updatedAt) to `<stateRoot>/dwf-checkpoint.json` via `atomicWriteJson`.
18
+ - `load()` returns `undefined` for a missing/corrupt checkpoint (fresh run); `delete()`
19
+ is best-effort and never throws.
20
+
21
+ **`ctx.agent()` checkpoint hook** in `src/runtime/dynamic-workflow-context.ts`:
22
+ - New `MakeWorkflowCtxOptions.onCheckpoint?: (state) => void` — invoked after each
23
+ `ctx.agent()` call (success OR fail) so a crash between calls leaves durable state.
24
+ - New `MakeWorkflowCtxOptions.resumedState?` — hydrates `ctx.vars`, phase state, logs,
25
+ `budget.spent()`, and `agentCount` from the checkpoint on resume.
26
+ - New closure counter `agentCount` (incremented in `agent()`'s `finally`), exposed via a
27
+ non-enumerable `__agentCount` getter.
28
+ - New `getWorkflowCheckpoint(ctx)` helper (mirror of `getWorkflowPhaseState`).
29
+
30
+ **Runner wiring** in `src/runtime/dynamic-workflow-runner.ts`:
31
+ - On run start: `DwfStore.load()` → hydrate ctx (`resumedState`) + emit `dwf.resumed`.
32
+ - `onCheckpoint` → `DwfStore.save()` (best-effort, errors swallowed).
33
+ - On clean completion: `DwfStore.delete()` so a re-run starts fresh.
34
+
35
+ ### Resume semantics
36
+
37
+ `team action='resume' runId='X'` re-dispatches with `runKind='dynamic-workflow'`. The
38
+ runner loads the checkpoint, hydrates `ctx.vars`/phases/logs, and re-executes the
39
+ script from the top. Scripts SHOULD be written defensively — check `ctx.vars.lastPhase`
40
+ to skip completed work (documented in `docs/dynamic-workflows.md`). No partial-resume of
41
+ a single agent call (it re-runs from scratch); checkpoints are written AFTER an agent
42
+ completes, never before.
43
+
44
+ ### Tests (14 new)
45
+ - `test/unit/dwf-state-store.test.ts` (10): save/load round-trip, missing→undefined,
46
+ delete, corrupt-file resilience, path layout, dir creation, large-state preservation.
47
+ - `test/unit/dynamic-workflow-context.test.ts` (+8): onCheckpoint fires on success/fail,
48
+ agentCount accumulation, backward-compat (no callback), resumedState hydration,
49
+ shallow-copy isolation, getWorkflowCheckpoint snapshot.
50
+ - `test/integration/dwf-setresult.test.ts` (+4): fresh run (no resumed event),
51
+ completed run (checkpoint deleted), resume (hydration + dwf.resumed + delete),
52
+ corrupt checkpoint treated as fresh run.
53
+
54
+ ### Docs
55
+ - `docs/dynamic-workflows.md` — new "Resume & Checkpoint (round-18 P2-3)" section +
56
+ defensive-script example.
57
+ - `types/dwf.d.ts` — resume pattern documented in header + `ctx.vars` JSDoc.
58
+ - `package.json` — bumped 1.0.1 → 1.1.0 (minor — new opt-in capability).
59
+
60
+ ### Out of scope (future rounds)
61
+ - P2-2 VM sandbox — still waiting for `isolated-vm` v1.5 (vm.createContext is not a
62
+ real security boundary). This was the LAST P2 item.
63
+
64
+ ### Process-safety follow-up fix (same release)
65
+
66
+ A heuristic-based zombie "cleanup" had killed a live interactive main `pi`
67
+ session by accident (uptime/RSS/orphan heuristics match a main session just
68
+ as readily as a real orphaned sub-agent). Fixed authoritatively:
69
+
70
+ - **`PI_CREW_KIND=subagent`** env marker, set by `buildPiWorkerArgs`
71
+ (`src/runtime/pi-args.ts`) on every child-pi spawn. A main session does NOT
72
+ carry it, so it can never be matched as a sub-agent. (An earlier draft also
73
+ added a `--crew-subagent` argv flag — removed because pi's strict option
74
+ parser rejects unknown flags and exits non-zero, which silently broke every
75
+ `ctx.agent()` call. The env var alone is the authoritative signal.)
76
+ - **`src/runtime/zombie-scanner.ts`** (new): read-only scanner that matches
77
+ ONLY processes with `PI_CREW_KIND=subagent` AND a dead `PI_CREW_PARENT_PID`.
78
+ Never matches a main session. Never kills.
79
+ - **`team action='doctor' focus='zombies'`**: renders the safe scan as a
80
+ human-readable report (zombies vs live sub-agents, with explicit do-not-kill
81
+ labelling for live parents).
82
+ - **`PI_CREW_KIND`** added to the env allowlist in `child-pi.ts`.
83
+ - **`docs/troubleshooting.md`** + **`.crew/knowledge.md`**: documented the
84
+ marker + the read-only rule so future agents never repeat the mistake.
85
+ - 8 new unit tests in `test/unit/zombie-scanner.test.ts`, including a
86
+ regression test asserting the current (main-session) process is never matched.
87
+
88
+ ### Real-world smoke testing findings (2026-06-24)
89
+
90
+ Three bugs were caught by real `team action='run'` smoke tests that the unit
91
+ suite missed (units don't shell out to the real `pi` binary). **All three are
92
+ now fixed.**
93
+
94
+ - **Fixed: `--crew-subagent` argv flag broke every `ctx.agent()` call.** Pi's
95
+ strict option parser exits non-zero on unknown flags. The marker is now the
96
+ `PI_CREW_KIND=subagent` ENV var only.
97
+ - **Fixed: `ctx.agent({schema, systemPrompt})` silently dropped `systemPrompt`.**
98
+ The round-13 schema branch used the resolved role persona as the base for the
99
+ JSON-output instruction, ignoring the caller's explicit persona. Models then
100
+ returned prose and failed schema validation. Fix: `call.systemPrompt` is now
101
+ preferred as the base when both are set.
102
+ - **Fixed: `ctx.agent({disableTools: true, maxTurns: 1})` returned `exit null`.**
103
+ Root cause (found via Phase-0 diagnostic instrumentation) was NOT the
104
+ final-drain race originally hypothesised — it was an erroneous
105
+ `killProcessTree` call in the steer-injection path. When `maxTurns` was
106
+ reached on a `turn_end` event, the code wrote a "wrap up" steer to
107
+ `child.stdin`; Node's `writable.write()` returns `false` on normal
108
+ backpressure, which the code mis-treated as a fatal injection failure and
109
+ killed the worker mid-answer (answer was in stdout; exit came back `null`).
110
+ The `disableTools` correlation was a red herring — the real trigger was
111
+ `maxTurns:1` hitting on the first turn. Fix: steer injection is now
112
+ ADVISORY — a `write() === false` or non-writable stdin is logged, not fatal;
113
+ the hard-abort at `maxTurns + graceTurns` remains the safety net for genuine
114
+ runaways. Verified: maxTurns=1 × 10 real-binary runs now 10/10 exit=0 (was
115
+ ~60% fail pre-fix). Regression guard: `test/unit/child-pi-steer-backpressure.test.ts`
116
+ (source-contract checks + opt-in real-binary smoke via `PI_CREW_SMOKE=1`).
117
+
118
+ ## [v0.9.7] — round-17 (P2-4 worktree isolation per agent) (2026-06-23)
119
+
120
+ P2-4 feature: `ctx.agent({worktree: true})` spawns the agent in an isolated
121
+ git worktree so parallel file-modifying agents don't conflict. Fully backward
122
+ compatible — `worktree` defaults to false, so existing calls are unchanged.
123
+
124
+ ### Implementation
125
+
126
+ **New helpers** in `src/worktree/worktree-manager.ts`:
127
+ - `prepareAgentWorktree(manifest, opts)` — creates a worktree from HEAD on a
128
+ unique branch; returns `{path, branch}` or `undefined` when the cwd is not a
129
+ git repo (graceful fallback).
130
+ - `cleanupAgentWorktree(manifest, worktreePath, branch?)` — removes the
131
+ worktree dir + branch, and captures a git diff as a side artifact when the
132
+ worktree has changes (for audit/merge).
133
+
134
+ **`ctx.agent()` integration** in `src/runtime/dynamic-workflow-context.ts`:
135
+ - New `worktree?: boolean` field on `AgentCallOpts`.
136
+ - When true, the agent runs with the worktree path as its cwd.
137
+ - Cleanup always runs (success, failure, or agent error) — no worktree leaks.
138
+ - Non-git cwd or creation failure → fall back to normal cwd + `ctx.log()`
139
+ warning; the agent still runs.
140
+
141
+ ### Why worktrees for DWF
142
+
143
+ DWF scripts commonly fan out parallel agents that each modify files (e.g.
144
+ fixing different modules). Without isolation they race on the working tree.
145
+ `worktree: true` gives each agent its own checkout; the diff is captured as
146
+ an artifact for later merge.
147
+
148
+ ### Tests (8 new, 60 total in dynamic-workflow-context.test.ts)
149
+ - prepareAgentWorktree creates an isolated worktree
150
+ - cleanupAgentWorktree removes dir + branch (no leak)
151
+ - cleanupAgentWorktree captures a diff artifact when there are changes
152
+ - prepareAgentWorktree returns undefined for non-git cwd (graceful fallback)
153
+ - ctx.agent({worktree:true}) isolates + cleans up (mock)
154
+ - ctx.agent({worktree:false}) uses normal cwd (backward compat)
155
+ - ctx.agent({worktree:true}) falls back gracefully + warns in non-git cwd
156
+ - ctx.agent({worktree:true}) cleans up even when the agent fails
157
+
158
+ ### Docs
159
+ - `docs/dynamic-workflows.md` — worktree option in API table + example
160
+ - `types/dwf.d.ts` — `worktree?: boolean` on AgentCallOpts
161
+ - `package.json` — bumped 1.0.0 → 1.0.1 (patch, additive opt-in)
162
+
163
+ ### Out of scope (future rounds)
164
+ - P2-2 VM sandbox — waiting for `isolated-vm` v1.5 (vm.createContext is not
165
+ a real security boundary)
166
+ - P2-3 Resume/checkpoint — round-18 candidate (large effort)
167
+
168
+ ## [v0.9.7] — round-16 (P2-1 pipeline primitive) (2026-06-23)
169
+
170
+ Adds **`ctx.pipeline(items, ...stages)`** — a multi-stage transform primitive for
171
+ dynamic workflows. **Backward compatible** — existing DWF scripts are unaffected;
172
+ `pipeline` is a new opt-in capability.
173
+
174
+ ### Feature — Pipeline primitive (P2-1)
175
+
176
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + `types/dwf.d.ts` + `docs/dynamic-workflows.md`
177
+
178
+ Previously the DWF context only offered `ctx.fanOut()` (a single parallel map). The
179
+ new `ctx.pipeline()` chains stages: each item flows through **all stages in sequence**
180
+ (stage 1 → stage 2 → …), while **different items run concurrently**, bounded by the
181
+ workflow concurrency (`mapConcurrent`, the same primitive as `fanOut`).
182
+
183
+ Semantics (mirrors `pi-dynamic-workflows`' `pipeline()`):
184
+
185
+ - Each stage receives `(previous, original, index)` — `previous` is the prior stage's
186
+ output (the raw item for the first stage), `original` is the unchanged input item.
187
+ - A failed stage yields `null` for that item, logs `pipeline[i] failed: <msg>` via
188
+ `ctx.log()`, and the other items continue.
189
+ - On **abort**, the error propagates (it is not swallowed into `null`).
190
+ - Returns `(TResult | null)[]`, order-preserving.
191
+
192
+ Signature:
193
+
194
+ ```ts
195
+ ctx.pipeline<TItem, TResult = unknown>(
196
+ items: TItem[],
197
+ ...stages: Array<(previous: TResult, original: TItem, index: number) => Promise<TResult> | TResult>
198
+ ): Promise<(TResult | null)[]>;
199
+ ```
200
+
201
+ Implementation notes:
202
+
203
+ - Uses `mapConcurrent(items, concurrency, …)` — NOT unbounded `Promise.all` — so
204
+ item-level parallelism respects the workflow's configured concurrency. Stages that
205
+ spawn agents additionally acquire `ctx.semaphore` for agent-level throttling.
206
+ - Validates inputs: non-array first arg → `TypeError`; non-function stage (or no
207
+ stages) → `TypeError`.
208
+ - Empty items array short-circuits to `[]`.
209
+ - Authoring types (`types/dwf.d.ts`) mirror the runtime signature for IDE IntelliSense.
210
+
211
+ Example use case: scan → analyze → review each shard, up to `concurrency` shards at a
212
+ time, with per-shard failure isolation.
213
+
214
+ Tests: `test/unit/dynamic-workflow-context.test.ts` — single/multi-stage transforms,
215
+ empty array, failed-stage isolation + logging, TypeError on bad inputs, stage-argument
216
+ contract, async stages, and concurrency-bounded execution.
217
+
218
+ ## [v0.9.7] — round-15 (P1-4 phase UI) (2026-06-23)
219
+
220
+ The progress pane now **renders DWF phase markers** (▶/✓/⏸) by consuming the
221
+ `dwf.phase_started` / `dwf.phase_completed` events emitted by `ctx.phase()`
222
+ (round-12). **Backward compatible** — non-DWF runs are unaffected.
223
+
224
+ ### Feature — Phase UI in progress-pane (P1-4)
225
+
226
+ **Files:** `src/ui/dwf-phase-display.ts` (new) + `progress-pane.ts` +
227
+ `run-snapshot-cache.ts` + `snapshot-types.ts`
228
+
229
+ Previously the phase events were produced but the UI did not consume them.
230
+ Now the progress pane shows a phase overview:
231
+
232
+ - `▶ Phase: <name>` — currently running phase.
233
+ - `✓ Phase: <name>` — completed phase.
234
+ - `⏸ Phase: <name>` — a phase whose completion scrolled out of the recent-event
235
+ window and is not the current one.
236
+
237
+ Implementation details:
238
+
239
+ - New pure-function module `src/ui/dwf-phase-display.ts`:
240
+ `extractDwfPhaseState(events)` derives phase state from the event window
241
+ (returns `null` for non-DWF runs); `renderDwfPhaseLines(state, { ascii })`
242
+ renders markers with Unicode glyphs and ASCII fallbacks (`[>]`/`[v]`/`[ ]`).
243
+ - `RunUiSnapshot` gains an optional `dwfPhaseState` field, computed from the
244
+ existing tailed `recentEvents` window (no extra I/O) in both sync and async
245
+ snapshot builders.
246
+ - `progress-pane.ts` renders the phase lines right after the summary line,
247
+ before the task-based phase grouping. Non-DWF runs produce zero lines.
248
+ - `signatureFor` includes `dwfPhaseState` so cache invalidation reflects phase
249
+ changes.
250
+ - Tests: `test/unit/dwf-phase-display.test.ts` — phase state tracking from an
251
+ event sequence (incl. scrolled-off recovery), correct markers (Unicode +
252
+ ASCII), header gating, and non-DWF snapshots unaffected.
253
+
254
+ ## [v0.9.7] — round-14 (P1 DX + observability) (2026-06-23)
255
+
256
+ Four additive P1 features land in this round — **authoring types**, **per-workflow
257
+ token budget**, **log API**, and **typed args**. **Backward compatible** — existing
258
+ DWF scripts continue to work unchanged. New behavior is opt-in.
259
+
260
+ ### Feature 1 — Authoring Types / IDE IntelliSense (P1-1)
261
+
262
+ **Files:** `types/dwf.d.ts` (new) + `package.json` (`./workflow` export)
263
+
264
+ A `.dwf.ts` script can now import the `WorkflowCtx` (and supporting) types from
265
+ the package's `./workflow` export for full TypeScript IntelliSense:
266
+
267
+ ```ts
268
+ import type { WorkflowCtx } from "pi-crew/workflow";
269
+ export default async function run(ctx: WorkflowCtx): Promise<void> { /* ... */ }
270
+ ```
271
+
272
+ - New file: `types/dwf.d.ts` — named exports mirroring the runtime types
273
+ (`WorkflowCtx`, `AgentCallOpts`, `AgentResult`, `WorkflowBudget`, ...).
274
+ - `package.json` gains `"./workflow": { "types": "./types/dwf.d.ts" }` and ships
275
+ the `types/` directory.
276
+ - New test: `test/unit/dwf-authoring-types.test.ts` — compiles a sample `.dwf.ts`
277
+ against the export (positive + negative `@ts-expect-error` check).
278
+
279
+ ### Feature 2 — Per-Workflow Token Budget (P1-2)
280
+
281
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + dispatch wiring
282
+
283
+ `ctx.budget` is a frozen `{total, spent(), remaining()}` surface. When a
284
+ per-workflow token budget is set, `ctx.agent()` auto-rejects with `ok:false`
285
+ (`"workflow token budget exhausted"`) **before** spawning a child worker. `spent()`
286
+ accumulates each run's reported `usage.input + usage.output`.
287
+
288
+ - `total` is `null` (unbounded) by default; `remaining()` is `Infinity` then.
289
+ - New `MakeWorkflowCtxOptions.tokenBudget`, `WorkflowConfig.maxTokenBudget`,
290
+ `RunDynamicWorkflowInput.tokenBudget`, and the `team run` `tokenBudget` param
291
+ (param overrides the workflow value).
292
+ - Budget check + accumulation wired into `ctx.agent()`; budget object passed
293
+ through `run.ts` and `background-runner.ts`.
294
+ - Tests: 7 unit cases (default null, set total, spent/remaining, exhaustion,
295
+ accumulation from the mock's `{input:10,output:5}`, frozen check).
296
+
297
+ ### Feature 3 — Log API (P1-3)
298
+
299
+ **Files:** `src/state/contracts.ts` + `src/runtime/dynamic-workflow-context.ts`
300
+
301
+ `ctx.log(message)` appends a workflow-level log line: stringifies non-strings,
302
+ keeps a bounded in-memory copy (capped at **1000**), and always emits a durable
303
+ `dwf.log` event (`{message}`) to the run's `events.jsonl`.
304
+
305
+ - New event type `"dwf.log"` in `TEAM_EVENT_TYPES` (non-terminal).
306
+ - New runner-only `getWorkflowLogs(ctx)` accessor (mirrors `getWorkflowPhaseState`).
307
+ - Tests: 4 unit cases (append, stringify, event emission, 1000-cap) + 1
308
+ integration case (end-to-end through `runDynamicWorkflow`).
309
+
310
+ ### Feature 4 — Typed Args (P1-5)
311
+
312
+ **Files:** `src/runtime/dynamic-workflow-context.ts` + `src/state/types.ts` +
313
+ `src/schema/team-tool-schema.ts` + `src/state/state-store.ts` + dispatch wiring
314
+
315
+ `ctx.args<T>()` returns typed workflow arguments (sourced from `manifest.args`,
316
+ passed via the run `args` param). Defaults to `{}` when unset.
317
+
318
+ - New `TeamRunManifest.args`, `MakeWorkflowCtxOptions.args`, `createRunManifest`
319
+ `args` param, and the `team run` `args` schema field (`Type.Unsafe`, any JSON
320
+ value — avoids `any`).
321
+ - The runner reads `manifest.args` and forwards it to `makeWorkflowCtx`.
322
+ - Tests: 3 unit cases (default `{}`, typed object, array) + 1 integration case
323
+ (reads `manifest.args` end-to-end).
324
+
325
+ ### Other
326
+
327
+ - Version bumped `0.9.7` → `0.9.8`.
328
+ - `docs/dynamic-workflows.md`: API table rows + Log/Budget/Args/Authoring-types
329
+ sections.
330
+
331
+ ## [v0.9.7] — round-13 (P0 AST determinism + structured output + abort cleanup) (2026-06-23)
332
+
333
+ Three P0 features land in this round. **Backward compatible** — existing DWF
334
+ scripts continue to work unchanged. New behavior is opt-in via the `schema`
335
+ field on `ctx.agent()` and an env-var escape hatch for the determinism check.
336
+
337
+ ### Feature 1 — AST Determinism Check (P0-2)
338
+
339
+ **File:** `src/runtime/deterministic-ast.ts` (new) +
340
+ `src/runtime/dynamic-workflow-runner.ts` (integration)
341
+
342
+ Dynamic workflow scripts must now be **deterministic**. The runner parses each
343
+ `.dwf.ts` with `acorn` and walks the AST, rejecting `Date.now()`,
344
+ `Math.random()`, and `new Date()` calls before `jiti` executes the script.
345
+ Two runs of the same script against the same inputs now produce the same
346
+ outputs — critical for regression testing and workflow replay.
347
+
348
+ Why AST, not regex: regex matches `Date.now()` everywhere — including string
349
+ literals, comments, and prompt text. AST walking distinguishes **calls** from
350
+ strings, so prompts that say *"avoid `Date.now()` in your code"* still parse
351
+ cleanly. Other `Date.*` and `Math.*` methods (`Date.parse`, `Date.UTC`,
352
+ `Math.floor`, `Math.max`, etc.) are accepted — only `now` and `random` are
353
+ blocked.
354
+
355
+ - New dep: `acorn ^8.14.0` (small, well-maintained; verified Node ≥22 ESM/strip-types compatibility)
356
+ - New file: `src/runtime/deterministic-ast.ts` (determinism walker; MIT-licensed adaptation from pi-dynamic-workflows, attribution in `NOTICE.md`)
357
+ - New file: `test/unit/deterministic-ast.test.ts` (27 cases: accepts/rejects every form, comments, template literals, computed properties, parse-error delegation)
358
+ - New tests in `test/integration/dwf-setresult.test.ts` (5 end-to-end cases including env-var opt-out)
359
+
360
+ **Escape hatch:** `PI_CREW_DWF_SKIP_DETERMINISM_CHECK=1` bypasses the check for
361
+ power users who legitimately need time/random (e.g. randomized benchmark
362
+ scripts). Off by default.
363
+
364
+ ### Feature 2 — Structured Output Helper (P0-3)
365
+
366
+ **Files:** `src/runtime/result-extractor.ts` + `src/runtime/dynamic-workflow-context.ts`
367
+
368
+ `AgentCallOpts` gains an optional `schema?: TSchema` field (TypeBox). When set,
369
+ `ctx.agent()` validates the extracted JSON against the schema via
370
+ `@sinclair/typebox`'s `Value.Check`. Mismatch yields
371
+ `{ok: false, error: "structured output does not match schema: ..."}` instead
372
+ of an untyped `structured: { ... }` blob.
373
+
374
+ How the runner helps the model comply:
375
+
376
+ - Appends a JSON-output directive to the prompt.
377
+ - Replaces the agent's system prompt suffix with a "structured-output
378
+ assistant" preamble that describes the schema's shape.
379
+
380
+ When `schema` is **omitted**, behavior is byte-identical to the previous
381
+ regex-based extractor — verified by the existing 30+ test cases plus 9 new
382
+ schema-specific cases in `test/unit/result-extractor.test.ts` and 4 new
383
+ end-to-end cases in `test/unit/dynamic-workflow-context.test.ts`.
384
+
385
+ Caveat: pi-crew DWF spawns `pi` as a subprocess (`runChildPi`), not an
386
+ in-memory `createAgentSession`. Subprocess structured output is captured via
387
+ the same event-stream → JSON-line → schema-check pipeline used for everything
388
+ else, so this round ships Option B (regex-extract + schema validation
389
+ post-hoc). Option A (in-process terminating tool) is planned for round-14.
390
+
391
+ ### Feature 3 — Abort Listener Cleanup (P0-5) — NO-OP (already fixed in round 27)
392
+
393
+ Audited `src/runtime/child-pi.ts` for AbortSignal listener leaks. The fix was
394
+ landed in round 27 (BUG 4): both the `onParentAbort` flag handler and the
395
+ `abort` cancellation handler are now removed inside `settle()` regardless of
396
+ the exit path (normal completion, response timeout, hard kill, parent abort,
397
+ forced final drain). On runs with >10 tasks sharing one AbortSignal (the
398
+ common pattern under `background-runner`), this prevents the
399
+ `MaxListenersExceededWarning` and per-task closure capture that previously
400
+ pinned the worker stack frame in memory.
401
+
402
+ No code changes needed in round 13. Documented for the audit trail.
403
+
404
+ ### Verification
405
+
406
+ - `npm run typecheck` — clean
407
+ - `npx tsc --noEmit` — clean
408
+ - 31 new unit tests across `deterministic-ast.test.ts` (27) and 9 schema-validation cases in `result-extractor.test.ts` and 4 new cases in `dynamic-workflow-context.test.ts` — all pass
409
+ - 5 new integration tests in `dwf-setresult.test.ts` — all pass
410
+ - 0 regressions in the existing 4 round-12 tests
411
+
412
+
413
+ ## [v0.9.7] — round-12: DWF phases + structured-clone guard (2026-06-23)
414
+
415
+ Two additive P0 features for dynamic-workflow (DWF) scripts, both fully
416
+ backward-compatible (existing scripts continue to work unchanged). Researched
417
+ and adopted from the public `pi-dynamic-workflows` (Michaelliv/v1.0.1)
418
+ package — full comparison and adoption plan in
419
+ `.crew/artifacts/team_20260623095016_b693d3f967f88048/shared/06_synthesize.md`.
420
+
421
+ ### Feature 1: `ctx.phase(title)` runtime phase API (P0-1)
422
+
423
+ `WorkflowCtx` gains a new `phase(title: string): void` method. The orphan
424
+ `dwf.phase_started` / `dwf.phase_completed` event types — declared in
425
+ `src/state/contracts.ts:89-93` since v0.9.0 but never produced by any
426
+ producer — finally have a producer. Use cases:
427
+
428
+ - Group `ctx.agent()` calls under logical phases (e.g. "Scan", "Audit",
429
+ "Review") so downstream UI and log readers can group by phase.
430
+ - Emit a clear phase boundary to the run's `events.jsonl` without writing
431
+ custom event-log code.
432
+ - Drive live progress reporting from the script itself.
433
+
434
+ Semantics:
435
+
436
+ - Validates `title` is a non-empty string (throws `TypeError` otherwise).
437
+ - Idempotent: calling `ctx.phase("Scan")` twice does not emit a duplicate
438
+ event or change state.
439
+ - When a previous phase is still open, emits `dwf.phase_completed` for it
440
+ **before** emitting `dwf.phase_started` for the new one (consumers never
441
+ see two open phases at once).
442
+ - The in-memory `phases[]` list (read-only via `getWorkflowPhaseState`,
443
+ mirrors the `__finalResult` non-enumerable getter pattern) is deduped and
444
+ capped at **100 distinct titles** to bound memory. Events still flow
445
+ past the cap — the events log is the durable source of truth.
446
+ - The runner **auto-closes the last open phase** before emitting
447
+ `dwf.completed`, so a script that ends mid-phase still produces a
448
+ well-formed event sequence.
449
+
450
+ **Files changed:**
451
+ - `src/runtime/dynamic-workflow-context.ts` — interface, implementation,
452
+ `__phaseState` getter, `getWorkflowPhaseState` helper
453
+ - `src/runtime/dynamic-workflow-runner.ts` — auto-close on completion
454
+
455
+ ### Feature 2: structured-clone guard at the runner boundary (P0-4)
456
+
457
+ Defensive `assertStructuredCloneable(value, name)` helper applied to the
458
+ final artifact content and `manifest.summary` before they reach
459
+ `writeArtifact` and the run-event-bus emitter. Today this is mostly
460
+ future-proofing (the artifact file is read as a string, and strings are
461
+ always structured-cloneable), but the guard surfaces a clear, actionable
462
+ error pointing at the most common cause — forgetting `await` on
463
+ `ctx.agent()` / `ctx.review()` — instead of letting a cryptic
464
+ `DataCloneError` leak from deep inside the artifact store.
465
+
466
+ **Files changed:**
467
+ - `src/runtime/dynamic-workflow-runner.ts` — `assertStructuredCloneable`
468
+ helper, applied to `finalText` and `summaryText` (slice)
469
+
470
+ ### Tests
471
+
472
+ - 7 new unit tests in `test/unit/dynamic-workflow-context.test.ts`
473
+ (emission, idempotency, validation, sequence, helper, dedup, 100-cap).
474
+ - 1 new integration test in `test/integration/dwf-setresult.test.ts`
475
+ (end-to-end phase event sequence, including runner auto-close).
476
+ - All 23 existing DWF unit tests still pass; both pre-existing integration
477
+ tests still pass.
478
+
479
+ ### Docs
480
+
481
+ - `docs/dynamic-workflows.md` — updated WorkflowCtx example to use
482
+ `ctx.phase("Scan")` / `ctx.phase("Audit")`; added a `ctx.phase` row to
483
+ the API table; added a "Phases (round-12)" subsection explaining
484
+ semantics, idempotency, and the 100-cap.
485
+
486
+ ### Out of scope (planned for future rounds)
487
+
488
+ - AST determinism check (P0-2)
489
+ - Structured output helper (P0-3)
490
+ - Abort listener cleanup pattern (P0-5)
491
+ - Authoring types / IDE IntelliSense (P1-1)
492
+ - Token budget (P1-2)
493
+ - Phase UI in `progress-pane` (P1-4)
494
+ - Pipeline primitive (P2-1)
495
+ - `isolated-vm` sandbox (P2-2, planned for v1.5)
496
+
3
497
  ## [v0.9.5] — fix "team run hangs forever at 25%" (2026-06-23)
4
498
 
5
499
  Two coupled runtime bugs caused the recurring "run stuck at 25% (1/4)" failure
package/README.md CHANGED
@@ -197,7 +197,7 @@ background-dispatch discriminator.
197
197
  - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
198
198
  - **Health scoring** — penalty-based run health with time-series snapshots
199
199
  - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
200
- - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
200
+ - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
201
201
 
202
202
  ---
203
203
 
@@ -14,7 +14,13 @@ Use when an agent discovers a missing harness capability but should not change t
14
14
 
15
15
  **Risk**: normal
16
16
 
17
- **Status**: proposed
17
+ **Status**: ✅ PARTIALLY DONE (2026-06-24). The bulk of HB-001 was already
18
+ covered by 21 existing `test/integration/` files (team-runner path via
19
+ `mock-child-run`, `full-feature-smoke`, `phase3-6-*`). The genuine remaining
20
+ gap — interleaved manifest+task+event writes reloaded consistently (the
21
+ realistic run-load pattern) — is now covered by
22
+ `test/integration/state-durability-hb001.test.ts`. Child-process exit →
23
+ state-store reconcile is covered by `async-restart-recovery.test.ts`.
18
24
 
19
25
  ### HB-002: Windows-specific test coverage
20
26
 
@@ -26,7 +32,12 @@ Use when an agent discovers a missing harness capability but should not change t
26
32
 
27
33
  **Risk**: normal
28
34
 
29
- **Status**: proposed
35
+ **Status**: ✅ DONE (2026-06-24). `test/platform/` ships with two files:
36
+ `windows-rename.test.ts` (EBUSY/EPERM rename retry path via `renameWithRetry`,
37
+ self-skips off win32) and `posix-tools.test.ts` (BSD-vs-GNU grep, /var →
38
+ /private/var realpath, POSIX-shell resolution — self-skips on win32).
39
+ Runbook in `test/platform/README.md`. The CI OS matrix (ubuntu/windows/macos)
40
+ exercises each platform's tests.
30
41
 
31
42
  ### HB-003: Performance regression baseline
32
43
 
@@ -38,4 +49,41 @@ Use when an agent discovers a missing harness capability but should not change t
38
49
 
39
50
  **Risk**: tiny
40
51
 
41
- **Status**: proposed
52
+ **Status**: ✅ DONE (2026-06-24). `test/bench/` now has 6 benchmarks:
53
+ the pre-existing `register-startup`, `render-flush`, `snapshot-cache`, plus
54
+ three new ones covering the gaps HB-003 flagged — `atomic-write.bench.ts`
55
+ (`atomicWriteJson` cold/warm), `event-append.bench.ts` (serial lock
56
+ contention vs batch), `task-graph-scheduler.bench.ts` (DAG build/refresh/
57
+ full-run). All run via `npm run bench` → `test/bench/results.json`; baseline
58
+ via `npm run bench:capture`. Each prints min/p50/p95/p99/max percentiles.
59
+
60
+ ### HB-004: Real-binary smoke tests for ctx.agent() paths
61
+
62
+ **Discovered while**: Real-world `team action='run'` smoke testing on 2026-06-24
63
+ caught three bugs that the unit suite (which mocks child-pi) missed entirely.
64
+
65
+ **Current pain**: The unit tests for `dynamic-workflow-context.ts` and
66
+ `child-pi.ts` use `PI_TEAMS_MOCK_CHILD_PI` and never shell out to the real `pi`
67
+ binary. As a result they cannot catch:
68
+ - argv flags the real `pi` rejects (e.g. the `--crew-subagent` regression),
69
+ - env/persona interactions that change real model output (e.g. the
70
+ schema+systemPrompt drop),
71
+ - exit-code races in the real spawn lifecycle (e.g. the
72
+ `disableTools:true` → `exit null` race).
73
+
74
+ **Suggested improvement**: Add `test/smoke/` (gated behind a `PI_CREW_SMOKE=1`
75
+ env so CI doesn't bill tokens by default) that runs real `.dwf.ts` workflows
76
+ end-to-end via `team action='run'` and asserts on the resulting
77
+ `events.jsonl` + `summary.md`. One workflow per feature family
78
+ (phase/log/pipeline/agent/schema/worktree). Document the runbook in
79
+ `docs/troubleshooting.md`.
80
+
81
+ **Risk**: normal (token cost when run; otherwise read-only)
82
+
83
+ **Status**: ✅ DONE (2026-06-24). `test/smoke/` shipped with 5 smoke tests
84
+ (argv-flags, agent-plain, agent-schema, agent-disabletools, dwf-workflow),
85
+ all gated behind `PI_CREW_SMOKE=1`. `npm run test:smoke` runs them. CI
86
+ manual-dispatch workflow at `.github/workflows/smoke.yml` (requires
87
+ `PI_AUTH_JSON` secret). Runbook in `docs/troubleshooting.md`. Each smoke test
88
+ maps to a real bug it would have caught (HB-003a, the schema+systemPrompt
89
+ drop, the `--crew-subagent` argv regression).