npm - pi-crew - Versions diffs - 0.9.5 → 0.9.7 - Mend

pi-crew 0.9.5 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/CHANGELOG.md +494 -0
package/README.md +1 -1
package/docs/HARNESS_BACKLOG.md +51 -3
package/docs/dynamic-workflows.md +315 -2
package/docs/fix-plan-disabletools-exit-null.md +219 -0
package/docs/troubleshooting.md +76 -0
package/package.json +8 -2
package/src/extension/team-tool/doctor.ts +14 -0
package/src/extension/team-tool/run.ts +2 -0
package/src/runtime/background-runner.ts +1 -1
package/src/runtime/child-pi.ts +101 -10
package/src/runtime/deterministic-ast.ts +161 -0
package/src/runtime/dwf-state-store.ts +97 -0
package/src/runtime/dynamic-workflow-context.ts +381 -7
package/src/runtime/dynamic-workflow-runner.ts +93 -2
package/src/runtime/pi-args.ts +11 -0
package/src/runtime/result-extractor.ts +72 -7
package/src/runtime/team-runner.ts +8 -3
package/src/runtime/zombie-scanner.ts +297 -0
package/src/schema/team-tool-schema.ts +28 -0
package/src/state/contracts.ts +1 -0
package/src/state/state-store.ts +3 -0
package/src/state/types.ts +9 -0
package/src/ui/dashboard-panes/progress-pane.ts +5 -0
package/src/ui/dwf-phase-display.ts +151 -0
package/src/ui/run-snapshot-cache.ts +4 -0
package/src/ui/snapshot-types.ts +3 -0
package/src/workflows/workflow-config.ts +3 -0
package/src/worktree/worktree-manager.ts +94 -0
package/types/dwf.d.ts +187 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,499 @@
 # Changelog
+## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
+P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
+script crashes (timeout, OOM, agent error) between `ctx.agent()` calls, the runner now
+persists a checkpoint after every agent call so `team action='resume' runId='X'` can
+continue from the last checkpoint instead of re-running from scratch. **Backward
+compatible** — fresh runs (no checkpoint) behave exactly as before.
+### Implementation
+**New file** `src/runtime/dwf-state-store.ts` (`DwfStore`):
+- Atomic CRUD for a single run's DWF checkpoint, modeled on `GoalStore` /
+  `FileCheckpointStore`.
+- Persists `DwfCheckpointState` (vars, phases, currentPhase, logs, spent, agentCount,
+  updatedAt) to `<stateRoot>/dwf-checkpoint.json` via `atomicWriteJson`.
+- `load()` returns `undefined` for a missing/corrupt checkpoint (fresh run); `delete()`
+  is best-effort and never throws.
+**`ctx.agent()` checkpoint hook** in `src/runtime/dynamic-workflow-context.ts`:
+- New `MakeWorkflowCtxOptions.onCheckpoint?: (state) => void` — invoked after each
+  `ctx.agent()` call (success OR fail) so a crash between calls leaves durable state.
+- New `MakeWorkflowCtxOptions.resumedState?` — hydrates `ctx.vars`, phase state, logs,
+  `budget.spent()`, and `agentCount` from the checkpoint on resume.
+- New closure counter `agentCount` (incremented in `agent()`'s `finally`), exposed via a
+  non-enumerable `__agentCount` getter.
+- New `getWorkflowCheckpoint(ctx)` helper (mirror of `getWorkflowPhaseState`).
+**Runner wiring** in `src/runtime/dynamic-workflow-runner.ts`:
+- On run start: `DwfStore.load()` → hydrate ctx (`resumedState`) + emit `dwf.resumed`.
+- `onCheckpoint` → `DwfStore.save()` (best-effort, errors swallowed).
+- On clean completion: `DwfStore.delete()` so a re-run starts fresh.
+### Resume semantics
+`team action='resume' runId='X'` re-dispatches with `runKind='dynamic-workflow'`. The
+runner loads the checkpoint, hydrates `ctx.vars`/phases/logs, and re-executes the
+script from the top. Scripts SHOULD be written defensively — check `ctx.vars.lastPhase`
+to skip completed work (documented in `docs/dynamic-workflows.md`). No partial-resume of
+a single agent call (it re-runs from scratch); checkpoints are written AFTER an agent
+completes, never before.
+### Tests (14 new)
+- `test/unit/dwf-state-store.test.ts` (10): save/load round-trip, missing→undefined,
+  delete, corrupt-file resilience, path layout, dir creation, large-state preservation.
+- `test/unit/dynamic-workflow-context.test.ts` (+8): onCheckpoint fires on success/fail,
+  agentCount accumulation, backward-compat (no callback), resumedState hydration,
+  shallow-copy isolation, getWorkflowCheckpoint snapshot.
+- `test/integration/dwf-setresult.test.ts` (+4): fresh run (no resumed event),
+  completed run (checkpoint deleted), resume (hydration + dwf.resumed + delete),
+  corrupt checkpoint treated as fresh run.
+### Docs
+- `docs/dynamic-workflows.md` — new "Resume & Checkpoint (round-18 P2-3)" section +
+  defensive-script example.
+- `types/dwf.d.ts` — resume pattern documented in header + `ctx.vars` JSDoc.
+- `package.json` — bumped 1.0.1 → 1.1.0 (minor — new opt-in capability).
+### Out of scope (future rounds)
+- P2-2 VM sandbox — still waiting for `isolated-vm` v1.5 (vm.createContext is not a
+  real security boundary). This was the LAST P2 item.
+### Process-safety follow-up fix (same release)
+A heuristic-based zombie "cleanup" had killed a live interactive main `pi`
+session by accident (uptime/RSS/orphan heuristics match a main session just
+as readily as a real orphaned sub-agent). Fixed authoritatively:
+- **`PI_CREW_KIND=subagent`** env marker, set by `buildPiWorkerArgs`
+  (`src/runtime/pi-args.ts`) on every child-pi spawn. A main session does NOT
+  carry it, so it can never be matched as a sub-agent. (An earlier draft also
+  added a `--crew-subagent` argv flag — removed because pi's strict option
+  parser rejects unknown flags and exits non-zero, which silently broke every
+  `ctx.agent()` call. The env var alone is the authoritative signal.)
+- **`src/runtime/zombie-scanner.ts`** (new): read-only scanner that matches
+  ONLY processes with `PI_CREW_KIND=subagent` AND a dead `PI_CREW_PARENT_PID`.
+  Never matches a main session. Never kills.
+- **`team action='doctor' focus='zombies'`**: renders the safe scan as a
+  human-readable report (zombies vs live sub-agents, with explicit do-not-kill
+  labelling for live parents).
+- **`PI_CREW_KIND`** added to the env allowlist in `child-pi.ts`.
+- **`docs/troubleshooting.md`** + **`.crew/knowledge.md`**: documented the
+  marker + the read-only rule so future agents never repeat the mistake.
+- 8 new unit tests in `test/unit/zombie-scanner.test.ts`, including a
+  regression test asserting the current (main-session) process is never matched.
+### Real-world smoke testing findings (2026-06-24)
+Three bugs were caught by real `team action='run'` smoke tests that the unit
+suite missed (units don't shell out to the real `pi` binary). **All three are
+now fixed.**
+- **Fixed: `--crew-subagent` argv flag broke every `ctx.agent()` call.** Pi's
+  strict option parser exits non-zero on unknown flags. The marker is now the
+  `PI_CREW_KIND=subagent` ENV var only.
+- **Fixed: `ctx.agent({schema, systemPrompt})` silently dropped `systemPrompt`.**
+  The round-13 schema branch used the resolved role persona as the base for the
+  JSON-output instruction, ignoring the caller's explicit persona. Models then
+  returned prose and failed schema validation. Fix: `call.systemPrompt` is now
+  preferred as the base when both are set.
+- **Fixed: `ctx.agent({disableTools: true, maxTurns: 1})` returned `exit null`.**
+  Root cause (found via Phase-0 diagnostic instrumentation) was NOT the
+  final-drain race originally hypothesised — it was an erroneous
+  `killProcessTree` call in the steer-injection path. When `maxTurns` was
+  reached on a `turn_end` event, the code wrote a "wrap up" steer to
+  `child.stdin`; Node's `writable.write()` returns `false` on normal
+  backpressure, which the code mis-treated as a fatal injection failure and
+  killed the worker mid-answer (answer was in stdout; exit came back `null`).
+  The `disableTools` correlation was a red herring — the real trigger was
+  `maxTurns:1` hitting on the first turn. Fix: steer injection is now
+  ADVISORY — a `write() === false` or non-writable stdin is logged, not fatal;
+  the hard-abort at `maxTurns + graceTurns` remains the safety net for genuine
+  runaways. Verified: maxTurns=1 × 10 real-binary runs now 10/10 exit=0 (was
+  ~60% fail pre-fix). Regression guard: `test/unit/child-pi-steer-backpressure.test.ts`
+  (source-contract checks + opt-in real-binary smoke via `PI_CREW_SMOKE=1`).
+## [v0.9.7] — round-17 (P2-4 worktree isolation per agent) (2026-06-23)
+P2-4 feature: `ctx.agent({worktree: true})` spawns the agent in an isolated
+git worktree so parallel file-modifying agents don't conflict. Fully backward
+compatible — `worktree` defaults to false, so existing calls are unchanged.
+### Implementation
+**New helpers** in `src/worktree/worktree-manager.ts`:
+- `prepareAgentWorktree(manifest, opts)` — creates a worktree from HEAD on a
+  unique branch; returns `{path, branch}` or `undefined` when the cwd is not a
+  git repo (graceful fallback).
+- `cleanupAgentWorktree(manifest, worktreePath, branch?)` — removes the
+  worktree dir + branch, and captures a git diff as a side artifact when the
+  worktree has changes (for audit/merge).
+**`ctx.agent()` integration** in `src/runtime/dynamic-workflow-context.ts`:
+- New `worktree?: boolean` field on `AgentCallOpts`.
+- When true, the agent runs with the worktree path as its cwd.
+- Cleanup always runs (success, failure, or agent error) — no worktree leaks.
+- Non-git cwd or creation failure → fall back to normal cwd + `ctx.log()`
+  warning; the agent still runs.
+### Why worktrees for DWF
+DWF scripts commonly fan out parallel agents that each modify files (e.g.
+fixing different modules). Without isolation they race on the working tree.
+`worktree: true` gives each agent its own checkout; the diff is captured as
+an artifact for later merge.
+### Tests (8 new, 60 total in dynamic-workflow-context.test.ts)
+- prepareAgentWorktree creates an isolated worktree
+- cleanupAgentWorktree removes dir + branch (no leak)
+- cleanupAgentWorktree captures a diff artifact when there are changes
+- prepareAgentWorktree returns undefined for non-git cwd (graceful fallback)
+- ctx.agent({worktree:true}) isolates + cleans up (mock)
+- ctx.agent({worktree:false}) uses normal cwd (backward compat)
+- ctx.agent({worktree:true}) falls back gracefully + warns in non-git cwd
+- ctx.agent({worktree:true}) cleans up even when the agent fails
+### Docs
+- `docs/dynamic-workflows.md` — worktree option in API table + example
+- `types/dwf.d.ts` — `worktree?: boolean` on AgentCallOpts
+- `package.json` — bumped 1.0.0 → 1.0.1 (patch, additive opt-in)
+### Out of scope (future rounds)
+- P2-2 VM sandbox — waiting for `isolated-vm` v1.5 (vm.createContext is not
+  a real security boundary)
+- P2-3 Resume/checkpoint — round-18 candidate (large effort)
+## [v0.9.7] — round-16 (P2-1 pipeline primitive) (2026-06-23)
+Adds **`ctx.pipeline(items, ...stages)`** — a multi-stage transform primitive for
+dynamic workflows. **Backward compatible** — existing DWF scripts are unaffected;
+`pipeline` is a new opt-in capability.
+### Feature — Pipeline primitive (P2-1)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `types/dwf.d.ts` + `docs/dynamic-workflows.md`
+Previously the DWF context only offered `ctx.fanOut()` (a single parallel map). The
+new `ctx.pipeline()` chains stages: each item flows through **all stages in sequence**
+(stage 1 → stage 2 → …), while **different items run concurrently**, bounded by the
+workflow concurrency (`mapConcurrent`, the same primitive as `fanOut`).
+Semantics (mirrors `pi-dynamic-workflows`' `pipeline()`):
+- Each stage receives `(previous, original, index)` — `previous` is the prior stage's
+  output (the raw item for the first stage), `original` is the unchanged input item.
+- A failed stage yields `null` for that item, logs `pipeline[i] failed: <msg>` via
+  `ctx.log()`, and the other items continue.
+- On **abort**, the error propagates (it is not swallowed into `null`).
+- Returns `(TResult | null)[]`, order-preserving.
+Signature:
+```ts
+ctx.pipeline<TItem, TResult = unknown>(
+  items: TItem[],
+  ...stages: Array<(previous: TResult, original: TItem, index: number) => Promise<TResult> | TResult>
+): Promise<(TResult | null)[]>;
+```
+Implementation notes:
+- Uses `mapConcurrent(items, concurrency, …)` — NOT unbounded `Promise.all` — so
+  item-level parallelism respects the workflow's configured concurrency. Stages that
+  spawn agents additionally acquire `ctx.semaphore` for agent-level throttling.
+- Validates inputs: non-array first arg → `TypeError`; non-function stage (or no
+  stages) → `TypeError`.
+- Empty items array short-circuits to `[]`.
+- Authoring types (`types/dwf.d.ts`) mirror the runtime signature for IDE IntelliSense.
+Example use case: scan → analyze → review each shard, up to `concurrency` shards at a
+time, with per-shard failure isolation.
+Tests: `test/unit/dynamic-workflow-context.test.ts` — single/multi-stage transforms,
+empty array, failed-stage isolation + logging, TypeError on bad inputs, stage-argument
+contract, async stages, and concurrency-bounded execution.
+## [v0.9.7] — round-15 (P1-4 phase UI) (2026-06-23)
+The progress pane now **renders DWF phase markers** (▶/✓/⏸) by consuming the
+`dwf.phase_started` / `dwf.phase_completed` events emitted by `ctx.phase()`
+(round-12). **Backward compatible** — non-DWF runs are unaffected.
+### Feature — Phase UI in progress-pane (P1-4)
+**Files:** `src/ui/dwf-phase-display.ts` (new) + `progress-pane.ts` +
+`run-snapshot-cache.ts` + `snapshot-types.ts`
+Previously the phase events were produced but the UI did not consume them.
+Now the progress pane shows a phase overview:
+- `▶ Phase: <name>` — currently running phase.
+- `✓ Phase: <name>` — completed phase.
+- `⏸ Phase: <name>` — a phase whose completion scrolled out of the recent-event
+  window and is not the current one.
+Implementation details:
+- New pure-function module `src/ui/dwf-phase-display.ts`:
+  `extractDwfPhaseState(events)` derives phase state from the event window
+  (returns `null` for non-DWF runs); `renderDwfPhaseLines(state, { ascii })`
+  renders markers with Unicode glyphs and ASCII fallbacks (`[>]`/`[v]`/`[ ]`).
+- `RunUiSnapshot` gains an optional `dwfPhaseState` field, computed from the
+  existing tailed `recentEvents` window (no extra I/O) in both sync and async
+  snapshot builders.
+- `progress-pane.ts` renders the phase lines right after the summary line,
+  before the task-based phase grouping. Non-DWF runs produce zero lines.
+- `signatureFor` includes `dwfPhaseState` so cache invalidation reflects phase
+  changes.
+- Tests: `test/unit/dwf-phase-display.test.ts` — phase state tracking from an
+  event sequence (incl. scrolled-off recovery), correct markers (Unicode +
+  ASCII), header gating, and non-DWF snapshots unaffected.
+## [v0.9.7] — round-14 (P1 DX + observability) (2026-06-23)
+Four additive P1 features land in this round — **authoring types**, **per-workflow
+token budget**, **log API**, and **typed args**. **Backward compatible** — existing
+DWF scripts continue to work unchanged. New behavior is opt-in.
+### Feature 1 — Authoring Types / IDE IntelliSense (P1-1)
+**Files:** `types/dwf.d.ts` (new) + `package.json` (`./workflow` export)
+A `.dwf.ts` script can now import the `WorkflowCtx` (and supporting) types from
+the package's `./workflow` export for full TypeScript IntelliSense:
+```ts
+import type { WorkflowCtx } from "pi-crew/workflow";
+export default async function run(ctx: WorkflowCtx): Promise<void> { /* ... */ }
+```
+- New file: `types/dwf.d.ts` — named exports mirroring the runtime types
+  (`WorkflowCtx`, `AgentCallOpts`, `AgentResult`, `WorkflowBudget`, ...).
+- `package.json` gains `"./workflow": { "types": "./types/dwf.d.ts" }` and ships
+  the `types/` directory.
+- New test: `test/unit/dwf-authoring-types.test.ts` — compiles a sample `.dwf.ts`
+  against the export (positive + negative `@ts-expect-error` check).
+### Feature 2 — Per-Workflow Token Budget (P1-2)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + dispatch wiring
+`ctx.budget` is a frozen `{total, spent(), remaining()}` surface. When a
+per-workflow token budget is set, `ctx.agent()` auto-rejects with `ok:false`
+(`"workflow token budget exhausted"`) **before** spawning a child worker. `spent()`
+accumulates each run's reported `usage.input + usage.output`.
+- `total` is `null` (unbounded) by default; `remaining()` is `Infinity` then.
+- New `MakeWorkflowCtxOptions.tokenBudget`, `WorkflowConfig.maxTokenBudget`,
+  `RunDynamicWorkflowInput.tokenBudget`, and the `team run` `tokenBudget` param
+  (param overrides the workflow value).
+- Budget check + accumulation wired into `ctx.agent()`; budget object passed
+  through `run.ts` and `background-runner.ts`.
+- Tests: 7 unit cases (default null, set total, spent/remaining, exhaustion,
+  accumulation from the mock's `{input:10,output:5}`, frozen check).
+### Feature 3 — Log API (P1-3)
+**Files:** `src/state/contracts.ts` + `src/runtime/dynamic-workflow-context.ts`
+`ctx.log(message)` appends a workflow-level log line: stringifies non-strings,
+keeps a bounded in-memory copy (capped at **1000**), and always emits a durable
+`dwf.log` event (`{message}`) to the run's `events.jsonl`.
+- New event type `"dwf.log"` in `TEAM_EVENT_TYPES` (non-terminal).
+- New runner-only `getWorkflowLogs(ctx)` accessor (mirrors `getWorkflowPhaseState`).
+- Tests: 4 unit cases (append, stringify, event emission, 1000-cap) + 1
+  integration case (end-to-end through `runDynamicWorkflow`).
+### Feature 4 — Typed Args (P1-5)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `src/state/types.ts` +
+`src/schema/team-tool-schema.ts` + `src/state/state-store.ts` + dispatch wiring
+`ctx.args<T>()` returns typed workflow arguments (sourced from `manifest.args`,
+passed via the run `args` param). Defaults to `{}` when unset.
+- New `TeamRunManifest.args`, `MakeWorkflowCtxOptions.args`, `createRunManifest`
+  `args` param, and the `team run` `args` schema field (`Type.Unsafe`, any JSON
+  value — avoids `any`).
+- The runner reads `manifest.args` and forwards it to `makeWorkflowCtx`.
+- Tests: 3 unit cases (default `{}`, typed object, array) + 1 integration case
+  (reads `manifest.args` end-to-end).
+### Other
+- Version bumped `0.9.7` → `0.9.8`.
+- `docs/dynamic-workflows.md`: API table rows + Log/Budget/Args/Authoring-types
+  sections.
+## [v0.9.7] — round-13 (P0 AST determinism + structured output + abort cleanup) (2026-06-23)
+Three P0 features land in this round. **Backward compatible** — existing DWF
+scripts continue to work unchanged. New behavior is opt-in via the `schema`
+field on `ctx.agent()` and an env-var escape hatch for the determinism check.
+### Feature 1 — AST Determinism Check (P0-2)
+**File:** `src/runtime/deterministic-ast.ts` (new) +
+`src/runtime/dynamic-workflow-runner.ts` (integration)
+Dynamic workflow scripts must now be **deterministic**. The runner parses each
+`.dwf.ts` with `acorn` and walks the AST, rejecting `Date.now()`,
+`Math.random()`, and `new Date()` calls before `jiti` executes the script.
+Two runs of the same script against the same inputs now produce the same
+outputs — critical for regression testing and workflow replay.
+Why AST, not regex: regex matches `Date.now()` everywhere — including string
+literals, comments, and prompt text. AST walking distinguishes **calls** from
+strings, so prompts that say *"avoid `Date.now()` in your code"* still parse
+cleanly. Other `Date.*` and `Math.*` methods (`Date.parse`, `Date.UTC`,
+`Math.floor`, `Math.max`, etc.) are accepted — only `now` and `random` are
+blocked.
+- New dep: `acorn ^8.14.0` (small, well-maintained; verified Node ≥22 ESM/strip-types compatibility)
+- New file: `src/runtime/deterministic-ast.ts` (determinism walker; MIT-licensed adaptation from pi-dynamic-workflows, attribution in `NOTICE.md`)
+- New file: `test/unit/deterministic-ast.test.ts` (27 cases: accepts/rejects every form, comments, template literals, computed properties, parse-error delegation)
+- New tests in `test/integration/dwf-setresult.test.ts` (5 end-to-end cases including env-var opt-out)
+**Escape hatch:** `PI_CREW_DWF_SKIP_DETERMINISM_CHECK=1` bypasses the check for
+power users who legitimately need time/random (e.g. randomized benchmark
+scripts). Off by default.
+### Feature 2 — Structured Output Helper (P0-3)
+**Files:** `src/runtime/result-extractor.ts` + `src/runtime/dynamic-workflow-context.ts`
+`AgentCallOpts` gains an optional `schema?: TSchema` field (TypeBox). When set,
+`ctx.agent()` validates the extracted JSON against the schema via
+`@sinclair/typebox`'s `Value.Check`. Mismatch yields
+`{ok: false, error: "structured output does not match schema: ..."}` instead
+of an untyped `structured: { ... }` blob.
+How the runner helps the model comply:
+- Appends a JSON-output directive to the prompt.
+- Replaces the agent's system prompt suffix with a "structured-output
+  assistant" preamble that describes the schema's shape.
+When `schema` is **omitted**, behavior is byte-identical to the previous
+regex-based extractor — verified by the existing 30+ test cases plus 9 new
+schema-specific cases in `test/unit/result-extractor.test.ts` and 4 new
+end-to-end cases in `test/unit/dynamic-workflow-context.test.ts`.
+Caveat: pi-crew DWF spawns `pi` as a subprocess (`runChildPi`), not an
+in-memory `createAgentSession`. Subprocess structured output is captured via
+the same event-stream → JSON-line → schema-check pipeline used for everything
+else, so this round ships Option B (regex-extract + schema validation
+post-hoc). Option A (in-process terminating tool) is planned for round-14.
+### Feature 3 — Abort Listener Cleanup (P0-5) — NO-OP (already fixed in round 27)
+Audited `src/runtime/child-pi.ts` for AbortSignal listener leaks. The fix was
+landed in round 27 (BUG 4): both the `onParentAbort` flag handler and the
+`abort` cancellation handler are now removed inside `settle()` regardless of
+the exit path (normal completion, response timeout, hard kill, parent abort,
+forced final drain). On runs with >10 tasks sharing one AbortSignal (the
+common pattern under `background-runner`), this prevents the
+`MaxListenersExceededWarning` and per-task closure capture that previously
+pinned the worker stack frame in memory.
+No code changes needed in round 13. Documented for the audit trail.
+### Verification
+- `npm run typecheck` — clean
+- `npx tsc --noEmit` — clean
+- 31 new unit tests across `deterministic-ast.test.ts` (27) and 9 schema-validation cases in `result-extractor.test.ts` and 4 new cases in `dynamic-workflow-context.test.ts` — all pass
+- 5 new integration tests in `dwf-setresult.test.ts` — all pass
+- 0 regressions in the existing 4 round-12 tests
+## [v0.9.7] — round-12: DWF phases + structured-clone guard (2026-06-23)
+Two additive P0 features for dynamic-workflow (DWF) scripts, both fully
+backward-compatible (existing scripts continue to work unchanged). Researched
+and adopted from the public `pi-dynamic-workflows` (Michaelliv/v1.0.1)
+package — full comparison and adoption plan in
+`.crew/artifacts/team_20260623095016_b693d3f967f88048/shared/06_synthesize.md`.
+### Feature 1: `ctx.phase(title)` runtime phase API (P0-1)
+`WorkflowCtx` gains a new `phase(title: string): void` method. The orphan
+`dwf.phase_started` / `dwf.phase_completed` event types — declared in
+`src/state/contracts.ts:89-93` since v0.9.0 but never produced by any
+producer — finally have a producer. Use cases:
+- Group `ctx.agent()` calls under logical phases (e.g. "Scan", "Audit",
+  "Review") so downstream UI and log readers can group by phase.
+- Emit a clear phase boundary to the run's `events.jsonl` without writing
+  custom event-log code.
+- Drive live progress reporting from the script itself.
+Semantics:
+- Validates `title` is a non-empty string (throws `TypeError` otherwise).
+- Idempotent: calling `ctx.phase("Scan")` twice does not emit a duplicate
+  event or change state.
+- When a previous phase is still open, emits `dwf.phase_completed` for it
+  **before** emitting `dwf.phase_started` for the new one (consumers never
+  see two open phases at once).
+- The in-memory `phases[]` list (read-only via `getWorkflowPhaseState`,
+  mirrors the `__finalResult` non-enumerable getter pattern) is deduped and
+  capped at **100 distinct titles** to bound memory. Events still flow
+  past the cap — the events log is the durable source of truth.
+- The runner **auto-closes the last open phase** before emitting
+  `dwf.completed`, so a script that ends mid-phase still produces a
+  well-formed event sequence.
+**Files changed:**
+- `src/runtime/dynamic-workflow-context.ts` — interface, implementation,
+  `__phaseState` getter, `getWorkflowPhaseState` helper
+- `src/runtime/dynamic-workflow-runner.ts` — auto-close on completion
+### Feature 2: structured-clone guard at the runner boundary (P0-4)
+Defensive `assertStructuredCloneable(value, name)` helper applied to the
+final artifact content and `manifest.summary` before they reach
+`writeArtifact` and the run-event-bus emitter. Today this is mostly
+future-proofing (the artifact file is read as a string, and strings are
+always structured-cloneable), but the guard surfaces a clear, actionable
+error pointing at the most common cause — forgetting `await` on
+`ctx.agent()` / `ctx.review()` — instead of letting a cryptic
+`DataCloneError` leak from deep inside the artifact store.
+**Files changed:**
+- `src/runtime/dynamic-workflow-runner.ts` — `assertStructuredCloneable`
+  helper, applied to `finalText` and `summaryText` (slice)
+### Tests
+- 7 new unit tests in `test/unit/dynamic-workflow-context.test.ts`
+  (emission, idempotency, validation, sequence, helper, dedup, 100-cap).
+- 1 new integration test in `test/integration/dwf-setresult.test.ts`
+  (end-to-end phase event sequence, including runner auto-close).
+- All 23 existing DWF unit tests still pass; both pre-existing integration
+  tests still pass.
+### Docs
+- `docs/dynamic-workflows.md` — updated WorkflowCtx example to use
+  `ctx.phase("Scan")` / `ctx.phase("Audit")`; added a `ctx.phase` row to
+  the API table; added a "Phases (round-12)" subsection explaining
+  semantics, idempotency, and the 100-cap.
+### Out of scope (planned for future rounds)
+- AST determinism check (P0-2)
+- Structured output helper (P0-3)
+- Abort listener cleanup pattern (P0-5)
+- Authoring types / IDE IntelliSense (P1-1)
+- Token budget (P1-2)
+- Phase UI in `progress-pane` (P1-4)
+- Pipeline primitive (P2-1)
+- `isolated-vm` sandbox (P2-2, planned for v1.5)
 ## [v0.9.5] — fix "team run hangs forever at 25%" (2026-06-23)
 Two coupled runtime bugs caused the recurring "run stuck at 25% (1/4)" failure

package/README.md CHANGED Viewed

@@ -197,7 +197,7 @@ background-dispatch discriminator.
 - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
 - **Health scoring** — penalty-based run health with time-series snapshots
 - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
-- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
 ---

package/docs/HARNESS_BACKLOG.md CHANGED Viewed

@@ -14,7 +14,13 @@ Use when an agent discovers a missing harness capability but should not change t
 **Risk**: normal
-**Status**: proposed
+**Status**: ✅ PARTIALLY DONE (2026-06-24). The bulk of HB-001 was already
+covered by 21 existing `test/integration/` files (team-runner path via
+`mock-child-run`, `full-feature-smoke`, `phase3-6-*`). The genuine remaining
+gap — interleaved manifest+task+event writes reloaded consistently (the
+realistic run-load pattern) — is now covered by
+`test/integration/state-durability-hb001.test.ts`. Child-process exit →
+state-store reconcile is covered by `async-restart-recovery.test.ts`.
 ### HB-002: Windows-specific test coverage
@@ -26,7 +32,12 @@ Use when an agent discovers a missing harness capability but should not change t
 **Risk**: normal
-**Status**: proposed
+**Status**: ✅ DONE (2026-06-24). `test/platform/` ships with two files:
+`windows-rename.test.ts` (EBUSY/EPERM rename retry path via `renameWithRetry`,
+self-skips off win32) and `posix-tools.test.ts` (BSD-vs-GNU grep, /var →
+/private/var realpath, POSIX-shell resolution — self-skips on win32).
+Runbook in `test/platform/README.md`. The CI OS matrix (ubuntu/windows/macos)
+exercises each platform's tests.
 ### HB-003: Performance regression baseline
@@ -38,4 +49,41 @@ Use when an agent discovers a missing harness capability but should not change t
 **Risk**: tiny
-**Status**: proposed
+**Status**: ✅ DONE (2026-06-24). `test/bench/` now has 6 benchmarks:
+the pre-existing `register-startup`, `render-flush`, `snapshot-cache`, plus
+three new ones covering the gaps HB-003 flagged — `atomic-write.bench.ts`
+(`atomicWriteJson` cold/warm), `event-append.bench.ts` (serial lock
+contention vs batch), `task-graph-scheduler.bench.ts` (DAG build/refresh/
+full-run). All run via `npm run bench` → `test/bench/results.json`; baseline
+via `npm run bench:capture`. Each prints min/p50/p95/p99/max percentiles.
+### HB-004: Real-binary smoke tests for ctx.agent() paths
+**Discovered while**: Real-world `team action='run'` smoke testing on 2026-06-24
+caught three bugs that the unit suite (which mocks child-pi) missed entirely.
+**Current pain**: The unit tests for `dynamic-workflow-context.ts` and
+`child-pi.ts` use `PI_TEAMS_MOCK_CHILD_PI` and never shell out to the real `pi`
+binary. As a result they cannot catch:
+  - argv flags the real `pi` rejects (e.g. the `--crew-subagent` regression),
+  - env/persona interactions that change real model output (e.g. the
+    schema+systemPrompt drop),
+  - exit-code races in the real spawn lifecycle (e.g. the
+    `disableTools:true` → `exit null` race).
+**Suggested improvement**: Add `test/smoke/` (gated behind a `PI_CREW_SMOKE=1`
+env so CI doesn't bill tokens by default) that runs real `.dwf.ts` workflows
+end-to-end via `team action='run'` and asserts on the resulting
+`events.jsonl` + `summary.md`. One workflow per feature family
+(phase/log/pipeline/agent/schema/worktree). Document the runbook in
+`docs/troubleshooting.md`.
+**Risk**: normal (token cost when run; otherwise read-only)
+**Status**: ✅ DONE (2026-06-24). `test/smoke/` shipped with 5 smoke tests
+(argv-flags, agent-plain, agent-schema, agent-disabletools, dwf-workflow),
+all gated behind `PI_CREW_SMOKE=1`. `npm run test:smoke` runs them. CI
+manual-dispatch workflow at `.github/workflows/smoke.yml` (requires
+`PI_AUTH_JSON` secret). Runbook in `docs/troubleshooting.md`. Each smoke test
+maps to a real bug it would have caught (HB-003a, the schema+systemPrompt
+drop, the `--crew-subagent` argv regression).