npm - pi-crew - Versions diffs - 0.9.5 → 0.9.8 - Mend

pi-crew 0.9.5 → 0.9.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/CHANGELOG.md +556 -0
package/README.md +10 -3
package/docs/HARNESS_BACKLOG.md +51 -3
package/docs/dynamic-workflows.md +315 -2
package/docs/fix-plan-disabletools-exit-null.md +219 -0
package/docs/troubleshooting.md +76 -0
package/package.json +10 -3
package/src/config/defaults.ts +8 -4
package/src/extension/team-tool/doctor.ts +14 -0
package/src/extension/team-tool/run.ts +2 -0
package/src/runtime/background-runner.ts +1 -1
package/src/runtime/capability-inventory.ts +20 -1
package/src/runtime/child-pi.ts +109 -11
package/src/runtime/deterministic-ast.ts +161 -0
package/src/runtime/dwf-state-store.ts +97 -0
package/src/runtime/dynamic-workflow-context.ts +381 -7
package/src/runtime/dynamic-workflow-runner.ts +93 -2
package/src/runtime/pi-args.ts +11 -0
package/src/runtime/result-extractor.ts +72 -7
package/src/runtime/task-output-context.ts +25 -9
package/src/runtime/team-runner.ts +8 -3
package/src/runtime/zombie-scanner.ts +297 -0
package/src/schema/team-tool-schema.ts +28 -0
package/src/skills/discover-skills.ts +61 -8
package/src/skills/validate.ts +267 -0
package/src/state/contracts.ts +1 -0
package/src/state/state-store.ts +3 -0
package/src/state/types.ts +9 -0
package/src/ui/dashboard-panes/progress-pane.ts +5 -0
package/src/ui/dwf-phase-display.ts +151 -0
package/src/ui/keybinding-map.ts +128 -41
package/src/ui/run-event-bus.ts +83 -0
package/src/ui/run-snapshot-cache.ts +4 -0
package/src/ui/snapshot-types.ts +3 -0
package/src/workflows/workflow-config.ts +3 -0
package/src/worktree/worktree-manager.ts +94 -0
package/types/dwf.d.ts +187 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,561 @@
 # Changelog
+## [v0.9.8] — deer-flow learning integration: L1/L2/L3/L4 (2026-06-24)
+Four improvements distilled from researching [bytedance/deer-flow](https://github.com/bytedance/deer-flow) and the wider Pi-ecosystem (pi-boomerang, pi-subagents, pi-dynamic-workflows). Each was calibrated against real pi-crew code (the research over-reported gaps — several patterns pi-crew already does *better* than deer-flow) and sized from measured data, not guesses.
+### L3 — Strict SKILL.md frontmatter validation (commit 5348c47)
+Malformed skills now **fail-fast at discovery** instead of silently producing broken behavior at runtime. New `src/skills/validate.ts` validates frontmatter against the `ALLOWED_SKILL_PROPS` whitelist using a **HYBRID policy**:
+- **HARD errors** (missing/malformed `name` or `description`, type mismatches) → skill excluded from `discoverSkills()`.
+- **SOFT warnings** (unknown props like `origin`/`triggers`, missing `name` derived from directory) → skill kept, surfaced via `getLastDiscoveryDiagnostics()` / `buildSkillValidationDiagnostics()`.
+Replaces the fragile line-prefix parser (broke on multi-line folded scalars `description: >`, quoted strings, nested YAML) with the `yaml` package (^2.9.0, already transitive, added as direct dep — zero install cost). Back-compat preserved: missing `name` derives from the directory; no-frontmatter skills still load with empty description.
+**Bonus value**: pre-flight on the real environment surfaced 2 user skills that were silently broken (`agent-browser`: `allowed-tools` wrong type; `spike-wrap-up`: `<>` in description).
+### L2 — Data-driven keybinding dispatch (commit 35fc3c6)
+Replaced the 30-line imperative `if (includes(...))` chain in `dashboardActionForKey` with a single `for (const b of BINDINGS)` loop driven by a declarative `BINDINGS[]` table. Adding a key now means editing ONE place (the table) instead of two (table + dispatch) — removes the DRY violation that caused table-vs-dispatch drift. `KEY_RESERVED` is now exported and derived.
+Behavior is **provably identical** to the old chain: a golden-snapshot parity test asserts every `(data, activePane)` pair returns the same action (~190 pairs). Pane-scoped bindings (`mailbox-detail`, `health-*`) precede their generic competitors so first-match-wins reproduces old precedence.
+The `inTextInput` guard from the original plan was **intentionally skipped** — overlays are mutually exclusive and each handles its own input (`mailbox-compose-overlay.ts` captures every single-char key), so there is no leak path. Documented in the commit.
+### L1 — RunEventBus.onWithReplay catch-up primitive (commit a2a478b)
+Closes the transient-subscriber-absence gap: when an overlay/widget is disposed and recreated (toggle, reconnect), live events emitted in that window are lost as notification triggers. `onWithReplay(runId, eventsPath, lastSeenSeq, callback)` replays missed events from the durable JSONL log before attaching the live listener, then dedups via `metadata.seq` so each event fires exactly once.
+Unlike deer-flow's 256-event RAM ring buffer (lost on crash), this reuses pi-crew's existing `readEventsCursor` — O(new bytes) via byte-offset incremental reads, monotonic seq, tail-capped. Strictly better: survives crashes, bounded memory. `RunEventPayload` gains optional `seq`; `emitFromTeamEvent` stamps it.
+The **primitive is landed + fully tested** (7 cases: replay order, dedup race, transient live-only, cursor bound, sinceSeq filter, missing-log fallback, unsubscribe). Dashboard wiring (switching `onAny()` → `onWithReplay()` per-run) is deferred — the dashboard subscribes across multiple runs and needs a subscription-model refactor; state isn't lost during absence anyway (`run-snapshot-cache` rebuilds from disk, TTL 1500ms).
+### L4 — Data-driven output thresholds + head/tail compaction (commit 463d08d)
+Worker output was being truncated at 3 points with thresholds sized by guess, not data. Measured 27 real result artifacts: **max 9226 bytes, median 8272, 100% under 16KB**. The old thresholds cut **62% of real outputs** (head-only, no recovery path). This change sizes thresholds from that data and switches compaction from head-only to head+tail so closing markdown structure (code fences, headings) survives.
+| Threshold | Before | After |
+|---|---|---|
+| `maxAssistantTextChars` | 8192 | **16384** |
+| `maxToolResultChars` | 1024 | **8192** |
+| `maxCompactContentChars` | 4096 | **8192** |
+| `maxToolInputChars` | 2048 | **4096** |
+| `readIfSmall` (3 inconsistent values) | 24K/40K/80K | **single 32KB** |
+| Compaction shape | head-only | **head(75%)+tail(25%)** |
+**Why not caveman-shrink** (the alternative considered): tested it on the same 27 artifacts — only 3.9% compression (vs 42% on prose fixtures) because pi-crew output is code-citation-heavy with little prose to strip, AND it has a real data-loss bug (`funccall` protected-pattern eats sentinel placeholders for the `identifier (inline-code)` pattern, corrupting 24/27 files with null bytes, 127 inline codes lost). caveman's *concept* (detect/validate) is worth borrowing but its engine doesn't fit pi-crew's content type. Threshold-only wins on the data.
+### Tests & verification
+- 10 new L4 tests, 25 L3 validator tests, 7 L1 replay tests, 7 L2 parity tests.
+- `npm run typecheck` + `check:lazy-imports` green.
+- End-to-end team-run smoke tests confirm all 4 features load and run without crash.
+- Real-world smoke scripts at `test/manual/l{1,2,3}-*-smoke.mjs`.
+- Research artifacts at `source/deer-flow/.research/` + `.crew/research/worker-output-handling.md`.
+### Backward compatibility
+All four changes are additive or behavior-preserving:
+- L3: valid skills unaffected; only malformed ones now excluded (was: silent breakage).
+- L2: golden-snapshot parity test proves identical dispatch.
+- L1: new method added; existing `on`/`onAny`/`emit` unchanged.
+- L4: outputs that fit (100% of measured real outputs) are unchanged; only oversized ones now keep head+tail instead of head-only.
+## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
+P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
+script crashes (timeout, OOM, agent error) between `ctx.agent()` calls, the runner now
+persists a checkpoint after every agent call so `team action='resume' runId='X'` can
+continue from the last checkpoint instead of re-running from scratch. **Backward
+compatible** — fresh runs (no checkpoint) behave exactly as before.
+### Implementation
+**New file** `src/runtime/dwf-state-store.ts` (`DwfStore`):
+- Atomic CRUD for a single run's DWF checkpoint, modeled on `GoalStore` /
+  `FileCheckpointStore`.
+- Persists `DwfCheckpointState` (vars, phases, currentPhase, logs, spent, agentCount,
+  updatedAt) to `<stateRoot>/dwf-checkpoint.json` via `atomicWriteJson`.
+- `load()` returns `undefined` for a missing/corrupt checkpoint (fresh run); `delete()`
+  is best-effort and never throws.
+**`ctx.agent()` checkpoint hook** in `src/runtime/dynamic-workflow-context.ts`:
+- New `MakeWorkflowCtxOptions.onCheckpoint?: (state) => void` — invoked after each
+  `ctx.agent()` call (success OR fail) so a crash between calls leaves durable state.
+- New `MakeWorkflowCtxOptions.resumedState?` — hydrates `ctx.vars`, phase state, logs,
+  `budget.spent()`, and `agentCount` from the checkpoint on resume.
+- New closure counter `agentCount` (incremented in `agent()`'s `finally`), exposed via a
+  non-enumerable `__agentCount` getter.
+- New `getWorkflowCheckpoint(ctx)` helper (mirror of `getWorkflowPhaseState`).
+**Runner wiring** in `src/runtime/dynamic-workflow-runner.ts`:
+- On run start: `DwfStore.load()` → hydrate ctx (`resumedState`) + emit `dwf.resumed`.
+- `onCheckpoint` → `DwfStore.save()` (best-effort, errors swallowed).
+- On clean completion: `DwfStore.delete()` so a re-run starts fresh.
+### Resume semantics
+`team action='resume' runId='X'` re-dispatches with `runKind='dynamic-workflow'`. The
+runner loads the checkpoint, hydrates `ctx.vars`/phases/logs, and re-executes the
+script from the top. Scripts SHOULD be written defensively — check `ctx.vars.lastPhase`
+to skip completed work (documented in `docs/dynamic-workflows.md`). No partial-resume of
+a single agent call (it re-runs from scratch); checkpoints are written AFTER an agent
+completes, never before.
+### Tests (14 new)
+- `test/unit/dwf-state-store.test.ts` (10): save/load round-trip, missing→undefined,
+  delete, corrupt-file resilience, path layout, dir creation, large-state preservation.
+- `test/unit/dynamic-workflow-context.test.ts` (+8): onCheckpoint fires on success/fail,
+  agentCount accumulation, backward-compat (no callback), resumedState hydration,
+  shallow-copy isolation, getWorkflowCheckpoint snapshot.
+- `test/integration/dwf-setresult.test.ts` (+4): fresh run (no resumed event),
+  completed run (checkpoint deleted), resume (hydration + dwf.resumed + delete),
+  corrupt checkpoint treated as fresh run.
+### Docs
+- `docs/dynamic-workflows.md` — new "Resume & Checkpoint (round-18 P2-3)" section +
+  defensive-script example.
+- `types/dwf.d.ts` — resume pattern documented in header + `ctx.vars` JSDoc.
+- `package.json` — bumped 1.0.1 → 1.1.0 (minor — new opt-in capability).
+### Out of scope (future rounds)
+- P2-2 VM sandbox — still waiting for `isolated-vm` v1.5 (vm.createContext is not a
+  real security boundary). This was the LAST P2 item.
+### Process-safety follow-up fix (same release)
+A heuristic-based zombie "cleanup" had killed a live interactive main `pi`
+session by accident (uptime/RSS/orphan heuristics match a main session just
+as readily as a real orphaned sub-agent). Fixed authoritatively:
+- **`PI_CREW_KIND=subagent`** env marker, set by `buildPiWorkerArgs`
+  (`src/runtime/pi-args.ts`) on every child-pi spawn. A main session does NOT
+  carry it, so it can never be matched as a sub-agent. (An earlier draft also
+  added a `--crew-subagent` argv flag — removed because pi's strict option
+  parser rejects unknown flags and exits non-zero, which silently broke every
+  `ctx.agent()` call. The env var alone is the authoritative signal.)
+- **`src/runtime/zombie-scanner.ts`** (new): read-only scanner that matches
+  ONLY processes with `PI_CREW_KIND=subagent` AND a dead `PI_CREW_PARENT_PID`.
+  Never matches a main session. Never kills.
+- **`team action='doctor' focus='zombies'`**: renders the safe scan as a
+  human-readable report (zombies vs live sub-agents, with explicit do-not-kill
+  labelling for live parents).
+- **`PI_CREW_KIND`** added to the env allowlist in `child-pi.ts`.
+- **`docs/troubleshooting.md`** + **`.crew/knowledge.md`**: documented the
+  marker + the read-only rule so future agents never repeat the mistake.
+- 8 new unit tests in `test/unit/zombie-scanner.test.ts`, including a
+  regression test asserting the current (main-session) process is never matched.
+### Real-world smoke testing findings (2026-06-24)
+Three bugs were caught by real `team action='run'` smoke tests that the unit
+suite missed (units don't shell out to the real `pi` binary). **All three are
+now fixed.**
+- **Fixed: `--crew-subagent` argv flag broke every `ctx.agent()` call.** Pi's
+  strict option parser exits non-zero on unknown flags. The marker is now the
+  `PI_CREW_KIND=subagent` ENV var only.
+- **Fixed: `ctx.agent({schema, systemPrompt})` silently dropped `systemPrompt`.**
+  The round-13 schema branch used the resolved role persona as the base for the
+  JSON-output instruction, ignoring the caller's explicit persona. Models then
+  returned prose and failed schema validation. Fix: `call.systemPrompt` is now
+  preferred as the base when both are set.
+- **Fixed: `ctx.agent({disableTools: true, maxTurns: 1})` returned `exit null`.**
+  Root cause (found via Phase-0 diagnostic instrumentation) was NOT the
+  final-drain race originally hypothesised — it was an erroneous
+  `killProcessTree` call in the steer-injection path. When `maxTurns` was
+  reached on a `turn_end` event, the code wrote a "wrap up" steer to
+  `child.stdin`; Node's `writable.write()` returns `false` on normal
+  backpressure, which the code mis-treated as a fatal injection failure and
+  killed the worker mid-answer (answer was in stdout; exit came back `null`).
+  The `disableTools` correlation was a red herring — the real trigger was
+  `maxTurns:1` hitting on the first turn. Fix: steer injection is now
+  ADVISORY — a `write() === false` or non-writable stdin is logged, not fatal;
+  the hard-abort at `maxTurns + graceTurns` remains the safety net for genuine
+  runaways. Verified: maxTurns=1 × 10 real-binary runs now 10/10 exit=0 (was
+  ~60% fail pre-fix). Regression guard: `test/unit/child-pi-steer-backpressure.test.ts`
+  (source-contract checks + opt-in real-binary smoke via `PI_CREW_SMOKE=1`).
+## [v0.9.7] — round-17 (P2-4 worktree isolation per agent) (2026-06-23)
+P2-4 feature: `ctx.agent({worktree: true})` spawns the agent in an isolated
+git worktree so parallel file-modifying agents don't conflict. Fully backward
+compatible — `worktree` defaults to false, so existing calls are unchanged.
+### Implementation
+**New helpers** in `src/worktree/worktree-manager.ts`:
+- `prepareAgentWorktree(manifest, opts)` — creates a worktree from HEAD on a
+  unique branch; returns `{path, branch}` or `undefined` when the cwd is not a
+  git repo (graceful fallback).
+- `cleanupAgentWorktree(manifest, worktreePath, branch?)` — removes the
+  worktree dir + branch, and captures a git diff as a side artifact when the
+  worktree has changes (for audit/merge).
+**`ctx.agent()` integration** in `src/runtime/dynamic-workflow-context.ts`:
+- New `worktree?: boolean` field on `AgentCallOpts`.
+- When true, the agent runs with the worktree path as its cwd.
+- Cleanup always runs (success, failure, or agent error) — no worktree leaks.
+- Non-git cwd or creation failure → fall back to normal cwd + `ctx.log()`
+  warning; the agent still runs.
+### Why worktrees for DWF
+DWF scripts commonly fan out parallel agents that each modify files (e.g.
+fixing different modules). Without isolation they race on the working tree.
+`worktree: true` gives each agent its own checkout; the diff is captured as
+an artifact for later merge.
+### Tests (8 new, 60 total in dynamic-workflow-context.test.ts)
+- prepareAgentWorktree creates an isolated worktree
+- cleanupAgentWorktree removes dir + branch (no leak)
+- cleanupAgentWorktree captures a diff artifact when there are changes
+- prepareAgentWorktree returns undefined for non-git cwd (graceful fallback)
+- ctx.agent({worktree:true}) isolates + cleans up (mock)
+- ctx.agent({worktree:false}) uses normal cwd (backward compat)
+- ctx.agent({worktree:true}) falls back gracefully + warns in non-git cwd
+- ctx.agent({worktree:true}) cleans up even when the agent fails
+### Docs
+- `docs/dynamic-workflows.md` — worktree option in API table + example
+- `types/dwf.d.ts` — `worktree?: boolean` on AgentCallOpts
+- `package.json` — bumped 1.0.0 → 1.0.1 (patch, additive opt-in)
+### Out of scope (future rounds)
+- P2-2 VM sandbox — waiting for `isolated-vm` v1.5 (vm.createContext is not
+  a real security boundary)
+- P2-3 Resume/checkpoint — round-18 candidate (large effort)
+## [v0.9.7] — round-16 (P2-1 pipeline primitive) (2026-06-23)
+Adds **`ctx.pipeline(items, ...stages)`** — a multi-stage transform primitive for
+dynamic workflows. **Backward compatible** — existing DWF scripts are unaffected;
+`pipeline` is a new opt-in capability.
+### Feature — Pipeline primitive (P2-1)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `types/dwf.d.ts` + `docs/dynamic-workflows.md`
+Previously the DWF context only offered `ctx.fanOut()` (a single parallel map). The
+new `ctx.pipeline()` chains stages: each item flows through **all stages in sequence**
+(stage 1 → stage 2 → …), while **different items run concurrently**, bounded by the
+workflow concurrency (`mapConcurrent`, the same primitive as `fanOut`).
+Semantics (mirrors `pi-dynamic-workflows`' `pipeline()`):
+- Each stage receives `(previous, original, index)` — `previous` is the prior stage's
+  output (the raw item for the first stage), `original` is the unchanged input item.
+- A failed stage yields `null` for that item, logs `pipeline[i] failed: <msg>` via
+  `ctx.log()`, and the other items continue.
+- On **abort**, the error propagates (it is not swallowed into `null`).
+- Returns `(TResult | null)[]`, order-preserving.
+Signature:
+```ts
+ctx.pipeline<TItem, TResult = unknown>(
+  items: TItem[],
+  ...stages: Array<(previous: TResult, original: TItem, index: number) => Promise<TResult> | TResult>
+): Promise<(TResult | null)[]>;
+```
+Implementation notes:
+- Uses `mapConcurrent(items, concurrency, …)` — NOT unbounded `Promise.all` — so
+  item-level parallelism respects the workflow's configured concurrency. Stages that
+  spawn agents additionally acquire `ctx.semaphore` for agent-level throttling.
+- Validates inputs: non-array first arg → `TypeError`; non-function stage (or no
+  stages) → `TypeError`.
+- Empty items array short-circuits to `[]`.
+- Authoring types (`types/dwf.d.ts`) mirror the runtime signature for IDE IntelliSense.
+Example use case: scan → analyze → review each shard, up to `concurrency` shards at a
+time, with per-shard failure isolation.
+Tests: `test/unit/dynamic-workflow-context.test.ts` — single/multi-stage transforms,
+empty array, failed-stage isolation + logging, TypeError on bad inputs, stage-argument
+contract, async stages, and concurrency-bounded execution.
+## [v0.9.7] — round-15 (P1-4 phase UI) (2026-06-23)
+The progress pane now **renders DWF phase markers** (▶/✓/⏸) by consuming the
+`dwf.phase_started` / `dwf.phase_completed` events emitted by `ctx.phase()`
+(round-12). **Backward compatible** — non-DWF runs are unaffected.
+### Feature — Phase UI in progress-pane (P1-4)
+**Files:** `src/ui/dwf-phase-display.ts` (new) + `progress-pane.ts` +
+`run-snapshot-cache.ts` + `snapshot-types.ts`
+Previously the phase events were produced but the UI did not consume them.
+Now the progress pane shows a phase overview:
+- `▶ Phase: <name>` — currently running phase.
+- `✓ Phase: <name>` — completed phase.
+- `⏸ Phase: <name>` — a phase whose completion scrolled out of the recent-event
+  window and is not the current one.
+Implementation details:
+- New pure-function module `src/ui/dwf-phase-display.ts`:
+  `extractDwfPhaseState(events)` derives phase state from the event window
+  (returns `null` for non-DWF runs); `renderDwfPhaseLines(state, { ascii })`
+  renders markers with Unicode glyphs and ASCII fallbacks (`[>]`/`[v]`/`[ ]`).
+- `RunUiSnapshot` gains an optional `dwfPhaseState` field, computed from the
+  existing tailed `recentEvents` window (no extra I/O) in both sync and async
+  snapshot builders.
+- `progress-pane.ts` renders the phase lines right after the summary line,
+  before the task-based phase grouping. Non-DWF runs produce zero lines.
+- `signatureFor` includes `dwfPhaseState` so cache invalidation reflects phase
+  changes.
+- Tests: `test/unit/dwf-phase-display.test.ts` — phase state tracking from an
+  event sequence (incl. scrolled-off recovery), correct markers (Unicode +
+  ASCII), header gating, and non-DWF snapshots unaffected.
+## [v0.9.7] — round-14 (P1 DX + observability) (2026-06-23)
+Four additive P1 features land in this round — **authoring types**, **per-workflow
+token budget**, **log API**, and **typed args**. **Backward compatible** — existing
+DWF scripts continue to work unchanged. New behavior is opt-in.
+### Feature 1 — Authoring Types / IDE IntelliSense (P1-1)
+**Files:** `types/dwf.d.ts` (new) + `package.json` (`./workflow` export)
+A `.dwf.ts` script can now import the `WorkflowCtx` (and supporting) types from
+the package's `./workflow` export for full TypeScript IntelliSense:
+```ts
+import type { WorkflowCtx } from "pi-crew/workflow";
+export default async function run(ctx: WorkflowCtx): Promise<void> { /* ... */ }
+```
+- New file: `types/dwf.d.ts` — named exports mirroring the runtime types
+  (`WorkflowCtx`, `AgentCallOpts`, `AgentResult`, `WorkflowBudget`, ...).
+- `package.json` gains `"./workflow": { "types": "./types/dwf.d.ts" }` and ships
+  the `types/` directory.
+- New test: `test/unit/dwf-authoring-types.test.ts` — compiles a sample `.dwf.ts`
+  against the export (positive + negative `@ts-expect-error` check).
+### Feature 2 — Per-Workflow Token Budget (P1-2)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + dispatch wiring
+`ctx.budget` is a frozen `{total, spent(), remaining()}` surface. When a
+per-workflow token budget is set, `ctx.agent()` auto-rejects with `ok:false`
+(`"workflow token budget exhausted"`) **before** spawning a child worker. `spent()`
+accumulates each run's reported `usage.input + usage.output`.
+- `total` is `null` (unbounded) by default; `remaining()` is `Infinity` then.
+- New `MakeWorkflowCtxOptions.tokenBudget`, `WorkflowConfig.maxTokenBudget`,
+  `RunDynamicWorkflowInput.tokenBudget`, and the `team run` `tokenBudget` param
+  (param overrides the workflow value).
+- Budget check + accumulation wired into `ctx.agent()`; budget object passed
+  through `run.ts` and `background-runner.ts`.
+- Tests: 7 unit cases (default null, set total, spent/remaining, exhaustion,
+  accumulation from the mock's `{input:10,output:5}`, frozen check).
+### Feature 3 — Log API (P1-3)
+**Files:** `src/state/contracts.ts` + `src/runtime/dynamic-workflow-context.ts`
+`ctx.log(message)` appends a workflow-level log line: stringifies non-strings,
+keeps a bounded in-memory copy (capped at **1000**), and always emits a durable
+`dwf.log` event (`{message}`) to the run's `events.jsonl`.
+- New event type `"dwf.log"` in `TEAM_EVENT_TYPES` (non-terminal).
+- New runner-only `getWorkflowLogs(ctx)` accessor (mirrors `getWorkflowPhaseState`).
+- Tests: 4 unit cases (append, stringify, event emission, 1000-cap) + 1
+  integration case (end-to-end through `runDynamicWorkflow`).
+### Feature 4 — Typed Args (P1-5)
+**Files:** `src/runtime/dynamic-workflow-context.ts` + `src/state/types.ts` +
+`src/schema/team-tool-schema.ts` + `src/state/state-store.ts` + dispatch wiring
+`ctx.args<T>()` returns typed workflow arguments (sourced from `manifest.args`,
+passed via the run `args` param). Defaults to `{}` when unset.
+- New `TeamRunManifest.args`, `MakeWorkflowCtxOptions.args`, `createRunManifest`
+  `args` param, and the `team run` `args` schema field (`Type.Unsafe`, any JSON
+  value — avoids `any`).
+- The runner reads `manifest.args` and forwards it to `makeWorkflowCtx`.
+- Tests: 3 unit cases (default `{}`, typed object, array) + 1 integration case
+  (reads `manifest.args` end-to-end).
+### Other
+- Version bumped `0.9.7` → `0.9.8`.
+- `docs/dynamic-workflows.md`: API table rows + Log/Budget/Args/Authoring-types
+  sections.
+## [v0.9.7] — round-13 (P0 AST determinism + structured output + abort cleanup) (2026-06-23)
+Three P0 features land in this round. **Backward compatible** — existing DWF
+scripts continue to work unchanged. New behavior is opt-in via the `schema`
+field on `ctx.agent()` and an env-var escape hatch for the determinism check.
+### Feature 1 — AST Determinism Check (P0-2)
+**File:** `src/runtime/deterministic-ast.ts` (new) +
+`src/runtime/dynamic-workflow-runner.ts` (integration)
+Dynamic workflow scripts must now be **deterministic**. The runner parses each
+`.dwf.ts` with `acorn` and walks the AST, rejecting `Date.now()`,
+`Math.random()`, and `new Date()` calls before `jiti` executes the script.
+Two runs of the same script against the same inputs now produce the same
+outputs — critical for regression testing and workflow replay.
+Why AST, not regex: regex matches `Date.now()` everywhere — including string
+literals, comments, and prompt text. AST walking distinguishes **calls** from
+strings, so prompts that say *"avoid `Date.now()` in your code"* still parse
+cleanly. Other `Date.*` and `Math.*` methods (`Date.parse`, `Date.UTC`,
+`Math.floor`, `Math.max`, etc.) are accepted — only `now` and `random` are
+blocked.
+- New dep: `acorn ^8.14.0` (small, well-maintained; verified Node ≥22 ESM/strip-types compatibility)
+- New file: `src/runtime/deterministic-ast.ts` (determinism walker; MIT-licensed adaptation from pi-dynamic-workflows, attribution in `NOTICE.md`)
+- New file: `test/unit/deterministic-ast.test.ts` (27 cases: accepts/rejects every form, comments, template literals, computed properties, parse-error delegation)
+- New tests in `test/integration/dwf-setresult.test.ts` (5 end-to-end cases including env-var opt-out)
+**Escape hatch:** `PI_CREW_DWF_SKIP_DETERMINISM_CHECK=1` bypasses the check for
+power users who legitimately need time/random (e.g. randomized benchmark
+scripts). Off by default.
+### Feature 2 — Structured Output Helper (P0-3)
+**Files:** `src/runtime/result-extractor.ts` + `src/runtime/dynamic-workflow-context.ts`
+`AgentCallOpts` gains an optional `schema?: TSchema` field (TypeBox). When set,
+`ctx.agent()` validates the extracted JSON against the schema via
+`@sinclair/typebox`'s `Value.Check`. Mismatch yields
+`{ok: false, error: "structured output does not match schema: ..."}` instead
+of an untyped `structured: { ... }` blob.
+How the runner helps the model comply:
+- Appends a JSON-output directive to the prompt.
+- Replaces the agent's system prompt suffix with a "structured-output
+  assistant" preamble that describes the schema's shape.
+When `schema` is **omitted**, behavior is byte-identical to the previous
+regex-based extractor — verified by the existing 30+ test cases plus 9 new
+schema-specific cases in `test/unit/result-extractor.test.ts` and 4 new
+end-to-end cases in `test/unit/dynamic-workflow-context.test.ts`.
+Caveat: pi-crew DWF spawns `pi` as a subprocess (`runChildPi`), not an
+in-memory `createAgentSession`. Subprocess structured output is captured via
+the same event-stream → JSON-line → schema-check pipeline used for everything
+else, so this round ships Option B (regex-extract + schema validation
+post-hoc). Option A (in-process terminating tool) is planned for round-14.
+### Feature 3 — Abort Listener Cleanup (P0-5) — NO-OP (already fixed in round 27)
+Audited `src/runtime/child-pi.ts` for AbortSignal listener leaks. The fix was
+landed in round 27 (BUG 4): both the `onParentAbort` flag handler and the
+`abort` cancellation handler are now removed inside `settle()` regardless of
+the exit path (normal completion, response timeout, hard kill, parent abort,
+forced final drain). On runs with >10 tasks sharing one AbortSignal (the
+common pattern under `background-runner`), this prevents the
+`MaxListenersExceededWarning` and per-task closure capture that previously
+pinned the worker stack frame in memory.
+No code changes needed in round 13. Documented for the audit trail.
+### Verification
+- `npm run typecheck` — clean
+- `npx tsc --noEmit` — clean
+- 31 new unit tests across `deterministic-ast.test.ts` (27) and 9 schema-validation cases in `result-extractor.test.ts` and 4 new cases in `dynamic-workflow-context.test.ts` — all pass
+- 5 new integration tests in `dwf-setresult.test.ts` — all pass
+- 0 regressions in the existing 4 round-12 tests
+## [v0.9.7] — round-12: DWF phases + structured-clone guard (2026-06-23)
+Two additive P0 features for dynamic-workflow (DWF) scripts, both fully
+backward-compatible (existing scripts continue to work unchanged). Researched
+and adopted from the public `pi-dynamic-workflows` (Michaelliv/v1.0.1)
+package — full comparison and adoption plan in
+`.crew/artifacts/team_20260623095016_b693d3f967f88048/shared/06_synthesize.md`.
+### Feature 1: `ctx.phase(title)` runtime phase API (P0-1)
+`WorkflowCtx` gains a new `phase(title: string): void` method. The orphan
+`dwf.phase_started` / `dwf.phase_completed` event types — declared in
+`src/state/contracts.ts:89-93` since v0.9.0 but never produced by any
+producer — finally have a producer. Use cases:
+- Group `ctx.agent()` calls under logical phases (e.g. "Scan", "Audit",
+  "Review") so downstream UI and log readers can group by phase.
+- Emit a clear phase boundary to the run's `events.jsonl` without writing
+  custom event-log code.
+- Drive live progress reporting from the script itself.
+Semantics:
+- Validates `title` is a non-empty string (throws `TypeError` otherwise).
+- Idempotent: calling `ctx.phase("Scan")` twice does not emit a duplicate
+  event or change state.
+- When a previous phase is still open, emits `dwf.phase_completed` for it
+  **before** emitting `dwf.phase_started` for the new one (consumers never
+  see two open phases at once).
+- The in-memory `phases[]` list (read-only via `getWorkflowPhaseState`,
+  mirrors the `__finalResult` non-enumerable getter pattern) is deduped and
+  capped at **100 distinct titles** to bound memory. Events still flow
+  past the cap — the events log is the durable source of truth.
+- The runner **auto-closes the last open phase** before emitting
+  `dwf.completed`, so a script that ends mid-phase still produces a
+  well-formed event sequence.
+**Files changed:**
+- `src/runtime/dynamic-workflow-context.ts` — interface, implementation,
+  `__phaseState` getter, `getWorkflowPhaseState` helper
+- `src/runtime/dynamic-workflow-runner.ts` — auto-close on completion
+### Feature 2: structured-clone guard at the runner boundary (P0-4)
+Defensive `assertStructuredCloneable(value, name)` helper applied to the
+final artifact content and `manifest.summary` before they reach
+`writeArtifact` and the run-event-bus emitter. Today this is mostly
+future-proofing (the artifact file is read as a string, and strings are
+always structured-cloneable), but the guard surfaces a clear, actionable
+error pointing at the most common cause — forgetting `await` on
+`ctx.agent()` / `ctx.review()` — instead of letting a cryptic
+`DataCloneError` leak from deep inside the artifact store.
+**Files changed:**
+- `src/runtime/dynamic-workflow-runner.ts` — `assertStructuredCloneable`
+  helper, applied to `finalText` and `summaryText` (slice)
+### Tests
+- 7 new unit tests in `test/unit/dynamic-workflow-context.test.ts`
+  (emission, idempotency, validation, sequence, helper, dedup, 100-cap).
+- 1 new integration test in `test/integration/dwf-setresult.test.ts`
+  (end-to-end phase event sequence, including runner auto-close).
+- All 23 existing DWF unit tests still pass; both pre-existing integration
+  tests still pass.
+### Docs
+- `docs/dynamic-workflows.md` — updated WorkflowCtx example to use
+  `ctx.phase("Scan")` / `ctx.phase("Audit")`; added a `ctx.phase` row to
+  the API table; added a "Phases (round-12)" subsection explaining
+  semantics, idempotency, and the 100-cap.
+### Out of scope (planned for future rounds)
+- AST determinism check (P0-2)
+- Structured output helper (P0-3)
+- Abort listener cleanup pattern (P0-5)
+- Authoring types / IDE IntelliSense (P1-1)
+- Token budget (P1-2)
+- Phase UI in `progress-pane` (P1-4)
+- Pipeline primitive (P2-1)
+- `isolated-vm` sandbox (P2-2, planned for v1.5)
 ## [v0.9.5] — fix "team run hangs forever at 25%" (2026-06-23)
 Two coupled runtime bugs caused the recurring "run stuck at 25% (1/4)" failure

package/README.md CHANGED Viewed

@@ -39,9 +39,9 @@ npm: pi-crew
 repo: https://github.com/baphuongna/pi-crew
 ```
-**v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
+**v0.9.4 / v0.9.5 / v0.9.8**: See [CHANGELOG.md](CHANGELOG.md).
-### Highlights (v0.6.4 → v0.9.5)
+### Highlights (v0.6.4 → v0.9.8)
 A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
 trust and cliff-resilience, stay lean, delete before adding.*
@@ -197,7 +197,10 @@ background-dispatch discriminator.
 - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
 - **Health scoring** — penalty-based run health with time-series snapshots
 - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
-- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Strict SKILL.md validation** (L3, v0.9.8) — skills with malformed frontmatter (missing/malformed `name`/`description`, type mismatches) now **fail-fast at discovery** with visible diagnostics, instead of silently producing broken behavior at runtime. HYBRID policy: HARD on required fields, SOFT (warn) on unknown props for forward-compat. Surfaced via `buildSkillValidationDiagnostics()`.
+- **Durable event replay** (L1, v0.9.8) — `RunEventBus.onWithReplay()` catches up a re-subscribing dashboard/overlay with events it missed during transient absence (toggle, reconnect), replaying from the durable JSONL log with seq-based dedup. No information loss even if the live subscriber was briefly gone.
+- **Lossless-by-default output handling** (L4, v0.9.8) — worker output thresholds sized from measured data (100% of real outputs fit without compaction); when compaction is unavoidable it keeps head+tail (preserves closing code fences/headings) instead of head-only truncation. No more `[pi-crew compacted N chars]` markers eating the end of a worker's result.
 ---
@@ -468,6 +471,10 @@ pi-crew survives Pi's context compaction. When the context is compacted (auto or
 Context compacted. 1 pi-crew run(s) still in-flight — use team status to continue.
 ```
+**Durable event replay** (v0.9.8, L1): even if a dashboard/overlay is briefly gone during compaction or a reconnect, `RunEventBus.onWithReplay()` catches it up with the events it missed, replaying from the durable JSONL log with seq-based dedup — no information loss. (The dashboard wires this up per-run; the primitive is available for any subscriber.)
+**Lossless-by-default worker output** (v0.9.8, L4): output-handling thresholds are sized from measured real data (100% of real worker outputs fit without any compaction). When compaction *is* unavoidable, it keeps head+tail instead of head-only truncation, so closing code fences and headings survive — no more `[pi-crew compacted N chars]` markers eating the end of a result.
 ### Plan-level human-in-the-loop (HITL)
 Set `runtime.requirePlanApproval = true` to gate **any workflow** at the plan→execute boundary. After the read-only (planning) phases complete, the run pauses for explicit approval before mutating tasks run: