pi-crew 0.9.7 → 0.9.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,100 @@
1
1
  # Changelog
2
2
 
3
+ ## [v0.9.9] — gajae-code distillation (4 P0) + notification race fix (2026-06-25)
4
+
5
+ Six changes: four high-impact/low-effort features distilled from researching [Yeachan-Heo/gajae-code](https://github.com/Yeachan-Heo/gajae-code) (full report: `research-findings/gajae-code-distill.md`), plus a fix for a redundant-notification bug the leader directly hit while running that research. Each was calibrated against real pi-crew code — two reported "gaps" turned out to be patterns pi-crew already implements (prompt-level stablePrefix, detached spawning), and four areas where pi-crew is already superior were deliberately left untouched (crash-recovery byte-offset cursor, declarative workflow + semaphores, run-snapshot-cache, event sourcing).
6
+
7
+ ### P0 #1 — Crash classification taxonomy (commit fb8c4a8)
8
+
9
+ `child-pi.ts` captured stderr/exit codes but never bucketed failure modes. New pure `classifyProcessCrash()` (port of gajae-code's `crash-diagnostics.ts`) maps exits to 9 semantic classes (`clean_exit | non_zero_exit | signal_exit | timeout | cancelled | spawn_error | protocol_exit | native_panic | unknown`) with precedence timeout > cancelled > spawn_error > native_panic > signal. Attached to `WorkerExitStatus` at both settle paths; kill/drain/timeout logic untouched. 30 unit tests.
10
+
11
+ ### P0 #2 — Staleness-aware tool output pruning (commit 13acf37)
12
+
13
+ The L4 size-based compaction retained every copy of a re-read file until a size threshold tripped. New `tool-output-pruner.ts` (port of gajae-code's `pruning.ts`) drops superseded tool results — same-file re-reads and read-then-edit — **before** they are injected into a downstream worker's prompt via `task-output-context.ts`. Replaces stale content with a digest notice (first/last lines + count for bash/grep/search). OPT-IN via `DEFAULT_PRUNE_CONFIG`; does NOT regress the L4 head+tail(75/25) behavior. 25 unit tests.
14
+
15
+ ### P0 #3 — OwnedProcess abstraction (commit fe3bdde)
16
+
17
+ Background spawns used `detached:true` but had no unified ownership primitive for guaranteed teardown. New `process-lifecycle.ts` adds `OwnedProcess` (escalating SIGTERM → grace → SIGKILL; Windows `taskkill /F /T /PID` fallback; idempotent `dispose()`; bounded `awaitExit()`; `onExit`) plus `registerResourceOwner()`/`disposeAllOwners()` for non-process resources (timers, sockets, Workers) and root-exit drain reconciliation. **Incremental adoption** — deliberately NOT migrating `child-pi.ts`'s battle-tested `killProcessTree`/post-exit-stdio-guard/hard-kill-timer or `async-runner.ts`'s intentionally-detached background spawns; the primitive is available for future ownership-scoped spawns (MCP/LSP/DAP servers, eval workers). 22 unit tests.
18
+
19
+ ### P0 #4 — IRC reply support, side-channel Q&A (commit 43bcd65)
20
+
21
+ The `irc-tool` was fire-and-forget despite an `awaitReply` param marked "Not yet supported". New `respondAsBackground()` on `live-agent-manager.ts` delivers a DM to a recipient's session **without blocking its main loop** (`sendCustomMessage({triggerTurn:false})`) and awaits an event-driven, timeout-bounded reply via an in-memory pending-reply registry keyed by correlation id. `awaitReply:true` DMs now route through this side-channel and return reply content; broadcast stays fire-and-forget. Coexists with mailbox.ts's existing file-based reply fields (cross-process). 10 unit tests.
22
+
23
+ ### Notification race fix — Rule 2 + Rule 1 (commits 592d9ea, c22cbb9)
24
+
25
+ While running the gajae-code research, the leader observed redundant "background subagent changed state" notifications arriving a turn late, after results were already read. Root cause: the completion callback (`SubagentManager.onComplete`) fires from inside the `record.promise` IIFE `finally` block — **before the promise resolves** — so a leader calling `get_subagent_result(wait:true)` sets `resultConsumed=true` only afterward, and the synchronous `if (record.resultConsumed) return` guard always saw `false`. A latent test bug (`assert sentMessages.length === 0` on an array that was unconditionally empty because `sendAgentWakeUp` prefers `sendUserMessage`) masked it.
26
+
27
+ - **Rule 2** (592d9ea): defer notification emission to a `setTimeout(0)` **macrotask** (not `queueMicrotask` — microtasks queued in the finally run before the promise-resolution microtask), then recheck `resultConsumed` (in-memory `getRecord` + persisted `readPersistedSubagentRecord`) before emitting; suppress if already consumed. Covers all three `onComplete` call sites via the single emit point. Fixed the test assertion to `sentUserMessages` and added two explicit regression tests (notify still fires when leader does NOT pre-consume; notify suppressed when leader pre-consumes via `wait:true`).
28
+ - **Rule 1** (c22cbb9): new `BatchBarrier` registry + optional `batch_id` param on the Agent tool. Background agents sharing a `batch_id` never emit individual notifications; instead each completion is recorded in the barrier and **one consolidated** "All N background subagents in batch \"X\" have finished" notification fires exactly once when every member reaches a terminal state (`blocked` is NOT terminal — a blocked agent resumes later). Verified end-to-end with 1/2/5-agent batches (one with a queued member and staggered 10–25s sleeps): exactly 1 consolidated notification, 0 individual leaks. Composes with Rule 2 (a batched agent whose result was already consumed is still suppressed via the `resultConsumed` recheck). 10 unit tests + 2 integration tests.
29
+
30
+ Design doc: `research-findings/subagent-notification-race-fix.md`.
31
+
32
+ ### What was NOT adopted (pi-crew already superior)
33
+
34
+ Crash recovery (`crash-recovery.ts`, 421 lines, byte-offset event-log cursor), declarative workflow + semaphores, `run-snapshot-cache` (disk rebuild, TTL 1500ms), and event sourcing (`readEventsCursor`) are all more sophisticated than gajae-code's equivalents and were left intact.
35
+
36
+ ## [v0.9.8] — deer-flow learning integration: L1/L2/L3/L4 (2026-06-24)
37
+
38
+ Four improvements distilled from researching [bytedance/deer-flow](https://github.com/bytedance/deer-flow) and the wider Pi-ecosystem (pi-boomerang, pi-subagents, pi-dynamic-workflows). Each was calibrated against real pi-crew code (the research over-reported gaps — several patterns pi-crew already does *better* than deer-flow) and sized from measured data, not guesses.
39
+
40
+ ### L3 — Strict SKILL.md frontmatter validation (commit 5348c47)
41
+
42
+ Malformed skills now **fail-fast at discovery** instead of silently producing broken behavior at runtime. New `src/skills/validate.ts` validates frontmatter against the `ALLOWED_SKILL_PROPS` whitelist using a **HYBRID policy**:
43
+
44
+ - **HARD errors** (missing/malformed `name` or `description`, type mismatches) → skill excluded from `discoverSkills()`.
45
+ - **SOFT warnings** (unknown props like `origin`/`triggers`, missing `name` derived from directory) → skill kept, surfaced via `getLastDiscoveryDiagnostics()` / `buildSkillValidationDiagnostics()`.
46
+
47
+ Replaces the fragile line-prefix parser (broke on multi-line folded scalars `description: >`, quoted strings, nested YAML) with the `yaml` package (^2.9.0, already transitive, added as direct dep — zero install cost). Back-compat preserved: missing `name` derives from the directory; no-frontmatter skills still load with empty description.
48
+
49
+ **Bonus value**: pre-flight on the real environment surfaced 2 user skills that were silently broken (`agent-browser`: `allowed-tools` wrong type; `spike-wrap-up`: `<>` in description).
50
+
51
+ ### L2 — Data-driven keybinding dispatch (commit 35fc3c6)
52
+
53
+ Replaced the 30-line imperative `if (includes(...))` chain in `dashboardActionForKey` with a single `for (const b of BINDINGS)` loop driven by a declarative `BINDINGS[]` table. Adding a key now means editing ONE place (the table) instead of two (table + dispatch) — removes the DRY violation that caused table-vs-dispatch drift. `KEY_RESERVED` is now exported and derived.
54
+
55
+ Behavior is **provably identical** to the old chain: a golden-snapshot parity test asserts every `(data, activePane)` pair returns the same action (~190 pairs). Pane-scoped bindings (`mailbox-detail`, `health-*`) precede their generic competitors so first-match-wins reproduces old precedence.
56
+
57
+ The `inTextInput` guard from the original plan was **intentionally skipped** — overlays are mutually exclusive and each handles its own input (`mailbox-compose-overlay.ts` captures every single-char key), so there is no leak path. Documented in the commit.
58
+
59
+ ### L1 — RunEventBus.onWithReplay catch-up primitive (commit a2a478b)
60
+
61
+ Closes the transient-subscriber-absence gap: when an overlay/widget is disposed and recreated (toggle, reconnect), live events emitted in that window are lost as notification triggers. `onWithReplay(runId, eventsPath, lastSeenSeq, callback)` replays missed events from the durable JSONL log before attaching the live listener, then dedups via `metadata.seq` so each event fires exactly once.
62
+
63
+ Unlike deer-flow's 256-event RAM ring buffer (lost on crash), this reuses pi-crew's existing `readEventsCursor` — O(new bytes) via byte-offset incremental reads, monotonic seq, tail-capped. Strictly better: survives crashes, bounded memory. `RunEventPayload` gains optional `seq`; `emitFromTeamEvent` stamps it.
64
+
65
+ The **primitive is landed + fully tested** (7 cases: replay order, dedup race, transient live-only, cursor bound, sinceSeq filter, missing-log fallback, unsubscribe). Dashboard wiring (switching `onAny()` → `onWithReplay()` per-run) is deferred — the dashboard subscribes across multiple runs and needs a subscription-model refactor; state isn't lost during absence anyway (`run-snapshot-cache` rebuilds from disk, TTL 1500ms).
66
+
67
+ ### L4 — Data-driven output thresholds + head/tail compaction (commit 463d08d)
68
+
69
+ Worker output was being truncated at 3 points with thresholds sized by guess, not data. Measured 27 real result artifacts: **max 9226 bytes, median 8272, 100% under 16KB**. The old thresholds cut **62% of real outputs** (head-only, no recovery path). This change sizes thresholds from that data and switches compaction from head-only to head+tail so closing markdown structure (code fences, headings) survives.
70
+
71
+ | Threshold | Before | After |
72
+ |---|---|---|
73
+ | `maxAssistantTextChars` | 8192 | **16384** |
74
+ | `maxToolResultChars` | 1024 | **8192** |
75
+ | `maxCompactContentChars` | 4096 | **8192** |
76
+ | `maxToolInputChars` | 2048 | **4096** |
77
+ | `readIfSmall` (3 inconsistent values) | 24K/40K/80K | **single 32KB** |
78
+ | Compaction shape | head-only | **head(75%)+tail(25%)** |
79
+
80
+ **Why not caveman-shrink** (the alternative considered): tested it on the same 27 artifacts — only 3.9% compression (vs 42% on prose fixtures) because pi-crew output is code-citation-heavy with little prose to strip, AND it has a real data-loss bug (`funccall` protected-pattern eats sentinel placeholders for the `identifier (inline-code)` pattern, corrupting 24/27 files with null bytes, 127 inline codes lost). caveman's *concept* (detect/validate) is worth borrowing but its engine doesn't fit pi-crew's content type. Threshold-only wins on the data.
81
+
82
+ ### Tests & verification
83
+
84
+ - 10 new L4 tests, 25 L3 validator tests, 7 L1 replay tests, 7 L2 parity tests.
85
+ - `npm run typecheck` + `check:lazy-imports` green.
86
+ - End-to-end team-run smoke tests confirm all 4 features load and run without crash.
87
+ - Real-world smoke scripts at `test/manual/l{1,2,3}-*-smoke.mjs`.
88
+ - Research artifacts at `source/deer-flow/.research/` + `.crew/research/worker-output-handling.md`.
89
+
90
+ ### Backward compatibility
91
+
92
+ All four changes are additive or behavior-preserving:
93
+ - L3: valid skills unaffected; only malformed ones now excluded (was: silent breakage).
94
+ - L2: golden-snapshot parity test proves identical dispatch.
95
+ - L1: new method added; existing `on`/`onAny`/`emit` unchanged.
96
+ - L4: outputs that fit (100% of measured real outputs) are unchanged; only oversized ones now keep head+tail instead of head-only.
97
+
3
98
  ## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
4
99
 
5
100
  P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`
package/README.md CHANGED
@@ -39,9 +39,9 @@ npm: pi-crew
39
39
  repo: https://github.com/baphuongna/pi-crew
40
40
  ```
41
41
 
42
- **v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
42
+ **v0.9.4 / v0.9.5 / v0.9.8 / v0.9.9**: See [CHANGELOG.md](CHANGELOG.md).
43
43
 
44
- ### Highlights (v0.6.4 → v0.9.5)
44
+ ### Highlights (v0.6.4 → v0.9.9)
45
45
 
46
46
  A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
47
47
  trust and cliff-resilience, stay lean, delete before adding.*
@@ -198,6 +198,9 @@ background-dispatch discriminator.
198
198
  - **Health scoring** — penalty-based run health with time-series snapshots
199
199
  - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
200
200
  - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
201
+ - **Strict SKILL.md validation** (L3, v0.9.8) — skills with malformed frontmatter (missing/malformed `name`/`description`, type mismatches) now **fail-fast at discovery** with visible diagnostics, instead of silently producing broken behavior at runtime. HYBRID policy: HARD on required fields, SOFT (warn) on unknown props for forward-compat. Surfaced via `buildSkillValidationDiagnostics()`.
202
+ - **Durable event replay** (L1, v0.9.8) — `RunEventBus.onWithReplay()` catches up a re-subscribing dashboard/overlay with events it missed during transient absence (toggle, reconnect), replaying from the durable JSONL log with seq-based dedup. No information loss even if the live subscriber was briefly gone.
203
+ - **Lossless-by-default output handling** (L4, v0.9.8) — worker output thresholds sized from measured data (100% of real outputs fit without compaction); when compaction is unavoidable it keeps head+tail (preserves closing code fences/headings) instead of head-only truncation. No more `[pi-crew compacted N chars]` markers eating the end of a worker's result.
201
204
 
202
205
  ---
203
206
 
@@ -468,6 +471,10 @@ pi-crew survives Pi's context compaction. When the context is compacted (auto or
468
471
  Context compacted. 1 pi-crew run(s) still in-flight — use team status to continue.
469
472
  ```
470
473
 
474
+ **Durable event replay** (v0.9.8, L1): even if a dashboard/overlay is briefly gone during compaction or a reconnect, `RunEventBus.onWithReplay()` catches it up with the events it missed, replaying from the durable JSONL log with seq-based dedup — no information loss. (The dashboard wires this up per-run; the primitive is available for any subscriber.)
475
+
476
+ **Lossless-by-default worker output** (v0.9.8, L4): output-handling thresholds are sized from measured real data (100% of real worker outputs fit without any compaction). When compaction *is* unavoidable, it keeps head+tail instead of head-only truncation, so closing code fences and headings survive — no more `[pi-crew compacted N chars]` markers eating the end of a result.
477
+
471
478
  ### Plan-level human-in-the-loop (HITL)
472
479
 
473
480
  Set `runtime.requirePlanApproval = true` to gate **any workflow** at the plan→execute boundary. After the read-only (planning) phases complete, the run pauses for explicit approval before mutating tasks run:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-crew",
3
- "version": "0.9.7",
3
+ "version": "0.9.9",
4
4
  "description": "Pi extension for coordinated AI teams, workflows, worktrees, and async task orchestration",
5
5
  "author": "baphuongna",
6
6
  "license": "MIT",
@@ -90,7 +90,8 @@
90
90
  "ajv": "^8.20.0",
91
91
  "cli-highlight": "^2.1.11",
92
92
  "diff": "^5.2.0",
93
- "jiti": "^2.7.0"
93
+ "jiti": "^2.7.0",
94
+ "yaml": "^2.9.0"
94
95
  },
95
96
  "devDependencies": {
96
97
  "@biomejs/biome": "^2.4.15",
@@ -16,10 +16,14 @@ export const DEFAULT_CHILD_PI: Readonly<{
16
16
  // Keep this as a coarse stuck-worker guard rather than a short per-message latency budget.
17
17
  responseTimeoutMs: 5 * 60_000,
18
18
  maxCaptureBytes: 256 * 1024,
19
- maxAssistantTextChars: 8192,
20
- maxToolResultChars: 1024,
21
- maxToolInputChars: 2048,
22
- maxCompactContentChars: 4096,
19
+ // L4 output-handling: thresholds sized from real worker-output data
20
+ // (27 result artifacts measured: max 9226 bytes, median 8272, 100% < 16KB).
21
+ // Previous values (8192/1024/4096) truncated 62% of real results.
22
+ // See .crew/research/worker-output-handling.md + source/deer-flow/.research/.
23
+ maxAssistantTextChars: 16_384,
24
+ maxToolResultChars: 8_192,
25
+ maxToolInputChars: 4_096,
26
+ maxCompactContentChars: 8_192,
23
27
  };
24
28
 
25
29
  export const DEFAULT_LIVE_SESSION = {
@@ -56,7 +56,11 @@ import { createManifestCache } from "../runtime/manifest-cache.ts";
56
56
  import { CrewScheduler } from "../runtime/scheduler.ts";
57
57
  import { loadRunManifestById, updateRunStatus } from "../state/state-store.ts";
58
58
  import type { TeamRunManifest } from "../state/types.ts";
59
- import { SubagentManager } from "../subagents/manager.ts";
59
+ import {
60
+ SubagentManager,
61
+ readPersistedSubagentRecord,
62
+ } from "../subagents/manager.ts";
63
+ import { BatchBarrier, type BatchMember } from "../runtime/batch-barrier.ts";
60
64
  import { terminateActiveChildPiProcesses } from "../subagents/spawn.ts";
61
65
  import {
62
66
  type CrewWidgetState,
@@ -635,6 +639,7 @@ export function registerPiTeams(pi: ExtensionAPI): void {
635
639
  !cleanedUp &&
636
640
  currentCtx === ctx &&
637
641
  sessionGeneration === ownerGeneration;
642
+ const batchBarrier = new BatchBarrier();
638
643
  const subagentManager = new SubagentManager(
639
644
  4,
640
645
  (record) => {
@@ -651,22 +656,90 @@ export function registerPiTeams(pi: ExtensionAPI): void {
651
656
  durationMs: record.durationMs,
652
657
  });
653
658
  }
654
- if (!record.background || record.resultConsumed) return;
659
+ if (!record.background) return;
655
660
  if (!isOwnerSessionCurrent(record.ownerSessionGeneration)) return;
656
661
  if (
657
- record.status === "completed" ||
658
- record.status === "failed" ||
659
- record.status === "cancelled" ||
660
- record.status === "blocked" ||
661
- record.status === "error"
662
- ) {
662
+ record.status !== "completed" &&
663
+ record.status !== "failed" &&
664
+ record.status !== "cancelled" &&
665
+ record.status !== "blocked" &&
666
+ record.status !== "error"
667
+ )
668
+ return;
669
+ // Rule 2 (consume-race fix): this callback fires from inside the
670
+ // record.promise IIFE `finally` block — BEFORE the promise resolves,
671
+ // i.e. before a leader calling `get_subagent_result(wait:true)` can
672
+ // set resultConsumed=true. The old synchronous guard always saw
673
+ // resultConsumed=false here. Defer emission to a MACROTASK
674
+ // (setTimeout, NOT queueMicrotask): macrotasks run only after the
675
+ // microtask queue drains — which includes the leader's
676
+ // `await record.promise` continuation that marks resultConsumed=true.
677
+ // Then recheck in-memory + persisted before emitting.
678
+ const agentId = record.id;
679
+ const ownerGen = record.ownerSessionGeneration;
680
+ const agentStatus = record.status;
681
+ const agentType = record.type;
682
+ const agentDescription = record.description;
683
+ const agentRunId = record.runId;
684
+ const agentBatchId = record.batchId;
685
+ setTimeout(() => {
686
+ if (cleanedUp) return;
687
+ const fresh = subagentManager.getRecord(agentId);
688
+ const persisted = currentCtx
689
+ ? readPersistedSubagentRecord(currentCtx.cwd, agentId)
690
+ : undefined;
691
+ // Leader already joined the result -> suppress redundant notify.
692
+ if (fresh?.resultConsumed || persisted?.resultConsumed) return;
693
+ if (!isOwnerSessionCurrent(fresh?.ownerSessionGeneration ?? ownerGen))
694
+ return;
695
+ // Rule 1 (batch coalescing): if this agent belongs to a batch, never
696
+ // emit an individual notification. Instead record its terminal state
697
+ // in the barrier; emit ONE consolidated notification only when ALL
698
+ // members are terminal. Suppressed members wait silently.
699
+ if (agentBatchId) {
700
+ const member: BatchMember = {
701
+ id: agentId,
702
+ description: agentDescription,
703
+ type: agentType,
704
+ status: agentStatus,
705
+ };
706
+ const snap = batchBarrier.markTerminal(agentBatchId, member);
707
+ if (snap.allDone && !snap.notified) {
708
+ batchBarrier.markNotified(agentBatchId);
709
+ const roster = snap.terminal
710
+ .map(
711
+ (m) =>
712
+ `- ${m.id} [${m.status}] (${m.type ?? "agent"}): ${m.description ?? ""}`,
713
+ )
714
+ .join("\n");
715
+ const joinInstruction = [
716
+ `All ${snap.terminal.length} background subagents in batch "${agentBatchId}" have finished.`,
717
+ "Members:",
718
+ roster,
719
+ "",
720
+ `Call get_subagent_result for each agent_id above, read the outputs, then continue the user's original task.`,
721
+ ].join("\n");
722
+ sendAgentWakeUp(pi, joinInstruction);
723
+ notifyOperator({
724
+ id: `subagent-batch:${agentBatchId}:completed`,
725
+ severity: "info",
726
+ source: "subagent-completed",
727
+ runId: agentRunId,
728
+ title: `pi-crew batch "${agentBatchId}" complete (${snap.terminal.length} agents).`,
729
+ body: `Members: ${snap.terminal.map((m) => m.id).join(", ")}`,
730
+ });
731
+ }
732
+ // Either we just emitted the consolidated notify, or we are still
733
+ // waiting for other members — in both cases do NOT emit individual.
734
+ return;
735
+ }
663
736
  const metadata = JSON.stringify(
664
737
  {
665
- id: record.id,
666
- status: record.status,
667
- type: record.type,
668
- runId: record.runId,
669
- description: record.description,
738
+ id: agentId,
739
+ status: agentStatus,
740
+ type: agentType,
741
+ runId: agentRunId,
742
+ description: agentDescription,
670
743
  },
671
744
  null,
672
745
  2,
@@ -677,19 +750,18 @@ export function registerPiTeams(pi: ExtensionAPI): void {
677
750
  "```json",
678
751
  metadata,
679
752
  "```",
680
- `Call get_subagent_result with agent_id="${record.id}" now, read the output, then continue the user's original task without waiting for another user prompt.`,
753
+ `Call get_subagent_result with agent_id="${agentId}" now, read the output, then continue the user's original task without waiting for another user prompt.`,
681
754
  ].join("\n");
682
755
  sendAgentWakeUp(pi, joinInstruction);
683
756
  notifyOperator({
684
- id: `subagent:${record.id}:${record.status}`,
685
- severity:
686
- record.status === "completed" ? "info" : "warning",
757
+ id: `subagent:${agentId}:${agentStatus}`,
758
+ severity: agentStatus === "completed" ? "info" : "warning",
687
759
  source: "subagent-completed",
688
- runId: record.runId,
689
- title: `pi-crew subagent ${record.id} ${record.status}.`,
690
- body: `Use get_subagent_result with agent_id=${record.id} for output.`,
760
+ runId: agentRunId,
761
+ title: `pi-crew subagent ${agentId} ${agentStatus}.`,
762
+ body: `Use get_subagent_result with agent_id=${agentId} for output.`,
691
763
  });
692
- }
764
+ }, 0);
693
765
  },
694
766
  1000,
695
767
  (event, payload) => {
@@ -2044,6 +2116,7 @@ export function registerPiTeams(pi: ExtensionAPI): void {
2044
2116
  ownerSessionGeneration: captureSessionGeneration,
2045
2117
  startForegroundRun: (ctx, runner, runId) =>
2046
2118
  startForegroundRun(ctx as ExtensionContext, runner, runId),
2119
+ batchBarrier,
2047
2120
  });
2048
2121
  time("register.tools");
2049
2122
 
@@ -98,5 +98,6 @@ export function __test__subagentSpawnParams(params: Record<string, unknown>, ctx
98
98
  model: typeof params.model === "string" && params.model.trim() ? params.model.trim() : undefined,
99
99
  skill: parseSkillParam(params.skill),
100
100
  maxTurns: typeof params.max_turns === "number" && Number.isFinite(params.max_turns) ? params.max_turns : undefined,
101
+ batchId: typeof params.batch_id === "string" && params.batch_id.trim() ? params.batch_id.trim() : undefined,
101
102
  };
102
103
  }
@@ -15,6 +15,7 @@ async function handleTeamTool(params: Parameters<typeof HandleTeamToolFn>[0], ct
15
15
  }
16
16
  import { checkSubagentSpawnPermission, currentCrewRole } from "../../runtime/role-permission.ts";
17
17
  import { readPersistedSubagentRecord, savePersistedSubagentRecord, type SubagentManager, type SubagentSpawnOptions } from "../../subagents/manager.ts";
18
+ import type { BatchBarrier } from "../../runtime/batch-barrier.ts";
18
19
  import { loadConfig } from "../../config/config.ts";
19
20
  import { logInternalError } from "../../utils/internal-error.ts";
20
21
  import { __test__subagentSpawnParams, formatSubagentRecord, readSubagentRunResult, refreshPersistedSubagentRecord, subagentToolResult } from "./subagent-helpers.ts";
@@ -32,6 +33,9 @@ type OnUpdate = (chunk: { content: { type: "text"; text: string }[] }) => void;
32
33
  export interface SubagentToolRegistrationOptions {
33
34
  ownerSessionGeneration?: () => number;
34
35
  startForegroundRun?: (ctx: unknown, runner: (signal?: AbortSignal) => Promise<void>, runId?: string) => void;
36
+ /** Rule 1 batch barrier. When present, agents spawned with a batchId are
37
+ * registered here so their completion notifications are coalesced. */
38
+ batchBarrier?: BatchBarrier;
35
39
  }
36
40
 
37
41
  export function registerSubagentTools(pi: ExtensionAPI, subagentManager: SubagentManager, options: SubagentToolRegistrationOptions = {}): void {
@@ -53,6 +57,7 @@ export function registerSubagentTools(pi: ExtensionAPI, subagentManager: Subagen
53
57
  skill: Type.Optional(Type.Union([Type.String(), Type.Array(Type.String()), Type.Boolean()], { description: "Skill name(s) to inject for this subagent, or false to disable selected/default skills." })),
54
58
  max_turns: Type.Optional(Type.Number({ description: "Reserved for live-session subagents; child-process runtime may ignore this." })),
55
59
  run_in_background: Type.Optional(Type.Boolean({ description: "Run in background and return an agent ID immediately." })),
60
+ batch_id: Type.Optional(Type.String({ description: "Optional batch grouping id. Background agents sharing the same batch_id receive ONE consolidated completion notification when ALL members finish (instead of N individual notifications). Use this when launching several background agents in one turn and you do not join them immediately. Omit for the default individual-notification behavior." })),
56
61
  }) as never,
57
62
  async execute(_id, params, signal, onUpdate, ctx) {
58
63
  // Diagnostic: detect pre-aborted signal before spawn
@@ -71,6 +76,10 @@ export function registerSubagentTools(pi: ExtensionAPI, subagentManager: Subagen
71
76
  const ctxWithSession = withSessionId(ctx);
72
77
  const runner = async (currentOptions: SubagentSpawnOptions, childSignal?: AbortSignal) => handleTeamTool({ action: "run", agent: currentOptions.type, goal: currentOptions.prompt, model: currentOptions.model, skill: currentOptions.skill, async: currentOptions.background, config: currentOptions.maxTurns ? { runtime: { maxTurns: currentOptions.maxTurns } } : undefined } as TeamToolParamsValue, { ...ctxWithSession, signal: childSignal, ...(options.startForegroundRun ? { startForegroundRun: (runRunner: (sig?: AbortSignal) => Promise<void>, runId?: string) => options.startForegroundRun!(ctxWithSession, runRunner, runId) } : {}) });
73
78
  const record = subagentManager.spawn(spawnOptions, runner, spawnOptions.background ? undefined : signal);
79
+ // Rule 1: register batch membership so completions can be coalesced.
80
+ if (spawnOptions.batchId && spawnOptions.background) {
81
+ options.batchBarrier?.register(spawnOptions.batchId, record.id, { description: record.description, type: record.type });
82
+ }
74
83
  if (spawnOptions.background || record.status === "queued") {
75
84
  // Phase 1.1a: Terminate turn for background queued — no LLM follow-up needed.
76
85
  // Phase 1.6: Record was terminated for telemetry.
@@ -0,0 +1,145 @@
1
+ /**
2
+ * BatchBarrier — Rule 1 (no-wait batch grouping).
3
+ *
4
+ * When a leader launches several background subagents with the SAME `batchId`
5
+ * and does NOT join them immediately (`get_subagent_result(wait:true)`), the
6
+ * completion notifications are coalesced: instead of N individual
7
+ * "changed state" wake-ups, the leader receives ONE consolidated notification
8
+ * once ALL members of the batch have reached a terminal state.
9
+ *
10
+ * Semantics:
11
+ * - `register(batchId, agentId)` is called at spawn time (synchronous within a
12
+ * leader turn). All members of a batch are therefore known by the time the
13
+ * first completion fires (completion is observed via the 1000ms poll loop).
14
+ * - `markTerminal(batchId, agentId)` returns whether THIS completion made every
15
+ * registered member terminal ("allDone"). When allDone, the caller emits a
16
+ * single consolidated notification and calls `markNotified`.
17
+ * - If a member reaches terminal after the batch already notified (late spawn
18
+ * edge case), `markTerminal` returns allDone=false for the straggler path is
19
+ * NOT covered — but `alreadyNotified` lets the caller suppress stray
20
+ * individual notifications once the consolidated one fired.
21
+ *
22
+ * Thread-safety: single-threaded JS event loop. No locks needed.
23
+ */
24
+
25
+ export interface BatchMember {
26
+ id: string;
27
+ description?: string;
28
+ type?: string;
29
+ status: string;
30
+ }
31
+
32
+ export interface BatchSnapshot {
33
+ batchId: string;
34
+ members: BatchMember[];
35
+ terminal: BatchMember[];
36
+ /** true when every registered member has reached a terminal state. */
37
+ allDone: boolean;
38
+ /** true once the consolidated notification has been emitted. */
39
+ notified: boolean;
40
+ }
41
+
42
+ const TERMINAL_STATUSES = new Set([
43
+ "completed",
44
+ "failed",
45
+ "cancelled",
46
+ "error",
47
+ "stopped",
48
+ ]);
49
+
50
+ export function isTerminalStatus(status: string): boolean {
51
+ return TERMINAL_STATUSES.has(status);
52
+ }
53
+
54
+ export class BatchBarrier {
55
+ private readonly batches = new Map<
56
+ string,
57
+ {
58
+ members: Map<string, BatchMember>;
59
+ terminal: Map<string, BatchMember>;
60
+ notified: boolean;
61
+ }
62
+ >();
63
+
64
+ /** Register a member at spawn time. Idempotent per (batchId, agentId). */
65
+ register(batchId: string, agentId: string, meta?: { description?: string; type?: string }): void {
66
+ let batch = this.batches.get(batchId);
67
+ if (!batch) {
68
+ batch = { members: new Map(), terminal: new Map(), notified: false };
69
+ this.batches.set(batchId, batch);
70
+ }
71
+ if (!batch.members.has(agentId)) {
72
+ batch.members.set(agentId, {
73
+ id: agentId,
74
+ description: meta?.description,
75
+ type: meta?.type,
76
+ status: "running",
77
+ });
78
+ }
79
+ }
80
+
81
+ /**
82
+ * Record that a member reached a terminal state. Returns the batch snapshot.
83
+ * `snapshot.allDone` is true iff every registered member is now terminal.
84
+ * If the batch was never seen (defensive edge case), the member is registered
85
+ * on-the-fly as a batch-of-one so its terminal state is not silently lost.
86
+ */
87
+ markTerminal(batchId: string, member: BatchMember): BatchSnapshot {
88
+ let batch = this.batches.get(batchId);
89
+ if (!batch) {
90
+ batch = { members: new Map(), terminal: new Map(), notified: false };
91
+ this.batches.set(batchId, batch);
92
+ }
93
+ // Ensure the member is known (auto-register for the defensive case).
94
+ if (!batch.members.has(member.id)) {
95
+ batch.members.set(member.id, { ...member, status: member.status });
96
+ }
97
+ if (isTerminalStatus(member.status)) {
98
+ batch.terminal.set(member.id, { ...member });
99
+ const existing = batch.members.get(member.id);
100
+ if (existing) batch.members.set(member.id, { ...existing, status: member.status });
101
+ }
102
+ const allDone =
103
+ batch.members.size > 0 &&
104
+ [...batch.members.keys()].every((id) => batch.terminal.has(id));
105
+ return {
106
+ batchId,
107
+ members: [...batch.members.values()],
108
+ terminal: [...batch.terminal.values()],
109
+ allDone,
110
+ notified: batch.notified,
111
+ };
112
+ }
113
+
114
+ /** Has the consolidated notification already been emitted for this batch? */
115
+ alreadyNotified(batchId: string): boolean {
116
+ return this.batches.get(batchId)?.notified ?? false;
117
+ }
118
+
119
+ /** Mark the consolidated notification as emitted. No-op if already set. */
120
+ markNotified(batchId: string): void {
121
+ const batch = this.batches.get(batchId);
122
+ if (batch) batch.notified = true;
123
+ }
124
+
125
+ /** Read-only snapshot (for tests / debugging). */
126
+ snapshot(batchId: string): BatchSnapshot | undefined {
127
+ const batch = this.batches.get(batchId);
128
+ if (!batch) return undefined;
129
+ return {
130
+ batchId,
131
+ members: [...batch.members.values()],
132
+ terminal: [...batch.terminal.values()],
133
+ allDone:
134
+ batch.members.size > 0 &&
135
+ [...batch.members.keys()].every((id) => batch.terminal.has(id)),
136
+ notified: batch.notified,
137
+ };
138
+ }
139
+
140
+ /** Drop a batch (used on cleanup / test reset). */
141
+ dispose(batchId?: string): void {
142
+ if (batchId === undefined) this.batches.clear();
143
+ else this.batches.delete(batchId);
144
+ }
145
+ }
@@ -2,7 +2,8 @@ import type { AgentConfig, ResourceSource } from "../agents/agent-config.ts";
2
2
  import { discoverAgents } from "../agents/discover-agents.ts";
3
3
  import { discoverTeams } from "../teams/discover-teams.ts";
4
4
  import { discoverWorkflows } from "../workflows/discover-workflows.ts";
5
- import { discoverSkills } from "../skills/discover-skills.ts";
5
+ import { discoverSkills, getLastDiscoveryDiagnostics } from "../skills/discover-skills.ts";
6
+ import type { SkillValidationError } from "../skills/validate.ts";
6
7
  import type { PiTeamsConfig } from "../config/config.ts";
7
8
 
8
9
  export type CapabilityKind = "team" | "workflow" | "agent" | "skill" | "tool" | "runtime";
@@ -114,3 +115,21 @@ export function buildCapabilityInventory(cwd: string, config?: PiTeamsConfig): C
114
115
 
115
116
  return items.sort((a, b) => a.id.localeCompare(b.id));
116
117
  }
118
+
119
+ /**
120
+ * L3: surface skill-validation diagnostics from the most recent
121
+ * `discoverSkills()` call. Skills that fail HARD validation are silently
122
+ * excluded from `buildCapabilityInventory()`; this function exposes the
123
+ * underlying errors so users see WHY a skill is missing instead of just
124
+ * noticing the absence.
125
+ *
126
+ * Soft warnings (unknown props, derived-name fallback) are also returned so
127
+ * skill authors can clean up their frontmatter over time.
128
+ *
129
+ * IMPORTANT: `discoverSkills()` is internally cached for 30s, so this
130
+ * function returns diagnostics from whichever call last populated the cache.
131
+ * Call `buildCapabilityInventory(cwd)` first to ensure a fresh pass.
132
+ */
133
+ export function buildSkillValidationDiagnostics(): SkillValidationError[] {
134
+ return getLastDiscoveryDiagnostics();
135
+ }
@@ -12,6 +12,7 @@ import { attachPostExitStdioGuard, trySignalChild } from "./post-exit-stdio-guar
12
12
  import { redactJsonLine } from "../utils/redaction.ts";
13
13
  import { sanitizeEnvSecrets } from "../utils/env-filter.ts";
14
14
  import { registerChildProcess, unregisterChildProcess } from "../extension/crew-cleanup.ts";
15
+ import { classifyProcessCrash } from "./crash-classification.ts";
15
16
  import { resolveRealContainedPath } from "../utils/safe-paths.ts";
16
17
 
17
18
  const POST_EXIT_STDIO_GUARD_MS = DEFAULT_CHILD_PI.postExitStdioGuardMs;
@@ -380,7 +381,14 @@ function appendTranscript(input: ChildPiRunInput, line: string): void {
380
381
 
381
382
  function compactString(value: string, maxChars = MAX_COMPACT_CONTENT_CHARS): string {
382
383
  if (value.length <= maxChars) return value;
383
- return `${value.slice(0, maxChars)}\n[pi-crew compacted ${value.length - maxChars} chars]`;
384
+ // L4: head + tail instead of head-only. Keeps closing markdown structure
385
+ // (code fences, headings, list tails) instead of dropping them — the old
386
+ // head-only slice left unclosed ``` fences that downstream parsers and
387
+ // output-validator.ts flagged as "output may be truncated". Head gets 75%
388
+ // (opening structure + bulk of content); tail gets 25% (closing structure).
389
+ const head = Math.floor(maxChars * 0.75);
390
+ const tail = maxChars - head;
391
+ return `${value.slice(0, head)}\n...[pi-crew compacted ${value.length - maxChars} chars, head+tail preserved]...\n${value.slice(-tail)}`;
384
392
  }
385
393
 
386
394
  function compactValue(value: unknown): unknown {
@@ -905,7 +913,7 @@ export async function runChildPi(input: ChildPiRunInput): Promise<ChildPiRunResu
905
913
  } catch (err) {
906
914
  logInternalError("child-pi.on-lifecycle-event", err, `event=error, pid=${child.pid}`);
907
915
  }
908
- settle({ exitCode: null, stdout, stderr, error: processError.message });
916
+ settle({ exitCode: null, stdout, stderr, error: processError.message, exitStatus: { exitCode: null, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: false, cleanupErrors, finalDrainMs, crashClass: classifyProcessCrash({ exitCode: null, cancelled: abortRequested, timedOut: responseTimeoutHit, spawnError: error, stderrSnippet: stderr ? stderr.slice(-1000) : undefined }).crashClass } });
909
917
  });
910
918
  child.on("exit", (code, signal) => {
911
919
  if (child.pid) {
@@ -994,7 +1002,19 @@ export async function runChildPi(input: ChildPiRunInput): Promise<ChildPiRunResu
994
1002
  // is logged, not fatal). The steerError branch is retained for safety in
995
1003
  // case a future change reintroduces a fatal steer path.
996
1004
  const steerError = steerInjectionFailed ? "Steer injection failed due to stdin backpressure; process killed" : undefined;
997
- settle({ exitCode: finalExitCode, stdout, stderr, ...(timeoutError ? { error: timeoutError.error } : {}), ...(steerError ? { error: steerError } : {}), aborted: wasGraceAborted || wasParentAborted, steered: softLimitReached && !wasGraceAborted, exitStatus: { exitCode: finalExitCode, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: hardKilled, cleanupErrors, finalDrainMs } });
1005
+ // P0 crash taxonomy: classify the exit so callers/dashboards can bucket
1006
+ // failure modes (timeout vs cancel vs native panic vs signal …).
1007
+ // The classifier is a pure function; this is the single integration point.
1008
+ const crashClassification = classifyProcessCrash({
1009
+ exitCode: finalExitCode,
1010
+ signal: child.signalCode ?? undefined,
1011
+ cancelled: abortRequested,
1012
+ timedOut: responseTimeoutHit,
1013
+ killed: hardKilled,
1014
+ spawnError: undefined,
1015
+ stderrSnippet: stderr ? stderr.slice(-1000) : undefined,
1016
+ });
1017
+ settle({ exitCode: finalExitCode, stdout, stderr, ...(timeoutError ? { error: timeoutError.error } : {}), ...(steerError ? { error: steerError } : {}), aborted: wasGraceAborted || wasParentAborted, steered: softLimitReached && !wasGraceAborted, exitStatus: { exitCode: finalExitCode, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: hardKilled, cleanupErrors, finalDrainMs, crashClass: crashClassification.crashClass } });
998
1018
  });
999
1019
  });
1000
1020
  } finally {