npm - pi-crew - Versions diffs - 0.9.7 → 0.9.9 - Mend

pi-crew 0.9.7 → 0.9.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/CHANGELOG.md +95 -0
package/README.md +9 -2
package/package.json +3 -2
package/src/config/defaults.ts +8 -4
package/src/extension/register.ts +94 -21
package/src/extension/registration/subagent-helpers.ts +1 -0
package/src/extension/registration/subagent-tools.ts +9 -0
package/src/runtime/batch-barrier.ts +145 -0
package/src/runtime/capability-inventory.ts +20 -1
package/src/runtime/child-pi.ts +23 -3
package/src/runtime/crash-classification.ts +208 -0
package/src/runtime/custom-tools/irc-tool.ts +47 -7
package/src/runtime/live-agent-manager.ts +185 -0
package/src/runtime/process-lifecycle.ts +481 -0
package/src/runtime/subagent-manager.ts +6 -0
package/src/runtime/task-output-context.ts +77 -10
package/src/runtime/tool-output-pruner.ts +334 -0
package/src/skills/discover-skills.ts +61 -8
package/src/skills/validate.ts +267 -0
package/src/state/types.ts +5 -0
package/src/ui/keybinding-map.ts +128 -41
package/src/ui/run-event-bus.ts +83 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,100 @@
 # Changelog
+## [v0.9.9] — gajae-code distillation (4 P0) + notification race fix (2026-06-25)
+Six changes: four high-impact/low-effort features distilled from researching [Yeachan-Heo/gajae-code](https://github.com/Yeachan-Heo/gajae-code) (full report: `research-findings/gajae-code-distill.md`), plus a fix for a redundant-notification bug the leader directly hit while running that research. Each was calibrated against real pi-crew code — two reported "gaps" turned out to be patterns pi-crew already implements (prompt-level stablePrefix, detached spawning), and four areas where pi-crew is already superior were deliberately left untouched (crash-recovery byte-offset cursor, declarative workflow + semaphores, run-snapshot-cache, event sourcing).
+### P0 #1 — Crash classification taxonomy (commit fb8c4a8)
+`child-pi.ts` captured stderr/exit codes but never bucketed failure modes. New pure `classifyProcessCrash()` (port of gajae-code's `crash-diagnostics.ts`) maps exits to 9 semantic classes (`clean_exit | non_zero_exit | signal_exit | timeout | cancelled | spawn_error | protocol_exit | native_panic | unknown`) with precedence timeout > cancelled > spawn_error > native_panic > signal. Attached to `WorkerExitStatus` at both settle paths; kill/drain/timeout logic untouched. 30 unit tests.
+### P0 #2 — Staleness-aware tool output pruning (commit 13acf37)
+The L4 size-based compaction retained every copy of a re-read file until a size threshold tripped. New `tool-output-pruner.ts` (port of gajae-code's `pruning.ts`) drops superseded tool results — same-file re-reads and read-then-edit — **before** they are injected into a downstream worker's prompt via `task-output-context.ts`. Replaces stale content with a digest notice (first/last lines + count for bash/grep/search). OPT-IN via `DEFAULT_PRUNE_CONFIG`; does NOT regress the L4 head+tail(75/25) behavior. 25 unit tests.
+### P0 #3 — OwnedProcess abstraction (commit fe3bdde)
+Background spawns used `detached:true` but had no unified ownership primitive for guaranteed teardown. New `process-lifecycle.ts` adds `OwnedProcess` (escalating SIGTERM → grace → SIGKILL; Windows `taskkill /F /T /PID` fallback; idempotent `dispose()`; bounded `awaitExit()`; `onExit`) plus `registerResourceOwner()`/`disposeAllOwners()` for non-process resources (timers, sockets, Workers) and root-exit drain reconciliation. **Incremental adoption** — deliberately NOT migrating `child-pi.ts`'s battle-tested `killProcessTree`/post-exit-stdio-guard/hard-kill-timer or `async-runner.ts`'s intentionally-detached background spawns; the primitive is available for future ownership-scoped spawns (MCP/LSP/DAP servers, eval workers). 22 unit tests.
+### P0 #4 — IRC reply support, side-channel Q&A (commit 43bcd65)
+The `irc-tool` was fire-and-forget despite an `awaitReply` param marked "Not yet supported". New `respondAsBackground()` on `live-agent-manager.ts` delivers a DM to a recipient's session **without blocking its main loop** (`sendCustomMessage({triggerTurn:false})`) and awaits an event-driven, timeout-bounded reply via an in-memory pending-reply registry keyed by correlation id. `awaitReply:true` DMs now route through this side-channel and return reply content; broadcast stays fire-and-forget. Coexists with mailbox.ts's existing file-based reply fields (cross-process). 10 unit tests.
+### Notification race fix — Rule 2 + Rule 1 (commits 592d9ea, c22cbb9)
+While running the gajae-code research, the leader observed redundant "background subagent changed state" notifications arriving a turn late, after results were already read. Root cause: the completion callback (`SubagentManager.onComplete`) fires from inside the `record.promise` IIFE `finally` block — **before the promise resolves** — so a leader calling `get_subagent_result(wait:true)` sets `resultConsumed=true` only afterward, and the synchronous `if (record.resultConsumed) return` guard always saw `false`. A latent test bug (`assert sentMessages.length === 0` on an array that was unconditionally empty because `sendAgentWakeUp` prefers `sendUserMessage`) masked it.
+- **Rule 2** (592d9ea): defer notification emission to a `setTimeout(0)` **macrotask** (not `queueMicrotask` — microtasks queued in the finally run before the promise-resolution microtask), then recheck `resultConsumed` (in-memory `getRecord` + persisted `readPersistedSubagentRecord`) before emitting; suppress if already consumed. Covers all three `onComplete` call sites via the single emit point. Fixed the test assertion to `sentUserMessages` and added two explicit regression tests (notify still fires when leader does NOT pre-consume; notify suppressed when leader pre-consumes via `wait:true`).
+- **Rule 1** (c22cbb9): new `BatchBarrier` registry + optional `batch_id` param on the Agent tool. Background agents sharing a `batch_id` never emit individual notifications; instead each completion is recorded in the barrier and **one consolidated** "All N background subagents in batch \"X\" have finished" notification fires exactly once when every member reaches a terminal state (`blocked` is NOT terminal — a blocked agent resumes later). Verified end-to-end with 1/2/5-agent batches (one with a queued member and staggered 10–25s sleeps): exactly 1 consolidated notification, 0 individual leaks. Composes with Rule 2 (a batched agent whose result was already consumed is still suppressed via the `resultConsumed` recheck). 10 unit tests + 2 integration tests.
+Design doc: `research-findings/subagent-notification-race-fix.md`.
+### What was NOT adopted (pi-crew already superior)
+Crash recovery (`crash-recovery.ts`, 421 lines, byte-offset event-log cursor), declarative workflow + semaphores, `run-snapshot-cache` (disk rebuild, TTL 1500ms), and event sourcing (`readEventsCursor`) are all more sophisticated than gajae-code's equivalents and were left intact.
+## [v0.9.8] — deer-flow learning integration: L1/L2/L3/L4 (2026-06-24)
+Four improvements distilled from researching [bytedance/deer-flow](https://github.com/bytedance/deer-flow) and the wider Pi-ecosystem (pi-boomerang, pi-subagents, pi-dynamic-workflows). Each was calibrated against real pi-crew code (the research over-reported gaps — several patterns pi-crew already does *better* than deer-flow) and sized from measured data, not guesses.
+### L3 — Strict SKILL.md frontmatter validation (commit 5348c47)
+Malformed skills now **fail-fast at discovery** instead of silently producing broken behavior at runtime. New `src/skills/validate.ts` validates frontmatter against the `ALLOWED_SKILL_PROPS` whitelist using a **HYBRID policy**:
+- **HARD errors** (missing/malformed `name` or `description`, type mismatches) → skill excluded from `discoverSkills()`.
+- **SOFT warnings** (unknown props like `origin`/`triggers`, missing `name` derived from directory) → skill kept, surfaced via `getLastDiscoveryDiagnostics()` / `buildSkillValidationDiagnostics()`.
+Replaces the fragile line-prefix parser (broke on multi-line folded scalars `description: >`, quoted strings, nested YAML) with the `yaml` package (^2.9.0, already transitive, added as direct dep — zero install cost). Back-compat preserved: missing `name` derives from the directory; no-frontmatter skills still load with empty description.
+**Bonus value**: pre-flight on the real environment surfaced 2 user skills that were silently broken (`agent-browser`: `allowed-tools` wrong type; `spike-wrap-up`: `<>` in description).
+### L2 — Data-driven keybinding dispatch (commit 35fc3c6)
+Replaced the 30-line imperative `if (includes(...))` chain in `dashboardActionForKey` with a single `for (const b of BINDINGS)` loop driven by a declarative `BINDINGS[]` table. Adding a key now means editing ONE place (the table) instead of two (table + dispatch) — removes the DRY violation that caused table-vs-dispatch drift. `KEY_RESERVED` is now exported and derived.
+Behavior is **provably identical** to the old chain: a golden-snapshot parity test asserts every `(data, activePane)` pair returns the same action (~190 pairs). Pane-scoped bindings (`mailbox-detail`, `health-*`) precede their generic competitors so first-match-wins reproduces old precedence.
+The `inTextInput` guard from the original plan was **intentionally skipped** — overlays are mutually exclusive and each handles its own input (`mailbox-compose-overlay.ts` captures every single-char key), so there is no leak path. Documented in the commit.
+### L1 — RunEventBus.onWithReplay catch-up primitive (commit a2a478b)
+Closes the transient-subscriber-absence gap: when an overlay/widget is disposed and recreated (toggle, reconnect), live events emitted in that window are lost as notification triggers. `onWithReplay(runId, eventsPath, lastSeenSeq, callback)` replays missed events from the durable JSONL log before attaching the live listener, then dedups via `metadata.seq` so each event fires exactly once.
+Unlike deer-flow's 256-event RAM ring buffer (lost on crash), this reuses pi-crew's existing `readEventsCursor` — O(new bytes) via byte-offset incremental reads, monotonic seq, tail-capped. Strictly better: survives crashes, bounded memory. `RunEventPayload` gains optional `seq`; `emitFromTeamEvent` stamps it.
+The **primitive is landed + fully tested** (7 cases: replay order, dedup race, transient live-only, cursor bound, sinceSeq filter, missing-log fallback, unsubscribe). Dashboard wiring (switching `onAny()` → `onWithReplay()` per-run) is deferred — the dashboard subscribes across multiple runs and needs a subscription-model refactor; state isn't lost during absence anyway (`run-snapshot-cache` rebuilds from disk, TTL 1500ms).
+### L4 — Data-driven output thresholds + head/tail compaction (commit 463d08d)
+Worker output was being truncated at 3 points with thresholds sized by guess, not data. Measured 27 real result artifacts: **max 9226 bytes, median 8272, 100% under 16KB**. The old thresholds cut **62% of real outputs** (head-only, no recovery path). This change sizes thresholds from that data and switches compaction from head-only to head+tail so closing markdown structure (code fences, headings) survives.
+| Threshold | Before | After |
+|---|---|---|
+| `maxAssistantTextChars` | 8192 | **16384** |
+| `maxToolResultChars` | 1024 | **8192** |
+| `maxCompactContentChars` | 4096 | **8192** |
+| `maxToolInputChars` | 2048 | **4096** |
+| `readIfSmall` (3 inconsistent values) | 24K/40K/80K | **single 32KB** |
+| Compaction shape | head-only | **head(75%)+tail(25%)** |
+**Why not caveman-shrink** (the alternative considered): tested it on the same 27 artifacts — only 3.9% compression (vs 42% on prose fixtures) because pi-crew output is code-citation-heavy with little prose to strip, AND it has a real data-loss bug (`funccall` protected-pattern eats sentinel placeholders for the `identifier (inline-code)` pattern, corrupting 24/27 files with null bytes, 127 inline codes lost). caveman's *concept* (detect/validate) is worth borrowing but its engine doesn't fit pi-crew's content type. Threshold-only wins on the data.
+### Tests & verification
+- 10 new L4 tests, 25 L3 validator tests, 7 L1 replay tests, 7 L2 parity tests.
+- `npm run typecheck` + `check:lazy-imports` green.
+- End-to-end team-run smoke tests confirm all 4 features load and run without crash.
+- Real-world smoke scripts at `test/manual/l{1,2,3}-*-smoke.mjs`.
+- Research artifacts at `source/deer-flow/.research/` + `.crew/research/worker-output-handling.md`.
+### Backward compatibility
+All four changes are additive or behavior-preserving:
+- L3: valid skills unaffected; only malformed ones now excluded (was: silent breakage).
+- L2: golden-snapshot parity test proves identical dispatch.
+- L1: new method added; existing `on`/`onAny`/`emit` unchanged.
+- L4: outputs that fit (100% of measured real outputs) are unchanged; only oversized ones now keep head+tail instead of head-only.
 ## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
 P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`

package/README.md CHANGED Viewed

@@ -39,9 +39,9 @@ npm: pi-crew
 repo: https://github.com/baphuongna/pi-crew
 ```
-**v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
+**v0.9.4 / v0.9.5 / v0.9.8 / v0.9.9**: See [CHANGELOG.md](CHANGELOG.md).
-### Highlights (v0.6.4 → v0.9.5)
+### Highlights (v0.6.4 → v0.9.9)
 A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
 trust and cliff-resilience, stay lean, delete before adding.*
@@ -198,6 +198,9 @@ background-dispatch discriminator.
 - **Health scoring** — penalty-based run health with time-series snapshots
 - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
 - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Strict SKILL.md validation** (L3, v0.9.8) — skills with malformed frontmatter (missing/malformed `name`/`description`, type mismatches) now **fail-fast at discovery** with visible diagnostics, instead of silently producing broken behavior at runtime. HYBRID policy: HARD on required fields, SOFT (warn) on unknown props for forward-compat. Surfaced via `buildSkillValidationDiagnostics()`.
+- **Durable event replay** (L1, v0.9.8) — `RunEventBus.onWithReplay()` catches up a re-subscribing dashboard/overlay with events it missed during transient absence (toggle, reconnect), replaying from the durable JSONL log with seq-based dedup. No information loss even if the live subscriber was briefly gone.
+- **Lossless-by-default output handling** (L4, v0.9.8) — worker output thresholds sized from measured data (100% of real outputs fit without compaction); when compaction is unavoidable it keeps head+tail (preserves closing code fences/headings) instead of head-only truncation. No more `[pi-crew compacted N chars]` markers eating the end of a worker's result.
 ---
@@ -468,6 +471,10 @@ pi-crew survives Pi's context compaction. When the context is compacted (auto or
 Context compacted. 1 pi-crew run(s) still in-flight — use team status to continue.
 ```
+**Durable event replay** (v0.9.8, L1): even if a dashboard/overlay is briefly gone during compaction or a reconnect, `RunEventBus.onWithReplay()` catches it up with the events it missed, replaying from the durable JSONL log with seq-based dedup — no information loss. (The dashboard wires this up per-run; the primitive is available for any subscriber.)
+**Lossless-by-default worker output** (v0.9.8, L4): output-handling thresholds are sized from measured real data (100% of real worker outputs fit without any compaction). When compaction *is* unavoidable, it keeps head+tail instead of head-only truncation, so closing code fences and headings survive — no more `[pi-crew compacted N chars]` markers eating the end of a result.
 ### Plan-level human-in-the-loop (HITL)
 Set `runtime.requirePlanApproval = true` to gate **any workflow** at the plan→execute boundary. After the read-only (planning) phases complete, the run pauses for explicit approval before mutating tasks run:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-crew",
-  "version": "0.9.7",
+  "version": "0.9.9",
   "description": "Pi extension for coordinated AI teams, workflows, worktrees, and async task orchestration",
   "author": "baphuongna",
   "license": "MIT",
@@ -90,7 +90,8 @@
     "ajv": "^8.20.0",
     "cli-highlight": "^2.1.11",
     "diff": "^5.2.0",
-    "jiti": "^2.7.0"
+    "jiti": "^2.7.0",
+    "yaml": "^2.9.0"
   },
   "devDependencies": {
     "@biomejs/biome": "^2.4.15",

package/src/config/defaults.ts CHANGED Viewed

@@ -16,10 +16,14 @@ export const DEFAULT_CHILD_PI: Readonly<{
 	// Keep this as a coarse stuck-worker guard rather than a short per-message latency budget.
 	responseTimeoutMs: 5 * 60_000,
 	maxCaptureBytes: 256 * 1024,
-	maxAssistantTextChars: 8192,
-	maxToolResultChars: 1024,
-	maxToolInputChars: 2048,
-	maxCompactContentChars: 4096,
+	// L4 output-handling: thresholds sized from real worker-output data
+	// (27 result artifacts measured: max 9226 bytes, median 8272, 100% < 16KB).
+	// Previous values (8192/1024/4096) truncated 62% of real results.
+	// See .crew/research/worker-output-handling.md + source/deer-flow/.research/.
+	maxAssistantTextChars: 16_384,
+	maxToolResultChars: 8_192,
+	maxToolInputChars: 4_096,
+	maxCompactContentChars: 8_192,
 };
 export const DEFAULT_LIVE_SESSION = {

package/src/extension/register.ts CHANGED Viewed

@@ -56,7 +56,11 @@ import { createManifestCache } from "../runtime/manifest-cache.ts";
 import { CrewScheduler } from "../runtime/scheduler.ts";
 import { loadRunManifestById, updateRunStatus } from "../state/state-store.ts";
 import type { TeamRunManifest } from "../state/types.ts";
-import { SubagentManager } from "../subagents/manager.ts";
+import {
+	SubagentManager,
+	readPersistedSubagentRecord,
+} from "../subagents/manager.ts";
+import { BatchBarrier, type BatchMember } from "../runtime/batch-barrier.ts";
 import { terminateActiveChildPiProcesses } from "../subagents/spawn.ts";
 import {
 	type CrewWidgetState,
@@ -635,6 +639,7 @@ export function registerPiTeams(pi: ExtensionAPI): void {
 		!cleanedUp &&
 		currentCtx === ctx &&
 		sessionGeneration === ownerGeneration;
+	const batchBarrier = new BatchBarrier();
 	const subagentManager = new SubagentManager(
 		4,
 		(record) => {
@@ -651,22 +656,90 @@ export function registerPiTeams(pi: ExtensionAPI): void {
 					durationMs: record.durationMs,
 				});
 			}
-			if (!record.background || record.resultConsumed) return;
+			if (!record.background) return;
 			if (!isOwnerSessionCurrent(record.ownerSessionGeneration)) return;
 			if (
-				record.status === "completed" ||
-				record.status === "failed" ||
-				record.status === "cancelled" ||
-				record.status === "blocked" ||
-				record.status === "error"
-			) {
+				record.status !== "completed" &&
+				record.status !== "failed" &&
+				record.status !== "cancelled" &&
+				record.status !== "blocked" &&
+				record.status !== "error"
+			)
+				return;
+			// Rule 2 (consume-race fix): this callback fires from inside the
+			// record.promise IIFE `finally` block — BEFORE the promise resolves,
+			// i.e. before a leader calling `get_subagent_result(wait:true)` can
+			// set resultConsumed=true. The old synchronous guard always saw
+			// resultConsumed=false here. Defer emission to a MACROTASK
+			// (setTimeout, NOT queueMicrotask): macrotasks run only after the
+			// microtask queue drains — which includes the leader's
+			// `await record.promise` continuation that marks resultConsumed=true.
+			// Then recheck in-memory + persisted before emitting.
+			const agentId = record.id;
+			const ownerGen = record.ownerSessionGeneration;
+			const agentStatus = record.status;
+			const agentType = record.type;
+			const agentDescription = record.description;
+			const agentRunId = record.runId;
+			const agentBatchId = record.batchId;
+			setTimeout(() => {
+				if (cleanedUp) return;
+				const fresh = subagentManager.getRecord(agentId);
+				const persisted = currentCtx
+					? readPersistedSubagentRecord(currentCtx.cwd, agentId)
+					: undefined;
+				// Leader already joined the result -> suppress redundant notify.
+				if (fresh?.resultConsumed || persisted?.resultConsumed) return;
+				if (!isOwnerSessionCurrent(fresh?.ownerSessionGeneration ?? ownerGen))
+					return;
+				// Rule 1 (batch coalescing): if this agent belongs to a batch, never
+				// emit an individual notification. Instead record its terminal state
+				// in the barrier; emit ONE consolidated notification only when ALL
+				// members are terminal. Suppressed members wait silently.
+				if (agentBatchId) {
+					const member: BatchMember = {
+						id: agentId,
+						description: agentDescription,
+						type: agentType,
+						status: agentStatus,
+					};
+					const snap = batchBarrier.markTerminal(agentBatchId, member);
+					if (snap.allDone && !snap.notified) {
+						batchBarrier.markNotified(agentBatchId);
+						const roster = snap.terminal
+							.map(
+								(m) =>
+									`- ${m.id} [${m.status}] (${m.type ?? "agent"}): ${m.description ?? ""}`,
+							)
+							.join("\n");
+						const joinInstruction = [
+							`All ${snap.terminal.length} background subagents in batch "${agentBatchId}" have finished.`,
+							"Members:",
+							roster,
+							"",
+							`Call get_subagent_result for each agent_id above, read the outputs, then continue the user's original task.`,
+						].join("\n");
+						sendAgentWakeUp(pi, joinInstruction);
+						notifyOperator({
+							id: `subagent-batch:${agentBatchId}:completed`,
+							severity: "info",
+							source: "subagent-completed",
+							runId: agentRunId,
+							title: `pi-crew batch "${agentBatchId}" complete (${snap.terminal.length} agents).`,
+							body: `Members: ${snap.terminal.map((m) => m.id).join(", ")}`,
+						});
+					}
+					// Either we just emitted the consolidated notify, or we are still
+					// waiting for other members — in both cases do NOT emit individual.
+					return;
+				}
 				const metadata = JSON.stringify(
 					{
-						id: record.id,
-						status: record.status,
-						type: record.type,
-						runId: record.runId,
-						description: record.description,
+						id: agentId,
+						status: agentStatus,
+						type: agentType,
+						runId: agentRunId,
+						description: agentDescription,
 					},
 					null,
 					2,
@@ -677,19 +750,18 @@ export function registerPiTeams(pi: ExtensionAPI): void {
 					"```json",
 					metadata,
 					"```",
-					`Call get_subagent_result with agent_id="${record.id}" now, read the output, then continue the user's original task without waiting for another user prompt.`,
+					`Call get_subagent_result with agent_id="${agentId}" now, read the output, then continue the user's original task without waiting for another user prompt.`,
 				].join("\n");
 				sendAgentWakeUp(pi, joinInstruction);
 				notifyOperator({
-					id: `subagent:${record.id}:${record.status}`,
-					severity:
-						record.status === "completed" ? "info" : "warning",
+					id: `subagent:${agentId}:${agentStatus}`,
+					severity: agentStatus === "completed" ? "info" : "warning",
 					source: "subagent-completed",
-					runId: record.runId,
-					title: `pi-crew subagent ${record.id} ${record.status}.`,
-					body: `Use get_subagent_result with agent_id=${record.id} for output.`,
+					runId: agentRunId,
+					title: `pi-crew subagent ${agentId} ${agentStatus}.`,
+					body: `Use get_subagent_result with agent_id=${agentId} for output.`,
 				});
-			}
+			}, 0);
 		},
 		1000,
 		(event, payload) => {
@@ -2044,6 +2116,7 @@ export function registerPiTeams(pi: ExtensionAPI): void {
 		ownerSessionGeneration: captureSessionGeneration,
 		startForegroundRun: (ctx, runner, runId) =>
 			startForegroundRun(ctx as ExtensionContext, runner, runId),
+		batchBarrier,
 	});
 	time("register.tools");

package/src/extension/registration/subagent-helpers.ts CHANGED Viewed

@@ -98,5 +98,6 @@ export function __test__subagentSpawnParams(params: Record<string, unknown>, ctx
 		model: typeof params.model === "string" && params.model.trim() ? params.model.trim() : undefined,
 		skill: parseSkillParam(params.skill),
 		maxTurns: typeof params.max_turns === "number" && Number.isFinite(params.max_turns) ? params.max_turns : undefined,
+		batchId: typeof params.batch_id === "string" && params.batch_id.trim() ? params.batch_id.trim() : undefined,
 	};
 }

package/src/extension/registration/subagent-tools.ts CHANGED Viewed

@@ -15,6 +15,7 @@ async function handleTeamTool(params: Parameters<typeof HandleTeamToolFn>[0], ct
 }
 import { checkSubagentSpawnPermission, currentCrewRole } from "../../runtime/role-permission.ts";
 import { readPersistedSubagentRecord, savePersistedSubagentRecord, type SubagentManager, type SubagentSpawnOptions } from "../../subagents/manager.ts";
+import type { BatchBarrier } from "../../runtime/batch-barrier.ts";
 import { loadConfig } from "../../config/config.ts";
 import { logInternalError } from "../../utils/internal-error.ts";
 import { __test__subagentSpawnParams, formatSubagentRecord, readSubagentRunResult, refreshPersistedSubagentRecord, subagentToolResult } from "./subagent-helpers.ts";
@@ -32,6 +33,9 @@ type OnUpdate = (chunk: { content: { type: "text"; text: string }[] }) => void;
 export interface SubagentToolRegistrationOptions {
 	ownerSessionGeneration?: () => number;
 	startForegroundRun?: (ctx: unknown, runner: (signal?: AbortSignal) => Promise<void>, runId?: string) => void;
+	/** Rule 1 batch barrier. When present, agents spawned with a batchId are
+	 * registered here so their completion notifications are coalesced. */
+	batchBarrier?: BatchBarrier;
 }
 export function registerSubagentTools(pi: ExtensionAPI, subagentManager: SubagentManager, options: SubagentToolRegistrationOptions = {}): void {
@@ -53,6 +57,7 @@ export function registerSubagentTools(pi: ExtensionAPI, subagentManager: Subagen
 			skill: Type.Optional(Type.Union([Type.String(), Type.Array(Type.String()), Type.Boolean()], { description: "Skill name(s) to inject for this subagent, or false to disable selected/default skills." })),
 			max_turns: Type.Optional(Type.Number({ description: "Reserved for live-session subagents; child-process runtime may ignore this." })),
 			run_in_background: Type.Optional(Type.Boolean({ description: "Run in background and return an agent ID immediately." })),
+			batch_id: Type.Optional(Type.String({ description: "Optional batch grouping id. Background agents sharing the same batch_id receive ONE consolidated completion notification when ALL members finish (instead of N individual notifications). Use this when launching several background agents in one turn and you do not join them immediately. Omit for the default individual-notification behavior." })),
 		}) as never,
 		async execute(_id, params, signal, onUpdate, ctx) {
 			// Diagnostic: detect pre-aborted signal before spawn
@@ -71,6 +76,10 @@ export function registerSubagentTools(pi: ExtensionAPI, subagentManager: Subagen
 			const ctxWithSession = withSessionId(ctx);
 			const runner = async (currentOptions: SubagentSpawnOptions, childSignal?: AbortSignal) => handleTeamTool({ action: "run", agent: currentOptions.type, goal: currentOptions.prompt, model: currentOptions.model, skill: currentOptions.skill, async: currentOptions.background, config: currentOptions.maxTurns ? { runtime: { maxTurns: currentOptions.maxTurns } } : undefined } as TeamToolParamsValue, { ...ctxWithSession, signal: childSignal, ...(options.startForegroundRun ? { startForegroundRun: (runRunner: (sig?: AbortSignal) => Promise<void>, runId?: string) => options.startForegroundRun!(ctxWithSession, runRunner, runId) } : {}) });
 			const record = subagentManager.spawn(spawnOptions, runner, spawnOptions.background ? undefined : signal);
+			// Rule 1: register batch membership so completions can be coalesced.
+			if (spawnOptions.batchId && spawnOptions.background) {
+				options.batchBarrier?.register(spawnOptions.batchId, record.id, { description: record.description, type: record.type });
+			}
 			if (spawnOptions.background || record.status === "queued") {
 				// Phase 1.1a: Terminate turn for background queued — no LLM follow-up needed.
 				// Phase 1.6: Record was terminated for telemetry.

package/src/runtime/batch-barrier.ts ADDED Viewed

@@ -0,0 +1,145 @@
+/**
+ * BatchBarrier — Rule 1 (no-wait batch grouping).
+ *
+ * When a leader launches several background subagents with the SAME `batchId`
+ * and does NOT join them immediately (`get_subagent_result(wait:true)`), the
+ * completion notifications are coalesced: instead of N individual
+ * "changed state" wake-ups, the leader receives ONE consolidated notification
+ * once ALL members of the batch have reached a terminal state.
+ *
+ * Semantics:
+ * - `register(batchId, agentId)` is called at spawn time (synchronous within a
+ *   leader turn). All members of a batch are therefore known by the time the
+ *   first completion fires (completion is observed via the 1000ms poll loop).
+ * - `markTerminal(batchId, agentId)` returns whether THIS completion made every
+ *   registered member terminal ("allDone"). When allDone, the caller emits a
+ *   single consolidated notification and calls `markNotified`.
+ * - If a member reaches terminal after the batch already notified (late spawn
+ *   edge case), `markTerminal` returns allDone=false for the straggler path is
+ *   NOT covered — but `alreadyNotified` lets the caller suppress stray
+ *   individual notifications once the consolidated one fired.
+ *
+ * Thread-safety: single-threaded JS event loop. No locks needed.
+ */
+export interface BatchMember {
+	id: string;
+	description?: string;
+	type?: string;
+	status: string;
+}
+export interface BatchSnapshot {
+	batchId: string;
+	members: BatchMember[];
+	terminal: BatchMember[];
+	/** true when every registered member has reached a terminal state. */
+	allDone: boolean;
+	/** true once the consolidated notification has been emitted. */
+	notified: boolean;
+}
+const TERMINAL_STATUSES = new Set([
+	"completed",
+	"failed",
+	"cancelled",
+	"error",
+	"stopped",
+]);
+export function isTerminalStatus(status: string): boolean {
+	return TERMINAL_STATUSES.has(status);
+}
+export class BatchBarrier {
+	private readonly batches = new Map<
+		string,
+		{
+			members: Map<string, BatchMember>;
+			terminal: Map<string, BatchMember>;
+			notified: boolean;
+		}
+	>();
+	/** Register a member at spawn time. Idempotent per (batchId, agentId). */
+	register(batchId: string, agentId: string, meta?: { description?: string; type?: string }): void {
+		let batch = this.batches.get(batchId);
+		if (!batch) {
+			batch = { members: new Map(), terminal: new Map(), notified: false };
+			this.batches.set(batchId, batch);
+		}
+		if (!batch.members.has(agentId)) {
+			batch.members.set(agentId, {
+				id: agentId,
+				description: meta?.description,
+				type: meta?.type,
+				status: "running",
+			});
+		}
+	}
+	/**
+	 * Record that a member reached a terminal state. Returns the batch snapshot.
+	 * `snapshot.allDone` is true iff every registered member is now terminal.
+	 * If the batch was never seen (defensive edge case), the member is registered
+	 * on-the-fly as a batch-of-one so its terminal state is not silently lost.
+	 */
+	markTerminal(batchId: string, member: BatchMember): BatchSnapshot {
+		let batch = this.batches.get(batchId);
+		if (!batch) {
+			batch = { members: new Map(), terminal: new Map(), notified: false };
+			this.batches.set(batchId, batch);
+		}
+		// Ensure the member is known (auto-register for the defensive case).
+		if (!batch.members.has(member.id)) {
+			batch.members.set(member.id, { ...member, status: member.status });
+		}
+		if (isTerminalStatus(member.status)) {
+			batch.terminal.set(member.id, { ...member });
+			const existing = batch.members.get(member.id);
+			if (existing) batch.members.set(member.id, { ...existing, status: member.status });
+		}
+		const allDone =
+			batch.members.size > 0 &&
+			[...batch.members.keys()].every((id) => batch.terminal.has(id));
+		return {
+			batchId,
+			members: [...batch.members.values()],
+			terminal: [...batch.terminal.values()],
+			allDone,
+			notified: batch.notified,
+		};
+	}
+	/** Has the consolidated notification already been emitted for this batch? */
+	alreadyNotified(batchId: string): boolean {
+		return this.batches.get(batchId)?.notified ?? false;
+	}
+	/** Mark the consolidated notification as emitted. No-op if already set. */
+	markNotified(batchId: string): void {
+		const batch = this.batches.get(batchId);
+		if (batch) batch.notified = true;
+	}
+	/** Read-only snapshot (for tests / debugging). */
+	snapshot(batchId: string): BatchSnapshot | undefined {
+		const batch = this.batches.get(batchId);
+		if (!batch) return undefined;
+		return {
+			batchId,
+			members: [...batch.members.values()],
+			terminal: [...batch.terminal.values()],
+			allDone:
+				batch.members.size > 0 &&
+				[...batch.members.keys()].every((id) => batch.terminal.has(id)),
+			notified: batch.notified,
+		};
+	}
+	/** Drop a batch (used on cleanup / test reset). */
+	dispose(batchId?: string): void {
+		if (batchId === undefined) this.batches.clear();
+		else this.batches.delete(batchId);
+	}
+}

package/src/runtime/capability-inventory.ts CHANGED Viewed

@@ -2,7 +2,8 @@ import type { AgentConfig, ResourceSource } from "../agents/agent-config.ts";
 import { discoverAgents } from "../agents/discover-agents.ts";
 import { discoverTeams } from "../teams/discover-teams.ts";
 import { discoverWorkflows } from "../workflows/discover-workflows.ts";
-import { discoverSkills } from "../skills/discover-skills.ts";
+import { discoverSkills, getLastDiscoveryDiagnostics } from "../skills/discover-skills.ts";
+import type { SkillValidationError } from "../skills/validate.ts";
 import type { PiTeamsConfig } from "../config/config.ts";
 export type CapabilityKind = "team" | "workflow" | "agent" | "skill" | "tool" | "runtime";
@@ -114,3 +115,21 @@ export function buildCapabilityInventory(cwd: string, config?: PiTeamsConfig): C
 	return items.sort((a, b) => a.id.localeCompare(b.id));
 }
+/**
+ * L3: surface skill-validation diagnostics from the most recent
+ * `discoverSkills()` call. Skills that fail HARD validation are silently
+ * excluded from `buildCapabilityInventory()`; this function exposes the
+ * underlying errors so users see WHY a skill is missing instead of just
+ * noticing the absence.
+ *
+ * Soft warnings (unknown props, derived-name fallback) are also returned so
+ * skill authors can clean up their frontmatter over time.
+ *
+ * IMPORTANT: `discoverSkills()` is internally cached for 30s, so this
+ * function returns diagnostics from whichever call last populated the cache.
+ * Call `buildCapabilityInventory(cwd)` first to ensure a fresh pass.
+ */
+export function buildSkillValidationDiagnostics(): SkillValidationError[] {
+	return getLastDiscoveryDiagnostics();
+}

package/src/runtime/child-pi.ts CHANGED Viewed

@@ -12,6 +12,7 @@ import { attachPostExitStdioGuard, trySignalChild } from "./post-exit-stdio-guar
 import { redactJsonLine } from "../utils/redaction.ts";
 import { sanitizeEnvSecrets } from "../utils/env-filter.ts";
 import { registerChildProcess, unregisterChildProcess } from "../extension/crew-cleanup.ts";
+import { classifyProcessCrash } from "./crash-classification.ts";
 import { resolveRealContainedPath } from "../utils/safe-paths.ts";
 const POST_EXIT_STDIO_GUARD_MS = DEFAULT_CHILD_PI.postExitStdioGuardMs;
@@ -380,7 +381,14 @@ function appendTranscript(input: ChildPiRunInput, line: string): void {
 function compactString(value: string, maxChars = MAX_COMPACT_CONTENT_CHARS): string {
 	if (value.length <= maxChars) return value;
-	return `${value.slice(0, maxChars)}\n[pi-crew compacted ${value.length - maxChars} chars]`;
+	// L4: head + tail instead of head-only. Keeps closing markdown structure
+	// (code fences, headings, list tails) instead of dropping them — the old
+	// head-only slice left unclosed ``` fences that downstream parsers and
+	// output-validator.ts flagged as "output may be truncated". Head gets 75%
+	// (opening structure + bulk of content); tail gets 25% (closing structure).
+	const head = Math.floor(maxChars * 0.75);
+	const tail = maxChars - head;
+	return `${value.slice(0, head)}\n...[pi-crew compacted ${value.length - maxChars} chars, head+tail preserved]...\n${value.slice(-tail)}`;
 }
 function compactValue(value: unknown): unknown {
@@ -905,7 +913,7 @@ export async function runChildPi(input: ChildPiRunInput): Promise<ChildPiRunResu
 				} catch (err) {
 					logInternalError("child-pi.on-lifecycle-event", err, `event=error, pid=${child.pid}`);
 				}
-				settle({ exitCode: null, stdout, stderr, error: processError.message });
+				settle({ exitCode: null, stdout, stderr, error: processError.message, exitStatus: { exitCode: null, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: false, cleanupErrors, finalDrainMs, crashClass: classifyProcessCrash({ exitCode: null, cancelled: abortRequested, timedOut: responseTimeoutHit, spawnError: error, stderrSnippet: stderr ? stderr.slice(-1000) : undefined }).crashClass } });
 			});
 			child.on("exit", (code, signal) => {
 				if (child.pid) {
@@ -994,7 +1002,19 @@ export async function runChildPi(input: ChildPiRunInput): Promise<ChildPiRunResu
 				// is logged, not fatal). The steerError branch is retained for safety in
 				// case a future change reintroduces a fatal steer path.
 				const steerError = steerInjectionFailed ? "Steer injection failed due to stdin backpressure; process killed" : undefined;
-				settle({ exitCode: finalExitCode, stdout, stderr, ...(timeoutError ? { error: timeoutError.error } : {}), ...(steerError ? { error: steerError } : {}), aborted: wasGraceAborted || wasParentAborted, steered: softLimitReached && !wasGraceAborted, exitStatus: { exitCode: finalExitCode, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: hardKilled, cleanupErrors, finalDrainMs } });
+				// P0 crash taxonomy: classify the exit so callers/dashboards can bucket
+				// failure modes (timeout vs cancel vs native panic vs signal …).
+				// The classifier is a pure function; this is the single integration point.
+				const crashClassification = classifyProcessCrash({
+					exitCode: finalExitCode,
+					signal: child.signalCode ?? undefined,
+					cancelled: abortRequested,
+					timedOut: responseTimeoutHit,
+					killed: hardKilled,
+					spawnError: undefined,
+					stderrSnippet: stderr ? stderr.slice(-1000) : undefined,
+				});
+				settle({ exitCode: finalExitCode, stdout, stderr, ...(timeoutError ? { error: timeoutError.error } : {}), ...(steerError ? { error: steerError } : {}), aborted: wasGraceAborted || wasParentAborted, steered: softLimitReached && !wasGraceAborted, exitStatus: { exitCode: finalExitCode, cancelled: abortRequested, timedOut: responseTimeoutHit, killed: hardKilled, cleanupErrors, finalDrainMs, crashClass: crashClassification.crashClass } });
 			});
 		});
 	} finally {