npm - pi-crew - Versions diffs - 0.9.7 → 0.9.8 - Mend

pi-crew 0.9.7 → 0.9.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +62 -0
package/README.md +9 -2
package/package.json +3 -2
package/src/config/defaults.ts +8 -4
package/src/runtime/capability-inventory.ts +20 -1
package/src/runtime/child-pi.ts +8 -1
package/src/runtime/task-output-context.ts +25 -9
package/src/skills/discover-skills.ts +61 -8
package/src/skills/validate.ts +267 -0
package/src/ui/keybinding-map.ts +128 -41
package/src/ui/run-event-bus.ts +83 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,67 @@
 # Changelog
+## [v0.9.8] — deer-flow learning integration: L1/L2/L3/L4 (2026-06-24)
+Four improvements distilled from researching [bytedance/deer-flow](https://github.com/bytedance/deer-flow) and the wider Pi-ecosystem (pi-boomerang, pi-subagents, pi-dynamic-workflows). Each was calibrated against real pi-crew code (the research over-reported gaps — several patterns pi-crew already does *better* than deer-flow) and sized from measured data, not guesses.
+### L3 — Strict SKILL.md frontmatter validation (commit 5348c47)
+Malformed skills now **fail-fast at discovery** instead of silently producing broken behavior at runtime. New `src/skills/validate.ts` validates frontmatter against the `ALLOWED_SKILL_PROPS` whitelist using a **HYBRID policy**:
+- **HARD errors** (missing/malformed `name` or `description`, type mismatches) → skill excluded from `discoverSkills()`.
+- **SOFT warnings** (unknown props like `origin`/`triggers`, missing `name` derived from directory) → skill kept, surfaced via `getLastDiscoveryDiagnostics()` / `buildSkillValidationDiagnostics()`.
+Replaces the fragile line-prefix parser (broke on multi-line folded scalars `description: >`, quoted strings, nested YAML) with the `yaml` package (^2.9.0, already transitive, added as direct dep — zero install cost). Back-compat preserved: missing `name` derives from the directory; no-frontmatter skills still load with empty description.
+**Bonus value**: pre-flight on the real environment surfaced 2 user skills that were silently broken (`agent-browser`: `allowed-tools` wrong type; `spike-wrap-up`: `<>` in description).
+### L2 — Data-driven keybinding dispatch (commit 35fc3c6)
+Replaced the 30-line imperative `if (includes(...))` chain in `dashboardActionForKey` with a single `for (const b of BINDINGS)` loop driven by a declarative `BINDINGS[]` table. Adding a key now means editing ONE place (the table) instead of two (table + dispatch) — removes the DRY violation that caused table-vs-dispatch drift. `KEY_RESERVED` is now exported and derived.
+Behavior is **provably identical** to the old chain: a golden-snapshot parity test asserts every `(data, activePane)` pair returns the same action (~190 pairs). Pane-scoped bindings (`mailbox-detail`, `health-*`) precede their generic competitors so first-match-wins reproduces old precedence.
+The `inTextInput` guard from the original plan was **intentionally skipped** — overlays are mutually exclusive and each handles its own input (`mailbox-compose-overlay.ts` captures every single-char key), so there is no leak path. Documented in the commit.
+### L1 — RunEventBus.onWithReplay catch-up primitive (commit a2a478b)
+Closes the transient-subscriber-absence gap: when an overlay/widget is disposed and recreated (toggle, reconnect), live events emitted in that window are lost as notification triggers. `onWithReplay(runId, eventsPath, lastSeenSeq, callback)` replays missed events from the durable JSONL log before attaching the live listener, then dedups via `metadata.seq` so each event fires exactly once.
+Unlike deer-flow's 256-event RAM ring buffer (lost on crash), this reuses pi-crew's existing `readEventsCursor` — O(new bytes) via byte-offset incremental reads, monotonic seq, tail-capped. Strictly better: survives crashes, bounded memory. `RunEventPayload` gains optional `seq`; `emitFromTeamEvent` stamps it.
+The **primitive is landed + fully tested** (7 cases: replay order, dedup race, transient live-only, cursor bound, sinceSeq filter, missing-log fallback, unsubscribe). Dashboard wiring (switching `onAny()` → `onWithReplay()` per-run) is deferred — the dashboard subscribes across multiple runs and needs a subscription-model refactor; state isn't lost during absence anyway (`run-snapshot-cache` rebuilds from disk, TTL 1500ms).
+### L4 — Data-driven output thresholds + head/tail compaction (commit 463d08d)
+Worker output was being truncated at 3 points with thresholds sized by guess, not data. Measured 27 real result artifacts: **max 9226 bytes, median 8272, 100% under 16KB**. The old thresholds cut **62% of real outputs** (head-only, no recovery path). This change sizes thresholds from that data and switches compaction from head-only to head+tail so closing markdown structure (code fences, headings) survives.
+| Threshold | Before | After |
+|---|---|---|
+| `maxAssistantTextChars` | 8192 | **16384** |
+| `maxToolResultChars` | 1024 | **8192** |
+| `maxCompactContentChars` | 4096 | **8192** |
+| `maxToolInputChars` | 2048 | **4096** |
+| `readIfSmall` (3 inconsistent values) | 24K/40K/80K | **single 32KB** |
+| Compaction shape | head-only | **head(75%)+tail(25%)** |
+**Why not caveman-shrink** (the alternative considered): tested it on the same 27 artifacts — only 3.9% compression (vs 42% on prose fixtures) because pi-crew output is code-citation-heavy with little prose to strip, AND it has a real data-loss bug (`funccall` protected-pattern eats sentinel placeholders for the `identifier (inline-code)` pattern, corrupting 24/27 files with null bytes, 127 inline codes lost). caveman's *concept* (detect/validate) is worth borrowing but its engine doesn't fit pi-crew's content type. Threshold-only wins on the data.
+### Tests & verification
+- 10 new L4 tests, 25 L3 validator tests, 7 L1 replay tests, 7 L2 parity tests.
+- `npm run typecheck` + `check:lazy-imports` green.
+- End-to-end team-run smoke tests confirm all 4 features load and run without crash.
+- Real-world smoke scripts at `test/manual/l{1,2,3}-*-smoke.mjs`.
+- Research artifacts at `source/deer-flow/.research/` + `.crew/research/worker-output-handling.md`.
+### Backward compatibility
+All four changes are additive or behavior-preserving:
+- L3: valid skills unaffected; only malformed ones now excluded (was: silent breakage).
+- L2: golden-snapshot parity test proves identical dispatch.
+- L1: new method added; existing `on`/`onAny`/`emit` unchanged.
+- L4: outputs that fit (100% of measured real outputs) are unchanged; only oversized ones now keep head+tail instead of head-only.
 ## [v0.9.7] — round-18 + process-safety fix (2026-06-23)
 P2-3 feature: durable checkpoint + resume for dynamic-workflow runs. When a `.dwf.ts`

package/README.md CHANGED Viewed

@@ -39,9 +39,9 @@ npm: pi-crew
 repo: https://github.com/baphuongna/pi-crew
 ```
-**v0.9.4 / v0.9.5**: See [CHANGELOG.md](CHANGELOG.md).
+**v0.9.4 / v0.9.5 / v0.9.8**: See [CHANGELOG.md](CHANGELOG.md).
-### Highlights (v0.6.4 → v0.9.5)
+### Highlights (v0.6.4 → v0.9.8)
 A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
 trust and cliff-resilience, stay lean, delete before adding.*
@@ -198,6 +198,9 @@ background-dispatch discriminator.
 - **Health scoring** — penalty-based run health with time-series snapshots
 - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
 - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `ctx.phase()` marks logical phases; **round-14** adds `ctx.log()` (durable `dwf.log` events), `ctx.budget` (per-workflow token budget that auto-rejects `ctx.agent()` when exhausted), and `ctx.args<T>()` (typed workflow arguments). TypeScript IntelliSense is available via `import type { WorkflowCtx } from "pi-crew/workflow"`. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
+- **Strict SKILL.md validation** (L3, v0.9.8) — skills with malformed frontmatter (missing/malformed `name`/`description`, type mismatches) now **fail-fast at discovery** with visible diagnostics, instead of silently producing broken behavior at runtime. HYBRID policy: HARD on required fields, SOFT (warn) on unknown props for forward-compat. Surfaced via `buildSkillValidationDiagnostics()`.
+- **Durable event replay** (L1, v0.9.8) — `RunEventBus.onWithReplay()` catches up a re-subscribing dashboard/overlay with events it missed during transient absence (toggle, reconnect), replaying from the durable JSONL log with seq-based dedup. No information loss even if the live subscriber was briefly gone.
+- **Lossless-by-default output handling** (L4, v0.9.8) — worker output thresholds sized from measured data (100% of real outputs fit without compaction); when compaction is unavoidable it keeps head+tail (preserves closing code fences/headings) instead of head-only truncation. No more `[pi-crew compacted N chars]` markers eating the end of a worker's result.
 ---
@@ -468,6 +471,10 @@ pi-crew survives Pi's context compaction. When the context is compacted (auto or
 Context compacted. 1 pi-crew run(s) still in-flight — use team status to continue.
 ```
+**Durable event replay** (v0.9.8, L1): even if a dashboard/overlay is briefly gone during compaction or a reconnect, `RunEventBus.onWithReplay()` catches it up with the events it missed, replaying from the durable JSONL log with seq-based dedup — no information loss. (The dashboard wires this up per-run; the primitive is available for any subscriber.)
+**Lossless-by-default worker output** (v0.9.8, L4): output-handling thresholds are sized from measured real data (100% of real worker outputs fit without any compaction). When compaction *is* unavoidable, it keeps head+tail instead of head-only truncation, so closing code fences and headings survive — no more `[pi-crew compacted N chars]` markers eating the end of a result.
 ### Plan-level human-in-the-loop (HITL)
 Set `runtime.requirePlanApproval = true` to gate **any workflow** at the plan→execute boundary. After the read-only (planning) phases complete, the run pauses for explicit approval before mutating tasks run:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-crew",
-  "version": "0.9.7",
+  "version": "0.9.8",
   "description": "Pi extension for coordinated AI teams, workflows, worktrees, and async task orchestration",
   "author": "baphuongna",
   "license": "MIT",
@@ -90,7 +90,8 @@
     "ajv": "^8.20.0",
     "cli-highlight": "^2.1.11",
     "diff": "^5.2.0",
-    "jiti": "^2.7.0"
+    "jiti": "^2.7.0",
+    "yaml": "^2.9.0"
   },
   "devDependencies": {
     "@biomejs/biome": "^2.4.15",

package/src/config/defaults.ts CHANGED Viewed

@@ -16,10 +16,14 @@ export const DEFAULT_CHILD_PI: Readonly<{
 	// Keep this as a coarse stuck-worker guard rather than a short per-message latency budget.
 	responseTimeoutMs: 5 * 60_000,
 	maxCaptureBytes: 256 * 1024,
-	maxAssistantTextChars: 8192,
-	maxToolResultChars: 1024,
-	maxToolInputChars: 2048,
-	maxCompactContentChars: 4096,
+	// L4 output-handling: thresholds sized from real worker-output data
+	// (27 result artifacts measured: max 9226 bytes, median 8272, 100% < 16KB).
+	// Previous values (8192/1024/4096) truncated 62% of real results.
+	// See .crew/research/worker-output-handling.md + source/deer-flow/.research/.
+	maxAssistantTextChars: 16_384,
+	maxToolResultChars: 8_192,
+	maxToolInputChars: 4_096,
+	maxCompactContentChars: 8_192,
 };
 export const DEFAULT_LIVE_SESSION = {

package/src/runtime/capability-inventory.ts CHANGED Viewed

@@ -2,7 +2,8 @@ import type { AgentConfig, ResourceSource } from "../agents/agent-config.ts";
 import { discoverAgents } from "../agents/discover-agents.ts";
 import { discoverTeams } from "../teams/discover-teams.ts";
 import { discoverWorkflows } from "../workflows/discover-workflows.ts";
-import { discoverSkills } from "../skills/discover-skills.ts";
+import { discoverSkills, getLastDiscoveryDiagnostics } from "../skills/discover-skills.ts";
+import type { SkillValidationError } from "../skills/validate.ts";
 import type { PiTeamsConfig } from "../config/config.ts";
 export type CapabilityKind = "team" | "workflow" | "agent" | "skill" | "tool" | "runtime";
@@ -114,3 +115,21 @@ export function buildCapabilityInventory(cwd: string, config?: PiTeamsConfig): C
 	return items.sort((a, b) => a.id.localeCompare(b.id));
 }
+/**
+ * L3: surface skill-validation diagnostics from the most recent
+ * `discoverSkills()` call. Skills that fail HARD validation are silently
+ * excluded from `buildCapabilityInventory()`; this function exposes the
+ * underlying errors so users see WHY a skill is missing instead of just
+ * noticing the absence.
+ *
+ * Soft warnings (unknown props, derived-name fallback) are also returned so
+ * skill authors can clean up their frontmatter over time.
+ *
+ * IMPORTANT: `discoverSkills()` is internally cached for 30s, so this
+ * function returns diagnostics from whichever call last populated the cache.
+ * Call `buildCapabilityInventory(cwd)` first to ensure a fresh pass.
+ */
+export function buildSkillValidationDiagnostics(): SkillValidationError[] {
+	return getLastDiscoveryDiagnostics();
+}

package/src/runtime/child-pi.ts CHANGED Viewed

@@ -380,7 +380,14 @@ function appendTranscript(input: ChildPiRunInput, line: string): void {
 function compactString(value: string, maxChars = MAX_COMPACT_CONTENT_CHARS): string {
 	if (value.length <= maxChars) return value;
-	return `${value.slice(0, maxChars)}\n[pi-crew compacted ${value.length - maxChars} chars]`;
+	// L4: head + tail instead of head-only. Keeps closing markdown structure
+	// (code fences, headings, list tails) instead of dropping them — the old
+	// head-only slice left unclosed ``` fences that downstream parsers and
+	// output-validator.ts flagged as "output may be truncated". Head gets 75%
+	// (opening structure + bulk of content); tail gets 25% (closing structure).
+	const head = Math.floor(maxChars * 0.75);
+	const tail = maxChars - head;
+	return `${value.slice(0, head)}\n...[pi-crew compacted ${value.length - maxChars} chars, head+tail preserved]...\n${value.slice(-tail)}`;
 }
 function compactValue(value: unknown): unknown {

package/src/runtime/task-output-context.ts CHANGED Viewed

@@ -30,20 +30,36 @@ function containedExists(filePath: string, baseDir?: string): boolean {
 	}
 }
-function readIfSmall(filePath: string, maxBytes = 24_000, baseDir?: string): string | undefined {
+/**
+ * L4 output-handling: single consistent threshold for all artifact reads.
+ * Sized from real data (27 result artifacts: max 9226 bytes; 100% < 16KB).
+ * 32KB gives 2x headroom over the largest observed real output while still
+ * bounding memory. Larger than the old inconsistent per-call-site values
+ * (24K/40K/80K) which truncated the same artifact differently depending on
+ * which code path read it.
+ */
+const MAX_RESULT_INLINE_BYTES = 32_000;
+function readIfSmall(filePath: string, baseDir?: string): string | undefined {
+	const maxBytes = MAX_RESULT_INLINE_BYTES;
 	try {
 		const safePath = baseDir ? resolveRealContainedPath(baseDir, filePath) : filePath;
 		const stat = fs.statSync(safePath);
 		if (stat.size > maxBytes) {
-			// Use bounded read to avoid loading entire file into memory
-			const buf = Buffer.alloc(maxBytes);
+			// L4: head + tail instead of head-only. Keeps closing markdown
+			// structure (code fences, headings) instead of leaving them truncated.
+			const head = Math.floor(maxBytes * 0.75);
+			const tail = maxBytes - head;
+			const headBuf = Buffer.alloc(head);
+			const tailBuf = Buffer.alloc(tail);
 			const fd = fs.openSync(safePath, "r");
 			try {
-				fs.readSync(fd, buf, 0, maxBytes, 0);
+				fs.readSync(fd, headBuf, 0, head, 0);
+				fs.readSync(fd, tailBuf, 0, tail, stat.size - tail);
 			} finally {
 				fs.closeSync(fd);
 			}
-			return `${buf.toString("utf-8")}\n\n...(truncated ${stat.size - maxBytes} bytes)`;
+			return `${headBuf.toString("utf-8")}\n\n...[pi-crew truncated ${stat.size - maxBytes} bytes, head+tail preserved]...\n${tailBuf.toString("utf-8")}`;
 		}
 		return fs.readFileSync(safePath, "utf-8");
 	} catch {
@@ -99,7 +115,7 @@ export function collectDependencyOutputContext(manifest: TeamRunManifest, tasks:
 	const byStep = new Map(tasks.map((item) => [item.stepId, item]).filter((entry): entry is [string, TeamTaskState] => Boolean(entry[0])));
 	const byId = new Map(tasks.map((item) => [item.id, item]));
 	const dependencies = task.dependsOn.map((dep) => byStep.get(dep) ?? byId.get(dep)).filter((item): item is TeamTaskState => Boolean(item)).map((item) => {
-		const resultText = item.resultArtifact ? readIfSmall(item.resultArtifact.path, 24_000, manifest.artifactsRoot) : undefined;
+		const resultText = item.resultArtifact ? readIfSmall(item.resultArtifact.path, manifest.artifactsRoot) : undefined;
 		return {
 			taskId: item.id,
 			role: item.role,
@@ -113,7 +129,7 @@ export function collectDependencyOutputContext(manifest: TeamRunManifest, tasks:
 	});
 	const sharedReads = (step.reads === false ? [] : step.reads ?? []).map((name) => {
 		const filePath = sharedPath(manifest, name);
-		return { name, path: filePath, content: readIfSmall(filePath, 24_000, path.resolve(manifest.artifactsRoot, "shared")) ?? "" };
+		return { name, path: filePath, content: readIfSmall(filePath, path.resolve(manifest.artifactsRoot, "shared")) ?? "" };
 	}).filter((item) => item.content.trim().length > 0);
 	return { dependencies, sharedReads };
 }
@@ -139,7 +155,7 @@ export function renderDependencyOutputContext(context: DependencyOutputContext):
 export function writeTaskSharedOutput(manifest: TeamRunManifest, step: WorkflowStep, task: TeamTaskState): ArtifactDescriptor | undefined {
 	if (step.output === false) return undefined;
 	const name = safeSharedName(step.output || `${task.id}.md`);
-	const source = task.resultArtifact ? readIfSmall(task.resultArtifact.path, 80_000, manifest.artifactsRoot) : undefined;
+	const source = task.resultArtifact ? readIfSmall(task.resultArtifact.path, manifest.artifactsRoot) : undefined;
 	if (!source) return undefined;
 	return writeArtifact(manifest.artifactsRoot, {
 		kind: "metadata",
@@ -160,7 +176,7 @@ export function writeTaskInputsArtifact(manifest: TeamRunManifest, task: TeamTas
 export function aggregateTaskOutputs(tasks: TeamTaskState[], manifest?: TeamRunManifest): string {
 	return tasks.map((task, index) => {
-		const body = task.resultArtifact ? readIfSmall(task.resultArtifact.path, 40_000, manifest?.artifactsRoot) : undefined;
+		const body = task.resultArtifact ? readIfSmall(task.resultArtifact.path, manifest?.artifactsRoot) : undefined;
 		const hasBody = Boolean(body?.trim());
 		const expectedMissing = task.resultArtifact && !containedExists(task.resultArtifact.path, manifest?.artifactsRoot);
 		const status = task.status === "skipped"

package/src/skills/discover-skills.ts CHANGED Viewed

@@ -7,6 +7,7 @@ import { fileURLToPath } from "node:url";
 import { getAgentDir } from "../runtime/peer-dep.ts";
 import { logInternalError } from "../utils/internal-error.ts";
 import { isSafePathId, resolveContainedPath, resolveRealContainedPath } from "../utils/safe-paths.ts";
+import { parseSkillFrontmatter, validateSkillFrontmatter, type SkillValidationError } from "./validate.ts";
 const PACKAGE_SKILLS_DIR = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..", "..", "skills");
@@ -54,16 +55,44 @@ function listSkillDirs(cwd: string): Array<{ root: string; source: SkillDescript
 	];
 }
-function frontmatterDescription(content: string): string | undefined {
-	const match = /^---\r?\n([\s\S]*?)\r?\n---/.exec(content);
-	if (!match) return undefined;
-	const line = match[1].split(/\r?\n/).find((entry) => entry.startsWith("description:"));
-	return line?.slice("description:".length).trim();
+// ── Diagnostics (L3) ──────────────────────────────────────────────────────────
+// Module-level buffer populated each `discoverSkills()` call. Cleared at the
+// start of every call so callers see only the most recent run's diagnostics.
+// Surfaced via `getLastDiscoveryDiagnostics()` so capability-inventory and
+// other consumers can convert silent exclusions into visible feedback.
+let lastDiagnostics: SkillValidationError[] = [];
+export function getLastDiscoveryDiagnostics(): SkillValidationError[] {
+	return lastDiagnostics;
+}
+/**
+ * Parse frontmatter defensively. Falls back to the legacy line-prefix match
+ * if YAML parsing fails — preserves back-compat for malformed but readable
+ * SKILL.md files that pre-date the validator (we record a diagnostic in that
+ * case but still return the description we could salvage).
+ */
+function readDescription(content: string): { description: string; parseError: string | null } {
+	const parsed = parseSkillFrontmatter(content);
+	if (parsed.ok) {
+		const d = parsed.data.description;
+		return { description: typeof d === "string" ? d : "", parseError: null };
+	}
+	// YAML parse failed — fall back to legacy line-prefix match so we don't
+	// regress existing skills whose frontmatter the old parser could read.
+	const legacyMatch = /^---\r?\n([\s\S]*?)\r?\n---/.exec(content);
+	if (legacyMatch) {
+		const line = legacyMatch[1].split(/\r?\n/).find((entry) => entry.startsWith("description:"));
+		const fallback = line?.slice("description:".length).trim() ?? "";
+		return { description: fallback, parseError: parsed.error };
+	}
+	return { description: "", parseError: parsed.error };
 }
 export function discoverSkills(cwd: string): SkillDescriptor[] {
 	if (cache && cache.cwd === cwd && Date.now() - cache.cachedAt < CACHE_TTL_MS) return cache.skills;
 	const results: SkillDescriptor[] = [];
+	const diagnostics: SkillValidationError[] = [];
 	for (const dir of listSkillDirs(cwd)) {
 		if (!fs.existsSync(dir.root)) continue;
 		try {
@@ -94,7 +123,16 @@ export function discoverSkills(cwd: string): SkillDescriptor[] {
 						// (e.g. macOS /var → /private/var). Fall through with un-resolved path.
 					}
 					const content = fs.readFileSync(readPath, "utf-8");
-					description = frontmatterDescription(content) ?? "";
+					const { description: desc, parseError } = readDescription(content);
+					description = desc;
+					if (parseError) {
+						diagnostics.push({
+							path: path.dirname(skillMdPath),
+							field: "frontmatter",
+							reason: parseError,
+							severity: "error",
+						});
+					}
 				} catch (error) {
 					logInternalError("discoverSkills.readSkill", error, `skill=${entry.name}`);
 				}
@@ -104,6 +142,21 @@ export function discoverSkills(cwd: string): SkillDescriptor[] {
 			logInternalError("discoverSkills.readdir", error, `root=${dir.root}`);
 		}
 	}
-	cache = { skills: results, cachedAt: Date.now(), cwd };
-	return results;
+	// L3: strict validation pass after we've collected every (skill, source)
+	// candidate. Excludes malformed skills (HYBRID policy: missing/malformed
+	// `name`/`description` hard-fail; unknown props warn). Diagnostics are
+	// always recorded, including for skills that PASSED validation but had
+	// unknown-prop warnings.
+	const filtered: SkillDescriptor[] = [];
+	for (const skill of results) {
+		const validation = validateSkillFrontmatter(path.dirname(skill.path));
+		if (validation.ok) {
+			filtered.push(skill);
+		} else {
+			diagnostics.push(...validation.errors);
+		}
+	}
+	lastDiagnostics = diagnostics;
+	cache = { skills: filtered, cachedAt: Date.now(), cwd };
+	return filtered;
 }

package/src/skills/validate.ts ADDED Viewed

@@ -0,0 +1,267 @@
+/**
+ * SKILL.md frontmatter validation (L3 of deer-flow→pi-crew plan).
+ *
+ * Parses a SKILL.md's YAML frontmatter and validates it against the
+ * `ALLOWED_SKILL_PROPS` whitelist using HYBRID policy:
+ *   - HARD errors (missing/malformed `name`/`description`, type mismatches,
+ *     read/parse failures): EXCLUDE the skill from `discoverSkills()`.
+ *   - SOFT warnings (unknown props, missing `name` derived from directory):
+ *     KEEP the skill; surface via `getLastDiscoveryDiagnostics()`.
+ *
+ * YAML parsing uses the `yaml` package (^2.9.0). It is now a direct dep;
+ * before L3 it was only transitively available through
+ * `@earendil-works/pi-coding-agent`. Adding it as a direct dep is justified by:
+ *   - Replaces a fragile line-prefix parser that broke on multi-line folded
+ *     scalars (`description: >`), quoted strings, and nested YAML.
+ *   - Standard lib (eemeli/yaml), MIT, actively maintained.
+ *   - Already in the lockfile at the same version → zero install cost.
+ *   - Frontmatter is small and well-formed; YAML parsing cost is negligible.
+ *
+ * The validator runs once per skill at discovery. Discovery is already cached
+ * (`CACHE_TTL_MS = 30_000` in discover-skills.ts) so the validation cost is
+ * bounded regardless of how many skills exist.
+ */
+import * as fs from "node:fs";
+import * as path from "node:path";
+import yaml from "yaml";
+/**
+ * Properties allowed in SKILL.md frontmatter.
+ *
+ * Why this list:
+ *   - `name`, `description` are the contract used by pi-crew's prompt
+ *     rendering (see src/runtime/skill-instructions.ts) and capability
+ *     inventory. Both are HARD-required.
+ *   - `license`, `allowed-tools`, `compatibility`, `version`, `author` are
+ *     common metadata; we surface but don't enforce content beyond type.
+ *   - `metadata` is a free-form key/value bag (mirrors deer-flow `validation.py`).
+ *
+ * Unknown props (e.g. bundled skills' `origin`, `triggers`) are SOFT-warned,
+ * not rejected — see HYBRID policy in the module docstring.
+ */
+export const ALLOWED_SKILL_PROPS = new Set<string>([
+	"name",
+	"description",
+	"license",
+	"allowed-tools",
+	"metadata",
+	"compatibility",
+	"version",
+	"author",
+]);
+/** Hyphen-case name regex (Anthropic Agent Skills spec compatible). */
+const NAME_REGEX = /^[a-z0-9]+(-[a-z0-9]+)*$/;
+const NAME_MAX_LEN = 64;
+const DESCRIPTION_MAX_LEN = 1024;
+const VERSION_REGEX = /^\d+\.\d+(\.\d+)?(-[A-Za-z0-9.-]+)?$/;
+/**
+ * Structured error so callers (capability inventory, logs) can present
+ * actionable diagnostics instead of silently dropping malformed skills.
+ */
+export interface SkillValidationError {
+	/** Absolute path to the skill directory. */
+	path: string;
+	/** Field name (e.g. "name", "description", "<unknown-prop>") or "frontmatter" for parse errors. */
+	field: string;
+	/** Human-readable reason. Safe to surface in capability listings. */
+	reason: string;
+	/** Severity — "error" excludes the skill; "warn" keeps it. */
+	severity: "error" | "warn";
+}
+/**
+ * Validated manifest exposes the parsed frontmatter fields in a typed shape.
+ * Only present for valid skills; undefined for invalid ones.
+ */
+export interface ValidatedSkillManifest {
+	name: string;
+	description: string;
+	license?: string;
+	allowedTools?: string[];
+	metadata?: Record<string, unknown>;
+	compatibility?: string;
+	version?: string;
+	author?: string;
+}
+export interface ValidationResult {
+	ok: boolean;
+	errors: SkillValidationError[];
+	manifest?: ValidatedSkillManifest;
+}
+/**
+ * Parse YAML frontmatter from SKILL.md content.
+ *
+ * Returns an empty object when there is no frontmatter block. Parsing errors
+ * surface as `{ ok: false }` rather than throwing — discovery must remain
+ * exception-safe.
+ */
+export function parseSkillFrontmatter(
+	content: string,
+): { ok: true; data: Record<string, unknown> } | { ok: false; error: string } {
+	const match = /^---\r?\n([\s\S]*?)\r?\n---/.exec(content);
+	if (!match) return { ok: true, data: {} };
+	try {
+		const parsed = yaml.parse(match[1]);
+		if (parsed === null || parsed === undefined) return { ok: true, data: {} };
+		if (typeof parsed !== "object" || Array.isArray(parsed)) {
+			return { ok: false, error: "Frontmatter must be a YAML mapping, not a scalar or list." };
+		}
+		return { ok: true, data: parsed as Record<string, unknown> };
+	} catch (e) {
+		return { ok: false, error: `YAML parse error: ${(e as Error).message}` };
+	}
+}
+function hard(path: string, field: string, reason: string): SkillValidationError {
+	return { path, field, reason, severity: "error" };
+}
+function warn(path: string, field: string, reason: string): SkillValidationError {
+	return { path, field, reason, severity: "warn" };
+}
+/**
+ * Validate a single skill's frontmatter.
+ *
+ * @param skillDir Absolute path to the skill directory (must contain SKILL.md).
+ * @returns ValidationResult — `ok` is true when there are no HARD errors.
+ *          `errors[]` always lists ALL violations (HARD + SOFT).
+ *
+ * Back-compat: when `name` is missing from frontmatter, the validator
+ * DERIVES it from the directory name and emits a SOFT warning. Bundled
+ * pi-crew skills always set `name` explicitly, so the warning is informational.
+ */
+export function validateSkillFrontmatter(skillDir: string): ValidationResult {
+	const errors: SkillValidationError[] = [];
+	const skillMdPath = path.join(skillDir, "SKILL.md");
+	const derivedName = path.basename(skillDir);
+	let content: string;
+	try {
+		content = fs.readFileSync(skillMdPath, "utf-8");
+	} catch (e) {
+		errors.push(hard(skillDir, "SKILL.md", `Cannot read SKILL.md: ${(e as Error).message}`));
+		return { ok: false, errors };
+	}
+	const parsed = parseSkillFrontmatter(content);
+	if (!parsed.ok) {
+		errors.push(hard(skillDir, "frontmatter", parsed.error));
+		return { ok: false, errors };
+	}
+	const data = parsed.data;
+	// No frontmatter at all → back-compat: derive everything from the
+	// directory. Pre-L3 skills shipped without frontmatter and we don't want
+	// to regress them. Surface a SOFT warning so authors know to add YAML.
+	if (Object.keys(data).length === 0) {
+		errors.push(warn(skillDir, "frontmatter", "No frontmatter block; deriving name from directory and leaving description empty."));
+		return {
+			ok: true,
+			errors,
+			manifest: { name: derivedName, description: "" },
+		};
+	}
+	// ── name: HARD if type/length/regex bad; SOFT if missing (derive from dir)
+	const nameRaw = data.name;
+	let resolvedName = derivedName;
+	if (nameRaw === undefined || nameRaw === null) {
+		errors.push(warn(skillDir, "name", `Frontmatter 'name' missing; using directory name "${derivedName}" as fallback. Add explicit 'name' to silence this.`));
+	} else if (typeof nameRaw !== "string") {
+		errors.push(hard(skillDir, "name", `'name' must be a string, got ${typeof nameRaw}.`));
+	} else if (nameRaw.length === 0) {
+		errors.push(hard(skillDir, "name", "'name' is empty."));
+	} else if (nameRaw.length > NAME_MAX_LEN) {
+		errors.push(hard(skillDir, "name", `'name' exceeds ${NAME_MAX_LEN} chars (got ${nameRaw.length}).`));
+	} else if (!NAME_REGEX.test(nameRaw)) {
+		errors.push(hard(skillDir, "name", `'name' must be hyphen-case lowercase (a-z, 0-9, single hyphens); got "${nameRaw}".`));
+	} else {
+		resolvedName = nameRaw;
+	}
+	// ── description: HARD required
+	const descRaw = data.description;
+	if (descRaw === undefined || descRaw === null) {
+		errors.push(hard(skillDir, "description", "Required field 'description' is missing."));
+	} else if (typeof descRaw !== "string") {
+		errors.push(hard(skillDir, "description", `'description' must be a string, got ${typeof descRaw}.`));
+	} else if (descRaw.length > DESCRIPTION_MAX_LEN) {
+		errors.push(hard(skillDir, "description", `'description' exceeds ${DESCRIPTION_MAX_LEN} chars (got ${descRaw.length}).`));
+	} else if (descRaw.includes("<") || descRaw.includes(">")) {
+		errors.push(hard(skillDir, "description", `'description' must not contain '<' or '>' (prompt-safety).`));
+	}
+	// ── OPTIONAL: license
+	if (data.license !== undefined && typeof data.license !== "string") {
+		errors.push(hard(skillDir, "license", `'license' must be a string.`));
+	}
+	// ── OPTIONAL: allowed-tools
+	if (data["allowed-tools"] !== undefined) {
+		const at = data["allowed-tools"];
+		if (!Array.isArray(at) || !at.every((x) => typeof x === "string")) {
+			errors.push(hard(skillDir, "allowed-tools", `'allowed-tools' must be an array of strings.`));
+		}
+	}
+	// ── OPTIONAL: metadata
+	if (data.metadata !== undefined) {
+		const m = data.metadata;
+		if (typeof m !== "object" || m === null || Array.isArray(m)) {
+			errors.push(hard(skillDir, "metadata", `'metadata' must be an object.`));
+		}
+	}
+	// ── OPTIONAL: compatibility
+	if (data.compatibility !== undefined && typeof data.compatibility !== "string") {
+		errors.push(hard(skillDir, "compatibility", `'compatibility' must be a string.`));
+	}
+	// ── OPTIONAL: version
+	if (data.version !== undefined) {
+		if (typeof data.version !== "string" || !VERSION_REGEX.test(data.version)) {
+			errors.push(hard(skillDir, "version", `'version' must be a semver string (e.g. "1.2.3" or "1.2.3-beta.1"); got "${String(data.version)}".`));
+		}
+	}
+	// ── OPTIONAL: author
+	if (data.author !== undefined && typeof data.author !== "string") {
+		errors.push(hard(skillDir, "author", `'author' must be a string.`));
+	}
+	// ── UNKNOWN PROPS: SOFT warn (HYBRID policy)
+	for (const key of Object.keys(data)) {
+		if (!ALLOWED_SKILL_PROPS.has(key)) {
+			errors.push(warn(skillDir, `<unknown-prop:${key}>`, `Unknown property '${key}' is not in the whitelist; keeping for forward-compat.`));
+		}
+	}
+	// ── Result: ok iff no HARD errors
+	const hasHardError = errors.some((e) => e.severity === "error");
+	if (!hasHardError) {
+		const manifest: ValidatedSkillManifest = {
+			name: resolvedName,
+			description: typeof data.description === "string" ? data.description : "",
+		};
+		if (typeof data.license === "string") manifest.license = data.license;
+		if (Array.isArray(data["allowed-tools"])) {
+			manifest.allowedTools = (data["allowed-tools"] as unknown[]).filter((x) => typeof x === "string") as string[];
+		}
+		if (typeof data.metadata === "object" && data.metadata !== null && !Array.isArray(data.metadata)) {
+			manifest.metadata = data.metadata as Record<string, unknown>;
+		}
+		if (typeof data.compatibility === "string") manifest.compatibility = data.compatibility;
+		if (typeof data.version === "string") manifest.version = data.version;
+		if (typeof data.author === "string") manifest.author = data.author;
+		return { ok: true, errors, manifest };
+	}
+	return { ok: false, errors };
+}

package/src/ui/keybinding-map.ts CHANGED Viewed

@@ -1,3 +1,28 @@
+/**
+ * Dashboard keybinding map (L2 refactor: data-driven dispatch).
+ *
+ * Before L2 this module exposed `DASHBOARD_KEYS` (a data table) but dispatched
+ * via a 30-line `if (includes(...)) return "..."` chain — adding a key meant
+ * editing BOTH the table AND the dispatch, a DRY violation. L2 collapses the
+ * dispatch into a single `for (const b of BINDINGS)` loop driven by the
+ * `BINDINGS` table below. `DASHBOARD_KEYS` is retained as the raw key data so
+ * existing imports and the dead-but-intentional `KEY_RESERVED` set keep working.
+ *
+ * Recalibration vs. the original L2 plan: the plan also called for an
+ * `inTextInput` guard to prevent letter-key leaks into TUI text inputs.
+ * Verified during implementation that this is NOT needed — overlays are
+ * mutually exclusive and each has its own `handleInput`. `mailbox-compose-overlay.ts:111`
+ * captures every single-char key via `appendText(data)` and never delegates to
+ * `dashboardActionForKey`, so there is no leak path. Adding the guard would
+ * complicate the API (`run-dashboard.ts:485` has no text-input state to pass)
+ * for zero benefit. The input-guard half of L2 is therefore intentionally
+ * skipped; only the DRY/data-driven dispatch refactor landed.
+ *
+ * Origin pattern: deer-flow `frontend/src/components/workspace/command-palette.tsx:39-50`
+ * drives shortcuts from a single data array consumed by one loop in
+ * `use-global-shortcuts.ts:38-61`.
+ */
 export const DASHBOARD_KEYS = {
 	close: ["q", "\u001b"],
 	select: ["\r", "\n", "s"],
@@ -21,20 +46,23 @@ export const DASHBOARD_KEYS = {
 	notification: { dismissAll: ["H"] },
 } as const;
-/** @internal */
-const KEY_RESERVED = new Set<string>([
-	...DASHBOARD_KEYS.close,
-	...DASHBOARD_KEYS.select,
-	...Object.values(DASHBOARD_KEYS.root).flat(),
-	...Object.values(DASHBOARD_KEYS.pane).flat(),
-	...Object.values(DASHBOARD_KEYS.navigation).flat(),
-	...Object.values(DASHBOARD_KEYS.mailbox).flat(),
-	...Object.values(DASHBOARD_KEYS.health).flat(),
-	...Object.values(DASHBOARD_KEYS.notification).flat(),
-]);
+/**
+ * Pane identifiers that can scope a binding. `undefined` means the binding
+ * fires in every pane.
+ */
+export type ActivePane = "agents" | "progress" | "mailbox" | "output" | "health" | "metrics";
-function includes(values: readonly string[], data: string): boolean {
-	return values.includes(data);
+/**
+ * A single keybinding: the keys that trigger it, the action it produces, and
+ * an optional pane restriction. The dispatch loop returns the FIRST matching
+ * binding, so table ORDER IS SIGNIFICANT and must mirror the old if-chain
+ * precedence (pane-specific overrides before their generic competitors).
+ */
+export interface KeyBinding {
+	readonly keys: readonly string[];
+	readonly action: DashboardKeyAction;
+	/** When set, the binding only fires when `activePane === pane`. */
+	readonly pane?: ActivePane;
 }
 export type DashboardKeyAction =
@@ -65,34 +93,93 @@ export type DashboardKeyAction =
 	| "health-diagnostic-export"
 	| "notifications-dismiss";
-export function dashboardActionForKey(data: string, activePane?: "agents" | "progress" | "mailbox" | "output" | "health" | "metrics"): DashboardKeyAction | undefined {
-	if (includes(DASHBOARD_KEYS.close, data)) return "close";
-	if (activePane === "mailbox" && includes(DASHBOARD_KEYS.mailbox.openDetail, data)) return "mailbox-detail";
-	if (activePane === "health") {
-		if (includes(DASHBOARD_KEYS.health.recovery, data)) return "health-recovery";
-		if (includes(DASHBOARD_KEYS.health.killStale, data)) return "health-kill-stale";
-		if (includes(DASHBOARD_KEYS.health.diagnosticExport, data)) return "health-diagnostic-export";
+/**
+ * The dispatch table. ORDER MATTERS — first match wins.
+ *
+ * Precedence notes (must match the pre-L2 if-chain exactly):
+ *   1. `close` always wins (q / Esc).
+ *   2. `mailbox-detail` (\r, \n) is pane-scoped to mailbox and MUST precede
+ *      `select` (which also binds \r, \n) so Enter opens the detail instead of
+ *      triggering select while in the mailbox pane.
+ *   3. `health-*` are pane-scoped to health.
+ *   4. `notifications-dismiss` (H) is global.
+ *   5. `select`, then the root actions, pane switches, and navigation.
+ *
+ * NOTE: mailbox action keys A/N/C/P/X (ack/nudge/compose/preview/ackAll) are
+ * intentionally NOT in this table. They live in `DASHBOARD_KEYS.mailbox` for
+ * reservation but are handled by the mailbox overlay's own `handleInput`,
+ * not by the dashboard dispatch. Adding them here would change behavior.
+ */
+const BINDINGS: readonly KeyBinding[] = [
+	{ keys: DASHBOARD_KEYS.close, action: "close" },
+	{ keys: DASHBOARD_KEYS.mailbox.openDetail, action: "mailbox-detail", pane: "mailbox" },
+	{ keys: DASHBOARD_KEYS.health.recovery, action: "health-recovery", pane: "health" },
+	{ keys: DASHBOARD_KEYS.health.killStale, action: "health-kill-stale", pane: "health" },
+	{ keys: DASHBOARD_KEYS.health.diagnosticExport, action: "health-diagnostic-export", pane: "health" },
+	{ keys: DASHBOARD_KEYS.notification.dismissAll, action: "notifications-dismiss" },
+	{ keys: DASHBOARD_KEYS.select, action: "select" },
+	{ keys: DASHBOARD_KEYS.root.summary, action: "summary" },
+	{ keys: DASHBOARD_KEYS.root.artifacts, action: "artifacts" },
+	{ keys: DASHBOARD_KEYS.root.api, action: "api" },
+	{ keys: DASHBOARD_KEYS.root.agents, action: "agents" },
+	{ keys: DASHBOARD_KEYS.root.mailbox, action: "mailbox" },
+	{ keys: DASHBOARD_KEYS.root.events, action: "events" },
+	{ keys: DASHBOARD_KEYS.root.output, action: "output" },
+	{ keys: DASHBOARD_KEYS.root.transcript, action: "transcript" },
+	{ keys: DASHBOARD_KEYS.root.liveConversation, action: "live-conversation" },
+	{ keys: DASHBOARD_KEYS.root.reload, action: "reload" },
+	{ keys: DASHBOARD_KEYS.root.progressToggle, action: "progressToggle" },
+	{ keys: DASHBOARD_KEYS.pane.agents, action: "pane-agents" },
+	{ keys: DASHBOARD_KEYS.pane.progress, action: "pane-progress" },
+	{ keys: DASHBOARD_KEYS.pane.mailbox, action: "pane-mailbox" },
+	{ keys: DASHBOARD_KEYS.pane.output, action: "pane-output" },
+	{ keys: DASHBOARD_KEYS.pane.health, action: "pane-health" },
+	{ keys: DASHBOARD_KEYS.pane.metrics, action: "pane-metrics" },
+	{ keys: DASHBOARD_KEYS.navigation.up, action: "up" },
+	{ keys: DASHBOARD_KEYS.navigation.down, action: "down" },
+];
+/**
+ * Reserved keys — every key the dashboard claims, including mailbox/health
+ * action keys that are NOT dispatched here but are handled by their own
+ * overlays. Derived from `DASHBOARD_KEYS` (the full key set) rather than from
+ * `BINDINGS` (the dispatched subset) so overlay-handled keys stay reserved.
+ *
+ * @internal Currently unused outside this module but retained to document
+ * intent and support future callers that need to know which keys the
+ * dashboard ecosystem owns.
+ */
+const KEY_RESERVED = new Set<string>([
+	...DASHBOARD_KEYS.close,
+	...DASHBOARD_KEYS.select,
+	...Object.values(DASHBOARD_KEYS.root).flat(),
+	...Object.values(DASHBOARD_KEYS.pane).flat(),
+	...Object.values(DASHBOARD_KEYS.navigation).flat(),
+	...Object.values(DASHBOARD_KEYS.mailbox).flat(),
+	...Object.values(DASHBOARD_KEYS.health).flat(),
+	...Object.values(DASHBOARD_KEYS.notification).flat(),
+]);
+export { KEY_RESERVED };
+/**
+ * Resolve a raw input `data` string to a dashboard action.
+ *
+ * Data-driven dispatch: iterates `BINDINGS` in order and returns the action of
+ * the first binding whose `keys` contain `data` and whose optional `pane`
+ * restriction matches `activePane`. Behavior is identical to the pre-L2
+ * if-chain (verified by `test/unit/keybinding-map.parity.test.ts`).
+ *
+ * @param data Raw key input (single char or escape sequence).
+ * @param activePane Currently focused pane; pane-scoped bindings only fire
+ *                   when this matches. `undefined` disables all pane-scoped
+ *                   bindings (matching the old behavior where omitting the
+ *                   arg skipped the `activePane === ...` branches).
+ */
+export function dashboardActionForKey(data: string, activePane?: ActivePane): DashboardKeyAction | undefined {
+	for (const binding of BINDINGS) {
+		if (binding.pane !== undefined && binding.pane !== activePane) continue;
+		if (binding.keys.includes(data)) return binding.action;
 	}
-	if (includes(DASHBOARD_KEYS.notification.dismissAll, data)) return "notifications-dismiss";
-	if (includes(DASHBOARD_KEYS.select, data)) return "select";
-	if (includes(DASHBOARD_KEYS.root.summary, data)) return "summary";
-	if (includes(DASHBOARD_KEYS.root.artifacts, data)) return "artifacts";
-	if (includes(DASHBOARD_KEYS.root.api, data)) return "api";
-	if (includes(DASHBOARD_KEYS.root.agents, data)) return "agents";
-	if (includes(DASHBOARD_KEYS.root.mailbox, data)) return "mailbox";
-	if (includes(DASHBOARD_KEYS.root.events, data)) return "events";
-	if (includes(DASHBOARD_KEYS.root.output, data)) return "output";
-	if (includes(DASHBOARD_KEYS.root.transcript, data)) return "transcript";
-	if (includes(DASHBOARD_KEYS.root.liveConversation, data)) return "live-conversation";
-	if (includes(DASHBOARD_KEYS.root.reload, data)) return "reload";
-	if (includes(DASHBOARD_KEYS.root.progressToggle, data)) return "progressToggle";
-	if (includes(DASHBOARD_KEYS.pane.agents, data)) return "pane-agents";
-	if (includes(DASHBOARD_KEYS.pane.progress, data)) return "pane-progress";
-	if (includes(DASHBOARD_KEYS.pane.mailbox, data)) return "pane-mailbox";
-	if (includes(DASHBOARD_KEYS.pane.output, data)) return "pane-output";
-	if (includes(DASHBOARD_KEYS.pane.health, data)) return "pane-health";
-	if (includes(DASHBOARD_KEYS.pane.metrics, data)) return "pane-metrics";
-	if (includes(DASHBOARD_KEYS.navigation.up, data)) return "up";
-	if (includes(DASHBOARD_KEYS.navigation.down, data)) return "down";
 	return undefined;
 }

package/src/ui/run-event-bus.ts CHANGED Viewed

@@ -1,4 +1,5 @@
 import type { TeamEvent } from "../state/event-log.ts";
+import { readEventsCursor } from "../state/event-log.ts";
 export type RunEventType =
 	| "task_started"
@@ -59,6 +60,18 @@ export interface RunEventPayload {
 	timestamp?: string;
 	data?: unknown;
 	channel?: EventChannel;
+	/**
+	 * L1: monotonic sequence from the durable event log
+	 * (`TeamEvent.metadata.seq`). Present on events that originated from a
+	 * logged TeamEvent (via emitFromTeamEvent). Absent on transient live-only
+	 * events (e.g. worker_status from the stream bridge) that are never
+	 * persisted and therefore cannot be replayed or deduped.
+	 *
+	 * Used by onWithReplay() to dedup: a live event with seq <= the last seq
+	 * replayed to a subscriber is suppressed (it was already delivered from
+	 * the durable log).
+	 */
+	seq?: number;
 }
 export type RunEventCallback = (event: RunEventPayload) => void;
@@ -115,6 +128,73 @@ class RunEventBus {
 		};
 	}
+	/**
+	 * L1: subscribe with a catch-up replay from the durable event log.
+	 *
+	 * Closes the transient-subscriber-absence gap: when an overlay/widget is
+	 * disposed and recreated (toggle, reconnect), live events emitted in that
+	 * window are lost as notification triggers. This method replays the
+	 * missed TeamEvents from the durable JSONL log BEFORE attaching the live
+	 * listener, then dedups so events delivered both ways fire exactly once.
+	 *
+	 * Unlike deer-flow's 256-event RAM ring buffer (lost on crash), this uses
+	 * pi-crew's existing durable `readEventsCursor` — O(new bytes) via
+	 * byte-offset incremental reads, monotonic seq, tail-capped. Strictly
+	 * better: survives crashes, bounded memory.
+	 *
+	 * @param runId       Run to subscribe to (live listener scope).
+	 * @param eventsPath  Path to the run's events JSONL (manifest.eventsPath).
+	 * @param lastSeenSeq Last seq the caller processed; events with seq > this
+	 *                    are replayed. Pass 0 to replay everything.
+	 * @param callback    Receives both replayed and live events. Replayed
+	 *                    events are delivered directly (NOT via emit, so no
+	 *                    fan-out to other subscribers).
+	 * @returns unsubscribe handle (detaches the live listener).
+	 */
+	onWithReplay(
+		runId: string,
+		eventsPath: string,
+		lastSeenSeq: number,
+		callback: RunEventCallback,
+	): () => void {
+		// Phase 1: replay missed events from the durable log directly to this
+		// callback. Bounded by limit; readEventsCursor already tail-caps.
+		let maxReplayedSeq = lastSeenSeq;
+		try {
+			const cursor = readEventsCursor(eventsPath, { sinceSeq: lastSeenSeq, limit: 1000 });
+			for (const teamEvent of cursor.events) {
+				const type = teamEventToRunEventType(teamEvent);
+				if (!type) continue; // not all TeamEvents map to a RunEventType
+				const payload: RunEventPayload = {
+					type,
+					runId: teamEvent.runId,
+					taskId: teamEvent.taskId,
+					timestamp: teamEvent.time,
+					data: teamEvent.data,
+					channel: classifyEventChannel(type),
+					seq: teamEvent.metadata?.seq,
+				};
+				try { callback(payload); } catch { /* subscriber errors are non-fatal */ }
+				if (typeof teamEvent.metadata?.seq === "number") {
+					maxReplayedSeq = Math.max(maxReplayedSeq, teamEvent.metadata.seq);
+				}
+			}
+		} catch {
+			// Log read failures are non-fatal — fall through to live-only
+			// subscription. The durable log may not exist yet for a brand-new run.
+		}
+		// Phase 2: attach the live listener with dedup. A live event whose seq
+		// was already replayed (seq <= maxReplayedSeq) is suppressed. Events
+		// without a seq (transient live-only, e.g. worker_status) always
+		// deliver — they are never persisted and thus never replayed.
+		const liveCallback: RunEventCallback = (event) => {
+			if (typeof event.seq === "number" && event.seq <= maxReplayedSeq) return;
+			callback(event);
+		};
+		return this.on(runId, liveCallback);
+	}
 	emit(event: RunEventPayload): void {
 		// Auto-classify channel if not already set.
 		// M2: Use local variable for routing, but also set on event
@@ -206,5 +286,8 @@ export function emitFromTeamEvent(event: TeamEvent): void {
 		taskId: event.taskId,
 		timestamp: event.time,
 		data: event.data,
+		// L1: stamp the durable-log seq so onWithReplay() can dedup live
+		// delivery against replayed events.
+		seq: event.metadata?.seq,
 	});
 }