npm - @os-eco/overstory-cli - Versions diffs - 0.7.3 → 0.7.4 - Mend

@os-eco/overstory-cli 0.7.3 → 0.7.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +20 -8
package/agents/builder.md +6 -0
package/agents/coordinator.md +2 -2
package/agents/lead.md +4 -1
package/agents/merger.md +3 -2
package/agents/monitor.md +1 -1
package/agents/reviewer.md +1 -0
package/agents/scout.md +1 -0
package/package.json +2 -2
package/src/commands/agents.ts +18 -8
package/src/commands/prime.test.ts +1 -0
package/src/commands/prime.ts +1 -16
package/src/index.ts +1 -1
package/src/metrics/pricing.ts +80 -0
package/src/metrics/transcript.test.ts +58 -1
package/src/metrics/transcript.ts +9 -68
package/src/runtimes/pi-guards.test.ts +29 -0
package/src/runtimes/pi-guards.ts +23 -6
package/src/tracker/beads.test.ts +454 -0
package/src/tracker/seeds.test.ts +461 -0

package/README.md CHANGED Viewed

@@ -1,18 +1,21 @@
 # Overstory
-Multi-agent orchestration for Claude Code.
+Multi-agent orchestration for AI coding agents.
 [![npm](https://img.shields.io/npm/v/@os-eco/overstory-cli)](https://www.npmjs.com/package/@os-eco/overstory-cli)
 [![CI](https://github.com/jayminwest/overstory/actions/workflows/ci.yml/badge.svg)](https://github.com/jayminwest/overstory/actions/workflows/ci.yml)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-Overstory turns a single Claude Code session into a multi-agent team by spawning worker agents in git worktrees via tmux, coordinating them through a custom SQLite mail system, and merging their work back with tiered conflict resolution.
+Overstory turns a single coding session into a multi-agent team by spawning worker agents in git worktrees via tmux, coordinating them through a custom SQLite mail system, and merging their work back with tiered conflict resolution. A pluggable `AgentRuntime` interface lets you swap between runtimes — Claude Code, [Pi](https://github.com/nichochar/pi-coding-agent), or your own adapter.
 > **Warning: Agent swarms are not a universal solution.** Do not deploy Overstory without understanding the risks of multi-agent orchestration — compounding error rates, cost amplification, debugging complexity, and merge conflicts are the normal case, not edge cases. Read [STEELMAN.md](STEELMAN.md) for a full risk analysis and the [Agentic Engineering Book](https://github.com/jayminwest/agentic-engineering-book) ([web version](https://jayminwest.com/agentic-engineering-book)) before using this tool in production.
 ## Install
-Requires [Bun](https://bun.sh) v1.0+, [Claude Code](https://docs.anthropic.com/en/docs/claude-code), git, and tmux.
+Requires [Bun](https://bun.sh) v1.0+, git, and tmux. At least one supported agent runtime must be installed:
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (`claude` CLI)
+- [Pi](https://github.com/nichochar/pi-coding-agent) (`pi` CLI)
 ```bash
 bun install -g @os-eco/overstory-cli
@@ -158,11 +161,20 @@ Every command supports `--json` where noted. Global flags: `-q`/`--quiet`, `--ti
 ## Architecture
-Overstory uses CLAUDE.md overlays and PreToolUse hooks to turn Claude Code sessions into orchestrated agents. Each agent runs in an isolated git worktree via tmux. Inter-agent messaging is handled by a custom SQLite mail system (WAL mode, ~1-5ms per query) with typed protocol messages and broadcast support. A FIFO merge queue with 4-tier conflict resolution merges agent branches back to canonical. A tiered watchdog system (Tier 0 mechanical daemon, Tier 1 AI-assisted triage, Tier 2 monitor agent) ensures fleet health. See [CLAUDE.md](CLAUDE.md) for full technical details.
+Overstory uses instruction overlays and tool-call guards to turn agent sessions into orchestrated workers. Each agent runs in an isolated git worktree via tmux. Inter-agent messaging is handled by a custom SQLite mail system (WAL mode, ~1-5ms per query) with typed protocol messages and broadcast support. A FIFO merge queue with 4-tier conflict resolution merges agent branches back to canonical. A tiered watchdog system (Tier 0 mechanical daemon, Tier 1 AI-assisted triage, Tier 2 monitor agent) ensures fleet health. See [CLAUDE.md](CLAUDE.md) for full technical details.
+### Runtime Adapters
+Overstory is runtime-agnostic. The `AgentRuntime` interface (`src/runtimes/types.ts`) defines the contract — each adapter handles spawning, config deployment, guard enforcement, readiness detection, and transcript parsing for its runtime. Set the default in `config.yaml` or override per-agent with `ov sling --runtime <name>`.
+| Runtime | CLI | Guard Mechanism | Status |
+|---------|-----|-----------------|--------|
+| Claude Code | `claude` | `settings.local.json` hooks | Stable |
+| Pi | `pi` | `.pi/extensions/` guard extension | Active development |
 ## How It Works
-CLAUDE.md + hooks + the `ov` CLI turn your Claude Code session into a multi-agent orchestrator. A persistent coordinator agent manages task decomposition and dispatch, while a mechanical watchdog daemon monitors agent health in the background.
+Instruction overlays + tool-call guards + the `ov` CLI turn your coding session into a multi-agent orchestrator. A persistent coordinator agent manages task decomposition and dispatch, while a mechanical watchdog daemon monitors agent health in the background.
 ```
 Coordinator (persistent orchestrator at project root)
@@ -190,10 +202,10 @@ Coordinator (persistent orchestrator at project root)
 - **Worktrees**: Each agent gets an isolated git worktree — no file conflicts between agents
 - **Merge**: FIFO merge queue (SQLite-backed) with 4-tier conflict resolution
 - **Watchdog**: Tiered health monitoring — Tier 0 mechanical daemon (tmux/pid liveness), Tier 1 AI-assisted failure triage, Tier 2 monitor agent for continuous fleet patrol
-- **Tool Enforcement**: PreToolUse hooks mechanically block file modifications for non-implementation agents and dangerous git operations for all agents
+- **Tool Enforcement**: Runtime-specific guards (hooks for Claude Code, extensions for Pi) mechanically block file modifications for non-implementation agents and dangerous git operations for all agents
 - **Task Groups**: Batch coordination with auto-close when all member issues complete
 - **Session Lifecycle**: Checkpoint save/restore for compaction survivability, handoff orchestration for crash recovery
-- **Token Instrumentation**: Session metrics extracted from Claude Code transcript JSONL files
+- **Token Instrumentation**: Session metrics extracted from runtime transcript files (JSONL)
 ## Project Structure
@@ -252,7 +264,7 @@ overstory/
     merge/                        FIFO queue + conflict resolution
     watchdog/                     Tiered health monitoring (daemon, triage, health)
     logging/                      Multi-format logger + sanitizer + reporter + color control + shared theme/format
-    metrics/                      SQLite metrics + transcript parsing
+    metrics/                      SQLite metrics + pricing + transcript parsing
     doctor/                       Health check modules (10 checks)
     insights/                     Session insight analyzer for auto-expertise
     runtimes/                     AgentRuntime abstraction (registry + adapters: Claude, Pi)

package/agents/builder.md CHANGED Viewed

@@ -54,8 +54,10 @@ Your task-specific context (task ID, file scope, spec path, branch name, parent
 5. **Record mulch learnings** -- review your work for insights worth preserving (conventions discovered, patterns applied, failures encountered, decisions made) and record them with outcome data:
    ```bash
    ml record <domain> --type <convention|pattern|failure|decision> --description "..." \
+     --classification <foundational|tactical|observational> \
      --outcome-status success --outcome-agent $OVERSTORY_AGENT_NAME
    ```
+   Classification guide: use `foundational` for stable conventions confirmed across sessions, `tactical` for session-specific patterns (default), `observational` for unverified one-off findings.
    This is a required gate, not optional. Every implementation session produces learnings. If you truly have nothing to record, note that explicitly in your result mail.
 6. Send `worker_done` mail to your parent with structured payload:
    ```bash
@@ -99,6 +101,10 @@ You are an implementation specialist. Given a spec and a set of files you own, y
 ### Expertise
 - **Load context:** `ml prime [domain]` to load domain expertise before implementing
 - **Record patterns:** `ml record <domain>` to capture useful patterns you discover
+- **Classify records:** Always pass `--classification` when recording:
+  - `foundational` — core conventions confirmed across multiple sessions (e.g., "all SQLite DBs use WAL mode")
+  - `tactical` — session-specific patterns useful for similar tasks (default if omitted)
+  - `observational` — one-off findings or unverified hypotheses worth noting
 ## workflow

package/agents/coordinator.md CHANGED Viewed

@@ -145,7 +145,7 @@ Coordinator (you, depth 0)
 ### Expertise
 - **Load context:** `ml prime [domain]` to understand the problem space before planning
-- **Record insights:** `ml record <domain> --type <type> --description "<insight>"` to capture orchestration patterns, dispatch decisions, and failure learnings
+- **Record insights:** `ml record <domain> --type <type> --classification <foundational|tactical|observational> --description "<insight>"` to capture orchestration patterns, dispatch decisions, and failure learnings. Use `foundational` for stable conventions, `tactical` for session-specific patterns, `observational` for unverified findings.
 - **Search knowledge:** `ml search <query>` to find relevant past decisions
 ## workflow
@@ -243,7 +243,7 @@ When a batch is complete (task group auto-closed, all issues resolved):
 1. Verify all issues are closed: run `{{TRACKER_CLI}} show <id>` for each issue in the group.
 2. Verify all branches are merged: check `ov status` for unmerged branches.
 3. Clean up worktrees: `ov worktree clean --completed`.
-4. Record orchestration insights: `ml record <domain> --type <type> --description "<insight>"`.
+4. Record orchestration insights: `ml record <domain> --type <type> --classification <foundational|tactical|observational> --description "<insight>"`.
 5. Report to the human operator: summarize what was accomplished, what was merged, any issues encountered.
 6. Check for follow-up work: `{{TRACKER_CLI}} ready` to see if new issues surfaced during the batch.

package/agents/lead.md CHANGED Viewed

@@ -121,6 +121,7 @@ ov sling <task-id> \
 - **Load domain context:** `ml prime [domain]` to understand the problem space before decomposing
 - **Record patterns:** `ml record <domain>` to capture orchestration insights
 - **Record worker insights:** When worker result mails contain notable findings, record them via `ml record` if they represent reusable patterns or conventions.
+- **Classify records:** Always pass `--classification` when recording. Use `foundational` for core conventions confirmed across sessions, `tactical` for session-specific patterns (default), `observational` for one-off findings.
 ## task-complexity-assessment
@@ -297,8 +298,10 @@ Good decomposition follows these principles:
 3. Run integration tests if applicable: {{QUALITY_GATE_INLINE}}.
 4. **Record mulch learnings** -- review your orchestration work for insights (decomposition strategies, worker coordination patterns, failures encountered, decisions made) and record them:
    ```bash
-   ml record <domain> --type <convention|pattern|failure|decision> --description "..."
+   ml record <domain> --type <convention|pattern|failure|decision> --description "..." \
+     --classification <foundational|tactical|observational>
    ```
+   Classification guide: use `foundational` for stable conventions confirmed across sessions, `tactical` for session-specific patterns (default), `observational` for unverified one-off findings.
    This is required. Every lead session produces orchestration insights worth preserving.
 5. Run `{{TRACKER_CLI}} close <task-id> --reason "<summary of what was accomplished>"`.
 6. Send a `status` mail to the coordinator confirming all subtasks are complete.

package/agents/merger.md CHANGED Viewed

@@ -51,7 +51,8 @@ Your task-specific context (task ID, branches to merge, target branch, merge ord
 {{QUALITY_GATE_STEPS}}
 4. **Record mulch learnings** -- capture merge resolution insights (conflict patterns, resolution strategies, branch integration issues):
    ```bash
-   ml record <domain> --type <convention|pattern|failure> --description "..."
+   ml record <domain> --type <convention|pattern|failure> --description "..." \
+     --classification <foundational|tactical|observational>
    ```
    This is required for non-trivial merges (Tier 2+). Merge resolution patterns are highly reusable knowledge for future mergers. Skip for clean Tier 1 merges with no conflicts.
 5. Send a `result` mail to your parent with: tier used, conflicts resolved (if any), test status.
@@ -92,7 +93,7 @@ You are a branch integration specialist. When workers complete their tasks on se
 ### Expertise
 - **Load context:** `ml prime [domain]` to understand the code being merged
-- **Record patterns:** `ml record <domain>` to capture merge resolution insights
+- **Record patterns:** `ml record <domain> --classification <foundational|tactical|observational>` to capture merge resolution insights. Use `foundational` for stable merge conventions, `tactical` for resolution strategies, `observational` for one-off conflict patterns.
 ## workflow

package/agents/monitor.md CHANGED Viewed

@@ -72,7 +72,7 @@ You are the watchdog's brain. While Tier 0 (mechanical daemon) checks tmux/pid l
 ### Expertise
 - **Load context:** `ml prime [domain]` to understand project patterns
-- **Record insights:** `ml record <domain> --type <type> --description "<insight>"` to capture monitoring patterns, failure signatures, and recovery strategies
+- **Record insights:** `ml record <domain> --type <type> --classification <foundational|tactical|observational> --description "<insight>"` to capture monitoring patterns, failure signatures, and recovery strategies. Use `foundational` for stable monitoring conventions, `tactical` for incident-specific patterns, `observational` for unverified anomaly observations.
 - **Search knowledge:** `ml search <query>` to find relevant past incidents
 ## workflow

package/agents/reviewer.md CHANGED Viewed

@@ -91,6 +91,7 @@ You are a validation specialist. Given code to review, you check it for correctn
 ### Expertise
 - **Load conventions:** `ml prime [domain]` to understand project standards
 - **Surface insights:** Include notable findings (convention violations, code quality patterns) in your result mail so your parent has full context.
+- **Classification guidance for parents:** When including notable findings in your result mail, indicate suggested classification: `foundational` (confirmed stable convention), `tactical` (task-specific pattern), or `observational` (unverified finding). This helps your parent record accurately.
 ## workflow

package/agents/scout.md CHANGED Viewed

@@ -93,6 +93,7 @@ You perform reconnaissance. Given a research question, exploration target, or an
 ### Expertise
 - **Query expertise:** `ml prime [domain]` to load relevant context
 - **Surface insights:** Include notable findings (patterns, conventions, gotchas) in your result mail so your parent has full context for spec writing.
+- **Classification guidance for parents:** When including notable findings in your result mail, indicate suggested classification: `foundational` (confirmed stable convention), `tactical` (task-specific pattern), or `observational` (unverified finding). This helps your parent record accurately.
 ## workflow

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
 	"name": "@os-eco/overstory-cli",
-	"version": "0.7.3",
-	"description": "Multi-agent orchestration for Claude Code — spawn worker agents in git worktrees via tmux, coordinate through SQLite mail, merge with tiered conflict resolution",
+	"version": "0.7.4",
+	"description": "Multi-agent orchestration for AI coding agents — spawn workers in git worktrees via tmux, coordinate through SQLite mail, merge with tiered conflict resolution. Pluggable runtime adapters for Claude Code, Pi, and more.",
 	"author": "Jaymin West",
 	"license": "MIT",
 	"type": "module",

package/src/commands/agents.ts CHANGED Viewed

@@ -29,9 +29,15 @@ export interface DiscoveredAgent {
 	lastActivity: string;
 }
+/** Known instruction file paths, tried in order until one exists. */
+const KNOWN_INSTRUCTION_PATHS = [
+	join(".claude", "CLAUDE.md"), // Claude Code, Pi
+	"AGENTS.md", // Codex (future)
+];
 /**
- * Extract file scope from an agent's overlay CLAUDE.md.
- * Returns empty array if overlay doesn't exist, has no file scope restrictions,
+ * Extract file scope from an agent's overlay instruction file.
+ * Returns empty array if no overlay exists, has no file scope restrictions,
  * or can't be read.
  *
  * @param worktreePath - Absolute path to the agent's worktree
@@ -39,15 +45,19 @@ export interface DiscoveredAgent {
  */
 export async function extractFileScope(worktreePath: string): Promise<string[]> {
 	try {
-		const overlayPath = join(worktreePath, ".claude", "CLAUDE.md");
-		const overlayFile = Bun.file(overlayPath);
-		if (!(await overlayFile.exists())) {
+		let content: string | null = null;
+		for (const relPath of KNOWN_INSTRUCTION_PATHS) {
+			const overlayPath = join(worktreePath, relPath);
+			const overlayFile = Bun.file(overlayPath);
+			if (await overlayFile.exists()) {
+				content = await overlayFile.text();
+				break;
+			}
+		}
+		if (content === null) {
 			return [];
 		}
-		const content = await overlayFile.text();
 		// Find the section between "## File Scope (exclusive ownership)" and "## Expertise"
 		const startMarker = "## File Scope (exclusive ownership)";
 		const endMarker = "## Expertise";

package/src/commands/prime.test.ts CHANGED Viewed

@@ -366,6 +366,7 @@ recentTasks: []
 !hooks.json
 !groups.json
 !agent-defs/
+!README.md
 `;
 		test("creates .overstory/.gitignore if missing", async () => {

package/src/commands/prime.ts CHANGED Viewed

@@ -18,22 +18,7 @@ import { createMulchClient } from "../mulch/client.ts";
 import { openSessionStore } from "../sessions/compat.ts";
 import type { AgentIdentity, AgentManifest, SessionCheckpoint, SessionMetrics } from "../types.ts";
 import { getCurrentSessionName } from "../worktree/tmux.ts";
-/**
- * Gitignore content for .overstory/.gitignore.
- * TODO: Import from init.ts once it's exported (parallel branch change).
- * Wildcard+whitelist pattern: ignore everything except tracked config files.
- */
-const OVERSTORY_GITIGNORE = `# Wildcard+whitelist: ignore everything, whitelist tracked files
-# Auto-healed by ov prime on each session start
-*
-!.gitignore
-!config.yaml
-!agent-manifest.json
-!hooks.json
-!groups.json
-!agent-defs/
-`;
+import { OVERSTORY_GITIGNORE } from "./init.ts";
 export interface PrimeOptions {
 	agent?: string;

package/src/index.ts CHANGED Viewed

@@ -45,7 +45,7 @@ import { OverstoryError, WorktreeError } from "./errors.ts";
 import { jsonError } from "./json.ts";
 import { brand, chalk, muted, setQuiet } from "./logging/color.ts";
-export const VERSION = "0.7.3";
+export const VERSION = "0.7.4";
 const rawArgs = process.argv.slice(2);

package/src/metrics/pricing.ts ADDED Viewed

@@ -0,0 +1,80 @@
+/**
+ * Runtime-agnostic pricing and cost estimation for AI models.
+ *
+ * Extracted from transcript.ts so any runtime can use cost estimation
+ * without pulling in Claude Code-specific JSONL parsing logic.
+ *
+ * To add support for a new provider model, add an entry to MODEL_PRICING
+ * using a lowercase substring that uniquely identifies the model tier
+ * (e.g. "opus", "sonnet", "haiku").
+ */
+/** Canonical token usage representation shared across all runtimes. */
+export interface TokenUsage {
+	inputTokens: number;
+	outputTokens: number;
+	cacheReadTokens: number;
+	cacheCreationTokens: number;
+	modelUsed: string | null;
+}
+/** Pricing per million tokens (USD). */
+export interface ModelPricing {
+	inputPerMTok: number;
+	outputPerMTok: number;
+	cacheReadPerMTok: number;
+	cacheCreationPerMTok: number;
+}
+/** Hardcoded pricing for known Claude models. */
+const MODEL_PRICING: Record<string, ModelPricing> = {
+	opus: {
+		inputPerMTok: 15,
+		outputPerMTok: 75,
+		cacheReadPerMTok: 1.5, // 10% of input
+		cacheCreationPerMTok: 3.75, // 25% of input
+	},
+	sonnet: {
+		inputPerMTok: 3,
+		outputPerMTok: 15,
+		cacheReadPerMTok: 0.3, // 10% of input
+		cacheCreationPerMTok: 0.75, // 25% of input
+	},
+	haiku: {
+		inputPerMTok: 0.8,
+		outputPerMTok: 4,
+		cacheReadPerMTok: 0.08, // 10% of input
+		cacheCreationPerMTok: 0.2, // 25% of input
+	},
+};
+/**
+ * Determine the pricing tier for a given model string.
+ * Matches on substring: "opus" -> opus pricing, "sonnet" -> sonnet, "haiku" -> haiku.
+ * Returns null if unrecognized.
+ */
+export function getPricingForModel(model: string): ModelPricing | null {
+	const lower = model.toLowerCase();
+	if (lower.includes("opus")) return MODEL_PRICING.opus ?? null;
+	if (lower.includes("sonnet")) return MODEL_PRICING.sonnet ?? null;
+	if (lower.includes("haiku")) return MODEL_PRICING.haiku ?? null;
+	return null;
+}
+/**
+ * Calculate the estimated cost in USD for a given usage and model.
+ * Returns null if the model is unrecognized.
+ */
+export function estimateCost(usage: TokenUsage): number | null {
+	if (usage.modelUsed === null) return null;
+	const pricing = getPricingForModel(usage.modelUsed);
+	if (pricing === null) return null;
+	const inputCost = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok;
+	const outputCost = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok;
+	const cacheReadCost = (usage.cacheReadTokens / 1_000_000) * pricing.cacheReadPerMTok;
+	const cacheCreationCost = (usage.cacheCreationTokens / 1_000_000) * pricing.cacheCreationPerMTok;
+	return inputCost + outputCost + cacheReadCost + cacheCreationCost;
+}

package/src/metrics/transcript.test.ts CHANGED Viewed

@@ -1,8 +1,13 @@
 /**
- * Tests for Claude Code transcript JSONL parser.
+ * Tests for Claude Code transcript JSONL parser and pricing.ts module.
  *
  * Uses temp files with real-format JSONL data. No mocks.
  * Philosophy: "never mock what you can use for real" (mx-252b16).
+ *
+ * Coverage:
+ *   - parseTranscriptUsage (transcript.ts)
+ *   - estimateCost re-export (transcript.ts -> pricing.ts)
+ *   - getPricingForModel (pricing.ts)
  */
 import { afterEach, beforeEach, describe, expect, test } from "bun:test";
@@ -10,6 +15,7 @@ import { mkdtemp } from "node:fs/promises";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
 import { cleanupTempDir } from "../test-helpers.ts";
+import { getPricingForModel, estimateCost as pricingEstimateCost } from "./pricing.ts";
 import { estimateCost, parseTranscriptUsage } from "./transcript.ts";
 let tempDir: string;
@@ -354,3 +360,54 @@ describe("estimateCost", () => {
 		}
 	});
 });
+// === getPricingForModel (pricing.ts) ===
+describe("getPricingForModel", () => {
+	test("matches opus substring", () => {
+		const pricing = getPricingForModel("claude-opus-4-6");
+		expect(pricing).not.toBeNull();
+		if (pricing !== null) {
+			expect(pricing.inputPerMTok).toBe(15);
+			expect(pricing.outputPerMTok).toBe(75);
+		}
+	});
+	test("matches sonnet substring", () => {
+		const pricing = getPricingForModel("claude-sonnet-4-20250514");
+		expect(pricing).not.toBeNull();
+		if (pricing !== null) {
+			expect(pricing.inputPerMTok).toBe(3);
+			expect(pricing.outputPerMTok).toBe(15);
+		}
+	});
+	test("matches haiku substring", () => {
+		const pricing = getPricingForModel("claude-haiku-3-5-20241022");
+		expect(pricing).not.toBeNull();
+		if (pricing !== null) {
+			expect(pricing.inputPerMTok).toBe(0.8);
+			expect(pricing.outputPerMTok).toBe(4);
+		}
+	});
+	test("returns null for unknown model", () => {
+		const pricing = getPricingForModel("gpt-4o");
+		expect(pricing).toBeNull();
+	});
+});
+// === re-export parity ===
+describe("estimateCost re-export parity", () => {
+	test("transcript.estimateCost and pricing.estimateCost produce same result", () => {
+		const usage = {
+			inputTokens: 1_000_000,
+			outputTokens: 1_000_000,
+			cacheReadTokens: 1_000_000,
+			cacheCreationTokens: 1_000_000,
+			modelUsed: "claude-opus-4-6",
+		};
+		expect(estimateCost(usage)).toBe(pricingEstimateCost(usage));
+	});
+});

package/src/metrics/transcript.ts CHANGED Viewed

@@ -1,8 +1,12 @@
 /**
  * Parser for Claude Code transcript JSONL files.
  *
- * Extracts token usage data from assistant-type entries in transcript files
- * at ~/.claude/projects/{project-slug}/{session-id}.jsonl.
+ * This is a Claude Code-specific JSONL parser that extracts token usage data
+ * from assistant-type entries in transcript files at
+ * ~/.claude/projects/{project-slug}/{session-id}.jsonl.
+ *
+ * Runtime-agnostic pricing logic lives in ./pricing.ts. Other runtimes
+ * implement their own transcript parsing via AgentRuntime.parseTranscript().
  *
  * Each assistant entry contains per-turn usage:
  * {
@@ -19,74 +23,11 @@
  * }
  */
-export interface TranscriptUsage {
-	inputTokens: number;
-	outputTokens: number;
-	cacheReadTokens: number;
-	cacheCreationTokens: number;
-	modelUsed: string | null;
-}
-/** Pricing per million tokens (USD). */
-interface ModelPricing {
-	inputPerMTok: number;
-	outputPerMTok: number;
-	cacheReadPerMTok: number;
-	cacheCreationPerMTok: number;
-}
-/** Hardcoded pricing for known Claude models. */
-const MODEL_PRICING: Record<string, ModelPricing> = {
-	opus: {
-		inputPerMTok: 15,
-		outputPerMTok: 75,
-		cacheReadPerMTok: 1.5, // 10% of input
-		cacheCreationPerMTok: 3.75, // 25% of input
-	},
-	sonnet: {
-		inputPerMTok: 3,
-		outputPerMTok: 15,
-		cacheReadPerMTok: 0.3, // 10% of input
-		cacheCreationPerMTok: 0.75, // 25% of input
-	},
-	haiku: {
-		inputPerMTok: 0.8,
-		outputPerMTok: 4,
-		cacheReadPerMTok: 0.08, // 10% of input
-		cacheCreationPerMTok: 0.2, // 25% of input
-	},
-};
-/**
- * Determine the pricing tier for a given model string.
- * Matches on substring: "opus" -> opus pricing, "sonnet" -> sonnet, "haiku" -> haiku.
- * Returns null if unrecognized.
- */
-function getPricingForModel(model: string): ModelPricing | null {
-	const lower = model.toLowerCase();
-	if (lower.includes("opus")) return MODEL_PRICING.opus ?? null;
-	if (lower.includes("sonnet")) return MODEL_PRICING.sonnet ?? null;
-	if (lower.includes("haiku")) return MODEL_PRICING.haiku ?? null;
-	return null;
-}
-/**
- * Calculate the estimated cost in USD for a given usage and model.
- * Returns null if the model is unrecognized.
- */
-export function estimateCost(usage: TranscriptUsage): number | null {
-	if (usage.modelUsed === null) return null;
+import type { TokenUsage } from "./pricing.ts";
-	const pricing = getPricingForModel(usage.modelUsed);
-	if (pricing === null) return null;
+export type TranscriptUsage = TokenUsage;
-	const inputCost = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok;
-	const outputCost = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok;
-	const cacheReadCost = (usage.cacheReadTokens / 1_000_000) * pricing.cacheReadPerMTok;
-	const cacheCreationCost = (usage.cacheCreationTokens / 1_000_000) * pricing.cacheCreationPerMTok;
-	return inputCost + outputCost + cacheReadCost + cacheCreationCost;
-}
+export { estimateCost } from "./pricing.ts";
 /**
  * Narrow an unknown value to determine if it looks like a transcript assistant entry.

package/src/runtimes/pi-guards.test.ts CHANGED Viewed

@@ -397,6 +397,35 @@ describe("generatePiGuardExtension", () => {
 			expect(generated).toContain('pi.on("tool_execution_end", async (event) => {');
 			expect(generated).not.toContain('pi.on("tool_execution_end", async (_event) => {');
 		});
+		test('generated code contains pi.on("agent_end", ...)', () => {
+			const generated = generatePiGuardExtension(builderHooks());
+			expect(generated).toContain('pi.on("agent_end",');
+		});
+		test("generated code awaits pi.exec ov log session-end in agent_end handler", () => {
+			const generated = generatePiGuardExtension(builderHooks());
+			// agent_end handler must await (not fire-and-forget) so it completes
+			// before Pi moves on, ensuring the SessionStore is updated.
+			const agentEndIdx = generated.indexOf('pi.on("agent_end"');
+			const sessionShutdownIdx = generated.indexOf('pi.on("session_shutdown"');
+			expect(agentEndIdx).toBeGreaterThan(-1);
+			expect(sessionShutdownIdx).toBeGreaterThan(-1);
+			// agent_end must come before session_shutdown
+			expect(agentEndIdx).toBeLessThan(sessionShutdownIdx);
+			// Extract the agent_end handler body
+			const handlerBody = generated.slice(agentEndIdx, sessionShutdownIdx);
+			expect(handlerBody).toContain(
+				'await pi.exec("ov", ["log", "session-end", "--agent", AGENT_NAME])',
+			);
+		});
+		test("agent_end handler is present for all capabilities", () => {
+			for (const hooks of [builderHooks(), scoutHooks(), coordinatorHooks()]) {
+				const generated = generatePiGuardExtension(hooks);
+				expect(generated).toContain('pi.on("agent_end",');
+			}
+		});
 	});
 	describe("PiRuntime integration", () => {

package/src/runtimes/pi-guards.ts CHANGED Viewed

@@ -8,7 +8,7 @@
 // to prevent tool execution — equivalent to Claude Code's PreToolUse hooks.
 //
 // Activity tracking fires via pi.exec("ov log ...") on tool_call,
-// tool_execution_end, and session_shutdown events so the SessionStore
+// tool_execution_end, agent_end, and session_shutdown events so the SessionStore
 // lastActivity stays fresh and the watchdog does not zombie-classify agents.
 import {
@@ -113,7 +113,11 @@ function toRegExpArrayLiteral(patterns: string[]): string {
  * Activity tracking:
  * - tool_call handler: fire-and-forget "ov log tool-start" to update lastActivity.
  * - tool_execution_end handler: fire-and-forget "ov log tool-end".
- * - session_shutdown handler: awaited "ov log session-end" to mark agent completed.
+ * - agent_end handler: awaited "ov log session-end" — fires when the agentic loop
+ *   completes (task done). Without this, completed Pi agents get watchdog-escalated
+ *   through stalled → nudge → triage → terminate.
+ * - session_shutdown handler: awaited "ov log session-end" — fires on Ctrl+C/SIGTERM.
+ *   Kept as a safety net in case agent_end does not fire (e.g., crash, force-kill).
  *
  * These tracking calls prevent the watchdog from zombie-classifying Pi agents due
  * to stale lastActivity timestamps (the root cause of the zombie state bug).
@@ -190,7 +194,7 @@ export function generatePiGuardExtension(hooks: HooksDef): string {
 		`//`,
 		`// Uses Pi's ExtensionAPI factory style: export default function(pi: ExtensionAPI) { ... }`,
 		`// pi.on("tool_call", ...) returns { block: true, reason } to prevent tool execution.`,
-		`// pi.exec("ov", [...]) calls the overstory CLI for activity tracking.`,
+		`// pi.exec("ov", [...]) calls the overstory CLI for activity tracking and lifecycle.`,
 		`import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";`,
 		``,
 		`const AGENT_NAME = "${agentName}";`,
@@ -331,10 +335,23 @@ export function generatePiGuardExtension(hooks: HooksDef): string {
 		`\t});`,
 		``,
 		`\t/**`,
-		`\t * Session shutdown: log session-end so the agent transitions to "completed" state.`,
+		`\t * Agent end: log session-end when the agentic loop completes (task done).`,
 		`\t *`,
-		`\t * Awaited so it completes before Pi exits. Without this call, the agent stays in`,
-		`\t * "booting" or "working" state forever, requiring manual cleanup or watchdog termination.`,
+		`\t * Awaited so it completes before Pi moves on. Without this handler, completed`,
+		`\t * Pi agents never transition to "completed" state in the SessionStore, causing`,
+		`\t * the watchdog to escalate them through stalled → nudge → triage → terminate.`,
+		`\t *`,
+		`\t * Fires when the agent finishes its work — before session_shutdown.`,
+		`\t */`,
+		`\tpi.on("agent_end", async (_event) => {`,
+		`\t\tawait pi.exec("ov", ["log", "session-end", "--agent", AGENT_NAME]).catch(() => {});`,
+		`\t});`,
+		``,
+		`\t/**`,
+		`\t * Session shutdown: safety-net session-end log for non-graceful exits.`,
+		`\t *`,
+		`\t * Awaited so it completes before Pi exits. Kept as a fallback in case`,
+		`\t * agent_end does not fire (e.g., crash, force-kill, Ctrl+C before task completes).`,
 		`\t *`,
 		`\t * Fires on Ctrl+C, Ctrl+D, or SIGTERM.`,
 		`\t */`,