npm - claude-overnight - Versions diffs - 1.25.19 → 1.25.20 - Mend

claude-overnight 1.25.19 → 1.25.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +35 -15
package/dist/_version.d.ts +1 -1
package/dist/_version.js +1 -1
package/dist/index.js +37 -10
package/dist/planner-query.d.ts +2 -0
package/dist/planner-query.js +115 -8
package/dist/planner.d.ts +4 -4
package/dist/planner.js +11 -11
package/dist/run.js +1 -1
package/dist/steering.d.ts +1 -1
package/dist/steering.js +3 -3
package/dist/swarm.d.ts +3 -0
package/dist/swarm.js +38 -3
package/dist/transcripts.d.ts +5 -0
package/dist/transcripts.js +38 -0
package/package.json +2 -2
package/plugins/claude-overnight/.claude-plugin/plugin.json +1 -1
package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +7 -3

package/README.md CHANGED Viewed

@@ -4,14 +4,14 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
 Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
-Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Pair any planner (Opus, Sonnet) with any executor — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
+Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits, verified by the worker next wave). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
 ## Run on Qwen 3.6 Plus
-Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in executor that speaks the Anthropic Messages API  -- same client, same flow, pennies per run.
+Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in worker that speaks the Anthropic Messages API  -- same client, same flow, pennies per run.
 1. **Get an API key.** Sign up at [Alibaba Cloud](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fmodelstudio.console.alibabacloud.com%2Fap-southeast-1%3Ftab%3Ddashboard%23%2Fapi-key&clearRedirectCookie=1)  -- the link takes you straight to the API key dashboard.
-2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the executor step, and fill in:
+2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the worker step, and fill in:
    | Field | Value |
    |---|---|
@@ -20,7 +20,7 @@ Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Al
    | Model id | `qwen3.6-plus` |
    | API key | your DashScope key |
-3. That's it. Planner runs on Sonnet (or Opus), executor runs on Qwen.
+3. That's it. Planner runs on Sonnet (or Opus), worker runs on Qwen.
 Or set it via env directly:
@@ -33,7 +33,7 @@ claude-overnight
 ## Run via Cursor API Proxy
-Use Cursor's model gateway as an executor -- `auto` (delegates to best available), `composer`, or `composer-2` models. Runs locally through a proxy that speaks the Anthropic Messages API, so it's a drop-in replacement for any other provider.
+Use Cursor's model gateway as a worker -- `auto` (delegates to best available), `composer`, or `composer-2` models. Runs locally through a proxy that speaks the Anthropic Messages API, so it's a drop-in replacement for any other provider.
 ### macOS: Cursor agent shell patch
@@ -130,7 +130,7 @@ claude-overnight
   ● Opus  -- Opus 4.6 · Most capable
   ○ Sonnet  -- Sonnet 4.6 · Best for everyday tasks
-⑤ Executor model (what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
+⑤ Worker model (what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
   ● Sonnet  -- Sonnet 4.6 · Best for everyday tasks
   ○ Opus  -- Opus 4.6 · Most capable
   ○ Other… · custom OpenAI/Anthropic-compatible endpoint
@@ -211,9 +211,15 @@ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever over
 .claude-overnight/
   runs/
     2026-04-04T18-52-49/     ← run A (done, $200, 200 tasks)
-      run.json, status.md, goal.md, milestones/, sessions/
-    2026-04-05T10-30-00/     ← run B (crashed)
-      run.json, sessions/
+      run.json          ← full resume state (models, budget, wave history)
+      status.md, goal.md, themes.md
+      designs/          ← per-focus research docs from the thinking wave
+      tasks.json        ← the plan the swarm is executing
+      transcripts/      ← NDJSON per planner query: themes, orchestrate, steer-wave-N, ...
+      steering/         ← steering decisions per wave
+      milestones/, sessions/
+    2026-04-05T10-30-00/     ← run B (crashed mid-planning)
+      run.json, transcripts/themes.ndjson   ← see exactly what the planner was doing
 ```
 Any run that stops before the steering system declares the objective complete  -- capped at usage limit, Ctrl+C, crash, rate limit timeout, steering failure  -- is automatically resumable:
@@ -243,6 +249,20 @@ If the thinking phase succeeds but orchestration crashes, the next run detects t
 **Knowledge carries forward**  -- new runs inherit knowledge from completed previous runs. Thinking sessions and steering see what past runs built. Run 2 knows run 1 already built the auth system.
+### Transcripts and streaming
+Every planner/steering query streams through the Agent SDK with `includePartialMessages: true`, so tool calls, thinking, and text deltas are captured as they happen. Each query also appends an NDJSON transcript under `runs/<ts>/transcripts/<name>.ndjson` — so if the planner crashes mid-think you still have the forensic trail (prompt preview, every tool use, every text/thinking delta, rate-limit events, and the final result or error). `themes.md` is also written as a human-readable summary right after the thinking wave.
+Not every provider delivers the same streaming granularity:
+| Provider | Tool-use events | Thinking deltas | Text deltas |
+| --- | --- | --- | --- |
+| Anthropic (direct) | ✓ | ✓ | ✓ |
+| Cursor proxy (`cursor-composer-in-claude`) | — | — | ✓ (final answer only) |
+| Qwen / OpenRouter / custom Anthropic-compatible | depends on upstream | depends | usually ✓ |
+When a provider doesn't stream partials (or the model is a reasoning model on the Cursor proxy — the proxy suppresses the thinking phase and only emits the final answer), the ticker shows elapsed time with no live text, then the completed result lands in one go. The UI, transcripts, and the resume flow all behave identically either way — streaming is used when available, never required.
 Add `.claude-overnight/` to your `.gitignore` (with the trailing slash  -- see below).
 A separate, tiny `claude-overnight.log.md` is also written at the repo root on every run. It's human-readable, append-only, one block per run (objective, start/finish, cost, outcome, branch), and is designed to be **committed**  -- so even after `.claude-overnight/` is cleaned up you can still recover which prompt produced which commits. Use `.claude-overnight/` (with trailing slash) in your gitignore so this file isn't matched by accident.
@@ -289,7 +309,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
 |---|---|---|
 | `--budget=N` | `10` | Total agent sessions |
 | `--concurrency=N` | `5` | Parallel agents |
-| `--model=NAME` | prompted | Worker model  -- interactive picks planner + executor separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
+| `--model=NAME` | prompted | Worker model  -- interactive picks planner + worker separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
 | `--usage-cap=N` | unlimited | Stop at N% utilization |
 | `--allow-extra-usage` | off | Allow extra/overage usage (billed separately) |
 | `--extra-usage-budget=N` |  -- | Max $ for extra usage (implies --allow-extra-usage) |
@@ -313,12 +333,12 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
 ## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
-Planner and executor are picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of execution.
+Planner, worker, and optional fast model are each picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work.
-From the interactive picker, choose `Other…` on the planner or executor step:
+From the interactive picker, choose `Other…` on the planner, worker, or fast step:
 ```
-⑤ Executor model (what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
+⑤ Worker model (what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
   ○ Sonnet
   ○ Opus
   ● Other…
@@ -333,9 +353,9 @@ From the interactive picker, choose `Other…` on the planner or executor step:
 Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
-**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, executor queries use the executor provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
+**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
-**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ executor preflight failed: ...` instead of N scattered mid-run errors.
+**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.
 **Resume.** Provider ids are persisted in `run.json` and rehydrated on resume. If you deleted a provider between runs, resume refuses to start and tells you exactly which id is missing.

package/dist/_version.d.ts CHANGED Viewed

	@@ -1 +1 @@
1	- export declare const VERSION = "1.25.19";
1	+ export declare const VERSION = "1.25.20";

package/dist/_version.js CHANGED Viewed

@@ -1,2 +1,2 @@
 // Auto-generated by build — do not edit manually.
-export const VERSION = "1.25.19";
+export const VERSION = "1.25.20";

package/dist/index.js CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env node
-import { readFileSync, existsSync, readdirSync, mkdirSync } from "fs";
+import { readFileSync, existsSync, readdirSync, mkdirSync, writeFileSync } from "fs";
 import { resolve, dirname, join } from "path";
 import { fileURLToPath } from "url";
 import chalk from "chalk";
@@ -9,6 +9,7 @@ import { Swarm } from "./swarm.js";
 import { planTasks, refinePlan, identifyThemes, buildThinkingTasks, orchestrate, salvageFromFile } from "./planner.js";
 import { modelDisplayName, formatContextWindow, DEFAULT_MODEL } from "./models.js";
 import { setPlannerEnvResolver } from "./planner-query.js";
+import { setTranscriptRunDir } from "./transcripts.js";
 import { pickModel, loadProviders, preflightProvider, buildEnvResolver, healthCheckCursorProxy, PROXY_DEFAULT_URL, isCursorProxyProvider, readCursorProxyLogTail, ensureCursorProxyRunning, bundledComposerProxyShellCommand, warnMacCursorAgentShellPatchIfNeeded, hasCursorAgentToken, } from "./providers.js";
 import { RunDisplay } from "./ui.js";
 import { renderSummary } from "./render.js";
@@ -72,10 +73,17 @@ async function promptResumeOverrides(state, cliFlags, argv, noTTY, runDir) {
         const extraStr = state.allowExtraUsage
             ? (state.extraUsageBudget ? `$${state.extraUsageBudget}` : "unlimited")
             : "off";
+        const modelLine = (label, m) => m ? `  ${chalk.dim(label.padEnd(11))}${chalk.white(m)} ${chalk.dim(`(${formatContextWindow(m)} context)`)}` : null;
         console.log();
         console.log(`  ${chalk.dim("Resume settings")}`);
         console.log(`  ${chalk.dim("─".repeat(40))}`);
-        console.log(`  ${chalk.dim("model      ")}${chalk.white(state.workerModel)} ${chalk.dim(`(${formatContextWindow(state.workerModel)} context)`)}`);
+        const lines = [
+            modelLine("planner", state.plannerModel),
+            modelLine("worker", state.workerModel),
+            modelLine("fast", state.fastModel),
+        ].filter(Boolean);
+        for (const l of lines)
+            console.log(l);
         console.log(`  ${chalk.dim("remaining  ")}${chalk.white(String(remaining))} ${chalk.dim("sessions")}`);
         console.log(`  ${chalk.dim("concur     ")}${chalk.white(String(state.concurrency))}`);
         console.log(`  ${chalk.dim("usage cap  ")}${chalk.white(capStr)}`);
@@ -185,7 +193,7 @@ async function main() {
     --dry-run              Show planned tasks without running them
     --budget=N             Target number of agent runs ${chalk.dim("(default: 10)")}
     --concurrency=N        Max parallel agents ${chalk.dim("(default: 5)")}
-    --model=NAME           Worker model override ${chalk.dim("(interactive mode picks planner + executor separately  -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
+    --model=NAME           Worker model override ${chalk.dim("(interactive mode picks planner + worker separately  -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
     --fast-model=NAME      Fast model for quick tasks ${chalk.dim("(optional  -- checked by worker model in next wave)")}
     --usage-cap=N          Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
     --allow-extra-usage    Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
@@ -472,8 +480,11 @@ async function main() {
                         const flexNote = `This is wave 1 of an adaptive multi-wave run (total budget: ${remainingBudget}). Plan the highest-impact foundational work first. Future waves will iterate based on what's learned.`;
                         console.log(chalk.cyan(`\n  ◆ Re-orchestrating plan from existing designs...\n`));
                         process.stdout.write("\x1B[?25l");
+                        // Route transcripts into the resumed run so this call's events
+                        // land alongside the prior run's planning trail.
+                        setTranscriptRunDir(resumeRunDir);
                         try {
-                            const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"));
+                            const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"), "orchestrate-resume");
                             resumeState.currentTasks = orchTasks;
                             process.stdout.write(`\x1B[2K\r  ${chalk.green(`✓ ${orchTasks.length} tasks`)}\n`);
                         }
@@ -588,7 +599,7 @@ async function main() {
         const plannerPick = await pickModel(`${chalk.cyan("④")} Planner model ${chalk.dim("(thinking, steering  -- use your strongest)")}:`, models);
         plannerModel = plannerPick.model;
         plannerProvider = plannerPick.provider;
-        const workerPick = await pickModel(`${chalk.cyan("⑤")} Executor model ${chalk.dim("(what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…)")}:`, models);
+        const workerPick = await pickModel(`${chalk.cyan("⑤")} Worker model ${chalk.dim("(what runs the tasks  -- Qwen 3.6 Plus / OpenRouter / etc via Other…)")}:`, models);
         workerModel = workerPick.model;
         workerProvider = workerPick.provider;
         // ⑤b Optional fast model for quick tasks that will be verified
@@ -782,7 +793,7 @@ async function main() {
         const seen = new Set();
         const all = [
             ["planner", plannerProvider],
-            ["executor", workerProvider],
+            ["worker", workerProvider],
             ["fast", fastProvider],
         ];
         const pending = [];
@@ -855,6 +866,10 @@ async function main() {
     const runDir = resuming && resumeRunDir ? resumeRunDir : (orphanedDir ?? createRunDir(rootDir));
     if (resuming && resumeRunDir)
         updateLatestSymlink(rootDir, resumeRunDir);
+    // Route all planner/steering stream events to <runDir>/transcripts/*.ndjson
+    // so crashes during planning leave a forensic trail and resumes can inspect
+    // what the planner was doing mid-flight. See src/transcripts.ts.
+    setTranscriptRunDir(runDir);
     const previousKnowledge = readPreviousRunKnowledge(rootDir);
     const needsPlan = tasks.length === 0 && (!resuming || replanFromScratch);
     const designDir = join(runDir, "designs");
@@ -867,8 +882,9 @@ async function main() {
             saveRunState(runDir, {
                 id: runDir.split(/[/\\]/).pop() ?? "",
                 objective, budget: budget ?? 10, remaining: budget ?? 10,
-                workerModel, plannerModel,
+                workerModel, plannerModel, fastModel,
                 workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
+                fastProviderId: fastProvider?.id,
                 concurrency, permissionMode,
                 usageCap, allowExtraUsage, extraUsageBudget,
                 flex, useWorktrees, mergeStrategy,
@@ -894,7 +910,16 @@ async function main() {
         const thinkingCount = useThinking ? Math.min(Math.max(concurrency, Math.ceil((budget ?? 10) * 0.005)), 10) : 0;
         try {
             if (useThinking) {
-                let themes = await identifyThemes(objective, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog());
+                // Persist themes as a Markdown doc so a planning-phase crash leaves a
+                // readable record (and a future resume can skip identifyThemes).
+                const saveThemesMd = (list) => {
+                    try {
+                        writeFileSync(join(runDir, "themes.md"), `# Themes\n\n**Objective:** ${objective}\n\n${list.map((t, i) => `${i + 1}. ${t}`).join("\n")}\n`, "utf-8");
+                    }
+                    catch { }
+                };
+                let themes = await identifyThemes(objective, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes");
+                saveThemesMd(themes);
                 process.stdout.write(`\x1B[2K\r  ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
                 planRestore();
                 let reviewing = true;
@@ -913,7 +938,8 @@ async function main() {
                             continue;
                         process.stdout.write("\x1B[?25l");
                         try {
-                            themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog());
+                            themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes-refine");
+                            saveThemesMd(themes);
                             process.stdout.write(`\x1B[2K\r  ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
                         }
                         catch (err) {
@@ -990,8 +1016,9 @@ async function main() {
                             saveRunState(runDir, {
                                 id: runDir.split(/[/\\]/).pop() ?? "",
                                 objective: objective, budget: budget ?? 10, remaining: (budget ?? 10) - thinkingUsed,
-                                workerModel, plannerModel,
+                                workerModel, plannerModel, fastModel,
                                 workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
+                                fastProviderId: fastProvider?.id,
                                 concurrency, permissionMode,
                                 usageCap, allowExtraUsage, extraUsageBudget,
                                 flex, useWorktrees, mergeStrategy,

package/dist/planner-query.d.ts CHANGED Viewed

@@ -23,6 +23,8 @@ export interface PlannerOpts {
         type: "json_schema";
         schema: Record<string, unknown>;
     };
+    /** When set, stream events are appended to <runDir>/transcripts/<name>.ndjson */
+    transcriptName?: string;
 }
 export declare function setPlannerEnvResolver(fn: ((model?: string) => Record<string, string> | undefined) | undefined): void;
 export declare function getTotalPlannerCost(): number;

package/dist/planner-query.js CHANGED Viewed

@@ -1,6 +1,7 @@
 import { query } from "@anthropic-ai/claude-agent-sdk";
 import { readFileSync } from "fs";
 import { NudgeError } from "./types.js";
+import { writeTranscriptEvent } from "./transcripts.js";
 // ── Shared env resolver (set once at run start, used by every planner query) ──
 //
 // Swarm and planner calls share a model→env map so a custom provider configured
@@ -63,6 +64,22 @@ async function throttlePlanner(onLog, aborted) {
     }
     // Exhausted backoffs — proceed anyway, the retry loop will catch a rejection.
 }
+/**
+ * Pick a short, human-readable target for a tool invocation (Read/Grep/Bash/…).
+ * Prefers explicit file paths; falls back to the first few tokens of a shell
+ * command. Returns `""` when the input has no useful identifier.
+ */
+function extractToolTarget(input) {
+    if (!input)
+        return "";
+    const p = input.path ?? input.file_path ?? input.pattern;
+    if (typeof p === "string" && p)
+        return p;
+    if (typeof input.command === "string" && input.command) {
+        return input.command.split(" ").slice(0, 3).join(" ");
+    }
+    return "";
+}
 // ── Query execution ──
 const NUDGE_MS = 15 * 60 * 1000;
 const HARD_TIMEOUT_MS = 30 * 60 * 1000;
@@ -110,6 +127,17 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
     const startedAt = Date.now();
     const isResume = !!opts.resumeSessionId;
     const envOverride = _envResolver?.(opts.model);
+    const tname = opts.transcriptName;
+    if (tname) {
+        writeTranscriptEvent(tname, {
+            kind: "session_start",
+            model: opts.model,
+            isResume,
+            resumeSessionId: opts.resumeSessionId,
+            promptPreview: prompt.slice(0, 2000),
+            promptBytes: prompt.length,
+        });
+    }
     const pq = query({
         prompt,
         options: {
@@ -167,6 +195,18 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
         };
         timer = setTimeout(check, timeoutMs);
     });
+    // Tool-use blocks can arrive in two shapes:
+    //  (a) content_block_start carries the full `input` (native Anthropic non-partial)
+    //  (b) content_block_start carries `input: {}` and the JSON is streamed via
+    //      input_json_delta frames (Anthropic streaming spec, cursor-composer-in-claude v0.9+).
+    // Track the open tool block so we can re-log with the enriched target once
+    // the input arrives, and write a complete transcript entry on block stop.
+    let pendingTool = null;
+    const logTool = (name, input) => {
+        const target = extractToolTarget(input);
+        lastLogText = target ? `${name} ${target}` : name;
+        onLog(target ? `${name} → ${target}` : name, "event");
+    };
     const consume = async () => {
         for await (const msg of pq) {
             lastActivity = Date.now();
@@ -178,21 +218,34 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
                     const cb = ev.content_block;
                     if (cb?.type === "tool_use") {
                         toolCount++;
-                        const toolName = cb.name;
-                        const input = cb.input;
-                        // Enrich event with target file/path for readability
-                        const target = input?.path ?? input?.file_path ?? input?.command
-                            ? (typeof input?.command === "string" ? input.command.split(" ").slice(0, 3).join(" ") : "")
-                            : "";
-                        lastLogText = target ? `${toolName} ${target}` : toolName;
-                        onLog(target ? `${toolName} → ${target}` : toolName, "event");
+                        const input = (cb.input ?? {});
+                        const hasInput = Object.keys(input).length > 0;
+                        pendingTool = {
+                            index: ev.index ?? 0,
+                            name: cb.name,
+                            id: cb.id,
+                            input,
+                            buf: "",
+                            logged: hasInput,
+                        };
+                        if (hasInput) {
+                            logTool(cb.name, input);
+                            if (tname)
+                                writeTranscriptEvent(tname, { kind: "tool_use", tool: cb.name, input });
+                        }
                     }
                     else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
                         lastLogText = "thinking…";
+                        if (tname)
+                            writeTranscriptEvent(tname, { kind: "thinking_start" });
                     }
                 }
                 if (ev?.type === "content_block_delta") {
                     const delta = ev.delta;
+                    if (delta?.type === "input_json_delta" && pendingTool && typeof delta.partial_json === "string") {
+                        pendingTool.buf += delta.partial_json;
+                        continue;
+                    }
                     // thinking_delta carries reasoning text under `delta.thinking`;
                     // text_delta carries final-answer text under `delta.text`.
                     const raw = delta?.type === "text_delta" ? delta.text
@@ -202,7 +255,23 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
                         const snippet = raw.trim().replace(/[{}"\\,[\]]+/g, " ").replace(/\s+/g, " ").trim();
                         if (snippet.length > 5)
                             lastLogText = snippet.slice(-60);
+                        if (tname)
+                            writeTranscriptEvent(tname, { kind: delta.type, text: raw });
+                    }
+                }
+                if (ev?.type === "content_block_stop" && pendingTool) {
+                    if (!pendingTool.logged && pendingTool.buf) {
+                        try {
+                            pendingTool.input = JSON.parse(pendingTool.buf);
+                        }
+                        catch { }
+                    }
+                    if (!pendingTool.logged) {
+                        logTool(pendingTool.name, pendingTool.input);
+                        if (tname)
+                            writeTranscriptEvent(tname, { kind: "tool_use", tool: pendingTool.name, input: pendingTool.input });
                     }
+                    pendingTool = null;
                 }
             }
             if (msg.type === "rate_limit_event") {
@@ -222,6 +291,15 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
                             resetsAt: info.resetsAt,
                         });
                     }
+                    if (tname)
+                        writeTranscriptEvent(tname, {
+                            kind: "rate_limit",
+                            utilization: info.utilization ?? 0,
+                            status: info.status,
+                            rateLimitType: info.rateLimitType,
+                            resetsAt: info.resetsAt,
+                            isUsingOverage: !!info.isUsingOverage,
+                        });
                 }
             }
             if (msg.type === "result") {
@@ -234,8 +312,27 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
                 if (msg.subtype === "success") {
                     structuredOutput = r.structured_output;
                     resultText = r.result || "";
+                    if (tname)
+                        writeTranscriptEvent(tname, {
+                            kind: "result",
+                            subtype: "success",
+                            costUsd,
+                            durationMs: Date.now() - startedAt,
+                            toolCount,
+                            resultPreview: typeof resultText === "string" ? resultText.slice(0, 4000) : undefined,
+                            hasStructuredOutput: structuredOutput != null,
+                        });
                 }
                 else {
+                    if (tname)
+                        writeTranscriptEvent(tname, {
+                            kind: "result",
+                            subtype: msg.subtype,
+                            costUsd,
+                            durationMs: Date.now() - startedAt,
+                            toolCount,
+                            error: r.result,
+                        });
                     throw new Error(`Planner failed: ${r.result || msg.subtype}`);
                 }
             }
@@ -244,6 +341,16 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
     try {
         await Promise.race([consume(), watchdog]);
     }
+    catch (err) {
+        if (tname)
+            writeTranscriptEvent(tname, {
+                kind: "error",
+                message: err instanceof Error ? err.message : String(err),
+                durationMs: Date.now() - startedAt,
+                toolCount,
+            });
+        throw err;
+    }
     finally {
         clearTimeout(timer);
         clearInterval(ticker);

package/dist/planner.d.ts CHANGED Viewed

@@ -1,8 +1,8 @@
 import type { Task, PermMode } from "./types.js";
 export declare function salvageFromFile(outFile: string | undefined, budget: number | undefined, onLog: (text: string, kind?: "status" | "event") => void, why: string): Task[] | null;
 export declare const DESIGN_THINKING = "\nHOW TO THINK ABOUT EVERY TASK:\n\nStart from the user's job. What is someone hiring this product to do? \"I need to send money abroad cheaply\"  -- not \"I need a currency conversion API.\" Every decision  -- what to build, how fast it needs to respond, what happens on error  -- flows from the job.\n\nThe experience IS the product. A 200ms server response is not a \"performance metric\"  -- it's the difference between an app that feels alive and one that feels broken. A loading state is not \"polish\"  -- it's the user knowing the app heard them. An error message is not \"error handling\"  -- it's the app being honest. There is no line between backend and UX. The server, the API, the database query, the render  -- they're all one experience the user either trusts or doesn't.\n\nBuild the core, verify it works, learn, iterate. Don't plan 20 features and build them all. Build the ONE thing that matters most, run it, see if it actually works from a user's chair. What you learn from seeing it run will change what you build next. Each wave should make what exists better before adding what doesn't exist yet.\n\nConsistency is what makes complex things feel simple. One design system, rigid rules, no exceptions. This is how Revolut ships a super-app with 30+ features that doesn't feel like chaos.\n";
-export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
-export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void): Promise<string[]>;
+export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
+export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void, transcriptName?: string): Promise<string[]>;
 export declare function buildThinkingTasks(objective: string, themes: string[], designDir: string, plannerModel: string, previousKnowledge?: string): Task[];
-export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
-export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void): Promise<Task[]>;
+export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
+export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, transcriptName?: string): Promise<Task[]>;

package/dist/planner.js CHANGED Viewed

@@ -152,13 +152,13 @@ Respond with ONLY a JSON object (no markdown fences):
 }`;
 }
 // ── Planning functions ──
-export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
+export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "plan") {
     onLog("Analyzing codebase...");
     const prompt = plannerPrompt(objective, workerModel, budget, concurrency, flexNote);
     const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
     let resultText;
     try {
-        resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+        resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
     }
     catch (err) {
         const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
@@ -168,7 +168,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
     }
     const parsed = await extractTaskJson(resultText, async () => {
         onLog("Retrying...");
-        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
     }, onLog, outFile);
     let tasks = (parsed.tasks || []).map((t, i) => ({
         id: String(i), prompt: typeof t === "string" ? t : t.prompt,
@@ -179,7 +179,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
     onLog(`${tasks.length} tasks`);
     return tasks;
 }
-export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }) {
+export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }, transcriptName = "themes") {
     const resultText = await runPlannerQuery(`You are picking ${count} research angles for architects who will deeply explore a codebase next.
 First do a BRIEF recon (3-6 tool calls max, don't go deep): read package.json and README if present, glob the top-level directory, peek at one or two config files that reveal the stack. You are learning what this codebase actually IS -- not solving anything.
@@ -188,7 +188,7 @@ Then pick ${count} angles that carve up THIS specific codebase orthogonally. Pre
 Objective: ${objective}
-Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA }, onLog);
+Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA, transcriptName }, onLog);
     const parsed = attemptJsonParse(resultText);
     if (parsed?.themes && Array.isArray(parsed.themes))
         return parsed.themes.slice(0, count);
@@ -229,7 +229,7 @@ Be thorough  -- your findings drive the execution plan.`,
         model: plannerModel,
     }));
 }
-export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
+export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "orchestrate") {
     const constraint = contextConstraintNote(workerModel);
     const flexLine = flexNote ? `\n\n${flexNote}` : "";
     const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
@@ -259,7 +259,7 @@ Respond with ONLY a JSON object (no markdown fences):
     onLog("Synthesizing...");
     let resultText;
     try {
-        resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+        resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
     }
     catch (err) {
         const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
@@ -269,7 +269,7 @@ Respond with ONLY a JSON object (no markdown fences):
     }
     const parsed = await extractTaskJson(resultText, async () => {
         onLog("Retrying...");
-        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
     }, onLog, outFile);
     let tasks = (parsed.tasks || []).map((t, i) => ({
         id: String(i), prompt: typeof t === "string" ? t : t.prompt,
@@ -280,7 +280,7 @@ Respond with ONLY a JSON object (no markdown fences):
     onLog(`${tasks.length} tasks`);
     return tasks;
 }
-export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog) {
+export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, transcriptName = "refine") {
     onLog("Refining plan...");
     const prev = previousTasks.map((t, i) => `${i + 1}. ${t.prompt}`).join("\n");
     const constraint = contextConstraintNote(workerModel);
@@ -303,10 +303,10 @@ ${scaleNote} ${concurrency} agents run in parallel. Update the plan accordingly.
 Respond with ONLY a JSON object (no markdown):
 {"tasks":[{"prompt":"..."}]}`;
-    const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+    const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
     const parsed = await extractTaskJson(resultText, async () => {
         onLog("Retrying...");
-        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
+        return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
     }, onLog);
     let tasks = (parsed.tasks || []).map((t, i) => ({
         id: String(i), prompt: typeof t === "string" ? t : t.prompt,

package/dist/run.js CHANGED Viewed

@@ -272,7 +272,7 @@ export async function executeRun(cfg) {
                 const appliedGuidance = memory.userGuidance;
                 if (appliedGuidance)
                     display.appendSteeringEvent(`User directives applied: ${appliedGuidance.slice(0, 80)}`);
-                const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory);
+                const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory, `steer-wave-${waveNum}-attempt-${steerAttempts}`);
                 accCost += getTotalPlannerCost() - plannerCostBefore;
                 syncRunInfo();
                 if (steer.statusUpdate)

package/dist/steering.d.ts CHANGED Viewed

@@ -1,3 +1,3 @@
 import type { PermMode, SteerResult, RunMemory, WaveSummary } from "./types.js";
 import { type PlannerLog } from "./planner-query.js";
-export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory): Promise<SteerResult>;
+export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory, transcriptName?: string): Promise<SteerResult>;

package/dist/steering.js CHANGED Viewed

@@ -23,7 +23,7 @@ const STEER_SCHEMA = {
         required: ["done", "tasks", "reasoning", "statusUpdate", "estimatedSessionsRemaining"],
     },
 };
-export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory) {
+export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
     const constraint = contextConstraintNote(workerModel);
     const recentWaves = history.slice(-3);
     const recentText = recentWaves.length > 0 ? recentWaves.map(w => {
@@ -114,14 +114,14 @@ Set "noWorktree": true for verify/user-test tasks  -- they need the real project
 If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSessionsRemaining": 0, "tasks": []}`;
     onLog("Assessing...", "status");
     onLog(`Reading codebase  -- wave ${history.length + 1}`, "event");
-    const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
+    const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName }, onLog);
     const parsed = await (async () => {
         const first = attemptJsonParse(resultText);
         if (first)
             return first;
         onLog(`Steering parse failed (${resultText.length} chars). Asking model to fix...`, "event");
         const snippet = resultText.length > 2000 ? resultText.slice(0, 1000) + "\n...\n" + resultText.slice(-800) : resultText;
-        const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
+        const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
         const retryParsed = attemptJsonParse(retryText);
         if (retryParsed)
             return retryParsed;

package/dist/swarm.d.ts CHANGED Viewed

@@ -67,6 +67,7 @@ export declare class Swarm {
     private worktreeBase?;
     private activeQueries;
     private cleanedUp;
+    private pendingTools;
     logFile?: string;
     readonly model: string | undefined;
     usageCap: number | undefined;
@@ -116,5 +117,7 @@ export declare class Swarm {
     private windowRejectedReset;
     private runAgent;
     private agentSummary;
+    /** Log a tool invocation with a short target extracted from its input. */
+    private logToolUse;
     private handleMsg;
 }

package/dist/swarm.js CHANGED Viewed

@@ -72,6 +72,10 @@ export class Swarm {
     worktreeBase;
     activeQueries = new Set();
     cleanedUp = false;
+    // Per-agent open tool_use block: cursor-composer-in-claude v0.9 opens the block
+    // with empty `input` and streams the real payload via `input_json_delta`, so we
+    // need to wait for content_block_stop before we can log the file/path target.
+    pendingTools = new WeakMap();
     logFile;
     model;
     usageCap;
@@ -700,6 +704,16 @@ export class Swarm {
         return `Agent ${agent.id} ${verb}: ${m}m ${s}s, ${agent.toolCalls} tools${files}`;
     }
     // ── Message handler ──
+    /** Log a tool invocation with a short target extracted from its input. */
+    logToolUse(agent, name, input) {
+        const p = input.path ?? input.file_path ?? input.pattern;
+        const target = typeof p === "string" && p
+            ? p
+            : typeof input.command === "string" && input.command
+                ? input.command.split(" ").slice(0, 3).join(" ")
+                : "";
+        this.log(agent.id, target ? `${name} \u2192 ${target}` : name);
+    }
     handleMsg(agent, msg) {
         // Any message that isn't a rate-limit event counts as real progress and
         // resets the stall watchdog + clears the per-agent blocked flag.
@@ -730,9 +744,11 @@ export class Swarm {
                     if (cb?.type === "tool_use") {
                         agent.currentTool = cb.name;
                         agent.toolCalls++;
-                        const input = cb.input;
-                        const target = input?.path ?? input?.file_path ?? (typeof input?.command === "string" ? input.command.split(" ").slice(0, 3).join(" ") : "");
-                        this.log(agent.id, target ? `${cb.name} \u2192 ${target}` : cb.name);
+                        const input = (cb.input ?? {});
+                        const hasInput = Object.keys(input).length > 0;
+                        this.pendingTools.set(agent, { name: cb.name, input, buf: "", logged: hasInput });
+                        if (hasInput)
+                            this.logToolUse(agent, cb.name, input);
                     }
                     else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
                         agent.lastText = "thinking…";
@@ -740,6 +756,11 @@ export class Swarm {
                 }
                 else if (ev.type === "content_block_delta") {
                     const delta = ev.delta;
+                    const pending = this.pendingTools.get(agent);
+                    if (delta?.type === "input_json_delta" && pending && typeof delta.partial_json === "string") {
+                        pending.buf += delta.partial_json;
+                        break;
+                    }
                     // thinking_delta: `delta.thinking`; text_delta: `delta.text`.
                     const raw = delta?.type === "text_delta" ? delta.text
                         : delta?.type === "thinking_delta" ? delta.thinking
@@ -750,6 +771,20 @@ export class Swarm {
                             agent.lastText = t.slice(-80);
                     }
                 }
+                else if (ev.type === "content_block_stop") {
+                    const pending = this.pendingTools.get(agent);
+                    if (pending && !pending.logged) {
+                        if (pending.buf) {
+                            try {
+                                pending.input = JSON.parse(pending.buf);
+                            }
+                            catch { }
+                        }
+                        this.logToolUse(agent, pending.name, pending.input);
+                        pending.logged = true;
+                    }
+                    this.pendingTools.delete(agent);
+                }
                 break;
             }
             case "result": {

package/dist/transcripts.d.ts ADDED Viewed

@@ -0,0 +1,5 @@
+export declare function setTranscriptRunDir(dir: string | undefined): void;
+export declare function getTranscriptRunDir(): string | undefined;
+export declare function transcriptPath(name: string): string | undefined;
+/** Append a single event; silent on error (disk full, permission, etc.). */
+export declare function writeTranscriptEvent(name: string, event: Record<string, unknown>): void;

package/dist/transcripts.js ADDED Viewed

@@ -0,0 +1,38 @@
+import { appendFileSync, mkdirSync } from "fs";
+import { dirname, join } from "path";
+/**
+ * Crash-safe NDJSON transcripts for planner/steering queries.
+ *
+ * Each query writes to `<runDir>/transcripts/<name>.ndjson`  -- one JSON object
+ * per line, so partial writes survive crashes. Multiple invocations of the same
+ * name append with a `session_start` marker separating them.
+ *
+ * Why NDJSON:
+ *   - append-only → no read-modify-write race under parallel waves
+ *   - one line per event → `tail -f` works; a killed process never leaves
+ *     the file in an unparseable state
+ *   - machine-readable → this assistant and future tools can `jq` through it
+ *
+ * Consumed by: planner-query.ts (stream_event, rate_limit_event, result, error).
+ */
+let _runDir;
+export function setTranscriptRunDir(dir) {
+    _runDir = dir;
+}
+export function getTranscriptRunDir() {
+    return _runDir;
+}
+export function transcriptPath(name) {
+    return _runDir ? join(_runDir, "transcripts", `${name}.ndjson`) : undefined;
+}
+/** Append a single event; silent on error (disk full, permission, etc.). */
+export function writeTranscriptEvent(name, event) {
+    const path = transcriptPath(name);
+    if (!path)
+        return;
+    try {
+        mkdirSync(dirname(path), { recursive: true });
+        appendFileSync(path, JSON.stringify({ t: Date.now(), ...event }) + "\n", "utf-8");
+    }
+    catch { }
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.19",
+  "version": "1.25.20",
   "description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
   "type": "module",
   "bin": {
@@ -17,7 +17,7 @@
   "dependencies": {
     "@anthropic-ai/claude-agent-sdk": "^0.2.92",
     "chalk": "^5.4.1",
-    "cursor-composer-in-claude": "0.8.0",
+    "cursor-composer-in-claude": "0.9.0",
     "jsonwebtoken": "^9.0.2"
   },
   "devDependencies": {

package/plugins/claude-overnight/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.19",
+  "version": "1.25.20",
   "description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs  -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
   "author": {
     "name": "Francesco Fornace"

package/plugins/claude-overnight/skills/claude-overnight/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ description: >
 # What it is
-`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, executor waves run them in parallel, and steering decides between more execution, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
+`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits verified by the worker next wave). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
 **Three-layer review system** runs on every wave:
 1. **Per-agent self-review**  -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm  -- no initial context bloat.
@@ -55,16 +55,20 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
 | File / dir           | What it tells you                                                                 |
 |----------------------|-----------------------------------------------------------------------------------|
-| `run.json`           | Machine state: objective, model, budget, cost, waves done, branches, done flag.   |
+| `run.json`           | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
 | `status.md`          | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
 | `goal.md`            | Evolving "north star"  -- what the run currently thinks "amazing" means.            |
+| `themes.md`          | The thinking-wave research angles picked for this objective (human-readable).     |
 | `milestones/*.md`    | Strategic snapshots archived ~every 5 waves. Long-term memory of the run.         |
 | `designs/*.md`       | Architect outputs from the thinking wave. Deleted once the objective is complete. |
+| `tasks.json`         | The execution plan written by the orchestrator.                                   |
+| `steering/wave-N-attempt-M.json` | Steering decision per wave: done flag, reasoning, status/goal updates.   |
+| `transcripts/*.ndjson` | Crash-safe NDJSON stream for every planner/steering query: `themes`, `orchestrate`, `plan`, `steer-wave-N-attempt-M`. Each line = one event (session_start, tool_use, text_delta, thinking_delta, rate_limit, result, error). Use `jq -c '.kind' <file>` to get a quick shape; read full objects to reconstruct what the planner was doing. Survives process crashes because writes are append-only. |
 | `sessions/wave-N.json` | Per-wave agent records: prompt, status, cost, files changed, branch, error.    |
 The newest subfolder under `runs/` is the current/last run. A run that never reached "done" is **resumable**  -- `run.json` will not be marked complete and `designs/` may still be present.
-To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands.
+To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands. If the run died during planning (no `sessions/` yet), read `themes.md` + the newest `transcripts/*.ndjson` instead — they show exactly what the planner was doing when it crashed.
 **Durable run history (committed, survives cleanup):** `claude-overnight.log.md` at the repo root is updated on every run with a block per run ID  -- original objective, start/finish times, cost, outcome, branch. If the user asks "what was my prompt" or "what did last night's run do" and `.claude-overnight/runs/` is empty, this file is the canonical recovery path.