claude-overnight 1.25.19 → 1.25.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,14 +4,14 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
4
4
 
5
5
  Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
6
6
 
7
- Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Pair any planner (Opus, Sonnet) with any executor — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
7
+ Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits, verified by the worker next wave). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
8
8
 
9
9
  ## Run on Qwen 3.6 Plus
10
10
 
11
- Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in executor that speaks the Anthropic Messages API -- same client, same flow, pennies per run.
11
+ Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in worker that speaks the Anthropic Messages API -- same client, same flow, pennies per run.
12
12
 
13
13
  1. **Get an API key.** Sign up at [Alibaba Cloud](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fmodelstudio.console.alibabacloud.com%2Fap-southeast-1%3Ftab%3Ddashboard%23%2Fapi-key&clearRedirectCookie=1) -- the link takes you straight to the API key dashboard.
14
- 2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the executor step, and fill in:
14
+ 2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the worker step, and fill in:
15
15
 
16
16
  | Field | Value |
17
17
  |---|---|
@@ -20,7 +20,7 @@ Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Al
20
20
  | Model id | `qwen3.6-plus` |
21
21
  | API key | your DashScope key |
22
22
 
23
- 3. That's it. Planner runs on Sonnet (or Opus), executor runs on Qwen.
23
+ 3. That's it. Planner runs on Sonnet (or Opus), worker runs on Qwen.
24
24
 
25
25
  Or set it via env directly:
26
26
 
@@ -33,7 +33,7 @@ claude-overnight
33
33
 
34
34
  ## Run via Cursor API Proxy
35
35
 
36
- Use Cursor's model gateway as an executor -- `auto` (delegates to best available), `composer`, or `composer-2` models. Runs locally through a proxy that speaks the Anthropic Messages API, so it's a drop-in replacement for any other provider.
36
+ Use Cursor's model gateway as a worker -- `auto` (delegates to best available), `composer`, or `composer-2` models. Runs locally through a proxy that speaks the Anthropic Messages API, so it's a drop-in replacement for any other provider.
37
37
 
38
38
  ### macOS: Cursor agent shell patch
39
39
 
@@ -130,7 +130,7 @@ claude-overnight
130
130
  ● Opus -- Opus 4.6 · Most capable
131
131
  ○ Sonnet -- Sonnet 4.6 · Best for everyday tasks
132
132
 
133
- Executor model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
133
+ Worker model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
134
134
  ● Sonnet -- Sonnet 4.6 · Best for everyday tasks
135
135
  ○ Opus -- Opus 4.6 · Most capable
136
136
  ○ Other… · custom OpenAI/Anthropic-compatible endpoint
@@ -211,9 +211,15 @@ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever over
211
211
  .claude-overnight/
212
212
  runs/
213
213
  2026-04-04T18-52-49/ ← run A (done, $200, 200 tasks)
214
- run.json, status.md, goal.md, milestones/, sessions/
215
- 2026-04-05T10-30-00/ ← run B (crashed)
216
- run.json, sessions/
214
+ run.json full resume state (models, budget, wave history)
215
+ status.md, goal.md, themes.md
216
+ designs/ ← per-focus research docs from the thinking wave
217
+ tasks.json ← the plan the swarm is executing
218
+ transcripts/ ← NDJSON per planner query: themes, orchestrate, steer-wave-N, ...
219
+ steering/ ← steering decisions per wave
220
+ milestones/, sessions/
221
+ 2026-04-05T10-30-00/ ← run B (crashed mid-planning)
222
+ run.json, transcripts/themes.ndjson ← see exactly what the planner was doing
217
223
  ```
218
224
 
219
225
  Any run that stops before the steering system declares the objective complete -- capped at usage limit, Ctrl+C, crash, rate limit timeout, steering failure -- is automatically resumable:
@@ -243,6 +249,20 @@ If the thinking phase succeeds but orchestration crashes, the next run detects t
243
249
 
244
250
  **Knowledge carries forward** -- new runs inherit knowledge from completed previous runs. Thinking sessions and steering see what past runs built. Run 2 knows run 1 already built the auth system.
245
251
 
252
+ ### Transcripts and streaming
253
+
254
+ Every planner/steering query streams through the Agent SDK with `includePartialMessages: true`, so tool calls, thinking, and text deltas are captured as they happen. Each query also appends an NDJSON transcript under `runs/<ts>/transcripts/<name>.ndjson` — so if the planner crashes mid-think you still have the forensic trail (prompt preview, every tool use, every text/thinking delta, rate-limit events, and the final result or error). `themes.md` is also written as a human-readable summary right after the thinking wave.
255
+
256
+ Not every provider delivers the same streaming granularity:
257
+
258
+ | Provider | Tool-use events | Thinking deltas | Text deltas |
259
+ | --- | --- | --- | --- |
260
+ | Anthropic (direct) | ✓ | ✓ | ✓ |
261
+ | Cursor proxy (`cursor-composer-in-claude`) | — | — | ✓ (final answer only) |
262
+ | Qwen / OpenRouter / custom Anthropic-compatible | depends on upstream | depends | usually ✓ |
263
+
264
+ When a provider doesn't stream partials (or the model is a reasoning model on the Cursor proxy — the proxy suppresses the thinking phase and only emits the final answer), the ticker shows elapsed time with no live text, then the completed result lands in one go. The UI, transcripts, and the resume flow all behave identically either way — streaming is used when available, never required.
265
+
246
266
  Add `.claude-overnight/` to your `.gitignore` (with the trailing slash -- see below).
247
267
 
248
268
  A separate, tiny `claude-overnight.log.md` is also written at the repo root on every run. It's human-readable, append-only, one block per run (objective, start/finish, cost, outcome, branch), and is designed to be **committed** -- so even after `.claude-overnight/` is cleaned up you can still recover which prompt produced which commits. Use `.claude-overnight/` (with trailing slash) in your gitignore so this file isn't matched by accident.
@@ -289,7 +309,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
289
309
  |---|---|---|
290
310
  | `--budget=N` | `10` | Total agent sessions |
291
311
  | `--concurrency=N` | `5` | Parallel agents |
292
- | `--model=NAME` | prompted | Worker model -- interactive picks planner + executor separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
312
+ | `--model=NAME` | prompted | Worker model -- interactive picks planner + worker separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
293
313
  | `--usage-cap=N` | unlimited | Stop at N% utilization |
294
314
  | `--allow-extra-usage` | off | Allow extra/overage usage (billed separately) |
295
315
  | `--extra-usage-budget=N` | -- | Max $ for extra usage (implies --allow-extra-usage) |
@@ -313,12 +333,12 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
313
333
 
314
334
  ## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
315
335
 
316
- Planner and executor are picked separately -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of execution.
336
+ Planner, worker, and optional fast model are each picked separately -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work.
317
337
 
318
- From the interactive picker, choose `Other…` on the planner or executor step:
338
+ From the interactive picker, choose `Other…` on the planner, worker, or fast step:
319
339
 
320
340
  ```
321
- Executor model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
341
+ Worker model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
322
342
  ○ Sonnet
323
343
  ○ Opus
324
344
  ● Other…
@@ -333,9 +353,9 @@ From the interactive picker, choose `Other…` on the planner or executor step:
333
353
 
334
354
  Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
335
355
 
336
- **How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, executor queries use the executor provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
356
+ **How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
337
357
 
338
- **Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ executor preflight failed: ...` instead of N scattered mid-run errors.
358
+ **Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.
339
359
 
340
360
  **Resume.** Provider ids are persisted in `run.json` and rehydrated on resume. If you deleted a provider between runs, resume refuses to start and tells you exactly which id is missing.
341
361
 
@@ -1 +1 @@
1
- export declare const VERSION = "1.25.19";
1
+ export declare const VERSION = "1.25.20";
package/dist/_version.js CHANGED
@@ -1,2 +1,2 @@
1
1
  // Auto-generated by build — do not edit manually.
2
- export const VERSION = "1.25.19";
2
+ export const VERSION = "1.25.20";
package/dist/index.js CHANGED
@@ -1,5 +1,5 @@
1
1
  #!/usr/bin/env node
2
- import { readFileSync, existsSync, readdirSync, mkdirSync } from "fs";
2
+ import { readFileSync, existsSync, readdirSync, mkdirSync, writeFileSync } from "fs";
3
3
  import { resolve, dirname, join } from "path";
4
4
  import { fileURLToPath } from "url";
5
5
  import chalk from "chalk";
@@ -9,6 +9,7 @@ import { Swarm } from "./swarm.js";
9
9
  import { planTasks, refinePlan, identifyThemes, buildThinkingTasks, orchestrate, salvageFromFile } from "./planner.js";
10
10
  import { modelDisplayName, formatContextWindow, DEFAULT_MODEL } from "./models.js";
11
11
  import { setPlannerEnvResolver } from "./planner-query.js";
12
+ import { setTranscriptRunDir } from "./transcripts.js";
12
13
  import { pickModel, loadProviders, preflightProvider, buildEnvResolver, healthCheckCursorProxy, PROXY_DEFAULT_URL, isCursorProxyProvider, readCursorProxyLogTail, ensureCursorProxyRunning, bundledComposerProxyShellCommand, warnMacCursorAgentShellPatchIfNeeded, hasCursorAgentToken, } from "./providers.js";
13
14
  import { RunDisplay } from "./ui.js";
14
15
  import { renderSummary } from "./render.js";
@@ -72,10 +73,17 @@ async function promptResumeOverrides(state, cliFlags, argv, noTTY, runDir) {
72
73
  const extraStr = state.allowExtraUsage
73
74
  ? (state.extraUsageBudget ? `$${state.extraUsageBudget}` : "unlimited")
74
75
  : "off";
76
+ const modelLine = (label, m) => m ? ` ${chalk.dim(label.padEnd(11))}${chalk.white(m)} ${chalk.dim(`(${formatContextWindow(m)} context)`)}` : null;
75
77
  console.log();
76
78
  console.log(` ${chalk.dim("Resume settings")}`);
77
79
  console.log(` ${chalk.dim("─".repeat(40))}`);
78
- console.log(` ${chalk.dim("model ")}${chalk.white(state.workerModel)} ${chalk.dim(`(${formatContextWindow(state.workerModel)} context)`)}`);
80
+ const lines = [
81
+ modelLine("planner", state.plannerModel),
82
+ modelLine("worker", state.workerModel),
83
+ modelLine("fast", state.fastModel),
84
+ ].filter(Boolean);
85
+ for (const l of lines)
86
+ console.log(l);
79
87
  console.log(` ${chalk.dim("remaining ")}${chalk.white(String(remaining))} ${chalk.dim("sessions")}`);
80
88
  console.log(` ${chalk.dim("concur ")}${chalk.white(String(state.concurrency))}`);
81
89
  console.log(` ${chalk.dim("usage cap ")}${chalk.white(capStr)}`);
@@ -185,7 +193,7 @@ async function main() {
185
193
  --dry-run Show planned tasks without running them
186
194
  --budget=N Target number of agent runs ${chalk.dim("(default: 10)")}
187
195
  --concurrency=N Max parallel agents ${chalk.dim("(default: 5)")}
188
- --model=NAME Worker model override ${chalk.dim("(interactive mode picks planner + executor separately -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
196
+ --model=NAME Worker model override ${chalk.dim("(interactive mode picks planner + worker separately -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
189
197
  --fast-model=NAME Fast model for quick tasks ${chalk.dim("(optional -- checked by worker model in next wave)")}
190
198
  --usage-cap=N Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
191
199
  --allow-extra-usage Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
@@ -472,8 +480,11 @@ async function main() {
472
480
  const flexNote = `This is wave 1 of an adaptive multi-wave run (total budget: ${remainingBudget}). Plan the highest-impact foundational work first. Future waves will iterate based on what's learned.`;
473
481
  console.log(chalk.cyan(`\n ◆ Re-orchestrating plan from existing designs...\n`));
474
482
  process.stdout.write("\x1B[?25l");
483
+ // Route transcripts into the resumed run so this call's events
484
+ // land alongside the prior run's planning trail.
485
+ setTranscriptRunDir(resumeRunDir);
475
486
  try {
476
- const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"));
487
+ const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"), "orchestrate-resume");
477
488
  resumeState.currentTasks = orchTasks;
478
489
  process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${orchTasks.length} tasks`)}\n`);
479
490
  }
@@ -588,7 +599,7 @@ async function main() {
588
599
  const plannerPick = await pickModel(`${chalk.cyan("④")} Planner model ${chalk.dim("(thinking, steering -- use your strongest)")}:`, models);
589
600
  plannerModel = plannerPick.model;
590
601
  plannerProvider = plannerPick.provider;
591
- const workerPick = await pickModel(`${chalk.cyan("⑤")} Executor model ${chalk.dim("(what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…)")}:`, models);
602
+ const workerPick = await pickModel(`${chalk.cyan("⑤")} Worker model ${chalk.dim("(what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…)")}:`, models);
592
603
  workerModel = workerPick.model;
593
604
  workerProvider = workerPick.provider;
594
605
  // ⑤b Optional fast model for quick tasks that will be verified
@@ -782,7 +793,7 @@ async function main() {
782
793
  const seen = new Set();
783
794
  const all = [
784
795
  ["planner", plannerProvider],
785
- ["executor", workerProvider],
796
+ ["worker", workerProvider],
786
797
  ["fast", fastProvider],
787
798
  ];
788
799
  const pending = [];
@@ -855,6 +866,10 @@ async function main() {
855
866
  const runDir = resuming && resumeRunDir ? resumeRunDir : (orphanedDir ?? createRunDir(rootDir));
856
867
  if (resuming && resumeRunDir)
857
868
  updateLatestSymlink(rootDir, resumeRunDir);
869
+ // Route all planner/steering stream events to <runDir>/transcripts/*.ndjson
870
+ // so crashes during planning leave a forensic trail and resumes can inspect
871
+ // what the planner was doing mid-flight. See src/transcripts.ts.
872
+ setTranscriptRunDir(runDir);
858
873
  const previousKnowledge = readPreviousRunKnowledge(rootDir);
859
874
  const needsPlan = tasks.length === 0 && (!resuming || replanFromScratch);
860
875
  const designDir = join(runDir, "designs");
@@ -867,8 +882,9 @@ async function main() {
867
882
  saveRunState(runDir, {
868
883
  id: runDir.split(/[/\\]/).pop() ?? "",
869
884
  objective, budget: budget ?? 10, remaining: budget ?? 10,
870
- workerModel, plannerModel,
885
+ workerModel, plannerModel, fastModel,
871
886
  workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
887
+ fastProviderId: fastProvider?.id,
872
888
  concurrency, permissionMode,
873
889
  usageCap, allowExtraUsage, extraUsageBudget,
874
890
  flex, useWorktrees, mergeStrategy,
@@ -894,7 +910,16 @@ async function main() {
894
910
  const thinkingCount = useThinking ? Math.min(Math.max(concurrency, Math.ceil((budget ?? 10) * 0.005)), 10) : 0;
895
911
  try {
896
912
  if (useThinking) {
897
- let themes = await identifyThemes(objective, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog());
913
+ // Persist themes as a Markdown doc so a planning-phase crash leaves a
914
+ // readable record (and a future resume can skip identifyThemes).
915
+ const saveThemesMd = (list) => {
916
+ try {
917
+ writeFileSync(join(runDir, "themes.md"), `# Themes\n\n**Objective:** ${objective}\n\n${list.map((t, i) => `${i + 1}. ${t}`).join("\n")}\n`, "utf-8");
918
+ }
919
+ catch { }
920
+ };
921
+ let themes = await identifyThemes(objective, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes");
922
+ saveThemesMd(themes);
898
923
  process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
899
924
  planRestore();
900
925
  let reviewing = true;
@@ -913,7 +938,8 @@ async function main() {
913
938
  continue;
914
939
  process.stdout.write("\x1B[?25l");
915
940
  try {
916
- themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog());
941
+ themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes-refine");
942
+ saveThemesMd(themes);
917
943
  process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
918
944
  }
919
945
  catch (err) {
@@ -990,8 +1016,9 @@ async function main() {
990
1016
  saveRunState(runDir, {
991
1017
  id: runDir.split(/[/\\]/).pop() ?? "",
992
1018
  objective: objective, budget: budget ?? 10, remaining: (budget ?? 10) - thinkingUsed,
993
- workerModel, plannerModel,
1019
+ workerModel, plannerModel, fastModel,
994
1020
  workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
1021
+ fastProviderId: fastProvider?.id,
995
1022
  concurrency, permissionMode,
996
1023
  usageCap, allowExtraUsage, extraUsageBudget,
997
1024
  flex, useWorktrees, mergeStrategy,
@@ -23,6 +23,8 @@ export interface PlannerOpts {
23
23
  type: "json_schema";
24
24
  schema: Record<string, unknown>;
25
25
  };
26
+ /** When set, stream events are appended to <runDir>/transcripts/<name>.ndjson */
27
+ transcriptName?: string;
26
28
  }
27
29
  export declare function setPlannerEnvResolver(fn: ((model?: string) => Record<string, string> | undefined) | undefined): void;
28
30
  export declare function getTotalPlannerCost(): number;
@@ -1,6 +1,7 @@
1
1
  import { query } from "@anthropic-ai/claude-agent-sdk";
2
2
  import { readFileSync } from "fs";
3
3
  import { NudgeError } from "./types.js";
4
+ import { writeTranscriptEvent } from "./transcripts.js";
4
5
  // ── Shared env resolver (set once at run start, used by every planner query) ──
5
6
  //
6
7
  // Swarm and planner calls share a model→env map so a custom provider configured
@@ -63,6 +64,22 @@ async function throttlePlanner(onLog, aborted) {
63
64
  }
64
65
  // Exhausted backoffs — proceed anyway, the retry loop will catch a rejection.
65
66
  }
67
+ /**
68
+ * Pick a short, human-readable target for a tool invocation (Read/Grep/Bash/…).
69
+ * Prefers explicit file paths; falls back to the first few tokens of a shell
70
+ * command. Returns `""` when the input has no useful identifier.
71
+ */
72
+ function extractToolTarget(input) {
73
+ if (!input)
74
+ return "";
75
+ const p = input.path ?? input.file_path ?? input.pattern;
76
+ if (typeof p === "string" && p)
77
+ return p;
78
+ if (typeof input.command === "string" && input.command) {
79
+ return input.command.split(" ").slice(0, 3).join(" ");
80
+ }
81
+ return "";
82
+ }
66
83
  // ── Query execution ──
67
84
  const NUDGE_MS = 15 * 60 * 1000;
68
85
  const HARD_TIMEOUT_MS = 30 * 60 * 1000;
@@ -110,6 +127,17 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
110
127
  const startedAt = Date.now();
111
128
  const isResume = !!opts.resumeSessionId;
112
129
  const envOverride = _envResolver?.(opts.model);
130
+ const tname = opts.transcriptName;
131
+ if (tname) {
132
+ writeTranscriptEvent(tname, {
133
+ kind: "session_start",
134
+ model: opts.model,
135
+ isResume,
136
+ resumeSessionId: opts.resumeSessionId,
137
+ promptPreview: prompt.slice(0, 2000),
138
+ promptBytes: prompt.length,
139
+ });
140
+ }
113
141
  const pq = query({
114
142
  prompt,
115
143
  options: {
@@ -167,6 +195,18 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
167
195
  };
168
196
  timer = setTimeout(check, timeoutMs);
169
197
  });
198
+ // Tool-use blocks can arrive in two shapes:
199
+ // (a) content_block_start carries the full `input` (native Anthropic non-partial)
200
+ // (b) content_block_start carries `input: {}` and the JSON is streamed via
201
+ // input_json_delta frames (Anthropic streaming spec, cursor-composer-in-claude v0.9+).
202
+ // Track the open tool block so we can re-log with the enriched target once
203
+ // the input arrives, and write a complete transcript entry on block stop.
204
+ let pendingTool = null;
205
+ const logTool = (name, input) => {
206
+ const target = extractToolTarget(input);
207
+ lastLogText = target ? `${name} ${target}` : name;
208
+ onLog(target ? `${name} → ${target}` : name, "event");
209
+ };
170
210
  const consume = async () => {
171
211
  for await (const msg of pq) {
172
212
  lastActivity = Date.now();
@@ -178,21 +218,34 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
178
218
  const cb = ev.content_block;
179
219
  if (cb?.type === "tool_use") {
180
220
  toolCount++;
181
- const toolName = cb.name;
182
- const input = cb.input;
183
- // Enrich event with target file/path for readability
184
- const target = input?.path ?? input?.file_path ?? input?.command
185
- ? (typeof input?.command === "string" ? input.command.split(" ").slice(0, 3).join(" ") : "")
186
- : "";
187
- lastLogText = target ? `${toolName} ${target}` : toolName;
188
- onLog(target ? `${toolName} → ${target}` : toolName, "event");
221
+ const input = (cb.input ?? {});
222
+ const hasInput = Object.keys(input).length > 0;
223
+ pendingTool = {
224
+ index: ev.index ?? 0,
225
+ name: cb.name,
226
+ id: cb.id,
227
+ input,
228
+ buf: "",
229
+ logged: hasInput,
230
+ };
231
+ if (hasInput) {
232
+ logTool(cb.name, input);
233
+ if (tname)
234
+ writeTranscriptEvent(tname, { kind: "tool_use", tool: cb.name, input });
235
+ }
189
236
  }
190
237
  else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
191
238
  lastLogText = "thinking…";
239
+ if (tname)
240
+ writeTranscriptEvent(tname, { kind: "thinking_start" });
192
241
  }
193
242
  }
194
243
  if (ev?.type === "content_block_delta") {
195
244
  const delta = ev.delta;
245
+ if (delta?.type === "input_json_delta" && pendingTool && typeof delta.partial_json === "string") {
246
+ pendingTool.buf += delta.partial_json;
247
+ continue;
248
+ }
196
249
  // thinking_delta carries reasoning text under `delta.thinking`;
197
250
  // text_delta carries final-answer text under `delta.text`.
198
251
  const raw = delta?.type === "text_delta" ? delta.text
@@ -202,7 +255,23 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
202
255
  const snippet = raw.trim().replace(/[{}"\\,[\]]+/g, " ").replace(/\s+/g, " ").trim();
203
256
  if (snippet.length > 5)
204
257
  lastLogText = snippet.slice(-60);
258
+ if (tname)
259
+ writeTranscriptEvent(tname, { kind: delta.type, text: raw });
260
+ }
261
+ }
262
+ if (ev?.type === "content_block_stop" && pendingTool) {
263
+ if (!pendingTool.logged && pendingTool.buf) {
264
+ try {
265
+ pendingTool.input = JSON.parse(pendingTool.buf);
266
+ }
267
+ catch { }
268
+ }
269
+ if (!pendingTool.logged) {
270
+ logTool(pendingTool.name, pendingTool.input);
271
+ if (tname)
272
+ writeTranscriptEvent(tname, { kind: "tool_use", tool: pendingTool.name, input: pendingTool.input });
205
273
  }
274
+ pendingTool = null;
206
275
  }
207
276
  }
208
277
  if (msg.type === "rate_limit_event") {
@@ -222,6 +291,15 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
222
291
  resetsAt: info.resetsAt,
223
292
  });
224
293
  }
294
+ if (tname)
295
+ writeTranscriptEvent(tname, {
296
+ kind: "rate_limit",
297
+ utilization: info.utilization ?? 0,
298
+ status: info.status,
299
+ rateLimitType: info.rateLimitType,
300
+ resetsAt: info.resetsAt,
301
+ isUsingOverage: !!info.isUsingOverage,
302
+ });
225
303
  }
226
304
  }
227
305
  if (msg.type === "result") {
@@ -234,8 +312,27 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
234
312
  if (msg.subtype === "success") {
235
313
  structuredOutput = r.structured_output;
236
314
  resultText = r.result || "";
315
+ if (tname)
316
+ writeTranscriptEvent(tname, {
317
+ kind: "result",
318
+ subtype: "success",
319
+ costUsd,
320
+ durationMs: Date.now() - startedAt,
321
+ toolCount,
322
+ resultPreview: typeof resultText === "string" ? resultText.slice(0, 4000) : undefined,
323
+ hasStructuredOutput: structuredOutput != null,
324
+ });
237
325
  }
238
326
  else {
327
+ if (tname)
328
+ writeTranscriptEvent(tname, {
329
+ kind: "result",
330
+ subtype: msg.subtype,
331
+ costUsd,
332
+ durationMs: Date.now() - startedAt,
333
+ toolCount,
334
+ error: r.result,
335
+ });
239
336
  throw new Error(`Planner failed: ${r.result || msg.subtype}`);
240
337
  }
241
338
  }
@@ -244,6 +341,16 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
244
341
  try {
245
342
  await Promise.race([consume(), watchdog]);
246
343
  }
344
+ catch (err) {
345
+ if (tname)
346
+ writeTranscriptEvent(tname, {
347
+ kind: "error",
348
+ message: err instanceof Error ? err.message : String(err),
349
+ durationMs: Date.now() - startedAt,
350
+ toolCount,
351
+ });
352
+ throw err;
353
+ }
247
354
  finally {
248
355
  clearTimeout(timer);
249
356
  clearInterval(ticker);
package/dist/planner.d.ts CHANGED
@@ -1,8 +1,8 @@
1
1
  import type { Task, PermMode } from "./types.js";
2
2
  export declare function salvageFromFile(outFile: string | undefined, budget: number | undefined, onLog: (text: string, kind?: "status" | "event") => void, why: string): Task[] | null;
3
3
  export declare const DESIGN_THINKING = "\nHOW TO THINK ABOUT EVERY TASK:\n\nStart from the user's job. What is someone hiring this product to do? \"I need to send money abroad cheaply\" -- not \"I need a currency conversion API.\" Every decision -- what to build, how fast it needs to respond, what happens on error -- flows from the job.\n\nThe experience IS the product. A 200ms server response is not a \"performance metric\" -- it's the difference between an app that feels alive and one that feels broken. A loading state is not \"polish\" -- it's the user knowing the app heard them. An error message is not \"error handling\" -- it's the app being honest. There is no line between backend and UX. The server, the API, the database query, the render -- they're all one experience the user either trusts or doesn't.\n\nBuild the core, verify it works, learn, iterate. Don't plan 20 features and build them all. Build the ONE thing that matters most, run it, see if it actually works from a user's chair. What you learn from seeing it run will change what you build next. Each wave should make what exists better before adding what doesn't exist yet.\n\nConsistency is what makes complex things feel simple. One design system, rigid rules, no exceptions. This is how Revolut ships a super-app with 30+ features that doesn't feel like chaos.\n";
4
- export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
5
- export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void): Promise<string[]>;
4
+ export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
5
+ export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void, transcriptName?: string): Promise<string[]>;
6
6
  export declare function buildThinkingTasks(objective: string, themes: string[], designDir: string, plannerModel: string, previousKnowledge?: string): Task[];
7
- export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
8
- export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void): Promise<Task[]>;
7
+ export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
8
+ export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, transcriptName?: string): Promise<Task[]>;
package/dist/planner.js CHANGED
@@ -152,13 +152,13 @@ Respond with ONLY a JSON object (no markdown fences):
152
152
  }`;
153
153
  }
154
154
  // ── Planning functions ──
155
- export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
155
+ export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "plan") {
156
156
  onLog("Analyzing codebase...");
157
157
  const prompt = plannerPrompt(objective, workerModel, budget, concurrency, flexNote);
158
158
  const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
159
159
  let resultText;
160
160
  try {
161
- resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
161
+ resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
162
162
  }
163
163
  catch (err) {
164
164
  const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
@@ -168,7 +168,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
168
168
  }
169
169
  const parsed = await extractTaskJson(resultText, async () => {
170
170
  onLog("Retrying...");
171
- return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
171
+ return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
172
172
  }, onLog, outFile);
173
173
  let tasks = (parsed.tasks || []).map((t, i) => ({
174
174
  id: String(i), prompt: typeof t === "string" ? t : t.prompt,
@@ -179,7 +179,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
179
179
  onLog(`${tasks.length} tasks`);
180
180
  return tasks;
181
181
  }
182
- export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }) {
182
+ export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }, transcriptName = "themes") {
183
183
  const resultText = await runPlannerQuery(`You are picking ${count} research angles for architects who will deeply explore a codebase next.
184
184
 
185
185
  First do a BRIEF recon (3-6 tool calls max, don't go deep): read package.json and README if present, glob the top-level directory, peek at one or two config files that reveal the stack. You are learning what this codebase actually IS -- not solving anything.
@@ -188,7 +188,7 @@ Then pick ${count} angles that carve up THIS specific codebase orthogonally. Pre
188
188
 
189
189
  Objective: ${objective}
190
190
 
191
- Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA }, onLog);
191
+ Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA, transcriptName }, onLog);
192
192
  const parsed = attemptJsonParse(resultText);
193
193
  if (parsed?.themes && Array.isArray(parsed.themes))
194
194
  return parsed.themes.slice(0, count);
@@ -229,7 +229,7 @@ Be thorough -- your findings drive the execution plan.`,
229
229
  model: plannerModel,
230
230
  }));
231
231
  }
232
- export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
232
+ export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "orchestrate") {
233
233
  const constraint = contextConstraintNote(workerModel);
234
234
  const flexLine = flexNote ? `\n\n${flexNote}` : "";
235
235
  const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
@@ -259,7 +259,7 @@ Respond with ONLY a JSON object (no markdown fences):
259
259
  onLog("Synthesizing...");
260
260
  let resultText;
261
261
  try {
262
- resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
262
+ resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
263
263
  }
264
264
  catch (err) {
265
265
  const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
@@ -269,7 +269,7 @@ Respond with ONLY a JSON object (no markdown fences):
269
269
  }
270
270
  const parsed = await extractTaskJson(resultText, async () => {
271
271
  onLog("Retrying...");
272
- return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
272
+ return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
273
273
  }, onLog, outFile);
274
274
  let tasks = (parsed.tasks || []).map((t, i) => ({
275
275
  id: String(i), prompt: typeof t === "string" ? t : t.prompt,
@@ -280,7 +280,7 @@ Respond with ONLY a JSON object (no markdown fences):
280
280
  onLog(`${tasks.length} tasks`);
281
281
  return tasks;
282
282
  }
283
- export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog) {
283
+ export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, transcriptName = "refine") {
284
284
  onLog("Refining plan...");
285
285
  const prev = previousTasks.map((t, i) => `${i + 1}. ${t.prompt}`).join("\n");
286
286
  const constraint = contextConstraintNote(workerModel);
@@ -303,10 +303,10 @@ ${scaleNote} ${concurrency} agents run in parallel. Update the plan accordingly.
303
303
 
304
304
  Respond with ONLY a JSON object (no markdown):
305
305
  {"tasks":[{"prompt":"..."}]}`;
306
- const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
306
+ const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
307
307
  const parsed = await extractTaskJson(resultText, async () => {
308
308
  onLog("Retrying...");
309
- return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
309
+ return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
310
310
  }, onLog);
311
311
  let tasks = (parsed.tasks || []).map((t, i) => ({
312
312
  id: String(i), prompt: typeof t === "string" ? t : t.prompt,
package/dist/run.js CHANGED
@@ -272,7 +272,7 @@ export async function executeRun(cfg) {
272
272
  const appliedGuidance = memory.userGuidance;
273
273
  if (appliedGuidance)
274
274
  display.appendSteeringEvent(`User directives applied: ${appliedGuidance.slice(0, 80)}`);
275
- const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory);
275
+ const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory, `steer-wave-${waveNum}-attempt-${steerAttempts}`);
276
276
  accCost += getTotalPlannerCost() - plannerCostBefore;
277
277
  syncRunInfo();
278
278
  if (steer.statusUpdate)
@@ -1,3 +1,3 @@
1
1
  import type { PermMode, SteerResult, RunMemory, WaveSummary } from "./types.js";
2
2
  import { type PlannerLog } from "./planner-query.js";
3
- export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory): Promise<SteerResult>;
3
+ export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory, transcriptName?: string): Promise<SteerResult>;
package/dist/steering.js CHANGED
@@ -23,7 +23,7 @@ const STEER_SCHEMA = {
23
23
  required: ["done", "tasks", "reasoning", "statusUpdate", "estimatedSessionsRemaining"],
24
24
  },
25
25
  };
26
- export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory) {
26
+ export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
27
27
  const constraint = contextConstraintNote(workerModel);
28
28
  const recentWaves = history.slice(-3);
29
29
  const recentText = recentWaves.length > 0 ? recentWaves.map(w => {
@@ -114,14 +114,14 @@ Set "noWorktree": true for verify/user-test tasks -- they need the real project
114
114
  If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSessionsRemaining": 0, "tasks": []}`;
115
115
  onLog("Assessing...", "status");
116
116
  onLog(`Reading codebase -- wave ${history.length + 1}`, "event");
117
- const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
117
+ const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName }, onLog);
118
118
  const parsed = await (async () => {
119
119
  const first = attemptJsonParse(resultText);
120
120
  if (first)
121
121
  return first;
122
122
  onLog(`Steering parse failed (${resultText.length} chars). Asking model to fix...`, "event");
123
123
  const snippet = resultText.length > 2000 ? resultText.slice(0, 1000) + "\n...\n" + resultText.slice(-800) : resultText;
124
- const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
124
+ const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
125
125
  const retryParsed = attemptJsonParse(retryText);
126
126
  if (retryParsed)
127
127
  return retryParsed;
package/dist/swarm.d.ts CHANGED
@@ -67,6 +67,7 @@ export declare class Swarm {
67
67
  private worktreeBase?;
68
68
  private activeQueries;
69
69
  private cleanedUp;
70
+ private pendingTools;
70
71
  logFile?: string;
71
72
  readonly model: string | undefined;
72
73
  usageCap: number | undefined;
@@ -116,5 +117,7 @@ export declare class Swarm {
116
117
  private windowRejectedReset;
117
118
  private runAgent;
118
119
  private agentSummary;
120
+ /** Log a tool invocation with a short target extracted from its input. */
121
+ private logToolUse;
119
122
  private handleMsg;
120
123
  }
package/dist/swarm.js CHANGED
@@ -72,6 +72,10 @@ export class Swarm {
72
72
  worktreeBase;
73
73
  activeQueries = new Set();
74
74
  cleanedUp = false;
75
+ // Per-agent open tool_use block: cursor-composer-in-claude v0.9 opens the block
76
+ // with empty `input` and streams the real payload via `input_json_delta`, so we
77
+ // need to wait for content_block_stop before we can log the file/path target.
78
+ pendingTools = new WeakMap();
75
79
  logFile;
76
80
  model;
77
81
  usageCap;
@@ -700,6 +704,16 @@ export class Swarm {
700
704
  return `Agent ${agent.id} ${verb}: ${m}m ${s}s, ${agent.toolCalls} tools${files}`;
701
705
  }
702
706
  // ── Message handler ──
707
+ /** Log a tool invocation with a short target extracted from its input. */
708
+ logToolUse(agent, name, input) {
709
+ const p = input.path ?? input.file_path ?? input.pattern;
710
+ const target = typeof p === "string" && p
711
+ ? p
712
+ : typeof input.command === "string" && input.command
713
+ ? input.command.split(" ").slice(0, 3).join(" ")
714
+ : "";
715
+ this.log(agent.id, target ? `${name} \u2192 ${target}` : name);
716
+ }
703
717
  handleMsg(agent, msg) {
704
718
  // Any message that isn't a rate-limit event counts as real progress and
705
719
  // resets the stall watchdog + clears the per-agent blocked flag.
@@ -730,9 +744,11 @@ export class Swarm {
730
744
  if (cb?.type === "tool_use") {
731
745
  agent.currentTool = cb.name;
732
746
  agent.toolCalls++;
733
- const input = cb.input;
734
- const target = input?.path ?? input?.file_path ?? (typeof input?.command === "string" ? input.command.split(" ").slice(0, 3).join(" ") : "");
735
- this.log(agent.id, target ? `${cb.name} \u2192 ${target}` : cb.name);
747
+ const input = (cb.input ?? {});
748
+ const hasInput = Object.keys(input).length > 0;
749
+ this.pendingTools.set(agent, { name: cb.name, input, buf: "", logged: hasInput });
750
+ if (hasInput)
751
+ this.logToolUse(agent, cb.name, input);
736
752
  }
737
753
  else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
738
754
  agent.lastText = "thinking…";
@@ -740,6 +756,11 @@ export class Swarm {
740
756
  }
741
757
  else if (ev.type === "content_block_delta") {
742
758
  const delta = ev.delta;
759
+ const pending = this.pendingTools.get(agent);
760
+ if (delta?.type === "input_json_delta" && pending && typeof delta.partial_json === "string") {
761
+ pending.buf += delta.partial_json;
762
+ break;
763
+ }
743
764
  // thinking_delta: `delta.thinking`; text_delta: `delta.text`.
744
765
  const raw = delta?.type === "text_delta" ? delta.text
745
766
  : delta?.type === "thinking_delta" ? delta.thinking
@@ -750,6 +771,20 @@ export class Swarm {
750
771
  agent.lastText = t.slice(-80);
751
772
  }
752
773
  }
774
+ else if (ev.type === "content_block_stop") {
775
+ const pending = this.pendingTools.get(agent);
776
+ if (pending && !pending.logged) {
777
+ if (pending.buf) {
778
+ try {
779
+ pending.input = JSON.parse(pending.buf);
780
+ }
781
+ catch { }
782
+ }
783
+ this.logToolUse(agent, pending.name, pending.input);
784
+ pending.logged = true;
785
+ }
786
+ this.pendingTools.delete(agent);
787
+ }
753
788
  break;
754
789
  }
755
790
  case "result": {
@@ -0,0 +1,5 @@
1
+ export declare function setTranscriptRunDir(dir: string | undefined): void;
2
+ export declare function getTranscriptRunDir(): string | undefined;
3
+ export declare function transcriptPath(name: string): string | undefined;
4
+ /** Append a single event; silent on error (disk full, permission, etc.). */
5
+ export declare function writeTranscriptEvent(name: string, event: Record<string, unknown>): void;
@@ -0,0 +1,38 @@
1
+ import { appendFileSync, mkdirSync } from "fs";
2
+ import { dirname, join } from "path";
3
+ /**
4
+ * Crash-safe NDJSON transcripts for planner/steering queries.
5
+ *
6
+ * Each query writes to `<runDir>/transcripts/<name>.ndjson` -- one JSON object
7
+ * per line, so partial writes survive crashes. Multiple invocations of the same
8
+ * name append with a `session_start` marker separating them.
9
+ *
10
+ * Why NDJSON:
11
+ * - append-only → no read-modify-write race under parallel waves
12
+ * - one line per event → `tail -f` works; a killed process never leaves
13
+ * the file in an unparseable state
14
+ * - machine-readable → this assistant and future tools can `jq` through it
15
+ *
16
+ * Consumed by: planner-query.ts (stream_event, rate_limit_event, result, error).
17
+ */
18
+ let _runDir;
19
+ export function setTranscriptRunDir(dir) {
20
+ _runDir = dir;
21
+ }
22
+ export function getTranscriptRunDir() {
23
+ return _runDir;
24
+ }
25
+ export function transcriptPath(name) {
26
+ return _runDir ? join(_runDir, "transcripts", `${name}.ndjson`) : undefined;
27
+ }
28
+ /** Append a single event; silent on error (disk full, permission, etc.). */
29
+ export function writeTranscriptEvent(name, event) {
30
+ const path = transcriptPath(name);
31
+ if (!path)
32
+ return;
33
+ try {
34
+ mkdirSync(dirname(path), { recursive: true });
35
+ appendFileSync(path, JSON.stringify({ t: Date.now(), ...event }) + "\n", "utf-8");
36
+ }
37
+ catch { }
38
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-overnight",
3
- "version": "1.25.19",
3
+ "version": "1.25.20",
4
4
  "description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -17,7 +17,7 @@
17
17
  "dependencies": {
18
18
  "@anthropic-ai/claude-agent-sdk": "^0.2.92",
19
19
  "chalk": "^5.4.1",
20
- "cursor-composer-in-claude": "0.8.0",
20
+ "cursor-composer-in-claude": "0.9.0",
21
21
  "jsonwebtoken": "^9.0.2"
22
22
  },
23
23
  "devDependencies": {
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-overnight",
3
- "version": "1.25.19",
3
+ "version": "1.25.20",
4
4
  "description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
5
5
  "author": {
6
6
  "name": "Francesco Fornace"
@@ -11,7 +11,7 @@ description: >
11
11
 
12
12
  # What it is
13
13
 
14
- `claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, executor waves run them in parallel, and steering decides between more execution, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable -- nothing is lost.
14
+ `claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits verified by the worker next wave). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable -- nothing is lost.
15
15
 
16
16
  **Three-layer review system** runs on every wave:
17
17
  1. **Per-agent self-review** -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm -- no initial context bloat.
@@ -55,16 +55,20 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
55
55
 
56
56
  | File / dir | What it tells you |
57
57
  |----------------------|-----------------------------------------------------------------------------------|
58
- | `run.json` | Machine state: objective, model, budget, cost, waves done, branches, done flag. |
58
+ | `run.json` | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
59
59
  | `status.md` | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
60
60
  | `goal.md` | Evolving "north star" -- what the run currently thinks "amazing" means. |
61
+ | `themes.md` | The thinking-wave research angles picked for this objective (human-readable). |
61
62
  | `milestones/*.md` | Strategic snapshots archived ~every 5 waves. Long-term memory of the run. |
62
63
  | `designs/*.md` | Architect outputs from the thinking wave. Deleted once the objective is complete. |
64
+ | `tasks.json` | The execution plan written by the orchestrator. |
65
+ | `steering/wave-N-attempt-M.json` | Steering decision per wave: done flag, reasoning, status/goal updates. |
66
+ | `transcripts/*.ndjson` | Crash-safe NDJSON stream for every planner/steering query: `themes`, `orchestrate`, `plan`, `steer-wave-N-attempt-M`. Each line = one event (session_start, tool_use, text_delta, thinking_delta, rate_limit, result, error). Use `jq -c '.kind' <file>` to get a quick shape; read full objects to reconstruct what the planner was doing. Survives process crashes because writes are append-only. |
63
67
  | `sessions/wave-N.json` | Per-wave agent records: prompt, status, cost, files changed, branch, error. |
64
68
 
65
69
  The newest subfolder under `runs/` is the current/last run. A run that never reached "done" is **resumable** -- `run.json` will not be marked complete and `designs/` may still be present.
66
70
 
67
- To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands.
71
+ To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands. If the run died during planning (no `sessions/` yet), read `themes.md` + the newest `transcripts/*.ndjson` instead — they show exactly what the planner was doing when it crashed.
68
72
 
69
73
  **Durable run history (committed, survives cleanup):** `claude-overnight.log.md` at the repo root is updated on every run with a block per run ID -- original objective, start/finish times, cost, outcome, branch. If the user asks "what was my prompt" or "what did last night's run do" and `.claude-overnight/runs/` is empty, this file is the canonical recovery path.
70
74