claude-overnight 1.25.19 → 1.25.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +35 -15
- package/dist/_version.d.ts +1 -1
- package/dist/_version.js +1 -1
- package/dist/index.js +37 -10
- package/dist/planner-query.d.ts +2 -0
- package/dist/planner-query.js +115 -8
- package/dist/planner.d.ts +4 -4
- package/dist/planner.js +11 -11
- package/dist/run.js +1 -1
- package/dist/steering.d.ts +1 -1
- package/dist/steering.js +3 -3
- package/dist/swarm.d.ts +3 -0
- package/dist/swarm.js +38 -3
- package/dist/transcripts.d.ts +5 -0
- package/dist/transcripts.js +38 -0
- package/package.json +2 -2
- package/plugins/claude-overnight/.claude-plugin/plugin.json +1 -1
- package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +7 -3
package/README.md
CHANGED
|
@@ -4,14 +4,14 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
|
|
|
4
4
|
|
|
5
5
|
Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
|
|
6
6
|
|
|
7
|
-
Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Pair any planner (Opus, Sonnet) with any
|
|
7
|
+
Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits, verified by the worker next wave). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
|
|
8
8
|
|
|
9
9
|
## Run on Qwen 3.6 Plus
|
|
10
10
|
|
|
11
|
-
Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in
|
|
11
|
+
Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in worker that speaks the Anthropic Messages API -- same client, same flow, pennies per run.
|
|
12
12
|
|
|
13
13
|
1. **Get an API key.** Sign up at [Alibaba Cloud](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fmodelstudio.console.alibabacloud.com%2Fap-southeast-1%3Ftab%3Ddashboard%23%2Fapi-key&clearRedirectCookie=1) -- the link takes you straight to the API key dashboard.
|
|
14
|
-
2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the
|
|
14
|
+
2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the worker step, and fill in:
|
|
15
15
|
|
|
16
16
|
| Field | Value |
|
|
17
17
|
|---|---|
|
|
@@ -20,7 +20,7 @@ Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Al
|
|
|
20
20
|
| Model id | `qwen3.6-plus` |
|
|
21
21
|
| API key | your DashScope key |
|
|
22
22
|
|
|
23
|
-
3. That's it. Planner runs on Sonnet (or Opus),
|
|
23
|
+
3. That's it. Planner runs on Sonnet (or Opus), worker runs on Qwen.
|
|
24
24
|
|
|
25
25
|
Or set it via env directly:
|
|
26
26
|
|
|
@@ -33,7 +33,7 @@ claude-overnight
|
|
|
33
33
|
|
|
34
34
|
## Run via Cursor API Proxy
|
|
35
35
|
|
|
36
|
-
Use Cursor's model gateway as
|
|
36
|
+
Use Cursor's model gateway as a worker -- `auto` (delegates to best available), `composer`, or `composer-2` models. Runs locally through a proxy that speaks the Anthropic Messages API, so it's a drop-in replacement for any other provider.
|
|
37
37
|
|
|
38
38
|
### macOS: Cursor agent shell patch
|
|
39
39
|
|
|
@@ -130,7 +130,7 @@ claude-overnight
|
|
|
130
130
|
● Opus -- Opus 4.6 · Most capable
|
|
131
131
|
○ Sonnet -- Sonnet 4.6 · Best for everyday tasks
|
|
132
132
|
|
|
133
|
-
⑤
|
|
133
|
+
⑤ Worker model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
|
|
134
134
|
● Sonnet -- Sonnet 4.6 · Best for everyday tasks
|
|
135
135
|
○ Opus -- Opus 4.6 · Most capable
|
|
136
136
|
○ Other… · custom OpenAI/Anthropic-compatible endpoint
|
|
@@ -211,9 +211,15 @@ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever over
|
|
|
211
211
|
.claude-overnight/
|
|
212
212
|
runs/
|
|
213
213
|
2026-04-04T18-52-49/ ← run A (done, $200, 200 tasks)
|
|
214
|
-
run.json
|
|
215
|
-
|
|
216
|
-
|
|
214
|
+
run.json ← full resume state (models, budget, wave history)
|
|
215
|
+
status.md, goal.md, themes.md
|
|
216
|
+
designs/ ← per-focus research docs from the thinking wave
|
|
217
|
+
tasks.json ← the plan the swarm is executing
|
|
218
|
+
transcripts/ ← NDJSON per planner query: themes, orchestrate, steer-wave-N, ...
|
|
219
|
+
steering/ ← steering decisions per wave
|
|
220
|
+
milestones/, sessions/
|
|
221
|
+
2026-04-05T10-30-00/ ← run B (crashed mid-planning)
|
|
222
|
+
run.json, transcripts/themes.ndjson ← see exactly what the planner was doing
|
|
217
223
|
```
|
|
218
224
|
|
|
219
225
|
Any run that stops before the steering system declares the objective complete -- capped at usage limit, Ctrl+C, crash, rate limit timeout, steering failure -- is automatically resumable:
|
|
@@ -243,6 +249,20 @@ If the thinking phase succeeds but orchestration crashes, the next run detects t
|
|
|
243
249
|
|
|
244
250
|
**Knowledge carries forward** -- new runs inherit knowledge from completed previous runs. Thinking sessions and steering see what past runs built. Run 2 knows run 1 already built the auth system.
|
|
245
251
|
|
|
252
|
+
### Transcripts and streaming
|
|
253
|
+
|
|
254
|
+
Every planner/steering query streams through the Agent SDK with `includePartialMessages: true`, so tool calls, thinking, and text deltas are captured as they happen. Each query also appends an NDJSON transcript under `runs/<ts>/transcripts/<name>.ndjson` — so if the planner crashes mid-think you still have the forensic trail (prompt preview, every tool use, every text/thinking delta, rate-limit events, and the final result or error). `themes.md` is also written as a human-readable summary right after the thinking wave.
|
|
255
|
+
|
|
256
|
+
Not every provider delivers the same streaming granularity:
|
|
257
|
+
|
|
258
|
+
| Provider | Tool-use events | Thinking deltas | Text deltas |
|
|
259
|
+
| --- | --- | --- | --- |
|
|
260
|
+
| Anthropic (direct) | ✓ | ✓ | ✓ |
|
|
261
|
+
| Cursor proxy (`cursor-composer-in-claude`) | — | — | ✓ (final answer only) |
|
|
262
|
+
| Qwen / OpenRouter / custom Anthropic-compatible | depends on upstream | depends | usually ✓ |
|
|
263
|
+
|
|
264
|
+
When a provider doesn't stream partials (or the model is a reasoning model on the Cursor proxy — the proxy suppresses the thinking phase and only emits the final answer), the ticker shows elapsed time with no live text, then the completed result lands in one go. The UI, transcripts, and the resume flow all behave identically either way — streaming is used when available, never required.
|
|
265
|
+
|
|
246
266
|
Add `.claude-overnight/` to your `.gitignore` (with the trailing slash -- see below).
|
|
247
267
|
|
|
248
268
|
A separate, tiny `claude-overnight.log.md` is also written at the repo root on every run. It's human-readable, append-only, one block per run (objective, start/finish, cost, outcome, branch), and is designed to be **committed** -- so even after `.claude-overnight/` is cleaned up you can still recover which prompt produced which commits. Use `.claude-overnight/` (with trailing slash) in your gitignore so this file isn't matched by accident.
|
|
@@ -289,7 +309,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
|
|
|
289
309
|
|---|---|---|
|
|
290
310
|
| `--budget=N` | `10` | Total agent sessions |
|
|
291
311
|
| `--concurrency=N` | `5` | Parallel agents |
|
|
292
|
-
| `--model=NAME` | prompted | Worker model -- interactive picks planner +
|
|
312
|
+
| `--model=NAME` | prompted | Worker model -- interactive picks planner + worker separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
|
|
293
313
|
| `--usage-cap=N` | unlimited | Stop at N% utilization |
|
|
294
314
|
| `--allow-extra-usage` | off | Allow extra/overage usage (billed separately) |
|
|
295
315
|
| `--extra-usage-budget=N` | -- | Max $ for extra usage (implies --allow-extra-usage) |
|
|
@@ -313,12 +333,12 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
|
|
|
313
333
|
|
|
314
334
|
## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
|
|
315
335
|
|
|
316
|
-
Planner and
|
|
336
|
+
Planner, worker, and optional fast model are each picked separately -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work.
|
|
317
337
|
|
|
318
|
-
From the interactive picker, choose `Other…` on the planner or
|
|
338
|
+
From the interactive picker, choose `Other…` on the planner, worker, or fast step:
|
|
319
339
|
|
|
320
340
|
```
|
|
321
|
-
⑤
|
|
341
|
+
⑤ Worker model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
|
|
322
342
|
○ Sonnet
|
|
323
343
|
○ Opus
|
|
324
344
|
● Other…
|
|
@@ -333,9 +353,9 @@ From the interactive picker, choose `Other…` on the planner or executor step:
|
|
|
333
353
|
|
|
334
354
|
Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
|
|
335
355
|
|
|
336
|
-
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider,
|
|
356
|
+
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
|
|
337
357
|
|
|
338
|
-
**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗
|
|
358
|
+
**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.
|
|
339
359
|
|
|
340
360
|
**Resume.** Provider ids are persisted in `run.json` and rehydrated on resume. If you deleted a provider between runs, resume refuses to start and tells you exactly which id is missing.
|
|
341
361
|
|
package/dist/_version.d.ts
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
export declare const VERSION = "1.25.
|
|
1
|
+
export declare const VERSION = "1.25.20";
|
package/dist/_version.js
CHANGED
|
@@ -1,2 +1,2 @@
|
|
|
1
1
|
// Auto-generated by build — do not edit manually.
|
|
2
|
-
export const VERSION = "1.25.
|
|
2
|
+
export const VERSION = "1.25.20";
|
package/dist/index.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
|
-
import { readFileSync, existsSync, readdirSync, mkdirSync } from "fs";
|
|
2
|
+
import { readFileSync, existsSync, readdirSync, mkdirSync, writeFileSync } from "fs";
|
|
3
3
|
import { resolve, dirname, join } from "path";
|
|
4
4
|
import { fileURLToPath } from "url";
|
|
5
5
|
import chalk from "chalk";
|
|
@@ -9,6 +9,7 @@ import { Swarm } from "./swarm.js";
|
|
|
9
9
|
import { planTasks, refinePlan, identifyThemes, buildThinkingTasks, orchestrate, salvageFromFile } from "./planner.js";
|
|
10
10
|
import { modelDisplayName, formatContextWindow, DEFAULT_MODEL } from "./models.js";
|
|
11
11
|
import { setPlannerEnvResolver } from "./planner-query.js";
|
|
12
|
+
import { setTranscriptRunDir } from "./transcripts.js";
|
|
12
13
|
import { pickModel, loadProviders, preflightProvider, buildEnvResolver, healthCheckCursorProxy, PROXY_DEFAULT_URL, isCursorProxyProvider, readCursorProxyLogTail, ensureCursorProxyRunning, bundledComposerProxyShellCommand, warnMacCursorAgentShellPatchIfNeeded, hasCursorAgentToken, } from "./providers.js";
|
|
13
14
|
import { RunDisplay } from "./ui.js";
|
|
14
15
|
import { renderSummary } from "./render.js";
|
|
@@ -72,10 +73,17 @@ async function promptResumeOverrides(state, cliFlags, argv, noTTY, runDir) {
|
|
|
72
73
|
const extraStr = state.allowExtraUsage
|
|
73
74
|
? (state.extraUsageBudget ? `$${state.extraUsageBudget}` : "unlimited")
|
|
74
75
|
: "off";
|
|
76
|
+
const modelLine = (label, m) => m ? ` ${chalk.dim(label.padEnd(11))}${chalk.white(m)} ${chalk.dim(`(${formatContextWindow(m)} context)`)}` : null;
|
|
75
77
|
console.log();
|
|
76
78
|
console.log(` ${chalk.dim("Resume settings")}`);
|
|
77
79
|
console.log(` ${chalk.dim("─".repeat(40))}`);
|
|
78
|
-
|
|
80
|
+
const lines = [
|
|
81
|
+
modelLine("planner", state.plannerModel),
|
|
82
|
+
modelLine("worker", state.workerModel),
|
|
83
|
+
modelLine("fast", state.fastModel),
|
|
84
|
+
].filter(Boolean);
|
|
85
|
+
for (const l of lines)
|
|
86
|
+
console.log(l);
|
|
79
87
|
console.log(` ${chalk.dim("remaining ")}${chalk.white(String(remaining))} ${chalk.dim("sessions")}`);
|
|
80
88
|
console.log(` ${chalk.dim("concur ")}${chalk.white(String(state.concurrency))}`);
|
|
81
89
|
console.log(` ${chalk.dim("usage cap ")}${chalk.white(capStr)}`);
|
|
@@ -185,7 +193,7 @@ async function main() {
|
|
|
185
193
|
--dry-run Show planned tasks without running them
|
|
186
194
|
--budget=N Target number of agent runs ${chalk.dim("(default: 10)")}
|
|
187
195
|
--concurrency=N Max parallel agents ${chalk.dim("(default: 5)")}
|
|
188
|
-
--model=NAME Worker model override ${chalk.dim("(interactive mode picks planner +
|
|
196
|
+
--model=NAME Worker model override ${chalk.dim("(interactive mode picks planner + worker separately -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
|
|
189
197
|
--fast-model=NAME Fast model for quick tasks ${chalk.dim("(optional -- checked by worker model in next wave)")}
|
|
190
198
|
--usage-cap=N Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
|
|
191
199
|
--allow-extra-usage Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
|
|
@@ -472,8 +480,11 @@ async function main() {
|
|
|
472
480
|
const flexNote = `This is wave 1 of an adaptive multi-wave run (total budget: ${remainingBudget}). Plan the highest-impact foundational work first. Future waves will iterate based on what's learned.`;
|
|
473
481
|
console.log(chalk.cyan(`\n ◆ Re-orchestrating plan from existing designs...\n`));
|
|
474
482
|
process.stdout.write("\x1B[?25l");
|
|
483
|
+
// Route transcripts into the resumed run so this call's events
|
|
484
|
+
// land alongside the prior run's planning trail.
|
|
485
|
+
setTranscriptRunDir(resumeRunDir);
|
|
475
486
|
try {
|
|
476
|
-
const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"));
|
|
487
|
+
const orchTasks = await orchestrate(resumeState.objective, designs, cwd, resumeState.plannerModel, resumeState.workerModel, resumeState.permissionMode, orchBudget, resumeState.concurrency, makeProgressLog(), flexNote, join(resumeRunDir, "tasks.json"), "orchestrate-resume");
|
|
477
488
|
resumeState.currentTasks = orchTasks;
|
|
478
489
|
process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${orchTasks.length} tasks`)}\n`);
|
|
479
490
|
}
|
|
@@ -588,7 +599,7 @@ async function main() {
|
|
|
588
599
|
const plannerPick = await pickModel(`${chalk.cyan("④")} Planner model ${chalk.dim("(thinking, steering -- use your strongest)")}:`, models);
|
|
589
600
|
plannerModel = plannerPick.model;
|
|
590
601
|
plannerProvider = plannerPick.provider;
|
|
591
|
-
const workerPick = await pickModel(`${chalk.cyan("⑤")}
|
|
602
|
+
const workerPick = await pickModel(`${chalk.cyan("⑤")} Worker model ${chalk.dim("(what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…)")}:`, models);
|
|
592
603
|
workerModel = workerPick.model;
|
|
593
604
|
workerProvider = workerPick.provider;
|
|
594
605
|
// ⑤b Optional fast model for quick tasks that will be verified
|
|
@@ -782,7 +793,7 @@ async function main() {
|
|
|
782
793
|
const seen = new Set();
|
|
783
794
|
const all = [
|
|
784
795
|
["planner", plannerProvider],
|
|
785
|
-
["
|
|
796
|
+
["worker", workerProvider],
|
|
786
797
|
["fast", fastProvider],
|
|
787
798
|
];
|
|
788
799
|
const pending = [];
|
|
@@ -855,6 +866,10 @@ async function main() {
|
|
|
855
866
|
const runDir = resuming && resumeRunDir ? resumeRunDir : (orphanedDir ?? createRunDir(rootDir));
|
|
856
867
|
if (resuming && resumeRunDir)
|
|
857
868
|
updateLatestSymlink(rootDir, resumeRunDir);
|
|
869
|
+
// Route all planner/steering stream events to <runDir>/transcripts/*.ndjson
|
|
870
|
+
// so crashes during planning leave a forensic trail and resumes can inspect
|
|
871
|
+
// what the planner was doing mid-flight. See src/transcripts.ts.
|
|
872
|
+
setTranscriptRunDir(runDir);
|
|
858
873
|
const previousKnowledge = readPreviousRunKnowledge(rootDir);
|
|
859
874
|
const needsPlan = tasks.length === 0 && (!resuming || replanFromScratch);
|
|
860
875
|
const designDir = join(runDir, "designs");
|
|
@@ -867,8 +882,9 @@ async function main() {
|
|
|
867
882
|
saveRunState(runDir, {
|
|
868
883
|
id: runDir.split(/[/\\]/).pop() ?? "",
|
|
869
884
|
objective, budget: budget ?? 10, remaining: budget ?? 10,
|
|
870
|
-
workerModel, plannerModel,
|
|
885
|
+
workerModel, plannerModel, fastModel,
|
|
871
886
|
workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
|
|
887
|
+
fastProviderId: fastProvider?.id,
|
|
872
888
|
concurrency, permissionMode,
|
|
873
889
|
usageCap, allowExtraUsage, extraUsageBudget,
|
|
874
890
|
flex, useWorktrees, mergeStrategy,
|
|
@@ -894,7 +910,16 @@ async function main() {
|
|
|
894
910
|
const thinkingCount = useThinking ? Math.min(Math.max(concurrency, Math.ceil((budget ?? 10) * 0.005)), 10) : 0;
|
|
895
911
|
try {
|
|
896
912
|
if (useThinking) {
|
|
897
|
-
|
|
913
|
+
// Persist themes as a Markdown doc so a planning-phase crash leaves a
|
|
914
|
+
// readable record (and a future resume can skip identifyThemes).
|
|
915
|
+
const saveThemesMd = (list) => {
|
|
916
|
+
try {
|
|
917
|
+
writeFileSync(join(runDir, "themes.md"), `# Themes\n\n**Objective:** ${objective}\n\n${list.map((t, i) => `${i + 1}. ${t}`).join("\n")}\n`, "utf-8");
|
|
918
|
+
}
|
|
919
|
+
catch { }
|
|
920
|
+
};
|
|
921
|
+
let themes = await identifyThemes(objective, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes");
|
|
922
|
+
saveThemesMd(themes);
|
|
898
923
|
process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
|
|
899
924
|
planRestore();
|
|
900
925
|
let reviewing = true;
|
|
@@ -913,7 +938,8 @@ async function main() {
|
|
|
913
938
|
continue;
|
|
914
939
|
process.stdout.write("\x1B[?25l");
|
|
915
940
|
try {
|
|
916
|
-
themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog());
|
|
941
|
+
themes = await identifyThemes(`${objective}\n\nUser feedback: ${feedback}`, thinkingCount, cwd, plannerModel, permissionMode, makeProgressLog(), "themes-refine");
|
|
942
|
+
saveThemesMd(themes);
|
|
917
943
|
process.stdout.write(`\x1B[2K\r ${chalk.green(`✓ ${themes.length} themes`)}\n\n`);
|
|
918
944
|
}
|
|
919
945
|
catch (err) {
|
|
@@ -990,8 +1016,9 @@ async function main() {
|
|
|
990
1016
|
saveRunState(runDir, {
|
|
991
1017
|
id: runDir.split(/[/\\]/).pop() ?? "",
|
|
992
1018
|
objective: objective, budget: budget ?? 10, remaining: (budget ?? 10) - thinkingUsed,
|
|
993
|
-
workerModel, plannerModel,
|
|
1019
|
+
workerModel, plannerModel, fastModel,
|
|
994
1020
|
workerProviderId: workerProvider?.id, plannerProviderId: plannerProvider?.id,
|
|
1021
|
+
fastProviderId: fastProvider?.id,
|
|
995
1022
|
concurrency, permissionMode,
|
|
996
1023
|
usageCap, allowExtraUsage, extraUsageBudget,
|
|
997
1024
|
flex, useWorktrees, mergeStrategy,
|
package/dist/planner-query.d.ts
CHANGED
|
@@ -23,6 +23,8 @@ export interface PlannerOpts {
|
|
|
23
23
|
type: "json_schema";
|
|
24
24
|
schema: Record<string, unknown>;
|
|
25
25
|
};
|
|
26
|
+
/** When set, stream events are appended to <runDir>/transcripts/<name>.ndjson */
|
|
27
|
+
transcriptName?: string;
|
|
26
28
|
}
|
|
27
29
|
export declare function setPlannerEnvResolver(fn: ((model?: string) => Record<string, string> | undefined) | undefined): void;
|
|
28
30
|
export declare function getTotalPlannerCost(): number;
|
package/dist/planner-query.js
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
import { query } from "@anthropic-ai/claude-agent-sdk";
|
|
2
2
|
import { readFileSync } from "fs";
|
|
3
3
|
import { NudgeError } from "./types.js";
|
|
4
|
+
import { writeTranscriptEvent } from "./transcripts.js";
|
|
4
5
|
// ── Shared env resolver (set once at run start, used by every planner query) ──
|
|
5
6
|
//
|
|
6
7
|
// Swarm and planner calls share a model→env map so a custom provider configured
|
|
@@ -63,6 +64,22 @@ async function throttlePlanner(onLog, aborted) {
|
|
|
63
64
|
}
|
|
64
65
|
// Exhausted backoffs — proceed anyway, the retry loop will catch a rejection.
|
|
65
66
|
}
|
|
67
|
+
/**
|
|
68
|
+
* Pick a short, human-readable target for a tool invocation (Read/Grep/Bash/…).
|
|
69
|
+
* Prefers explicit file paths; falls back to the first few tokens of a shell
|
|
70
|
+
* command. Returns `""` when the input has no useful identifier.
|
|
71
|
+
*/
|
|
72
|
+
function extractToolTarget(input) {
|
|
73
|
+
if (!input)
|
|
74
|
+
return "";
|
|
75
|
+
const p = input.path ?? input.file_path ?? input.pattern;
|
|
76
|
+
if (typeof p === "string" && p)
|
|
77
|
+
return p;
|
|
78
|
+
if (typeof input.command === "string" && input.command) {
|
|
79
|
+
return input.command.split(" ").slice(0, 3).join(" ");
|
|
80
|
+
}
|
|
81
|
+
return "";
|
|
82
|
+
}
|
|
66
83
|
// ── Query execution ──
|
|
67
84
|
const NUDGE_MS = 15 * 60 * 1000;
|
|
68
85
|
const HARD_TIMEOUT_MS = 30 * 60 * 1000;
|
|
@@ -110,6 +127,17 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
110
127
|
const startedAt = Date.now();
|
|
111
128
|
const isResume = !!opts.resumeSessionId;
|
|
112
129
|
const envOverride = _envResolver?.(opts.model);
|
|
130
|
+
const tname = opts.transcriptName;
|
|
131
|
+
if (tname) {
|
|
132
|
+
writeTranscriptEvent(tname, {
|
|
133
|
+
kind: "session_start",
|
|
134
|
+
model: opts.model,
|
|
135
|
+
isResume,
|
|
136
|
+
resumeSessionId: opts.resumeSessionId,
|
|
137
|
+
promptPreview: prompt.slice(0, 2000),
|
|
138
|
+
promptBytes: prompt.length,
|
|
139
|
+
});
|
|
140
|
+
}
|
|
113
141
|
const pq = query({
|
|
114
142
|
prompt,
|
|
115
143
|
options: {
|
|
@@ -167,6 +195,18 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
167
195
|
};
|
|
168
196
|
timer = setTimeout(check, timeoutMs);
|
|
169
197
|
});
|
|
198
|
+
// Tool-use blocks can arrive in two shapes:
|
|
199
|
+
// (a) content_block_start carries the full `input` (native Anthropic non-partial)
|
|
200
|
+
// (b) content_block_start carries `input: {}` and the JSON is streamed via
|
|
201
|
+
// input_json_delta frames (Anthropic streaming spec, cursor-composer-in-claude v0.9+).
|
|
202
|
+
// Track the open tool block so we can re-log with the enriched target once
|
|
203
|
+
// the input arrives, and write a complete transcript entry on block stop.
|
|
204
|
+
let pendingTool = null;
|
|
205
|
+
const logTool = (name, input) => {
|
|
206
|
+
const target = extractToolTarget(input);
|
|
207
|
+
lastLogText = target ? `${name} ${target}` : name;
|
|
208
|
+
onLog(target ? `${name} → ${target}` : name, "event");
|
|
209
|
+
};
|
|
170
210
|
const consume = async () => {
|
|
171
211
|
for await (const msg of pq) {
|
|
172
212
|
lastActivity = Date.now();
|
|
@@ -178,21 +218,34 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
178
218
|
const cb = ev.content_block;
|
|
179
219
|
if (cb?.type === "tool_use") {
|
|
180
220
|
toolCount++;
|
|
181
|
-
const
|
|
182
|
-
const
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
:
|
|
187
|
-
|
|
188
|
-
|
|
221
|
+
const input = (cb.input ?? {});
|
|
222
|
+
const hasInput = Object.keys(input).length > 0;
|
|
223
|
+
pendingTool = {
|
|
224
|
+
index: ev.index ?? 0,
|
|
225
|
+
name: cb.name,
|
|
226
|
+
id: cb.id,
|
|
227
|
+
input,
|
|
228
|
+
buf: "",
|
|
229
|
+
logged: hasInput,
|
|
230
|
+
};
|
|
231
|
+
if (hasInput) {
|
|
232
|
+
logTool(cb.name, input);
|
|
233
|
+
if (tname)
|
|
234
|
+
writeTranscriptEvent(tname, { kind: "tool_use", tool: cb.name, input });
|
|
235
|
+
}
|
|
189
236
|
}
|
|
190
237
|
else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
|
|
191
238
|
lastLogText = "thinking…";
|
|
239
|
+
if (tname)
|
|
240
|
+
writeTranscriptEvent(tname, { kind: "thinking_start" });
|
|
192
241
|
}
|
|
193
242
|
}
|
|
194
243
|
if (ev?.type === "content_block_delta") {
|
|
195
244
|
const delta = ev.delta;
|
|
245
|
+
if (delta?.type === "input_json_delta" && pendingTool && typeof delta.partial_json === "string") {
|
|
246
|
+
pendingTool.buf += delta.partial_json;
|
|
247
|
+
continue;
|
|
248
|
+
}
|
|
196
249
|
// thinking_delta carries reasoning text under `delta.thinking`;
|
|
197
250
|
// text_delta carries final-answer text under `delta.text`.
|
|
198
251
|
const raw = delta?.type === "text_delta" ? delta.text
|
|
@@ -202,7 +255,23 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
202
255
|
const snippet = raw.trim().replace(/[{}"\\,[\]]+/g, " ").replace(/\s+/g, " ").trim();
|
|
203
256
|
if (snippet.length > 5)
|
|
204
257
|
lastLogText = snippet.slice(-60);
|
|
258
|
+
if (tname)
|
|
259
|
+
writeTranscriptEvent(tname, { kind: delta.type, text: raw });
|
|
260
|
+
}
|
|
261
|
+
}
|
|
262
|
+
if (ev?.type === "content_block_stop" && pendingTool) {
|
|
263
|
+
if (!pendingTool.logged && pendingTool.buf) {
|
|
264
|
+
try {
|
|
265
|
+
pendingTool.input = JSON.parse(pendingTool.buf);
|
|
266
|
+
}
|
|
267
|
+
catch { }
|
|
268
|
+
}
|
|
269
|
+
if (!pendingTool.logged) {
|
|
270
|
+
logTool(pendingTool.name, pendingTool.input);
|
|
271
|
+
if (tname)
|
|
272
|
+
writeTranscriptEvent(tname, { kind: "tool_use", tool: pendingTool.name, input: pendingTool.input });
|
|
205
273
|
}
|
|
274
|
+
pendingTool = null;
|
|
206
275
|
}
|
|
207
276
|
}
|
|
208
277
|
if (msg.type === "rate_limit_event") {
|
|
@@ -222,6 +291,15 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
222
291
|
resetsAt: info.resetsAt,
|
|
223
292
|
});
|
|
224
293
|
}
|
|
294
|
+
if (tname)
|
|
295
|
+
writeTranscriptEvent(tname, {
|
|
296
|
+
kind: "rate_limit",
|
|
297
|
+
utilization: info.utilization ?? 0,
|
|
298
|
+
status: info.status,
|
|
299
|
+
rateLimitType: info.rateLimitType,
|
|
300
|
+
resetsAt: info.resetsAt,
|
|
301
|
+
isUsingOverage: !!info.isUsingOverage,
|
|
302
|
+
});
|
|
225
303
|
}
|
|
226
304
|
}
|
|
227
305
|
if (msg.type === "result") {
|
|
@@ -234,8 +312,27 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
234
312
|
if (msg.subtype === "success") {
|
|
235
313
|
structuredOutput = r.structured_output;
|
|
236
314
|
resultText = r.result || "";
|
|
315
|
+
if (tname)
|
|
316
|
+
writeTranscriptEvent(tname, {
|
|
317
|
+
kind: "result",
|
|
318
|
+
subtype: "success",
|
|
319
|
+
costUsd,
|
|
320
|
+
durationMs: Date.now() - startedAt,
|
|
321
|
+
toolCount,
|
|
322
|
+
resultPreview: typeof resultText === "string" ? resultText.slice(0, 4000) : undefined,
|
|
323
|
+
hasStructuredOutput: structuredOutput != null,
|
|
324
|
+
});
|
|
237
325
|
}
|
|
238
326
|
else {
|
|
327
|
+
if (tname)
|
|
328
|
+
writeTranscriptEvent(tname, {
|
|
329
|
+
kind: "result",
|
|
330
|
+
subtype: msg.subtype,
|
|
331
|
+
costUsd,
|
|
332
|
+
durationMs: Date.now() - startedAt,
|
|
333
|
+
toolCount,
|
|
334
|
+
error: r.result,
|
|
335
|
+
});
|
|
239
336
|
throw new Error(`Planner failed: ${r.result || msg.subtype}`);
|
|
240
337
|
}
|
|
241
338
|
}
|
|
@@ -244,6 +341,16 @@ async function runPlannerQueryOnce(prompt, opts, onLog) {
|
|
|
244
341
|
try {
|
|
245
342
|
await Promise.race([consume(), watchdog]);
|
|
246
343
|
}
|
|
344
|
+
catch (err) {
|
|
345
|
+
if (tname)
|
|
346
|
+
writeTranscriptEvent(tname, {
|
|
347
|
+
kind: "error",
|
|
348
|
+
message: err instanceof Error ? err.message : String(err),
|
|
349
|
+
durationMs: Date.now() - startedAt,
|
|
350
|
+
toolCount,
|
|
351
|
+
});
|
|
352
|
+
throw err;
|
|
353
|
+
}
|
|
247
354
|
finally {
|
|
248
355
|
clearTimeout(timer);
|
|
249
356
|
clearInterval(ticker);
|
package/dist/planner.d.ts
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
import type { Task, PermMode } from "./types.js";
|
|
2
2
|
export declare function salvageFromFile(outFile: string | undefined, budget: number | undefined, onLog: (text: string, kind?: "status" | "event") => void, why: string): Task[] | null;
|
|
3
3
|
export declare const DESIGN_THINKING = "\nHOW TO THINK ABOUT EVERY TASK:\n\nStart from the user's job. What is someone hiring this product to do? \"I need to send money abroad cheaply\" -- not \"I need a currency conversion API.\" Every decision -- what to build, how fast it needs to respond, what happens on error -- flows from the job.\n\nThe experience IS the product. A 200ms server response is not a \"performance metric\" -- it's the difference between an app that feels alive and one that feels broken. A loading state is not \"polish\" -- it's the user knowing the app heard them. An error message is not \"error handling\" -- it's the app being honest. There is no line between backend and UX. The server, the API, the database query, the render -- they're all one experience the user either trusts or doesn't.\n\nBuild the core, verify it works, learn, iterate. Don't plan 20 features and build them all. Build the ONE thing that matters most, run it, see if it actually works from a user's chair. What you learn from seeing it run will change what you build next. Each wave should make what exists better before adding what doesn't exist yet.\n\nConsistency is what makes complex things feel simple. One design system, rigid rules, no exceptions. This is how Revolut ships a super-app with 30+ features that doesn't feel like chaos.\n";
|
|
4
|
-
export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
|
|
5
|
-
export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void): Promise<string[]>;
|
|
4
|
+
export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
|
|
5
|
+
export declare function identifyThemes(objective: string, count: number, cwd: string, model: string, permissionMode: PermMode, onLog?: (text: string) => void, transcriptName?: string): Promise<string[]>;
|
|
6
6
|
export declare function buildThinkingTasks(objective: string, themes: string[], designDir: string, plannerModel: string, previousKnowledge?: string): Task[];
|
|
7
|
-
export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string): Promise<Task[]>;
|
|
8
|
-
export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void): Promise<Task[]>;
|
|
7
|
+
export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string, outFile?: string, transcriptName?: string): Promise<Task[]>;
|
|
8
|
+
export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, transcriptName?: string): Promise<Task[]>;
|
package/dist/planner.js
CHANGED
|
@@ -152,13 +152,13 @@ Respond with ONLY a JSON object (no markdown fences):
|
|
|
152
152
|
}`;
|
|
153
153
|
}
|
|
154
154
|
// ── Planning functions ──
|
|
155
|
-
export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
|
|
155
|
+
export async function planTasks(objective, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "plan") {
|
|
156
156
|
onLog("Analyzing codebase...");
|
|
157
157
|
const prompt = plannerPrompt(objective, workerModel, budget, concurrency, flexNote);
|
|
158
158
|
const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
|
|
159
159
|
let resultText;
|
|
160
160
|
try {
|
|
161
|
-
resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
161
|
+
resultText = await runPlannerQuery(prompt + fileInstruction, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
|
|
162
162
|
}
|
|
163
163
|
catch (err) {
|
|
164
164
|
const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
|
|
@@ -168,7 +168,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
|
|
|
168
168
|
}
|
|
169
169
|
const parsed = await extractTaskJson(resultText, async () => {
|
|
170
170
|
onLog("Retrying...");
|
|
171
|
-
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
171
|
+
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
|
|
172
172
|
}, onLog, outFile);
|
|
173
173
|
let tasks = (parsed.tasks || []).map((t, i) => ({
|
|
174
174
|
id: String(i), prompt: typeof t === "string" ? t : t.prompt,
|
|
@@ -179,7 +179,7 @@ export async function planTasks(objective, cwd, plannerModel, workerModel, permi
|
|
|
179
179
|
onLog(`${tasks.length} tasks`);
|
|
180
180
|
return tasks;
|
|
181
181
|
}
|
|
182
|
-
export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }) {
|
|
182
|
+
export async function identifyThemes(objective, count, cwd, model, permissionMode, onLog = () => { }, transcriptName = "themes") {
|
|
183
183
|
const resultText = await runPlannerQuery(`You are picking ${count} research angles for architects who will deeply explore a codebase next.
|
|
184
184
|
|
|
185
185
|
First do a BRIEF recon (3-6 tool calls max, don't go deep): read package.json and README if present, glob the top-level directory, peek at one or two config files that reveal the stack. You are learning what this codebase actually IS -- not solving anything.
|
|
@@ -188,7 +188,7 @@ Then pick ${count} angles that carve up THIS specific codebase orthogonally. Pre
|
|
|
188
188
|
|
|
189
189
|
Objective: ${objective}
|
|
190
190
|
|
|
191
|
-
Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA }, onLog);
|
|
191
|
+
Return ONLY a JSON object: {"themes": ["angle description", ...]}`, { cwd, model, permissionMode, outputFormat: THEMES_SCHEMA, transcriptName }, onLog);
|
|
192
192
|
const parsed = attemptJsonParse(resultText);
|
|
193
193
|
if (parsed?.themes && Array.isArray(parsed.themes))
|
|
194
194
|
return parsed.themes.slice(0, count);
|
|
@@ -229,7 +229,7 @@ Be thorough -- your findings drive the execution plan.`,
|
|
|
229
229
|
model: plannerModel,
|
|
230
230
|
}));
|
|
231
231
|
}
|
|
232
|
-
export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile) {
|
|
232
|
+
export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote, outFile, transcriptName = "orchestrate") {
|
|
233
233
|
const constraint = contextConstraintNote(workerModel);
|
|
234
234
|
const flexLine = flexNote ? `\n\n${flexNote}` : "";
|
|
235
235
|
const fileInstruction = outFile ? `\n\nAFTER generating the JSON, also write it to ${outFile} using the Write tool.` : "";
|
|
@@ -259,7 +259,7 @@ Respond with ONLY a JSON object (no markdown fences):
|
|
|
259
259
|
onLog("Synthesizing...");
|
|
260
260
|
let resultText;
|
|
261
261
|
try {
|
|
262
|
-
resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
262
|
+
resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
|
|
263
263
|
}
|
|
264
264
|
catch (err) {
|
|
265
265
|
const salvaged = salvageFromFile(outFile, budget, onLog, err?.message ?? String(err));
|
|
@@ -269,7 +269,7 @@ Respond with ONLY a JSON object (no markdown fences):
|
|
|
269
269
|
}
|
|
270
270
|
const parsed = await extractTaskJson(resultText, async () => {
|
|
271
271
|
onLog("Retrying...");
|
|
272
|
-
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
272
|
+
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
|
|
273
273
|
}, onLog, outFile);
|
|
274
274
|
let tasks = (parsed.tasks || []).map((t, i) => ({
|
|
275
275
|
id: String(i), prompt: typeof t === "string" ? t : t.prompt,
|
|
@@ -280,7 +280,7 @@ Respond with ONLY a JSON object (no markdown fences):
|
|
|
280
280
|
onLog(`${tasks.length} tasks`);
|
|
281
281
|
return tasks;
|
|
282
282
|
}
|
|
283
|
-
export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog) {
|
|
283
|
+
export async function refinePlan(objective, previousTasks, feedback, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, transcriptName = "refine") {
|
|
284
284
|
onLog("Refining plan...");
|
|
285
285
|
const prev = previousTasks.map((t, i) => `${i + 1}. ${t.prompt}`).join("\n");
|
|
286
286
|
const constraint = contextConstraintNote(workerModel);
|
|
@@ -303,10 +303,10 @@ ${scaleNote} ${concurrency} agents run in parallel. Update the plan accordingly.
|
|
|
303
303
|
|
|
304
304
|
Respond with ONLY a JSON object (no markdown):
|
|
305
305
|
{"tasks":[{"prompt":"..."}]}`;
|
|
306
|
-
const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
306
|
+
const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName }, onLog);
|
|
307
307
|
const parsed = await extractTaskJson(resultText, async () => {
|
|
308
308
|
onLog("Retrying...");
|
|
309
|
-
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA }, onLog);
|
|
309
|
+
return runPlannerQuery(`Your previous response was not valid JSON. Respond with ONLY a JSON object {"tasks":[{"prompt":"..."}]}.\n\n${prompt}`, { cwd, model: plannerModel, permissionMode, outputFormat: TASKS_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
|
|
310
310
|
}, onLog);
|
|
311
311
|
let tasks = (parsed.tasks || []).map((t, i) => ({
|
|
312
312
|
id: String(i), prompt: typeof t === "string" ? t : t.prompt,
|
package/dist/run.js
CHANGED
|
@@ -272,7 +272,7 @@ export async function executeRun(cfg) {
|
|
|
272
272
|
const appliedGuidance = memory.userGuidance;
|
|
273
273
|
if (appliedGuidance)
|
|
274
274
|
display.appendSteeringEvent(`User directives applied: ${appliedGuidance.slice(0, 80)}`);
|
|
275
|
-
const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory);
|
|
275
|
+
const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, steeringLog, memory, `steer-wave-${waveNum}-attempt-${steerAttempts}`);
|
|
276
276
|
accCost += getTotalPlannerCost() - plannerCostBefore;
|
|
277
277
|
syncRunInfo();
|
|
278
278
|
if (steer.statusUpdate)
|
package/dist/steering.d.ts
CHANGED
|
@@ -1,3 +1,3 @@
|
|
|
1
1
|
import type { PermMode, SteerResult, RunMemory, WaveSummary } from "./types.js";
|
|
2
2
|
import { type PlannerLog } from "./planner-query.js";
|
|
3
|
-
export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory): Promise<SteerResult>;
|
|
3
|
+
export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory, transcriptName?: string): Promise<SteerResult>;
|
package/dist/steering.js
CHANGED
|
@@ -23,7 +23,7 @@ const STEER_SCHEMA = {
|
|
|
23
23
|
required: ["done", "tasks", "reasoning", "statusUpdate", "estimatedSessionsRemaining"],
|
|
24
24
|
},
|
|
25
25
|
};
|
|
26
|
-
export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory) {
|
|
26
|
+
export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
|
|
27
27
|
const constraint = contextConstraintNote(workerModel);
|
|
28
28
|
const recentWaves = history.slice(-3);
|
|
29
29
|
const recentText = recentWaves.length > 0 ? recentWaves.map(w => {
|
|
@@ -114,14 +114,14 @@ Set "noWorktree": true for verify/user-test tasks -- they need the real project
|
|
|
114
114
|
If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSessionsRemaining": 0, "tasks": []}`;
|
|
115
115
|
onLog("Assessing...", "status");
|
|
116
116
|
onLog(`Reading codebase -- wave ${history.length + 1}`, "event");
|
|
117
|
-
const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
|
|
117
|
+
const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName }, onLog);
|
|
118
118
|
const parsed = await (async () => {
|
|
119
119
|
const first = attemptJsonParse(resultText);
|
|
120
120
|
if (first)
|
|
121
121
|
return first;
|
|
122
122
|
onLog(`Steering parse failed (${resultText.length} chars). Asking model to fix...`, "event");
|
|
123
123
|
const snippet = resultText.length > 2000 ? resultText.slice(0, 1000) + "\n...\n" + resultText.slice(-800) : resultText;
|
|
124
|
-
const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA }, onLog);
|
|
124
|
+
const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: `${transcriptName}-retry` }, onLog);
|
|
125
125
|
const retryParsed = attemptJsonParse(retryText);
|
|
126
126
|
if (retryParsed)
|
|
127
127
|
return retryParsed;
|
package/dist/swarm.d.ts
CHANGED
|
@@ -67,6 +67,7 @@ export declare class Swarm {
|
|
|
67
67
|
private worktreeBase?;
|
|
68
68
|
private activeQueries;
|
|
69
69
|
private cleanedUp;
|
|
70
|
+
private pendingTools;
|
|
70
71
|
logFile?: string;
|
|
71
72
|
readonly model: string | undefined;
|
|
72
73
|
usageCap: number | undefined;
|
|
@@ -116,5 +117,7 @@ export declare class Swarm {
|
|
|
116
117
|
private windowRejectedReset;
|
|
117
118
|
private runAgent;
|
|
118
119
|
private agentSummary;
|
|
120
|
+
/** Log a tool invocation with a short target extracted from its input. */
|
|
121
|
+
private logToolUse;
|
|
119
122
|
private handleMsg;
|
|
120
123
|
}
|
package/dist/swarm.js
CHANGED
|
@@ -72,6 +72,10 @@ export class Swarm {
|
|
|
72
72
|
worktreeBase;
|
|
73
73
|
activeQueries = new Set();
|
|
74
74
|
cleanedUp = false;
|
|
75
|
+
// Per-agent open tool_use block: cursor-composer-in-claude v0.9 opens the block
|
|
76
|
+
// with empty `input` and streams the real payload via `input_json_delta`, so we
|
|
77
|
+
// need to wait for content_block_stop before we can log the file/path target.
|
|
78
|
+
pendingTools = new WeakMap();
|
|
75
79
|
logFile;
|
|
76
80
|
model;
|
|
77
81
|
usageCap;
|
|
@@ -700,6 +704,16 @@ export class Swarm {
|
|
|
700
704
|
return `Agent ${agent.id} ${verb}: ${m}m ${s}s, ${agent.toolCalls} tools${files}`;
|
|
701
705
|
}
|
|
702
706
|
// ── Message handler ──
|
|
707
|
+
/** Log a tool invocation with a short target extracted from its input. */
|
|
708
|
+
logToolUse(agent, name, input) {
|
|
709
|
+
const p = input.path ?? input.file_path ?? input.pattern;
|
|
710
|
+
const target = typeof p === "string" && p
|
|
711
|
+
? p
|
|
712
|
+
: typeof input.command === "string" && input.command
|
|
713
|
+
? input.command.split(" ").slice(0, 3).join(" ")
|
|
714
|
+
: "";
|
|
715
|
+
this.log(agent.id, target ? `${name} \u2192 ${target}` : name);
|
|
716
|
+
}
|
|
703
717
|
handleMsg(agent, msg) {
|
|
704
718
|
// Any message that isn't a rate-limit event counts as real progress and
|
|
705
719
|
// resets the stall watchdog + clears the per-agent blocked flag.
|
|
@@ -730,9 +744,11 @@ export class Swarm {
|
|
|
730
744
|
if (cb?.type === "tool_use") {
|
|
731
745
|
agent.currentTool = cb.name;
|
|
732
746
|
agent.toolCalls++;
|
|
733
|
-
const input = cb.input;
|
|
734
|
-
const
|
|
735
|
-
this.
|
|
747
|
+
const input = (cb.input ?? {});
|
|
748
|
+
const hasInput = Object.keys(input).length > 0;
|
|
749
|
+
this.pendingTools.set(agent, { name: cb.name, input, buf: "", logged: hasInput });
|
|
750
|
+
if (hasInput)
|
|
751
|
+
this.logToolUse(agent, cb.name, input);
|
|
736
752
|
}
|
|
737
753
|
else if (cb?.type === "thinking" || cb?.type === "redacted_thinking") {
|
|
738
754
|
agent.lastText = "thinking…";
|
|
@@ -740,6 +756,11 @@ export class Swarm {
|
|
|
740
756
|
}
|
|
741
757
|
else if (ev.type === "content_block_delta") {
|
|
742
758
|
const delta = ev.delta;
|
|
759
|
+
const pending = this.pendingTools.get(agent);
|
|
760
|
+
if (delta?.type === "input_json_delta" && pending && typeof delta.partial_json === "string") {
|
|
761
|
+
pending.buf += delta.partial_json;
|
|
762
|
+
break;
|
|
763
|
+
}
|
|
743
764
|
// thinking_delta: `delta.thinking`; text_delta: `delta.text`.
|
|
744
765
|
const raw = delta?.type === "text_delta" ? delta.text
|
|
745
766
|
: delta?.type === "thinking_delta" ? delta.thinking
|
|
@@ -750,6 +771,20 @@ export class Swarm {
|
|
|
750
771
|
agent.lastText = t.slice(-80);
|
|
751
772
|
}
|
|
752
773
|
}
|
|
774
|
+
else if (ev.type === "content_block_stop") {
|
|
775
|
+
const pending = this.pendingTools.get(agent);
|
|
776
|
+
if (pending && !pending.logged) {
|
|
777
|
+
if (pending.buf) {
|
|
778
|
+
try {
|
|
779
|
+
pending.input = JSON.parse(pending.buf);
|
|
780
|
+
}
|
|
781
|
+
catch { }
|
|
782
|
+
}
|
|
783
|
+
this.logToolUse(agent, pending.name, pending.input);
|
|
784
|
+
pending.logged = true;
|
|
785
|
+
}
|
|
786
|
+
this.pendingTools.delete(agent);
|
|
787
|
+
}
|
|
753
788
|
break;
|
|
754
789
|
}
|
|
755
790
|
case "result": {
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
export declare function setTranscriptRunDir(dir: string | undefined): void;
|
|
2
|
+
export declare function getTranscriptRunDir(): string | undefined;
|
|
3
|
+
export declare function transcriptPath(name: string): string | undefined;
|
|
4
|
+
/** Append a single event; silent on error (disk full, permission, etc.). */
|
|
5
|
+
export declare function writeTranscriptEvent(name: string, event: Record<string, unknown>): void;
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
import { appendFileSync, mkdirSync } from "fs";
|
|
2
|
+
import { dirname, join } from "path";
|
|
3
|
+
/**
|
|
4
|
+
* Crash-safe NDJSON transcripts for planner/steering queries.
|
|
5
|
+
*
|
|
6
|
+
* Each query writes to `<runDir>/transcripts/<name>.ndjson` -- one JSON object
|
|
7
|
+
* per line, so partial writes survive crashes. Multiple invocations of the same
|
|
8
|
+
* name append with a `session_start` marker separating them.
|
|
9
|
+
*
|
|
10
|
+
* Why NDJSON:
|
|
11
|
+
* - append-only → no read-modify-write race under parallel waves
|
|
12
|
+
* - one line per event → `tail -f` works; a killed process never leaves
|
|
13
|
+
* the file in an unparseable state
|
|
14
|
+
* - machine-readable → this assistant and future tools can `jq` through it
|
|
15
|
+
*
|
|
16
|
+
* Consumed by: planner-query.ts (stream_event, rate_limit_event, result, error).
|
|
17
|
+
*/
|
|
18
|
+
let _runDir;
|
|
19
|
+
export function setTranscriptRunDir(dir) {
|
|
20
|
+
_runDir = dir;
|
|
21
|
+
}
|
|
22
|
+
export function getTranscriptRunDir() {
|
|
23
|
+
return _runDir;
|
|
24
|
+
}
|
|
25
|
+
export function transcriptPath(name) {
|
|
26
|
+
return _runDir ? join(_runDir, "transcripts", `${name}.ndjson`) : undefined;
|
|
27
|
+
}
|
|
28
|
+
/** Append a single event; silent on error (disk full, permission, etc.). */
|
|
29
|
+
export function writeTranscriptEvent(name, event) {
|
|
30
|
+
const path = transcriptPath(name);
|
|
31
|
+
if (!path)
|
|
32
|
+
return;
|
|
33
|
+
try {
|
|
34
|
+
mkdirSync(dirname(path), { recursive: true });
|
|
35
|
+
appendFileSync(path, JSON.stringify({ t: Date.now(), ...event }) + "\n", "utf-8");
|
|
36
|
+
}
|
|
37
|
+
catch { }
|
|
38
|
+
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-overnight",
|
|
3
|
-
"version": "1.25.
|
|
3
|
+
"version": "1.25.20",
|
|
4
4
|
"description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -17,7 +17,7 @@
|
|
|
17
17
|
"dependencies": {
|
|
18
18
|
"@anthropic-ai/claude-agent-sdk": "^0.2.92",
|
|
19
19
|
"chalk": "^5.4.1",
|
|
20
|
-
"cursor-composer-in-claude": "0.
|
|
20
|
+
"cursor-composer-in-claude": "0.9.0",
|
|
21
21
|
"jsonwebtoken": "^9.0.2"
|
|
22
22
|
},
|
|
23
23
|
"devDependencies": {
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-overnight",
|
|
3
|
-
"version": "1.25.
|
|
3
|
+
"version": "1.25.20",
|
|
4
4
|
"description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Francesco Fornace"
|
|
@@ -11,7 +11,7 @@ description: >
|
|
|
11
11
|
|
|
12
12
|
# What it is
|
|
13
13
|
|
|
14
|
-
`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks,
|
|
14
|
+
`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits verified by the worker next wave). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable -- nothing is lost.
|
|
15
15
|
|
|
16
16
|
**Three-layer review system** runs on every wave:
|
|
17
17
|
1. **Per-agent self-review** -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm -- no initial context bloat.
|
|
@@ -55,16 +55,20 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
|
|
|
55
55
|
|
|
56
56
|
| File / dir | What it tells you |
|
|
57
57
|
|----------------------|-----------------------------------------------------------------------------------|
|
|
58
|
-
| `run.json` | Machine state: objective,
|
|
58
|
+
| `run.json` | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
|
|
59
59
|
| `status.md` | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
|
|
60
60
|
| `goal.md` | Evolving "north star" -- what the run currently thinks "amazing" means. |
|
|
61
|
+
| `themes.md` | The thinking-wave research angles picked for this objective (human-readable). |
|
|
61
62
|
| `milestones/*.md` | Strategic snapshots archived ~every 5 waves. Long-term memory of the run. |
|
|
62
63
|
| `designs/*.md` | Architect outputs from the thinking wave. Deleted once the objective is complete. |
|
|
64
|
+
| `tasks.json` | The execution plan written by the orchestrator. |
|
|
65
|
+
| `steering/wave-N-attempt-M.json` | Steering decision per wave: done flag, reasoning, status/goal updates. |
|
|
66
|
+
| `transcripts/*.ndjson` | Crash-safe NDJSON stream for every planner/steering query: `themes`, `orchestrate`, `plan`, `steer-wave-N-attempt-M`. Each line = one event (session_start, tool_use, text_delta, thinking_delta, rate_limit, result, error). Use `jq -c '.kind' <file>` to get a quick shape; read full objects to reconstruct what the planner was doing. Survives process crashes because writes are append-only. |
|
|
63
67
|
| `sessions/wave-N.json` | Per-wave agent records: prompt, status, cost, files changed, branch, error. |
|
|
64
68
|
|
|
65
69
|
The newest subfolder under `runs/` is the current/last run. A run that never reached "done" is **resumable** -- `run.json` will not be marked complete and `designs/` may still be present.
|
|
66
70
|
|
|
67
|
-
To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands.
|
|
71
|
+
To assess status of a run from scratch, read in this order: `goal.md` → `status.md` → newest file in `milestones/` → newest `sessions/wave-*.json` → `run.json`. Five reads and you know exactly where it stands. If the run died during planning (no `sessions/` yet), read `themes.md` + the newest `transcripts/*.ndjson` instead — they show exactly what the planner was doing when it crashed.
|
|
68
72
|
|
|
69
73
|
**Durable run history (committed, survives cleanup):** `claude-overnight.log.md` at the repo root is updated on every run with a block per run ID -- original objective, start/finish times, cost, outcome, branch. If the user asks "what was my prompt" or "what did last night's run do" and `.claude-overnight/runs/` is empty, this file is the canonical recovery path.
|
|
70
74
|
|