claude-overnight 1.25.41 → 1.25.43
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -3
- package/dist/_version.d.ts +1 -1
- package/dist/_version.js +1 -1
- package/dist/index.js +2 -2
- package/dist/planner-query.js +15 -0
- package/dist/providers.js +5 -0
- package/dist/run.js +152 -54
- package/dist/settings.js +4 -4
- package/dist/state.d.ts +1 -1
- package/dist/state.js +6 -2
- package/dist/steering.d.ts +49 -0
- package/dist/steering.js +116 -40
- package/dist/swarm.js +2 -22
- package/dist/transcripts.d.ts +1 -1
- package/dist/transcripts.js +10 -2
- package/dist/types.d.ts +2 -1
- package/package.json +1 -1
- package/plugins/claude-overnight/.claude-plugin/plugin.json +1 -1
- package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +2 -2
- package/plugins/claude-overnight/skills/coach/SKILL.md +14 -13
package/README.md
CHANGED
|
@@ -4,7 +4,7 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
|
|
|
4
4
|
|
|
5
5
|
Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
|
|
6
6
|
|
|
7
|
-
Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast**
|
|
7
|
+
Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
|
|
8
8
|
|
|
9
9
|
## Run on Qwen 3.6 Plus
|
|
10
10
|
|
|
@@ -333,7 +333,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
|
|
|
333
333
|
|
|
334
334
|
## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
|
|
335
335
|
|
|
336
|
-
Planner, worker, and optional fast
|
|
336
|
+
Planner, main worker, and optional fast worker are each picked separately -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work. The fast worker is a real worker (same tools, same env), just on a cheaper/faster model — steering routes well-scoped tasks to it by default.
|
|
337
337
|
|
|
338
338
|
From the interactive picker, choose `Other…` on the planner, worker, or fast step:
|
|
339
339
|
|
|
@@ -353,7 +353,7 @@ From the interactive picker, choose `Other…` on the planner, worker, or fast s
|
|
|
353
353
|
|
|
354
354
|
Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
|
|
355
355
|
|
|
356
|
-
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
|
|
356
|
+
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, main-worker queries use the worker provider, fast-worker queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
|
|
357
357
|
|
|
358
358
|
**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.
|
|
359
359
|
|
package/dist/_version.d.ts
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
export declare const VERSION = "1.25.
|
|
1
|
+
export declare const VERSION = "1.25.42";
|
package/dist/_version.js
CHANGED
|
@@ -1,2 +1,2 @@
|
|
|
1
1
|
// Auto-generated by build — do not edit manually.
|
|
2
|
-
export const VERSION = "1.25.
|
|
2
|
+
export const VERSION = "1.25.42";
|
package/dist/index.js
CHANGED
|
@@ -156,7 +156,7 @@ async function main() {
|
|
|
156
156
|
--budget=N Target number of agent runs ${chalk.dim("(default: 10)")}
|
|
157
157
|
--concurrency=N Max parallel agents ${chalk.dim("(default: 5)")}
|
|
158
158
|
--model=NAME Worker model override ${chalk.dim("(interactive mode picks planner + worker separately -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
|
|
159
|
-
--fast-model=NAME Fast model for quick tasks ${chalk.dim("(optional -- checked by
|
|
159
|
+
--fast-model=NAME Fast worker model for quick tasks ${chalk.dim("(optional -- checked by next wave's workers)")}
|
|
160
160
|
--usage-cap=N Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
|
|
161
161
|
--allow-extra-usage Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
|
|
162
162
|
--extra-usage-budget=N Max $ for extra usage ${chalk.dim("(implies --allow-extra-usage)")}
|
|
@@ -888,7 +888,7 @@ async function main() {
|
|
|
888
888
|
console.error(chalk.red(` Fix the provider at ~/.claude/claude-overnight/providers.json and retry.`));
|
|
889
889
|
}
|
|
890
890
|
if (degradable) {
|
|
891
|
-
console.error(chalk.yellow(` Continuing without fast — fast-eligible tasks will run on the worker model instead.`));
|
|
891
|
+
console.error(chalk.yellow(` Continuing without the fast worker — fast-eligible tasks will run on the main worker model instead.`));
|
|
892
892
|
console.error("");
|
|
893
893
|
fastDegraded = true;
|
|
894
894
|
continue;
|
package/dist/planner-query.js
CHANGED
|
@@ -619,6 +619,21 @@ function extractOutermostBraces(text) {
|
|
|
619
619
|
return null;
|
|
620
620
|
}
|
|
621
621
|
export function attemptJsonParse(text) {
|
|
622
|
+
// Strip conversational prefaces/suffixes that weak-schema models sometimes
|
|
623
|
+
// wrap around the JSON body (e.g. "Here is the JSON: { ... } Let me know…").
|
|
624
|
+
const preface = /^\s*(?:Here (?:is|are)[^{]*|Let me[^{]*|I'?ll[^{]*|Sure[^{]*|Okay[^{]*)/i;
|
|
625
|
+
const suffix = /\n\n(?:Let me know|Hope this|Please let me)[\s\S]*$/i;
|
|
626
|
+
if (preface.test(text) || suffix.test(text)) {
|
|
627
|
+
const cleaned = text.replace(preface, "").replace(suffix, "").trim();
|
|
628
|
+
if (cleaned && cleaned !== text) {
|
|
629
|
+
try {
|
|
630
|
+
const obj = JSON.parse(cleaned);
|
|
631
|
+
if (typeof obj === "object" && obj !== null)
|
|
632
|
+
return obj;
|
|
633
|
+
}
|
|
634
|
+
catch { }
|
|
635
|
+
}
|
|
636
|
+
}
|
|
622
637
|
try {
|
|
623
638
|
const obj = JSON.parse(text);
|
|
624
639
|
if (typeof obj === "object" && obj !== null)
|
package/dist/providers.js
CHANGED
|
@@ -178,6 +178,11 @@ export function envFor(p) {
|
|
|
178
178
|
base.ANTHROPIC_AUTH_TOKEN = key;
|
|
179
179
|
}
|
|
180
180
|
delete base.ANTHROPIC_API_KEY;
|
|
181
|
+
// Prevent CURSOR_API_KEY from leaking into non-proxy envs — would cause
|
|
182
|
+
// isCursorProxyEnv false-positive, silently rerouting through direct fetch
|
|
183
|
+
// which ignores outputFormat (no JSON schema enforcement).
|
|
184
|
+
delete base.CURSOR_API_KEY;
|
|
185
|
+
delete base.CURSOR_AUTH_TOKEN;
|
|
181
186
|
return base;
|
|
182
187
|
}
|
|
183
188
|
/**
|
package/dist/run.js
CHANGED
|
@@ -3,8 +3,8 @@ import { join } from "path";
|
|
|
3
3
|
import { execSync } from "child_process";
|
|
4
4
|
import chalk from "chalk";
|
|
5
5
|
import { Swarm } from "./swarm.js";
|
|
6
|
-
import { steerWave } from "./steering.js";
|
|
7
|
-
import { getTotalPlannerCost, getPlannerRateLimitInfo, getPeakPlannerContext, runPlannerQuery, setPlannerEnvResolver } from "./planner-query.js";
|
|
6
|
+
import { steerWave, STEER_SCHEMA } from "./steering.js";
|
|
7
|
+
import { getTotalPlannerCost, getPlannerRateLimitInfo, getPeakPlannerContext, runPlannerQuery, setPlannerEnvResolver, attemptJsonParse } from "./planner-query.js";
|
|
8
8
|
import { contextFillInfo } from "./render.js";
|
|
9
9
|
import { getModelCapability } from "./models.js";
|
|
10
10
|
import { buildEnvResolver, isCursorProxyProvider } from "./providers.js";
|
|
@@ -55,6 +55,8 @@ export async function executeRun(cfg) {
|
|
|
55
55
|
let lastCapped = false, lastAborted = false, objectiveComplete = false;
|
|
56
56
|
let lastEstimate;
|
|
57
57
|
const branches = [];
|
|
58
|
+
let healFailStreak = 0; // consecutive waves where heal-0 agent changed 0 files
|
|
59
|
+
let zeroFileWaves = 0; // consecutive waves with 0 files across non-heal tasks
|
|
58
60
|
if (cfg.resuming && cfg.resumeState) {
|
|
59
61
|
const rs = cfg.resumeState;
|
|
60
62
|
remaining = Math.max(1, rs.remaining);
|
|
@@ -295,8 +297,21 @@ export async function executeRun(cfg) {
|
|
|
295
297
|
// Shared steering logic used by both resume-steering and in-loop steering
|
|
296
298
|
const runSteering = async () => {
|
|
297
299
|
let steered = false;
|
|
300
|
+
// ── B1: Skip steering when ≥2 unresolved merge-failed branches exist ──
|
|
301
|
+
const mergeFailedBranches = branches.filter(b => b.status === "merge-failed");
|
|
302
|
+
if (mergeFailedBranches.length >= 2) {
|
|
303
|
+
currentTasks = mergeFailedBranches.map((b, i) => ({
|
|
304
|
+
id: `branch-retry-${i}`,
|
|
305
|
+
prompt: `Your previous attempt at this task merge-failed against main. Redo it against the current state of main with minimal, focused edits. Original task:\n\n${b.taskPrompt}`,
|
|
306
|
+
model: workerModel,
|
|
307
|
+
postcondition: "pnpm run build",
|
|
308
|
+
}));
|
|
309
|
+
display.appendSteeringEvent(`Skipping steering — ${mergeFailedBranches.length} merge-failed branches form the wave`);
|
|
310
|
+
return true;
|
|
311
|
+
}
|
|
298
312
|
let steerAttempts = 0;
|
|
299
|
-
|
|
313
|
+
const MAX_STEER_ATTEMPTS = 2; // B2: retry threshold 3 → 2
|
|
314
|
+
while (!steered && remaining > 0 && !stopping && steerAttempts < MAX_STEER_ATTEMPTS) {
|
|
300
315
|
steerAttempts++;
|
|
301
316
|
const plannerCostBefore = getTotalPlannerCost();
|
|
302
317
|
try {
|
|
@@ -350,23 +365,52 @@ export async function executeRun(cfg) {
|
|
|
350
365
|
}
|
|
351
366
|
catch (err) {
|
|
352
367
|
accCost += getTotalPlannerCost() - plannerCostBefore;
|
|
353
|
-
|
|
354
|
-
|
|
368
|
+
const rawPreview = err?.message?.slice(0, 200) || "(no details)";
|
|
369
|
+
if (steerAttempts < MAX_STEER_ATTEMPTS) {
|
|
370
|
+
display.appendSteeringEvent(`Steering failed (attempt ${steerAttempts}/${MAX_STEER_ATTEMPTS}) -- retrying... ${rawPreview}`);
|
|
355
371
|
continue;
|
|
356
372
|
}
|
|
357
|
-
|
|
358
|
-
|
|
373
|
+
// ── B3: Decomposer fallback (replaces single-giant-fallback) ──
|
|
374
|
+
display.appendSteeringEvent(`Steering failed ${MAX_STEER_ATTEMPTS}× — decomposer fallback`);
|
|
375
|
+
// First: try merge-failed recycling even if only 1 unresolved branch exists
|
|
376
|
+
const stillFailed = branches.filter(b => b.status === "merge-failed");
|
|
377
|
+
if (stillFailed.length >= 1) {
|
|
378
|
+
currentTasks = stillFailed.map((b, i) => ({
|
|
379
|
+
id: `branch-retry-${i}`,
|
|
380
|
+
prompt: `Your previous attempt at this task merge-failed against main. Redo it against the current state of main with minimal, focused edits. Original task:\n\n${b.taskPrompt}`,
|
|
381
|
+
model: workerModel,
|
|
382
|
+
postcondition: "pnpm run build",
|
|
383
|
+
}));
|
|
384
|
+
display.appendSteeringEvent(`Decomposer: ${stillFailed.length} merge-failed branch(es) retried as swarm tasks`);
|
|
385
|
+
steered = true;
|
|
386
|
+
break;
|
|
387
|
+
}
|
|
388
|
+
// Second: minimal-prompt planner query
|
|
389
|
+
display.appendSteeringEvent("Decomposer: minimal planner query…");
|
|
359
390
|
try {
|
|
360
|
-
|
|
391
|
+
let statusText = "";
|
|
392
|
+
try {
|
|
393
|
+
statusText = readFileSync(join(runDir, "status.md"), "utf-8");
|
|
394
|
+
}
|
|
395
|
+
catch { }
|
|
396
|
+
const minimalPrompt = `${objective ? `Objective: ${objective}` : ""}\n\nStatus:\n${statusText || "(none)"}\n\nReturn tasks: string[] — 3-6 specific follow-ups. JSON only. {"tasks":[{"prompt":"..."}]}`;
|
|
397
|
+
const minimalText = await runPlannerQuery(minimalPrompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: "decomposer-minimal", maxTurns: 40 }, () => { });
|
|
398
|
+
const parsed = attemptJsonParse(minimalText);
|
|
399
|
+
if (parsed?.tasks?.length > 0) {
|
|
400
|
+
currentTasks = parsed.tasks.map((t, i) => ({
|
|
401
|
+
id: `decompose-${i}`,
|
|
402
|
+
prompt: typeof t === "string" ? t : t.prompt,
|
|
403
|
+
model: workerModel,
|
|
404
|
+
}));
|
|
405
|
+
display.appendSteeringEvent(`Decomposer: ${currentTasks.length} tasks from minimal planner`);
|
|
406
|
+
steered = true;
|
|
407
|
+
break;
|
|
408
|
+
}
|
|
361
409
|
}
|
|
362
410
|
catch { }
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
type: "execute",
|
|
367
|
-
}];
|
|
368
|
-
steered = true;
|
|
369
|
-
break;
|
|
411
|
+
// Finally: halt
|
|
412
|
+
display.appendSteeringEvent(`Decomposer: no tasks produced — halting`);
|
|
413
|
+
return false;
|
|
370
414
|
}
|
|
371
415
|
}
|
|
372
416
|
return steered;
|
|
@@ -389,12 +433,26 @@ export async function executeRun(cfg) {
|
|
|
389
433
|
// Health check before each wave: a broken build poisons every subsequent
|
|
390
434
|
// agent context, so prepend a heal task when detected. Steering-planned
|
|
391
435
|
// tasks still run, just after the build is green again.
|
|
436
|
+
// Skip if prior heal changed 0 files (heal unable to fix).
|
|
392
437
|
{
|
|
393
|
-
const
|
|
394
|
-
if (
|
|
395
|
-
const
|
|
396
|
-
|
|
397
|
-
|
|
438
|
+
const healTasks = healFailStreak > 0 ? [] : checkProjectHealth(cwd);
|
|
439
|
+
if (healTasks.length > 0 && remaining > 0) {
|
|
440
|
+
const healIds = healTasks.map(t => t.id);
|
|
441
|
+
const withoutDup = currentTasks.filter(t => !healIds.includes(t.id));
|
|
442
|
+
currentTasks = [...healTasks, ...withoutDup];
|
|
443
|
+
display.appendSteeringEvent(`Health check: build broken — queued ${healTasks.length} heal task(s)`);
|
|
444
|
+
}
|
|
445
|
+
else if (healTasks.length === 0 && healFailStreak > 0 && checkProjectHealth(cwd).length > 0) {
|
|
446
|
+
display.appendSteeringEvent(`Health check: build broken — heal skipped after ${healFailStreak} failed attempts, needs manual intervention`);
|
|
447
|
+
try {
|
|
448
|
+
const statusPath2 = join(runDir, "status.md");
|
|
449
|
+
const existing2 = existsSync(statusPath2) ? readFileSync(statusPath2, "utf-8") : "";
|
|
450
|
+
const marker = "## Heal blocked";
|
|
451
|
+
if (!existing2.includes(marker)) {
|
|
452
|
+
writeFileSync(statusPath2, `${existing2}${existing2 ? "\n\n" : ""}${marker}\nBuild has been broken for ${healFailStreak} waves, heal agents unable to fix — intervene manually.\n`, "utf-8");
|
|
453
|
+
}
|
|
454
|
+
}
|
|
455
|
+
catch { }
|
|
398
456
|
}
|
|
399
457
|
}
|
|
400
458
|
if (currentTasks.length > remaining)
|
|
@@ -598,7 +656,7 @@ export async function executeRun(cfg) {
|
|
|
598
656
|
liveConfig.remaining = remaining;
|
|
599
657
|
lastCapped = swarm.cappedOut;
|
|
600
658
|
lastAborted = swarm.aborted;
|
|
601
|
-
recordBranches(swarm.agents, swarm.mergeResults, branches);
|
|
659
|
+
recordBranches(swarm.agents, swarm.mergeResults, branches, waveNum);
|
|
602
660
|
saveWaveSession(runDir, waveNum, swarm.agents, swarm.totalCostUsd);
|
|
603
661
|
// Tasks that never made it into the swarm (queue cleared on abort/cap)
|
|
604
662
|
// are preserved as currentTasks so resume picks them up. Budget for these
|
|
@@ -623,6 +681,34 @@ export async function executeRun(cfg) {
|
|
|
623
681
|
};
|
|
624
682
|
}),
|
|
625
683
|
});
|
|
684
|
+
// Track heal fail streak: if a heal-0 task existed this wave and changed 0 files, increment.
|
|
685
|
+
// If any non-heal execute task changed files, reset.
|
|
686
|
+
const lastWave = waveHistory[waveHistory.length - 1];
|
|
687
|
+
const healTask = lastWave?.tasks.find(t => t.type === "heal");
|
|
688
|
+
if (healTask && !healTask.filesChanged) {
|
|
689
|
+
healFailStreak++;
|
|
690
|
+
}
|
|
691
|
+
else if (lastWave?.tasks.some(t => (t.type !== "heal") && (t.filesChanged ?? 0) > 0)) {
|
|
692
|
+
healFailStreak = 0;
|
|
693
|
+
}
|
|
694
|
+
// C1: Circuit breaker — halt after 2 consecutive waves with 0 files across non-heal tasks
|
|
695
|
+
const nonHealFiles = lastWave?.tasks.filter(t => t.type !== "heal").reduce((sum, t) => sum + (t.filesChanged ?? 0), 0) ?? 0;
|
|
696
|
+
if (nonHealFiles === 0 && waveNum > 0) {
|
|
697
|
+
zeroFileWaves++;
|
|
698
|
+
if (zeroFileWaves >= 2) {
|
|
699
|
+
display.appendSteeringEvent(`Circuit breaker: 2 consecutive waves produced no merged changes — halting to prevent budget drain`);
|
|
700
|
+
display.stop();
|
|
701
|
+
saveRunState(runDir, buildRunState({ remaining, phase: "stopped", currentTasks: [] }));
|
|
702
|
+
display.stop();
|
|
703
|
+
restore();
|
|
704
|
+
console.log(chalk.red(`\n Circuit breaker: 2 consecutive waves produced no merged changes.`));
|
|
705
|
+
console.log(chalk.red(` Halting to prevent budget drain. Run preserved at ${runDir}.`));
|
|
706
|
+
process.exit(3);
|
|
707
|
+
}
|
|
708
|
+
}
|
|
709
|
+
else {
|
|
710
|
+
zeroFileWaves = 0;
|
|
711
|
+
}
|
|
626
712
|
// Hook-blocked work: agents that touched files but nothing landed on the
|
|
627
713
|
// branch (pre-commit hooks, gitignore, writes outside worktree). Surface
|
|
628
714
|
// as a wave-level warning so steering sees it, not just a per-agent log.
|
|
@@ -670,6 +756,20 @@ export async function executeRun(cfg) {
|
|
|
670
756
|
}
|
|
671
757
|
if (next !== existing)
|
|
672
758
|
writeFileSync(statusPath, next, "utf-8");
|
|
759
|
+
// GC ghost branches: delete merge-failed branches ≥2 waves old and mark discarded.
|
|
760
|
+
// Safe: their work never landed. The decomposer (Phase B) will re-attempt from saved taskPrompt.
|
|
761
|
+
const gcCandidates = branches.filter(b => b.status === "merge-failed" && b.firstFailedWave !== undefined && (waveNum - b.firstFailedWave) >= 2);
|
|
762
|
+
let gcCount = 0;
|
|
763
|
+
for (const b of gcCandidates) {
|
|
764
|
+
try {
|
|
765
|
+
execSync(`git branch -D "${b.branch}"`, { cwd, stdio: "ignore" });
|
|
766
|
+
}
|
|
767
|
+
catch { }
|
|
768
|
+
b.status = "discarded";
|
|
769
|
+
gcCount++;
|
|
770
|
+
}
|
|
771
|
+
if (gcCount > 0)
|
|
772
|
+
display.appendSteeringEvent(`GC: discarded ${gcCount} ghost branch(es) ≥2 waves old`);
|
|
673
773
|
}
|
|
674
774
|
catch { }
|
|
675
775
|
// Fire-and-forget debrief after each wave.
|
|
@@ -979,34 +1079,11 @@ export async function executeRun(cfg) {
|
|
|
979
1079
|
}
|
|
980
1080
|
function reviewPrompt(scope, objective) {
|
|
981
1081
|
const scopeLine = scope === "wave"
|
|
982
|
-
? "
|
|
1082
|
+
? "Review and simplify all changes from the most recent wave."
|
|
983
1083
|
: `You are the final quality gate before this autonomous run completes.\n\nThe objective was: ${objective || "improve the codebase"}`;
|
|
984
|
-
const diffCmd = scope === "wave"
|
|
985
|
-
? "Run `git diff` to see what changed."
|
|
986
|
-
: "Run `git diff main` (or `git diff HEAD` if on the same branch) to see ALL changes made during this run.";
|
|
987
|
-
const checks = scope === "wave"
|
|
988
|
-
? `1. **Missed reuse**: Did any agent write something that already exists elsewhere? Find existing utilities and suggest replacements.
|
|
989
|
-
2. **Quality issues**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code where enums exist, unnecessary JSX nesting, comments that narrate what the code does.
|
|
990
|
-
3. **Efficiency problems**: Redundant computations, sequential operations that could be parallel, hot-path bloat, recurring no-op updates, TOCTOU patterns, memory leaks.
|
|
991
|
-
4. **Merge conflicts or inconsistencies**: Changes that work against each other or break existing patterns.`
|
|
992
|
-
: `1. **Architecture coherence**: Do the changes form a coherent whole, or are they a patchwork of independent edits that don't fit together?
|
|
993
|
-
2. **Missed reuse**: Any new code that duplicates existing functionality?
|
|
994
|
-
3. **Quality**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code, unnecessary nesting, narrative comments.
|
|
995
|
-
4. **Efficiency**: N+1 patterns, redundant computations, hot-path bloat, missing cleanup, unbounded data structures.
|
|
996
|
-
5. **Consistency**: Do all changes follow the project's existing patterns, conventions, and design system?
|
|
997
|
-
6. **Build and test**: Run the build and any existing tests. Fix any breakage.`;
|
|
998
|
-
const close = scope === "wave"
|
|
999
|
-
? "Fix issues directly. Delete and simplify rather than add. If the code is already clean, skip."
|
|
1000
|
-
: "Fix issues directly. Delete and simplify. If the codebase is clean and the build passes, say so.";
|
|
1001
1084
|
return `${scopeLine}
|
|
1002
1085
|
|
|
1003
|
-
|
|
1004
|
-
|
|
1005
|
-
${checks}
|
|
1006
|
-
|
|
1007
|
-
${close}
|
|
1008
|
-
|
|
1009
|
-
No need to explain your changes -- just fix them.`;
|
|
1086
|
+
Invoke the \`simplify\` skill to review changed code for reuse, quality, and efficiency, then fix any issues found.`;
|
|
1010
1087
|
}
|
|
1011
1088
|
async function runReview(opts, scope, objective, onSwarm) {
|
|
1012
1089
|
const swarm = new Swarm({
|
|
@@ -1062,24 +1139,45 @@ async function promptBudgetExtension(ctx) {
|
|
|
1062
1139
|
return suggested;
|
|
1063
1140
|
return n;
|
|
1064
1141
|
}
|
|
1142
|
+
/** Detect build errors and return one or more heal tasks. If errors span ≥2 files,
|
|
1143
|
+
* emit one task per file so they heal in parallel without merge conflicts. */
|
|
1065
1144
|
function checkProjectHealth(cwd) {
|
|
1066
1145
|
const cmd = detectHealthCommand(cwd);
|
|
1067
1146
|
if (!cmd)
|
|
1068
|
-
return
|
|
1147
|
+
return [];
|
|
1069
1148
|
try {
|
|
1070
1149
|
execSync(cmd, { cwd, encoding: "utf-8", stdio: "pipe", timeout: 60_000 });
|
|
1071
|
-
return
|
|
1150
|
+
return [];
|
|
1072
1151
|
}
|
|
1073
1152
|
catch (err) {
|
|
1074
1153
|
if (err.killed)
|
|
1075
|
-
return
|
|
1154
|
+
return [];
|
|
1076
1155
|
const output = ((err.stdout || "") + "\n" + (err.stderr || "")).trim();
|
|
1077
1156
|
const trimmed = output.length > 4000 ? output.slice(0, 2000) + "\n…\n" + output.slice(-2000) : output;
|
|
1078
|
-
|
|
1079
|
-
|
|
1080
|
-
|
|
1081
|
-
|
|
1082
|
-
|
|
1157
|
+
// B4: Split heal by file — extract distinct source file paths from errors
|
|
1158
|
+
const fileRe = /\/src\/[\w./-]+\.(ts|tsx|js|jsx)/g;
|
|
1159
|
+
const files = new Set();
|
|
1160
|
+
for (const m of trimmed.matchAll(fileRe))
|
|
1161
|
+
files.add(m[0]);
|
|
1162
|
+
if (files.size >= 2) {
|
|
1163
|
+
// One task per file — each agent gets only that file's error context
|
|
1164
|
+
const fileErrors = new Map();
|
|
1165
|
+
for (const f of files) {
|
|
1166
|
+
// Extract lines mentioning this file
|
|
1167
|
+
const lines = trimmed.split("\n").filter(l => l.includes(f));
|
|
1168
|
+
fileErrors.set(f, lines.slice(0, 30).join("\n"));
|
|
1169
|
+
}
|
|
1170
|
+
return Array.from(fileErrors.entries()).map(([file, errs], i) => ({
|
|
1171
|
+
id: `heal-${i}`,
|
|
1172
|
+
prompt: `Fix the broken build errors in \`${file}\`. \`${cmd}\` fails:\n\`\`\`\n${errs}\n\`\`\`\nFix every error in this file. Run \`${cmd}\` when done to verify.`,
|
|
1173
|
+
type: "heal",
|
|
1174
|
+
}));
|
|
1175
|
+
}
|
|
1176
|
+
return [{
|
|
1177
|
+
id: "heal-0",
|
|
1178
|
+
prompt: `Fix the broken build. \`${cmd}\` fails after merging parallel work:\n\`\`\`\n${trimmed}\n\`\`\`\nFix every error. Run \`${cmd}\` when done to verify.`,
|
|
1179
|
+
type: "heal",
|
|
1180
|
+
}];
|
|
1083
1181
|
}
|
|
1084
1182
|
}
|
|
1085
1183
|
function detectHealthCommand(cwd) {
|
package/dist/settings.js
CHANGED
|
@@ -25,12 +25,12 @@ export async function editRunSettings(options) {
|
|
|
25
25
|
s.workerModel = workerPick.model;
|
|
26
26
|
s.workerProviderId = workerPick.providerId;
|
|
27
27
|
const suggestFast = !!(options.defaults?.fastModel);
|
|
28
|
-
const fastChoice = await select(`${chalk.cyan("③")} Fast model ${chalk.dim("(optional -- Haiku/Qwen for
|
|
29
|
-
{ name: "Skip", value: "skip", hint: "
|
|
30
|
-
{ name: "Pick a fast
|
|
28
|
+
const fastChoice = await select(`${chalk.cyan("③")} Fast worker model ${chalk.dim("(optional -- Haiku/Qwen for well-scoped tasks, checked by next wave's workers)")}:`, [
|
|
29
|
+
{ name: "Skip", value: "skip", hint: "single-worker mode (main worker handles everything)" },
|
|
30
|
+
{ name: "Pick a fast worker", value: "pick", hint: "Haiku, Qwen, or any provider -- a cheaper, faster second worker" },
|
|
31
31
|
], suggestFast ? 1 : 0);
|
|
32
32
|
if (fastChoice === "pick") {
|
|
33
|
-
const fastPick = await pickModel(`${chalk.cyan("③b")} Fast model:`, models, options.defaults?.fastModel ?? s.fastModel);
|
|
33
|
+
const fastPick = await pickModel(`${chalk.cyan("③b")} Fast worker model:`, models, options.defaults?.fastModel ?? s.fastModel);
|
|
34
34
|
s.fastModel = fastPick.model;
|
|
35
35
|
s.fastProviderId = fastPick.providerId;
|
|
36
36
|
}
|
package/dist/state.d.ts
CHANGED
|
@@ -72,6 +72,6 @@ export declare function recordBranches(agents: {
|
|
|
72
72
|
}[], mergeResults: {
|
|
73
73
|
branch: string;
|
|
74
74
|
ok: boolean;
|
|
75
|
-
}[], branches: BranchRecord[]): void;
|
|
75
|
+
}[], branches: BranchRecord[], currentWave?: number): void;
|
|
76
76
|
export declare function autoMergeBranches(cwd: string, branches: BranchRecord[], onLog: (msg: string) => void): void;
|
|
77
77
|
export declare function archiveMilestone(baseDir: string, waveNum: number): void;
|
package/dist/state.js
CHANGED
|
@@ -461,7 +461,7 @@ export function loadWaveHistory(runDir) {
|
|
|
461
461
|
}
|
|
462
462
|
}
|
|
463
463
|
// ── Branch management ──
|
|
464
|
-
export function recordBranches(agents, mergeResults, branches) {
|
|
464
|
+
export function recordBranches(agents, mergeResults, branches, currentWave) {
|
|
465
465
|
for (const a of agents) {
|
|
466
466
|
if (a.branch) {
|
|
467
467
|
branches.push({
|
|
@@ -475,8 +475,12 @@ export function recordBranches(agents, mergeResults, branches) {
|
|
|
475
475
|
}
|
|
476
476
|
for (const mr of mergeResults) {
|
|
477
477
|
const br = branches.find(b => b.branch === mr.branch);
|
|
478
|
-
if (br)
|
|
478
|
+
if (br) {
|
|
479
479
|
br.status = mr.ok ? "merged" : "merge-failed";
|
|
480
|
+
if (!mr.ok && !br.firstFailedWave && currentWave !== undefined) {
|
|
481
|
+
br.firstFailedWave = currentWave;
|
|
482
|
+
}
|
|
483
|
+
}
|
|
480
484
|
}
|
|
481
485
|
}
|
|
482
486
|
export function autoMergeBranches(cwd, branches, onLog) {
|
package/dist/steering.d.ts
CHANGED
|
@@ -1,3 +1,52 @@
|
|
|
1
1
|
import type { PermMode, SteerResult, RunMemory, WaveSummary } from "./types.js";
|
|
2
2
|
import { type PlannerLog } from "./planner-query.js";
|
|
3
|
+
export declare const STEER_SCHEMA: {
|
|
4
|
+
type: "json_schema";
|
|
5
|
+
schema: {
|
|
6
|
+
type: string;
|
|
7
|
+
properties: {
|
|
8
|
+
done: {
|
|
9
|
+
type: string;
|
|
10
|
+
};
|
|
11
|
+
reasoning: {
|
|
12
|
+
type: string;
|
|
13
|
+
};
|
|
14
|
+
statusUpdate: {
|
|
15
|
+
type: string;
|
|
16
|
+
};
|
|
17
|
+
goalUpdate: {
|
|
18
|
+
type: string;
|
|
19
|
+
};
|
|
20
|
+
estimatedSessionsRemaining: {
|
|
21
|
+
type: string;
|
|
22
|
+
};
|
|
23
|
+
tasks: {
|
|
24
|
+
type: string;
|
|
25
|
+
items: {
|
|
26
|
+
type: string;
|
|
27
|
+
properties: {
|
|
28
|
+
prompt: {
|
|
29
|
+
type: string;
|
|
30
|
+
};
|
|
31
|
+
model: {
|
|
32
|
+
type: string;
|
|
33
|
+
};
|
|
34
|
+
noWorktree: {
|
|
35
|
+
type: string;
|
|
36
|
+
};
|
|
37
|
+
type: {
|
|
38
|
+
type: string;
|
|
39
|
+
enum: string[];
|
|
40
|
+
};
|
|
41
|
+
postcondition: {
|
|
42
|
+
type: string;
|
|
43
|
+
};
|
|
44
|
+
};
|
|
45
|
+
required: string[];
|
|
46
|
+
};
|
|
47
|
+
};
|
|
48
|
+
};
|
|
49
|
+
required: string[];
|
|
50
|
+
};
|
|
51
|
+
};
|
|
3
52
|
export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory, transcriptName?: string): Promise<SteerResult>;
|
package/dist/steering.js
CHANGED
|
@@ -2,7 +2,10 @@ import { runPlannerQuery, attemptJsonParse, postProcess } from "./planner-query.
|
|
|
2
2
|
import { contextConstraintNote } from "./models.js";
|
|
3
3
|
import { DESIGN_THINKING } from "./planner.js";
|
|
4
4
|
import { createTurn, beginTurn, endTurn } from "./turns.js";
|
|
5
|
-
|
|
5
|
+
import { writeFileSync, mkdirSync } from "fs";
|
|
6
|
+
import { join } from "path";
|
|
7
|
+
import { getTranscriptRunDir } from "./transcripts.js";
|
|
8
|
+
export const STEER_SCHEMA = {
|
|
6
9
|
type: "json_schema",
|
|
7
10
|
schema: {
|
|
8
11
|
type: "object",
|
|
@@ -24,10 +27,11 @@ const STEER_SCHEMA = {
|
|
|
24
27
|
required: ["done", "tasks", "reasoning", "statusUpdate", "estimatedSessionsRemaining"],
|
|
25
28
|
},
|
|
26
29
|
};
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
const
|
|
30
|
+
const PROMPT_BUDGET = 6000;
|
|
31
|
+
/** Build a compact wave summary; keepLast controls how many recent waves to include. */
|
|
32
|
+
function buildRecentText(history, keepLast) {
|
|
33
|
+
const recentWaves = history.slice(-keepLast);
|
|
34
|
+
return recentWaves.length > 0 ? recentWaves.map(w => {
|
|
31
35
|
const lines = w.tasks.map(t => {
|
|
32
36
|
const isExecute = !t.type || t.type === "execute";
|
|
33
37
|
const files = t.filesChanged ? ` (${t.filesChanged} files)` : isExecute ? " (0 files)" : " (read-only)";
|
|
@@ -39,16 +43,25 @@ export async function steerWave(objective, history, remainingBudget, cwd, planne
|
|
|
39
43
|
const warn = totalExecute > 0 && zeroExecute > totalExecute / 2 ? `\n ⚠ ${zeroExecute}/${totalExecute} execute tasks changed 0 files -- tasks may be mis-scoped or blocked` : "";
|
|
40
44
|
return `Wave ${w.wave + 1}:\n${lines}${warn}`;
|
|
41
45
|
}).join("\n\n") : "(first wave)";
|
|
46
|
+
}
|
|
47
|
+
export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
|
|
48
|
+
const constraint = contextConstraintNote(workerModel);
|
|
42
49
|
const cap = (s, max) => s.length > max ? s.slice(0, max) + "\n...(truncated)" : s;
|
|
43
50
|
const statusBlock = runMemory?.status ? `\nCurrent project status:\n${runMemory.status}\n` : "";
|
|
44
|
-
const milestoneBlock = runMemory?.milestones ? `\nMilestone snapshots:\n${cap(runMemory.milestones,
|
|
45
|
-
const designBlock = runMemory?.designs ? `\nArchitectural research:\n${cap(runMemory.designs,
|
|
46
|
-
const reflectionBlock = runMemory?.reflections ? `\nLatest quality reports:\n${cap(runMemory.reflections,
|
|
47
|
-
const verificationBlock = runMemory?.verifications ? `\nVerification results (from actually running the app):\n${cap(runMemory.verifications,
|
|
51
|
+
const milestoneBlock = runMemory?.milestones ? `\nMilestone snapshots:\n${cap(runMemory.milestones, 2000)}\n` : "";
|
|
52
|
+
const designBlock = runMemory?.designs ? `\nArchitectural research:\n${cap(runMemory.designs, 1500)}\n` : "";
|
|
53
|
+
const reflectionBlock = runMemory?.reflections ? `\nLatest quality reports:\n${cap(runMemory.reflections, 1000)}\n` : "";
|
|
54
|
+
const verificationBlock = runMemory?.verifications ? `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 1000)}\n` : "";
|
|
48
55
|
const goalBlock = runMemory?.goal ? `\nNorth star -- what "amazing" means:\n${runMemory.goal}\n` : "";
|
|
49
|
-
const prevRunBlock = runMemory?.previousRuns ? `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns,
|
|
56
|
+
const prevRunBlock = runMemory?.previousRuns ? `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 800)}\n` : "";
|
|
50
57
|
const guidanceBlock = runMemory?.userGuidance ? `\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\nUSER DIRECTIVES -- highest priority\nThese come directly from the user running this session. They override prior assumptions about status, goal, and next steps. Incorporate them into the wave you compose below. If they conflict with earlier decisions, the user wins. Reflect the new direction in statusUpdate so future waves remember.\n\n${cap(runMemory.userGuidance, 4000)}\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n` : "";
|
|
51
|
-
|
|
58
|
+
// Collapse archetype menu after wave 3 to save ~2 KB
|
|
59
|
+
const archetypesShort = `Archetypes: execute | explore | critique | synthesize | verify | user-test | polish | simplify`;
|
|
60
|
+
const archetypeBlock = history.length >= 3
|
|
61
|
+
? archetypesShort
|
|
62
|
+
: null;
|
|
63
|
+
let recentText = buildRecentText(history, 3);
|
|
64
|
+
let prompt = `You are the quality director for an autonomous multi-wave agent system. Your job is to push the work toward "amazing," not just "done."
|
|
52
65
|
${guidanceBlock}
|
|
53
66
|
Objective: ${objective}
|
|
54
67
|
${goalBlock}${statusBlock}${milestoneBlock}${prevRunBlock}
|
|
@@ -66,7 +79,7 @@ If verification found issues, those are the priority. Fix what's broken before b
|
|
|
66
79
|
|
|
67
80
|
## Compose the next wave
|
|
68
81
|
|
|
69
|
-
You have full creative freedom. Design the wave that will have the highest impact right now
|
|
82
|
+
You have full creative freedom. Design the wave that will have the highest impact right now.${archetypeBlock ? `\n\nUse these archetypes as shorthand — mix, adapt, or invent your own:\n\n${archetypeBlock}` : ` Here are archetypes to draw from -- mix, adapt, or invent your own:
|
|
70
83
|
|
|
71
84
|
**Execute** -- Agents implement concrete changes in parallel. Each touches different files. The bread and butter.
|
|
72
85
|
Example: 5 agents each owning a different feature or fix
|
|
@@ -89,47 +102,87 @@ You have full creative freedom. Design the wave that will have the highest impac
|
|
|
89
102
|
**Polish** -- Agents focus purely on feel: loading states, error messages, micro-interactions, empty states, responsiveness. Not features -- the texture that makes users trust the product.
|
|
90
103
|
Example: 2 agents, one on happy paths, one on error/edge states
|
|
91
104
|
|
|
92
|
-
**Simplify** --
|
|
93
|
-
Example: 1
|
|
94
|
-
|
|
95
|
-
You can combine these. A wave can have 3 execute agents + 1 verification agent. Or 2 divergent explorers. Whatever the situation calls for.
|
|
105
|
+
**Simplify** -- Invoke the 'simplify' skill. It reviews changed code and spawns parallel sub-agents for thorough review.
|
|
106
|
+
Example: 1 agent per wave with task type "review", let the skill handle the rest`}
|
|
96
107
|
|
|
97
|
-
For non-execute tasks (critique, verify, user-test, synthesize), tell agents to write their output to files in the run directory so findings persist for future waves. Use paths like: .claude-overnight/latest/reflections/wave-
|
|
108
|
+
For non-execute tasks (critique, verify, user-test, synthesize), tell agents to write their output to files in the run directory so findings persist for future waves. Use paths like: .claude-overnight/latest/reflections/wave-n-{topic}.md or .claude-overnight/latest/verifications/wave-n-{topic}.md.
|
|
98
109
|
|
|
99
110
|
IMPORTANT: You cannot declare "done" unless at least one verification has confirmed the app works. If you're considering done but haven't verified, compose a verification task first.
|
|
100
111
|
|
|
101
112
|
Respond with ONLY a JSON object (no markdown fences):
|
|
102
|
-
{
|
|
103
|
-
"done": false,
|
|
104
|
-
"reasoning": "your assessment and why you chose this wave composition",
|
|
105
|
-
"goalUpdate": "optional -- refine what 'amazing' means as you learn more",
|
|
106
|
-
"statusUpdate": "REQUIRED -- concise project status: what's built, what works, what's rough, quality level, key gaps. This replaces the previous status.",
|
|
107
|
-
"estimatedSessionsRemaining": 15,
|
|
108
|
-
"tasks": [
|
|
109
|
-
{"prompt": "task instruction...", "model": "worker", "postcondition": "test -f src/new-file.ts"},
|
|
110
|
-
{"prompt": "quick icon fix, verified by worker next wave...", "model": "fast"},
|
|
111
|
-
{"prompt": "verify the app end-to-end...", "model": "worker", "noWorktree": true}
|
|
112
|
-
]
|
|
113
|
-
}
|
|
113
|
+
{"done":boolean,"reasoning":"...","statusUpdate":"REQUIRED","estimatedSessionsRemaining":N,"tasks":[{"prompt":"...","model":"worker|fast","noWorktree":true/false,"postcondition":"..."}]}
|
|
114
114
|
|
|
115
115
|
"estimatedSessionsRemaining" is REQUIRED. Your best honest estimate of how many MORE agent sessions (beyond the wave you just composed above) are needed to reach 'amazing' -- include follow-up fixes, polish, verification, and anything else you'd want before shipping. Be realistic, not optimistic. Use 0 only if truly done.
|
|
116
116
|
|
|
117
|
-
The "model" field on each task —
|
|
117
|
+
The "model" field on each task — two kinds of workers. Pick the right one:
|
|
118
|
+
|
|
119
|
+
**Fast worker — "fast" (${fastModel ?? "not set"})** for well-scoped, mechanical tasks: single-file edits, refactors, renames, read/research, build checks, simple critiques, docs updates.
|
|
118
120
|
|
|
119
|
-
**
|
|
120
|
-
- Single-file edits, refactors, renames
|
|
121
|
-
- Read/research: scan files, summarize findings
|
|
122
|
-
- Build checks, postcondition verification
|
|
123
|
-
- E2E test runs with concrete steps
|
|
124
|
-
- Simple critiques, polish tweaks
|
|
121
|
+
**Main worker — "worker" (${workerModel})** for tasks that need deeper reasoning: multi-file features, complex logic, architectural changes, ambiguous specs.
|
|
125
122
|
|
|
126
|
-
|
|
123
|
+
When in doubt, pick "fast".
|
|
127
124
|
|
|
128
|
-
Set "noWorktree": true for verify/user-test tasks
|
|
125
|
+
Set "noWorktree": true for verify/user-test tasks.
|
|
129
126
|
|
|
130
|
-
OPTIONAL "postcondition": a single shell one-liner that exits 0 when the task is truly done.
|
|
127
|
+
OPTIONAL "postcondition": a single shell one-liner that exits 0 when the task is truly done. Keep it cheap. Omit for exploratory tasks.
|
|
131
128
|
|
|
132
|
-
If done: {"done":
|
|
129
|
+
If done: {"done":true,"reasoning":"...","statusUpdate":"...","estimatedSessionsRemaining":0,"tasks":[]}`;
|
|
130
|
+
// ── Hard 6 KB budget: trim non-critical blocks if over limit ──
|
|
131
|
+
let trimmed = 0;
|
|
132
|
+
if (prompt.length > PROMPT_BUDGET) {
|
|
133
|
+
// 1. Keep last 2 waves instead of 3
|
|
134
|
+
recentText = buildRecentText(history, 2);
|
|
135
|
+
prompt = prompt.replace(`Recent waves:\n${buildRecentText(history, 3)}`, `Recent waves:\n${recentText}`);
|
|
136
|
+
trimmed++;
|
|
137
|
+
}
|
|
138
|
+
if (prompt.length > PROMPT_BUDGET && runMemory?.milestones) {
|
|
139
|
+
const old = `\nMilestone snapshots:\n${cap(runMemory.milestones, 2000)}\n`;
|
|
140
|
+
const neu = `\nMilestone snapshots:\n${cap(runMemory.milestones, 1000)}\n`;
|
|
141
|
+
if (old !== neu) {
|
|
142
|
+
prompt = prompt.replace(old, neu);
|
|
143
|
+
trimmed++;
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
if (prompt.length > PROMPT_BUDGET && runMemory?.designs) {
|
|
147
|
+
const old = `\nArchitectural research:\n${cap(runMemory.designs, 1500)}\n`;
|
|
148
|
+
const neu = `\nArchitectural research:\n${cap(runMemory.designs, 1000)}\n`;
|
|
149
|
+
if (old !== neu) {
|
|
150
|
+
prompt = prompt.replace(old, neu);
|
|
151
|
+
trimmed++;
|
|
152
|
+
}
|
|
153
|
+
}
|
|
154
|
+
if (prompt.length > PROMPT_BUDGET && runMemory?.reflections) {
|
|
155
|
+
const old = `\nLatest quality reports:\n${cap(runMemory.reflections, 1000)}\n`;
|
|
156
|
+
const neu = `\nLatest quality reports:\n${cap(runMemory.reflections, 500)}\n`;
|
|
157
|
+
if (old !== neu) {
|
|
158
|
+
prompt = prompt.replace(old, neu);
|
|
159
|
+
trimmed++;
|
|
160
|
+
}
|
|
161
|
+
}
|
|
162
|
+
if (prompt.length > PROMPT_BUDGET && runMemory?.verifications) {
|
|
163
|
+
const old = `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 1000)}\n`;
|
|
164
|
+
const neu = `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 500)}\n`;
|
|
165
|
+
if (old !== neu) {
|
|
166
|
+
prompt = prompt.replace(old, neu);
|
|
167
|
+
trimmed++;
|
|
168
|
+
}
|
|
169
|
+
}
|
|
170
|
+
if (prompt.length > PROMPT_BUDGET && runMemory?.previousRuns) {
|
|
171
|
+
const old = `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 800)}\n`;
|
|
172
|
+
const neu = `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 400)}\n`;
|
|
173
|
+
if (old !== neu) {
|
|
174
|
+
prompt = prompt.replace(old, neu);
|
|
175
|
+
trimmed++;
|
|
176
|
+
}
|
|
177
|
+
}
|
|
178
|
+
if (trimmed > 0) {
|
|
179
|
+
onLog(`Steering prompt trimmed ${trimmed} blocks (${prompt.length}/${PROMPT_BUDGET} chars)`, "event");
|
|
180
|
+
}
|
|
181
|
+
// ── Non-Claude planner JSON hardening ──
|
|
182
|
+
if (!/^claude/i.test(plannerModel)) {
|
|
183
|
+
const directive = `OUTPUT: single JSON object. No prose. No markdown fences.`;
|
|
184
|
+
prompt = `${directive}\n\n${prompt}\n\n${directive}`;
|
|
185
|
+
}
|
|
133
186
|
onLog("Assessing...", "status");
|
|
134
187
|
onLog(`Reading codebase -- wave ${history.length + 1}`, "event");
|
|
135
188
|
const turn = createTurn("steer", `Steer wave ${history.length + 1}`, `steer-${history.length}`, plannerModel);
|
|
@@ -140,11 +193,34 @@ If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSes
|
|
|
140
193
|
if (first)
|
|
141
194
|
return first;
|
|
142
195
|
onLog(`Steering parse failed (${resultText.length} chars). Asking model to fix...`, "event");
|
|
196
|
+
// C2: persist raw output on parse failure
|
|
197
|
+
const steerDir = getTranscriptRunDir() ? join(getTranscriptRunDir(), "steering") : undefined;
|
|
198
|
+
if (steerDir) {
|
|
199
|
+
try {
|
|
200
|
+
mkdirSync(steerDir, { recursive: true });
|
|
201
|
+
}
|
|
202
|
+
catch { }
|
|
203
|
+
// Extract wave info from transcriptName (e.g. "steer-wave-32-attempt-1")
|
|
204
|
+
const waveMatch = transcriptName.match(/wave-(\d+)-attempt-(\d+)/);
|
|
205
|
+
if (waveMatch) {
|
|
206
|
+
writeFileSync(join(steerDir, `wave-${waveMatch[1]}-attempt-${waveMatch[2]}-raw.txt`), resultText, "utf-8");
|
|
207
|
+
}
|
|
208
|
+
}
|
|
143
209
|
const snippet = resultText.length > 2000 ? resultText.slice(0, 1000) + "\n...\n" + resultText.slice(-800) : resultText;
|
|
144
210
|
const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: `${transcriptName}-retry`, turnId: turn.id }, onLog);
|
|
145
211
|
const retryParsed = attemptJsonParse(retryText);
|
|
146
212
|
if (retryParsed)
|
|
147
213
|
return retryParsed;
|
|
214
|
+
// C2: persist retry raw output
|
|
215
|
+
if (steerDir) {
|
|
216
|
+
try {
|
|
217
|
+
const waveMatch2 = transcriptName.match(/wave-(\d+)-attempt-(\d+)/);
|
|
218
|
+
if (waveMatch2) {
|
|
219
|
+
writeFileSync(join(steerDir, `wave-${waveMatch2[1]}-attempt-${waveMatch2[2]}-retry-raw.txt`), retryText, "utf-8");
|
|
220
|
+
}
|
|
221
|
+
}
|
|
222
|
+
catch { }
|
|
223
|
+
}
|
|
148
224
|
throw new Error(`Could not parse steering response after retry (${resultText.length} chars: ${resultText.slice(0, 120)}...)`);
|
|
149
225
|
})();
|
|
150
226
|
const isDone = parsed.done === true;
|
package/dist/swarm.js
CHANGED
|
@@ -29,29 +29,9 @@ function withCursorWorkspaceHeader(env, cwd) {
|
|
|
29
29
|
}
|
|
30
30
|
import { getModelCapability } from "./models.js";
|
|
31
31
|
import { createTurn, beginTurn, endTurn, updateTurn } from "./turns.js";
|
|
32
|
-
const SIMPLIFY_PROMPT = `You just finished your task.
|
|
32
|
+
const SIMPLIFY_PROMPT = `You just finished your task. Review and simplify your changes.
|
|
33
33
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
1. **Reuse**: Search the codebase -- did you write something that already exists? Use existing utilities, helpers, patterns instead. Hand-rolled string manipulation, manual path handling, custom env checks, ad-hoc type guards -- all candidates for existing utilities.
|
|
37
|
-
|
|
38
|
-
2. **Quality**:
|
|
39
|
-
- Redundant state: cached values that could be derived, observers that could be direct calls
|
|
40
|
-
- Copy-paste with slight variation: near-duplicate blocks that should be unified
|
|
41
|
-
- Leaky abstractions: exposing internals or breaking existing abstraction boundaries
|
|
42
|
-
- Stringly-typed code: raw strings where enums/unions already exist
|
|
43
|
-
- Unnecessary JSX nesting: wrappers that add no layout value
|
|
44
|
-
- Comments narrating WHAT the code does -- delete them; keep only non-obvious WHY
|
|
45
|
-
|
|
46
|
-
3. **Efficiency**:
|
|
47
|
-
- Redundant computations, repeated file reads, duplicate API calls
|
|
48
|
-
- Sequential operations that could be parallel
|
|
49
|
-
- Hot-path bloat: new blocking work in startup or per-request paths
|
|
50
|
-
- Recurring no-op updates: state/store updates inside polling loops that fire unconditionally -- add change-detection guard
|
|
51
|
-
- Unnecessary existence checks before operating (TOCTOU anti-pattern)
|
|
52
|
-
- Memory: unbounded data structures, missing cleanup, event listener leaks
|
|
53
|
-
|
|
54
|
-
Less code is better. Delete and simplify rather than add. Fix directly -- no need to explain.`;
|
|
34
|
+
Invoke the \`simplify\` skill to review your changes for reuse, quality, and efficiency, then fix any issues found.`;
|
|
55
35
|
export class Swarm {
|
|
56
36
|
agents = [];
|
|
57
37
|
logs = [];
|
package/dist/transcripts.d.ts
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
export declare function setTranscriptRunDir(dir: string | undefined): void;
|
|
2
2
|
export declare function getTranscriptRunDir(): string | undefined;
|
|
3
3
|
export declare function transcriptPath(name: string): string | undefined;
|
|
4
|
-
/** Append a single event;
|
|
4
|
+
/** Append a single event; log to stderr once per name on failure (C5). */
|
|
5
5
|
export declare function writeTranscriptEvent(name: string, event: Record<string, unknown>): void;
|
package/dist/transcripts.js
CHANGED
|
@@ -25,7 +25,9 @@ export function getTranscriptRunDir() {
|
|
|
25
25
|
export function transcriptPath(name) {
|
|
26
26
|
return _runDir ? join(_runDir, "transcripts", `${name}.ndjson`) : undefined;
|
|
27
27
|
}
|
|
28
|
-
/**
|
|
28
|
+
/** Names that already errored — guard against repeated stderr spam. */
|
|
29
|
+
const _seenErrors = new Set();
|
|
30
|
+
/** Append a single event; log to stderr once per name on failure (C5). */
|
|
29
31
|
export function writeTranscriptEvent(name, event) {
|
|
30
32
|
const path = transcriptPath(name);
|
|
31
33
|
if (!path)
|
|
@@ -34,5 +36,11 @@ export function writeTranscriptEvent(name, event) {
|
|
|
34
36
|
mkdirSync(dirname(path), { recursive: true });
|
|
35
37
|
appendFileSync(path, JSON.stringify({ t: Date.now(), ...event }) + "\n", "utf-8");
|
|
36
38
|
}
|
|
37
|
-
catch {
|
|
39
|
+
catch (err) {
|
|
40
|
+
if (!_seenErrors.has(name)) {
|
|
41
|
+
_seenErrors.add(name);
|
|
42
|
+
const msg = err instanceof Error ? err.message : String(err);
|
|
43
|
+
process.stderr.write(`[transcript] writeTranscriptEvent("${name}") failed: ${msg}\n`);
|
|
44
|
+
}
|
|
45
|
+
}
|
|
38
46
|
}
|
package/dist/types.d.ts
CHANGED
|
@@ -156,9 +156,10 @@ export type MergeStrategy = "yolo" | "branch";
|
|
|
156
156
|
export interface BranchRecord {
|
|
157
157
|
branch: string;
|
|
158
158
|
taskPrompt: string;
|
|
159
|
-
status: "merged" | "unmerged" | "failed" | "merge-failed";
|
|
159
|
+
status: "merged" | "unmerged" | "failed" | "merge-failed" | "discarded";
|
|
160
160
|
filesChanged: number;
|
|
161
161
|
costUsd: number;
|
|
162
|
+
firstFailedWave?: number;
|
|
162
163
|
}
|
|
163
164
|
/** Per-window rate limit snapshot (matches SDK rateLimitType). */
|
|
164
165
|
export interface RateLimitWindow {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-overnight",
|
|
3
|
-
"version": "1.25.
|
|
3
|
+
"version": "1.25.43",
|
|
4
4
|
"description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-overnight",
|
|
3
|
-
"version": "1.25.
|
|
3
|
+
"version": "1.25.43",
|
|
4
4
|
"description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Francesco Fornace"
|
|
@@ -11,7 +11,7 @@ description: >
|
|
|
11
11
|
|
|
12
12
|
# What it is
|
|
13
13
|
|
|
14
|
-
`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast**
|
|
14
|
+
`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable -- nothing is lost.
|
|
15
15
|
|
|
16
16
|
**Three-layer review system** runs on every wave:
|
|
17
17
|
1. **Per-agent self-review** -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm -- no initial context bloat.
|
|
@@ -55,7 +55,7 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
|
|
|
55
55
|
|
|
56
56
|
| File / dir | What it tells you |
|
|
57
57
|
|----------------------|-----------------------------------------------------------------------------------|
|
|
58
|
-
| `run.json` | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
|
|
58
|
+
| `run.json` | Machine state: objective, planner/main-worker/fast-worker models, budget, cost, waves done, branches, done flag. |
|
|
59
59
|
| `status.md` | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
|
|
60
60
|
| `goal.md` | Evolving "north star" -- what the run currently thinks "amazing" means. |
|
|
61
61
|
| `themes.md` | The thinking-wave research angles picked for this objective (human-readable). |
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
name: claude-overnight-coach
|
|
3
3
|
description: >
|
|
4
4
|
Setup coach for claude-overnight. Turns a raw user objective into a ready
|
|
5
|
-
objective plus recommended run settings (budget, concurrency, planner/
|
|
6
|
-
models, flex, usage cap, permission mode)
|
|
5
|
+
objective plus recommended run settings (budget, concurrency, planner /
|
|
6
|
+
main-worker / optional fast-worker models, flex, usage cap, permission mode)
|
|
7
|
+
and an actionable preflight
|
|
7
8
|
checklist. Invoked once, before the interactive pickers, to catch prompt-shape
|
|
8
9
|
failures (vague, overambitious, multi-goal, unverifiable) and environmental
|
|
9
10
|
failures (missing keys, dirty tree, missing .env) while they're still cheap
|
|
@@ -69,8 +70,8 @@ Rules:
|
|
|
69
70
|
- `improvedObjective` preserves the user's voice and domain vocabulary. It MUST include a `Done:` line, a `Critical:` line (or `Critical: none` when nothing is off-limits), and a `Verify by:` line.
|
|
70
71
|
- `recommended.budget` is an integer ≥ 1. `concurrency` is an integer in [1, 12]. `usageCap` is either `null` (unlimited) or a float in (0, 1].
|
|
71
72
|
- `recommended.permissionMode` is `"auto" | "bypassPermissions" | "default"`.
|
|
72
|
-
- `fastModel` is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast model is reachable from the available providers.
|
|
73
|
-
- `recommended.plannerModel` / `workerModel` / `fastModel` MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
|
|
73
|
+
- `fastModel` (the fast-worker model) is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast-worker model is reachable from the available providers.
|
|
74
|
+
- `recommended.plannerModel` (planner) / `workerModel` (main worker) / `fastModel` (fast worker) MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
|
|
74
75
|
- `checklist` `remediation` is an informational label — the host does NOT auto-act on it. Set it to the slug that best describes the issue, or `"none"` for purely advisory items.
|
|
75
76
|
- `questions` is reserved for a future clarification loop; return `[]` for now.
|
|
76
77
|
|
|
@@ -105,27 +106,27 @@ Rows: scope. Each cell is a starting point — adjust by one step when repo fact
|
|
|
105
106
|
|
|
106
107
|
`conc` ⇒ `recommended.concurrency` (clamp to ≤ budget).
|
|
107
108
|
`flex` ⇒ `recommended.flex`.
|
|
108
|
-
`fast=true` ⇒ recommend a fast model **if the user has one configured and reachable** from their available providers. Pick whatever the cheapest fast model is among their providers (e.g. `claude-haiku-4-5`, `composer-2-fast`, `qwen3` variants). If
|
|
109
|
-
`fast=null` ⇒ do not recommend a fast
|
|
109
|
+
`fast=true` ⇒ recommend a fast-worker model **if the user has one configured and reachable** from their available providers. The fast worker is a real worker (same tools, same env) on a cheaper/faster model — steering routes well-scoped tasks to it by default. Pick whatever the cheapest fast-worker model is among their providers (e.g. `claude-haiku-4-5`, `composer-2-fast`, `qwen3` variants). If none is reachable, set `null`.
|
|
110
|
+
`fast=null` ⇒ do not recommend a fast worker (scope too complex or no suitable fast-worker model available).
|
|
110
111
|
`cap=null` ⇒ unlimited (`recommended.usageCap = null`).
|
|
111
112
|
|
|
112
|
-
## Planner / worker model selection
|
|
113
|
+
## Planner / main-worker / fast-worker model selection
|
|
113
114
|
|
|
114
|
-
Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the worker.
|
|
115
|
+
Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the main worker; optionally add a cheaper/faster second model as the fast worker.
|
|
115
116
|
|
|
116
117
|
Decision order (stop at the first row whose providers are present):
|
|
117
118
|
|
|
118
119
|
1. **Anthropic direct available**
|
|
119
120
|
- planner: `claude-opus-4-7` (or its `-thinking-high` variant when scope is `audit-and-fix` / `research-and-implement` / `migration`).
|
|
120
|
-
- worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
|
|
121
|
-
- fastModel: recommend the cheapest fast model available among the user's reachable providers when the matrix says `fast=true`.
|
|
121
|
+
- main worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
|
|
122
|
+
- fast worker (`fastModel`): recommend the cheapest fast-worker model available among the user's reachable providers when the matrix says `fast=true`.
|
|
122
123
|
2. **Custom Anthropic-compatible provider with a strong model** (e.g. `qwen3.6-plus`, `qwen3-coder-plus`)
|
|
123
124
|
- planner: the strongest such model the user has.
|
|
124
|
-
- worker: same model, or a cheaper sibling if the user has one.
|
|
125
|
+
- main worker: same model, or a cheaper sibling if the user has one.
|
|
125
126
|
3. **Cursor proxy is the only reachable provider**
|
|
126
127
|
- planner: `claude-opus-4-7` via Cursor (only if the proxy exposes it).
|
|
127
|
-
- worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
|
|
128
|
-
- fastModel: recommend a Cursor fast model (e.g. `composer-2-fast`) when the matrix says `fast=true`.
|
|
128
|
+
- main worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
|
|
129
|
+
- fast worker (`fastModel`): recommend a Cursor fast-worker model (e.g. `composer-2-fast`) when the matrix says `fast=true`.
|
|
129
130
|
4. **No reachable provider** — leave `plannerModel` and `workerModel` as `claude-sonnet-4-6` and emit a `blocking` checklist item titled "No reachable provider".
|
|
130
131
|
|
|
131
132
|
Never recommend Cursor models when the input does not list a `cursor proxy` provider, and never recommend stock Anthropic IDs when the input does not say "Anthropic direct: available". `fastModel` MUST be `null` rather than guessed.
|