npm - claude-overnight - Versions diffs - 1.25.41 → 1.25.43 - Mend

claude-overnight 1.25.41 → 1.25.43

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +3 -3
package/dist/_version.d.ts +1 -1
package/dist/_version.js +1 -1
package/dist/index.js +2 -2
package/dist/planner-query.js +15 -0
package/dist/providers.js +5 -0
package/dist/run.js +152 -54
package/dist/settings.js +4 -4
package/dist/state.d.ts +1 -1
package/dist/state.js +6 -2
package/dist/steering.d.ts +49 -0
package/dist/steering.js +116 -40
package/dist/swarm.js +2 -22
package/dist/transcripts.d.ts +1 -1
package/dist/transcripts.js +10 -2
package/dist/types.d.ts +2 -1
package/package.json +1 -1
package/plugins/claude-overnight/.claude-plugin/plugin.json +1 -1
package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +2 -2
package/plugins/claude-overnight/skills/coach/SKILL.md +14 -13

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
 Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
-Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits, verified by the worker next wave). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
+Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
 ## Run on Qwen 3.6 Plus
@@ -333,7 +333,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
 ## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
-Planner, worker, and optional fast model are each picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work.
+Planner, main worker, and optional fast worker are each picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work. The fast worker is a real worker (same tools, same env), just on a cheaper/faster model — steering routes well-scoped tasks to it by default.
 From the interactive picker, choose `Other…` on the planner, worker, or fast step:
@@ -353,7 +353,7 @@ From the interactive picker, choose `Other…` on the planner, worker, or fast s
 Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
-**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
+**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, main-worker queries use the worker provider, fast-worker queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
 **Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.

package/dist/_version.d.ts CHANGED Viewed

	@@ -1 +1 @@
1	- export declare const VERSION = "1.25.41";
1	+ export declare const VERSION = "1.25.42";

package/dist/_version.js CHANGED Viewed

@@ -1,2 +1,2 @@
 // Auto-generated by build — do not edit manually.
-export const VERSION = "1.25.41";
+export const VERSION = "1.25.42";

package/dist/index.js CHANGED Viewed

@@ -156,7 +156,7 @@ async function main() {
     --budget=N             Target number of agent runs ${chalk.dim("(default: 10)")}
     --concurrency=N        Max parallel agents ${chalk.dim("(default: 5)")}
     --model=NAME           Worker model override ${chalk.dim("(interactive mode picks planner + worker separately  -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
-    --fast-model=NAME      Fast model for quick tasks ${chalk.dim("(optional  -- checked by worker model in next wave)")}
+    --fast-model=NAME      Fast worker model for quick tasks ${chalk.dim("(optional  -- checked by next wave's workers)")}
     --usage-cap=N          Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
     --allow-extra-usage    Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
     --extra-usage-budget=N Max $ for extra usage ${chalk.dim("(implies --allow-extra-usage)")}
@@ -888,7 +888,7 @@ async function main() {
                     console.error(chalk.red(`  Fix the provider at ~/.claude/claude-overnight/providers.json and retry.`));
                 }
                 if (degradable) {
-                    console.error(chalk.yellow(`  Continuing without fast — fast-eligible tasks will run on the worker model instead.`));
+                    console.error(chalk.yellow(`  Continuing without the fast worker — fast-eligible tasks will run on the main worker model instead.`));
                     console.error("");
                     fastDegraded = true;
                     continue;

package/dist/planner-query.js CHANGED Viewed

@@ -619,6 +619,21 @@ function extractOutermostBraces(text) {
     return null;
 }
 export function attemptJsonParse(text) {
+    // Strip conversational prefaces/suffixes that weak-schema models sometimes
+    // wrap around the JSON body (e.g. "Here is the JSON: { ... } Let me know…").
+    const preface = /^\s*(?:Here (?:is|are)[^{]*|Let me[^{]*|I'?ll[^{]*|Sure[^{]*|Okay[^{]*)/i;
+    const suffix = /\n\n(?:Let me know|Hope this|Please let me)[\s\S]*$/i;
+    if (preface.test(text) || suffix.test(text)) {
+        const cleaned = text.replace(preface, "").replace(suffix, "").trim();
+        if (cleaned && cleaned !== text) {
+            try {
+                const obj = JSON.parse(cleaned);
+                if (typeof obj === "object" && obj !== null)
+                    return obj;
+            }
+            catch { }
+        }
+    }
     try {
         const obj = JSON.parse(text);
         if (typeof obj === "object" && obj !== null)

package/dist/providers.js CHANGED Viewed

@@ -178,6 +178,11 @@ export function envFor(p) {
         base.ANTHROPIC_AUTH_TOKEN = key;
     }
     delete base.ANTHROPIC_API_KEY;
+    // Prevent CURSOR_API_KEY from leaking into non-proxy envs — would cause
+    // isCursorProxyEnv false-positive, silently rerouting through direct fetch
+    // which ignores outputFormat (no JSON schema enforcement).
+    delete base.CURSOR_API_KEY;
+    delete base.CURSOR_AUTH_TOKEN;
     return base;
 }
 /**

package/dist/run.js CHANGED Viewed

@@ -3,8 +3,8 @@ import { join } from "path";
 import { execSync } from "child_process";
 import chalk from "chalk";
 import { Swarm } from "./swarm.js";
-import { steerWave } from "./steering.js";
-import { getTotalPlannerCost, getPlannerRateLimitInfo, getPeakPlannerContext, runPlannerQuery, setPlannerEnvResolver } from "./planner-query.js";
+import { steerWave, STEER_SCHEMA } from "./steering.js";
+import { getTotalPlannerCost, getPlannerRateLimitInfo, getPeakPlannerContext, runPlannerQuery, setPlannerEnvResolver, attemptJsonParse } from "./planner-query.js";
 import { contextFillInfo } from "./render.js";
 import { getModelCapability } from "./models.js";
 import { buildEnvResolver, isCursorProxyProvider } from "./providers.js";
@@ -55,6 +55,8 @@ export async function executeRun(cfg) {
     let lastCapped = false, lastAborted = false, objectiveComplete = false;
     let lastEstimate;
     const branches = [];
+    let healFailStreak = 0; // consecutive waves where heal-0 agent changed 0 files
+    let zeroFileWaves = 0; // consecutive waves with 0 files across non-heal tasks
     if (cfg.resuming && cfg.resumeState) {
         const rs = cfg.resumeState;
         remaining = Math.max(1, rs.remaining);
@@ -295,8 +297,21 @@ export async function executeRun(cfg) {
     // Shared steering logic used by both resume-steering and in-loop steering
     const runSteering = async () => {
         let steered = false;
+        // ── B1: Skip steering when ≥2 unresolved merge-failed branches exist ──
+        const mergeFailedBranches = branches.filter(b => b.status === "merge-failed");
+        if (mergeFailedBranches.length >= 2) {
+            currentTasks = mergeFailedBranches.map((b, i) => ({
+                id: `branch-retry-${i}`,
+                prompt: `Your previous attempt at this task merge-failed against main. Redo it against the current state of main with minimal, focused edits. Original task:\n\n${b.taskPrompt}`,
+                model: workerModel,
+                postcondition: "pnpm run build",
+            }));
+            display.appendSteeringEvent(`Skipping steering — ${mergeFailedBranches.length} merge-failed branches form the wave`);
+            return true;
+        }
         let steerAttempts = 0;
-        while (!steered && remaining > 0 && !stopping && steerAttempts < 3) {
+        const MAX_STEER_ATTEMPTS = 2; // B2: retry threshold 3 → 2
+        while (!steered && remaining > 0 && !stopping && steerAttempts < MAX_STEER_ATTEMPTS) {
             steerAttempts++;
             const plannerCostBefore = getTotalPlannerCost();
             try {
@@ -350,23 +365,52 @@ export async function executeRun(cfg) {
             }
             catch (err) {
                 accCost += getTotalPlannerCost() - plannerCostBefore;
-                if (steerAttempts < 3) {
-                    display.appendSteeringEvent(`Steering failed (attempt ${steerAttempts}/3)  -- retrying...`);
+                const rawPreview = err?.message?.slice(0, 200) || "(no details)";
+                if (steerAttempts < MAX_STEER_ATTEMPTS) {
+                    display.appendSteeringEvent(`Steering failed (attempt ${steerAttempts}/${MAX_STEER_ATTEMPTS})  -- retrying... ${rawPreview}`);
                     continue;
                 }
-                display.appendSteeringEvent(`Steering failed ${steerAttempts}×  -- falling back`);
-                let fallbackStatus = "";
+                // ── B3: Decomposer fallback (replaces single-giant-fallback) ──
+                display.appendSteeringEvent(`Steering failed ${MAX_STEER_ATTEMPTS}×  — decomposer fallback`);
+                // First: try merge-failed recycling even if only 1 unresolved branch exists
+                const stillFailed = branches.filter(b => b.status === "merge-failed");
+                if (stillFailed.length >= 1) {
+                    currentTasks = stillFailed.map((b, i) => ({
+                        id: `branch-retry-${i}`,
+                        prompt: `Your previous attempt at this task merge-failed against main. Redo it against the current state of main with minimal, focused edits. Original task:\n\n${b.taskPrompt}`,
+                        model: workerModel,
+                        postcondition: "pnpm run build",
+                    }));
+                    display.appendSteeringEvent(`Decomposer: ${stillFailed.length} merge-failed branch(es) retried as swarm tasks`);
+                    steered = true;
+                    break;
+                }
+                // Second: minimal-prompt planner query
+                display.appendSteeringEvent("Decomposer: minimal planner query…");
                 try {
-                    fallbackStatus = readFileSync(join(runDir, "status.md"), "utf-8");
+                    let statusText = "";
+                    try {
+                        statusText = readFileSync(join(runDir, "status.md"), "utf-8");
+                    }
+                    catch { }
+                    const minimalPrompt = `${objective ? `Objective: ${objective}` : ""}\n\nStatus:\n${statusText || "(none)"}\n\nReturn tasks: string[] — 3-6 specific follow-ups. JSON only. {"tasks":[{"prompt":"..."}]}`;
+                    const minimalText = await runPlannerQuery(minimalPrompt, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: "decomposer-minimal", maxTurns: 40 }, () => { });
+                    const parsed = attemptJsonParse(minimalText);
+                    if (parsed?.tasks?.length > 0) {
+                        currentTasks = parsed.tasks.map((t, i) => ({
+                            id: `decompose-${i}`,
+                            prompt: typeof t === "string" ? t : t.prompt,
+                            model: workerModel,
+                        }));
+                        display.appendSteeringEvent(`Decomposer: ${currentTasks.length} tasks from minimal planner`);
+                        steered = true;
+                        break;
+                    }
                 }
                 catch { }
-                currentTasks = [{
-                        id: "fallback-0",
-                        prompt: `Steering couldn't decide the next step. Read the project, assess what's done vs. remaining, and do the most impactful work.\n\nObjective: ${objective}${fallbackStatus ? `\n\nStatus:\n${fallbackStatus}` : ""}`,
-                        type: "execute",
-                    }];
-                steered = true;
-                break;
+                // Finally: halt
+                display.appendSteeringEvent(`Decomposer: no tasks produced — halting`);
+                return false;
             }
         }
         return steered;
@@ -389,12 +433,26 @@ export async function executeRun(cfg) {
             // Health check before each wave: a broken build poisons every subsequent
             // agent context, so prepend a heal task when detected. Steering-planned
             // tasks still run, just after the build is green again.
+            // Skip if prior heal changed 0 files (heal unable to fix).
             {
-                const healTask = checkProjectHealth(cwd);
-                if (healTask && remaining > 0) {
-                    const withoutDup = currentTasks.filter(t => t.id !== "heal-0");
-                    currentTasks = [healTask, ...withoutDup];
-                    display.appendSteeringEvent(`Health check: build broken — queued heal task`);
+                const healTasks = healFailStreak > 0 ? [] : checkProjectHealth(cwd);
+                if (healTasks.length > 0 && remaining > 0) {
+                    const healIds = healTasks.map(t => t.id);
+                    const withoutDup = currentTasks.filter(t => !healIds.includes(t.id));
+                    currentTasks = [...healTasks, ...withoutDup];
+                    display.appendSteeringEvent(`Health check: build broken — queued ${healTasks.length} heal task(s)`);
+                }
+                else if (healTasks.length === 0 && healFailStreak > 0 && checkProjectHealth(cwd).length > 0) {
+                    display.appendSteeringEvent(`Health check: build broken — heal skipped after ${healFailStreak} failed attempts, needs manual intervention`);
+                    try {
+                        const statusPath2 = join(runDir, "status.md");
+                        const existing2 = existsSync(statusPath2) ? readFileSync(statusPath2, "utf-8") : "";
+                        const marker = "## Heal blocked";
+                        if (!existing2.includes(marker)) {
+                            writeFileSync(statusPath2, `${existing2}${existing2 ? "\n\n" : ""}${marker}\nBuild has been broken for ${healFailStreak} waves, heal agents unable to fix — intervene manually.\n`, "utf-8");
+                        }
+                    }
+                    catch { }
                 }
             }
             if (currentTasks.length > remaining)
@@ -598,7 +656,7 @@ export async function executeRun(cfg) {
             liveConfig.remaining = remaining;
             lastCapped = swarm.cappedOut;
             lastAborted = swarm.aborted;
-            recordBranches(swarm.agents, swarm.mergeResults, branches);
+            recordBranches(swarm.agents, swarm.mergeResults, branches, waveNum);
             saveWaveSession(runDir, waveNum, swarm.agents, swarm.totalCostUsd);
             // Tasks that never made it into the swarm (queue cleared on abort/cap)
             // are preserved as currentTasks so resume picks them up. Budget for these
@@ -623,6 +681,34 @@ export async function executeRun(cfg) {
                     };
                 }),
             });
+            // Track heal fail streak: if a heal-0 task existed this wave and changed 0 files, increment.
+            // If any non-heal execute task changed files, reset.
+            const lastWave = waveHistory[waveHistory.length - 1];
+            const healTask = lastWave?.tasks.find(t => t.type === "heal");
+            if (healTask && !healTask.filesChanged) {
+                healFailStreak++;
+            }
+            else if (lastWave?.tasks.some(t => (t.type !== "heal") && (t.filesChanged ?? 0) > 0)) {
+                healFailStreak = 0;
+            }
+            // C1: Circuit breaker — halt after 2 consecutive waves with 0 files across non-heal tasks
+            const nonHealFiles = lastWave?.tasks.filter(t => t.type !== "heal").reduce((sum, t) => sum + (t.filesChanged ?? 0), 0) ?? 0;
+            if (nonHealFiles === 0 && waveNum > 0) {
+                zeroFileWaves++;
+                if (zeroFileWaves >= 2) {
+                    display.appendSteeringEvent(`Circuit breaker: 2 consecutive waves produced no merged changes — halting to prevent budget drain`);
+                    display.stop();
+                    saveRunState(runDir, buildRunState({ remaining, phase: "stopped", currentTasks: [] }));
+                    display.stop();
+                    restore();
+                    console.log(chalk.red(`\n  Circuit breaker: 2 consecutive waves produced no merged changes.`));
+                    console.log(chalk.red(`  Halting to prevent budget drain. Run preserved at ${runDir}.`));
+                    process.exit(3);
+                }
+            }
+            else {
+                zeroFileWaves = 0;
+            }
             // Hook-blocked work: agents that touched files but nothing landed on the
             // branch (pre-commit hooks, gitignore, writes outside worktree). Surface
             // as a wave-level warning so steering sees it, not just a per-agent log.
@@ -670,6 +756,20 @@ export async function executeRun(cfg) {
                 }
                 if (next !== existing)
                     writeFileSync(statusPath, next, "utf-8");
+                // GC ghost branches: delete merge-failed branches ≥2 waves old and mark discarded.
+                // Safe: their work never landed. The decomposer (Phase B) will re-attempt from saved taskPrompt.
+                const gcCandidates = branches.filter(b => b.status === "merge-failed" && b.firstFailedWave !== undefined && (waveNum - b.firstFailedWave) >= 2);
+                let gcCount = 0;
+                for (const b of gcCandidates) {
+                    try {
+                        execSync(`git branch -D "${b.branch}"`, { cwd, stdio: "ignore" });
+                    }
+                    catch { }
+                    b.status = "discarded";
+                    gcCount++;
+                }
+                if (gcCount > 0)
+                    display.appendSteeringEvent(`GC: discarded ${gcCount} ghost branch(es) ≥2 waves old`);
             }
             catch { }
             // Fire-and-forget debrief after each wave.
@@ -979,34 +1079,11 @@ export async function executeRun(cfg) {
 }
 function reviewPrompt(scope, objective) {
     const scopeLine = scope === "wave"
-        ? "You are reviewing all changes made in the most recent wave of agent work."
+        ? "Review and simplify all changes from the most recent wave."
         : `You are the final quality gate before this autonomous run completes.\n\nThe objective was: ${objective || "improve the codebase"}`;
-    const diffCmd = scope === "wave"
-        ? "Run `git diff` to see what changed."
-        : "Run `git diff main` (or `git diff HEAD` if on the same branch) to see ALL changes made during this run.";
-    const checks = scope === "wave"
-        ? `1. **Missed reuse**: Did any agent write something that already exists elsewhere? Find existing utilities and suggest replacements.
-2. **Quality issues**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code where enums exist, unnecessary JSX nesting, comments that narrate what the code does.
-3. **Efficiency problems**: Redundant computations, sequential operations that could be parallel, hot-path bloat, recurring no-op updates, TOCTOU patterns, memory leaks.
-4. **Merge conflicts or inconsistencies**: Changes that work against each other or break existing patterns.`
-        : `1. **Architecture coherence**: Do the changes form a coherent whole, or are they a patchwork of independent edits that don't fit together?
-2. **Missed reuse**: Any new code that duplicates existing functionality?
-3. **Quality**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code, unnecessary nesting, narrative comments.
-4. **Efficiency**: N+1 patterns, redundant computations, hot-path bloat, missing cleanup, unbounded data structures.
-5. **Consistency**: Do all changes follow the project's existing patterns, conventions, and design system?
-6. **Build and test**: Run the build and any existing tests. Fix any breakage.`;
-    const close = scope === "wave"
-        ? "Fix issues directly. Delete and simplify rather than add. If the code is already clean, skip."
-        : "Fix issues directly. Delete and simplify. If the codebase is clean and the build passes, say so.";
     return `${scopeLine}
-${diffCmd} Review for:
-${checks}
-${close}
-No need to explain your changes  -- just fix them.`;
+Invoke the \`simplify\` skill to review changed code for reuse, quality, and efficiency, then fix any issues found.`;
 }
 async function runReview(opts, scope, objective, onSwarm) {
     const swarm = new Swarm({
@@ -1062,24 +1139,45 @@ async function promptBudgetExtension(ctx) {
         return suggested;
     return n;
 }
+/** Detect build errors and return one or more heal tasks. If errors span ≥2 files,
+ *  emit one task per file so they heal in parallel without merge conflicts. */
 function checkProjectHealth(cwd) {
     const cmd = detectHealthCommand(cwd);
     if (!cmd)
-        return undefined;
+        return [];
     try {
         execSync(cmd, { cwd, encoding: "utf-8", stdio: "pipe", timeout: 60_000 });
-        return undefined;
+        return [];
     }
     catch (err) {
         if (err.killed)
-            return undefined;
+            return [];
         const output = ((err.stdout || "") + "\n" + (err.stderr || "")).trim();
         const trimmed = output.length > 4000 ? output.slice(0, 2000) + "\n…\n" + output.slice(-2000) : output;
-        return {
-            id: "heal-0",
-            prompt: `Fix the broken build. \`${cmd}\` fails after merging parallel work:\n\`\`\`\n${trimmed}\n\`\`\`\nFix every error. Run \`${cmd}\` when done to verify.`,
-            type: "heal",
-        };
+        // B4: Split heal by file — extract distinct source file paths from errors
+        const fileRe = /\/src\/[\w./-]+\.(ts|tsx|js|jsx)/g;
+        const files = new Set();
+        for (const m of trimmed.matchAll(fileRe))
+            files.add(m[0]);
+        if (files.size >= 2) {
+            // One task per file — each agent gets only that file's error context
+            const fileErrors = new Map();
+            for (const f of files) {
+                // Extract lines mentioning this file
+                const lines = trimmed.split("\n").filter(l => l.includes(f));
+                fileErrors.set(f, lines.slice(0, 30).join("\n"));
+            }
+            return Array.from(fileErrors.entries()).map(([file, errs], i) => ({
+                id: `heal-${i}`,
+                prompt: `Fix the broken build errors in \`${file}\`. \`${cmd}\` fails:\n\`\`\`\n${errs}\n\`\`\`\nFix every error in this file. Run \`${cmd}\` when done to verify.`,
+                type: "heal",
+            }));
+        }
+        return [{
+                id: "heal-0",
+                prompt: `Fix the broken build. \`${cmd}\` fails after merging parallel work:\n\`\`\`\n${trimmed}\n\`\`\`\nFix every error. Run \`${cmd}\` when done to verify.`,
+                type: "heal",
+            }];
     }
 }
 function detectHealthCommand(cwd) {

package/dist/settings.js CHANGED Viewed

@@ -25,12 +25,12 @@ export async function editRunSettings(options) {
     s.workerModel = workerPick.model;
     s.workerProviderId = workerPick.providerId;
     const suggestFast = !!(options.defaults?.fastModel);
-    const fastChoice = await select(`${chalk.cyan("③")} Fast model ${chalk.dim("(optional  -- Haiku/Qwen for quick tasks, checked by worker)")}:`, [
-        { name: "Skip", value: "skip", hint: "two-tier mode only (current setup)" },
-        { name: "Pick a fast model", value: "pick", hint: "Haiku, Qwen, or any provider  -- for well-scoped tasks" },
+    const fastChoice = await select(`${chalk.cyan("③")} Fast worker model ${chalk.dim("(optional  -- Haiku/Qwen for well-scoped tasks, checked by next wave's workers)")}:`, [
+        { name: "Skip", value: "skip", hint: "single-worker mode (main worker handles everything)" },
+        { name: "Pick a fast worker", value: "pick", hint: "Haiku, Qwen, or any provider  -- a cheaper, faster second worker" },
     ], suggestFast ? 1 : 0);
     if (fastChoice === "pick") {
-        const fastPick = await pickModel(`${chalk.cyan("③b")} Fast model:`, models, options.defaults?.fastModel ?? s.fastModel);
+        const fastPick = await pickModel(`${chalk.cyan("③b")} Fast worker model:`, models, options.defaults?.fastModel ?? s.fastModel);
         s.fastModel = fastPick.model;
         s.fastProviderId = fastPick.providerId;
     }

package/dist/state.d.ts CHANGED Viewed

@@ -72,6 +72,6 @@ export declare function recordBranches(agents: {
 }[], mergeResults: {
     branch: string;
     ok: boolean;
-}[], branches: BranchRecord[]): void;
+}[], branches: BranchRecord[], currentWave?: number): void;
 export declare function autoMergeBranches(cwd: string, branches: BranchRecord[], onLog: (msg: string) => void): void;
 export declare function archiveMilestone(baseDir: string, waveNum: number): void;

package/dist/state.js CHANGED Viewed

@@ -461,7 +461,7 @@ export function loadWaveHistory(runDir) {
     }
 }
 // ── Branch management ──
-export function recordBranches(agents, mergeResults, branches) {
+export function recordBranches(agents, mergeResults, branches, currentWave) {
     for (const a of agents) {
         if (a.branch) {
             branches.push({
@@ -475,8 +475,12 @@ export function recordBranches(agents, mergeResults, branches) {
     }
     for (const mr of mergeResults) {
         const br = branches.find(b => b.branch === mr.branch);
-        if (br)
+        if (br) {
             br.status = mr.ok ? "merged" : "merge-failed";
+            if (!mr.ok && !br.firstFailedWave && currentWave !== undefined) {
+                br.firstFailedWave = currentWave;
+            }
+        }
     }
 }
 export function autoMergeBranches(cwd, branches, onLog) {

package/dist/steering.d.ts CHANGED Viewed

@@ -1,3 +1,52 @@
 import type { PermMode, SteerResult, RunMemory, WaveSummary } from "./types.js";
 import { type PlannerLog } from "./planner-query.js";
+export declare const STEER_SCHEMA: {
+    type: "json_schema";
+    schema: {
+        type: string;
+        properties: {
+            done: {
+                type: string;
+            };
+            reasoning: {
+                type: string;
+            };
+            statusUpdate: {
+                type: string;
+            };
+            goalUpdate: {
+                type: string;
+            };
+            estimatedSessionsRemaining: {
+                type: string;
+            };
+            tasks: {
+                type: string;
+                items: {
+                    type: string;
+                    properties: {
+                        prompt: {
+                            type: string;
+                        };
+                        model: {
+                            type: string;
+                        };
+                        noWorktree: {
+                            type: string;
+                        };
+                        type: {
+                            type: string;
+                            enum: string[];
+                        };
+                        postcondition: {
+                            type: string;
+                        };
+                    };
+                    required: string[];
+                };
+            };
+        };
+        required: string[];
+    };
+};
 export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, fastModel: string | undefined, permissionMode: PermMode, concurrency: number, onLog: PlannerLog, runMemory?: RunMemory, transcriptName?: string): Promise<SteerResult>;

package/dist/steering.js CHANGED Viewed

@@ -2,7 +2,10 @@ import { runPlannerQuery, attemptJsonParse, postProcess } from "./planner-query.
 import { contextConstraintNote } from "./models.js";
 import { DESIGN_THINKING } from "./planner.js";
 import { createTurn, beginTurn, endTurn } from "./turns.js";
-const STEER_SCHEMA = {
+import { writeFileSync, mkdirSync } from "fs";
+import { join } from "path";
+import { getTranscriptRunDir } from "./transcripts.js";
+export const STEER_SCHEMA = {
     type: "json_schema",
     schema: {
         type: "object",
@@ -24,10 +27,11 @@ const STEER_SCHEMA = {
         required: ["done", "tasks", "reasoning", "statusUpdate", "estimatedSessionsRemaining"],
     },
 };
-export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
-    const constraint = contextConstraintNote(workerModel);
-    const recentWaves = history.slice(-3);
-    const recentText = recentWaves.length > 0 ? recentWaves.map(w => {
+const PROMPT_BUDGET = 6000;
+/** Build a compact wave summary; keepLast controls how many recent waves to include. */
+function buildRecentText(history, keepLast) {
+    const recentWaves = history.slice(-keepLast);
+    return recentWaves.length > 0 ? recentWaves.map(w => {
         const lines = w.tasks.map(t => {
             const isExecute = !t.type || t.type === "execute";
             const files = t.filesChanged ? ` (${t.filesChanged} files)` : isExecute ? " (0 files)" : " (read-only)";
@@ -39,16 +43,25 @@ export async function steerWave(objective, history, remainingBudget, cwd, planne
         const warn = totalExecute > 0 && zeroExecute > totalExecute / 2 ? `\n  ⚠ ${zeroExecute}/${totalExecute} execute tasks changed 0 files  -- tasks may be mis-scoped or blocked` : "";
         return `Wave ${w.wave + 1}:\n${lines}${warn}`;
     }).join("\n\n") : "(first wave)";
+}
+export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, fastModel, permissionMode, concurrency, onLog, runMemory, transcriptName = "steer") {
+    const constraint = contextConstraintNote(workerModel);
     const cap = (s, max) => s.length > max ? s.slice(0, max) + "\n...(truncated)" : s;
     const statusBlock = runMemory?.status ? `\nCurrent project status:\n${runMemory.status}\n` : "";
-    const milestoneBlock = runMemory?.milestones ? `\nMilestone snapshots:\n${cap(runMemory.milestones, 4000)}\n` : "";
-    const designBlock = runMemory?.designs ? `\nArchitectural research:\n${cap(runMemory.designs, 4000)}\n` : "";
-    const reflectionBlock = runMemory?.reflections ? `\nLatest quality reports:\n${cap(runMemory.reflections, 3000)}\n` : "";
-    const verificationBlock = runMemory?.verifications ? `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 3000)}\n` : "";
+    const milestoneBlock = runMemory?.milestones ? `\nMilestone snapshots:\n${cap(runMemory.milestones, 2000)}\n` : "";
+    const designBlock = runMemory?.designs ? `\nArchitectural research:\n${cap(runMemory.designs, 1500)}\n` : "";
+    const reflectionBlock = runMemory?.reflections ? `\nLatest quality reports:\n${cap(runMemory.reflections, 1000)}\n` : "";
+    const verificationBlock = runMemory?.verifications ? `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 1000)}\n` : "";
     const goalBlock = runMemory?.goal ? `\nNorth star  -- what "amazing" means:\n${runMemory.goal}\n` : "";
-    const prevRunBlock = runMemory?.previousRuns ? `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 3000)}\n` : "";
+    const prevRunBlock = runMemory?.previousRuns ? `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 800)}\n` : "";
     const guidanceBlock = runMemory?.userGuidance ? `\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\nUSER DIRECTIVES  -- highest priority\nThese come directly from the user running this session. They override prior assumptions about status, goal, and next steps. Incorporate them into the wave you compose below. If they conflict with earlier decisions, the user wins. Reflect the new direction in statusUpdate so future waves remember.\n\n${cap(runMemory.userGuidance, 4000)}\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n` : "";
-    const prompt = `You are the quality director for an autonomous multi-wave agent system. Your job is to push the work toward "amazing," not just "done."
+    // Collapse archetype menu after wave 3 to save ~2 KB
+    const archetypesShort = `Archetypes: execute | explore | critique | synthesize | verify | user-test | polish | simplify`;
+    const archetypeBlock = history.length >= 3
+        ? archetypesShort
+        : null;
+    let recentText = buildRecentText(history, 3);
+    let prompt = `You are the quality director for an autonomous multi-wave agent system. Your job is to push the work toward "amazing," not just "done."
 ${guidanceBlock}
 Objective: ${objective}
 ${goalBlock}${statusBlock}${milestoneBlock}${prevRunBlock}
@@ -66,7 +79,7 @@ If verification found issues, those are the priority. Fix what's broken before b
 ## Compose the next wave
-You have full creative freedom. Design the wave that will have the highest impact right now. Here are archetypes to draw from  -- mix, adapt, or invent your own:
+You have full creative freedom. Design the wave that will have the highest impact right now.${archetypeBlock ? `\n\nUse these archetypes as shorthand — mix, adapt, or invent your own:\n\n${archetypeBlock}` : ` Here are archetypes to draw from  -- mix, adapt, or invent your own:
 **Execute**  -- Agents implement concrete changes in parallel. Each touches different files. The bread and butter.
   Example: 5 agents each owning a different feature or fix
@@ -89,47 +102,87 @@ You have full creative freedom. Design the wave that will have the highest impac
 **Polish**  -- Agents focus purely on feel: loading states, error messages, micro-interactions, empty states, responsiveness. Not features  -- the texture that makes users trust the product.
   Example: 2 agents, one on happy paths, one on error/edge states
-**Simplify**  -- A fast model agent reviews recent changes and cleans them up. Set "model": "fast".
-  Example: 1 fast agent reviews files changed in the last wave, runs git diff, removes bloat
-You can combine these. A wave can have 3 execute agents + 1 verification agent. Or 2 divergent explorers. Whatever the situation calls for.
+**Simplify**  -- Invoke the 'simplify' skill. It reviews changed code and spawns parallel sub-agents for thorough review.
+  Example: 1 agent per wave with task type "review", let the skill handle the rest`}
-For non-execute tasks (critique, verify, user-test, synthesize), tell agents to write their output to files in the run directory so findings persist for future waves. Use paths like: .claude-overnight/latest/reflections/wave-N-{topic}.md or .claude-overnight/latest/verifications/wave-N-{topic}.md.
+For non-execute tasks (critique, verify, user-test, synthesize), tell agents to write their output to files in the run directory so findings persist for future waves. Use paths like: .claude-overnight/latest/reflections/wave-n-{topic}.md or .claude-overnight/latest/verifications/wave-n-{topic}.md.
 IMPORTANT: You cannot declare "done" unless at least one verification has confirmed the app works. If you're considering done but haven't verified, compose a verification task first.
 Respond with ONLY a JSON object (no markdown fences):
-{
-  "done": false,
-  "reasoning": "your assessment and why you chose this wave composition",
-  "goalUpdate": "optional  -- refine what 'amazing' means as you learn more",
-  "statusUpdate": "REQUIRED  -- concise project status: what's built, what works, what's rough, quality level, key gaps. This replaces the previous status.",
-  "estimatedSessionsRemaining": 15,
-  "tasks": [
-    {"prompt": "task instruction...", "model": "worker", "postcondition": "test -f src/new-file.ts"},
-    {"prompt": "quick icon fix, verified by worker next wave...", "model": "fast"},
-    {"prompt": "verify the app end-to-end...", "model": "worker", "noWorktree": true}
-  ]
-}
+{"done":boolean,"reasoning":"...","statusUpdate":"REQUIRED","estimatedSessionsRemaining":N,"tasks":[{"prompt":"...","model":"worker|fast","noWorktree":true/false,"postcondition":"..."}]}
 "estimatedSessionsRemaining" is REQUIRED. Your best honest estimate of how many MORE agent sessions (beyond the wave you just composed above) are needed to reach 'amazing'  -- include follow-up fixes, polish, verification, and anything else you'd want before shipping. Be realistic, not optimistic. Use 0 only if truly done.
-The "model" field on each task — pick based on the task's scope and risk:
+The "model" field on each task — two kinds of workers. Pick the right one:
+**Fast worker — "fast" (${fastModel ?? "not set"})** for well-scoped, mechanical tasks: single-file edits, refactors, renames, read/research, build checks, simple critiques, docs updates.
-**Use "fast" (${fastModel ?? "not set"})** for well-scoped, mechanical tasks where speed matters more than deep reasoning. The next wave's worker pass will catch and fix any issues:
-- Single-file edits, refactors, renames
-- Read/research: scan files, summarize findings
-- Build checks, postcondition verification
-- E2E test runs with concrete steps
-- Simple critiques, polish tweaks
+**Main worker — "worker" (${workerModel})** for tasks that need deeper reasoning: multi-file features, complex logic, architectural changes, ambiguous specs.
-**Use "worker" (${workerModel})** for multi-file features, complex logic, architectural changes, and any task where model quality matters.
+When in doubt, pick "fast".
-Set "noWorktree": true for verify/user-test tasks -- they need the real project directory with env files, dependencies, and local config.
+Set "noWorktree": true for verify/user-test tasks.
-OPTIONAL "postcondition": a single shell one-liner that exits 0 when the task is truly done. The framework runs it after merge; if it fails, the agent's "no-op" claim is rejected and the task is retried with the failure output as context. Use it whenever the task has a concrete, machine-checkable outcome. Examples: \`test -f src/tracking/watchlist-poller.ts && grep -q "runWatchlistPoll" src/tracking/watchlist-poller.ts\`, \`grep -q "watchlistPollerTask" src/scraper/scheduler.ts\`, \`pnpm run build\`, \`diff -q src/public/index.html frontend/dist/index.html\`. Keep it cheap (sub-second, no network). Omit for exploratory/research tasks where there is no crisp check.
+OPTIONAL "postcondition": a single shell one-liner that exits 0 when the task is truly done. Keep it cheap. Omit for exploratory tasks.
-If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSessionsRemaining": 0, "tasks": []}`;
+If done: {"done":true,"reasoning":"...","statusUpdate":"...","estimatedSessionsRemaining":0,"tasks":[]}`;
+    // ── Hard 6 KB budget: trim non-critical blocks if over limit ──
+    let trimmed = 0;
+    if (prompt.length > PROMPT_BUDGET) {
+        // 1. Keep last 2 waves instead of 3
+        recentText = buildRecentText(history, 2);
+        prompt = prompt.replace(`Recent waves:\n${buildRecentText(history, 3)}`, `Recent waves:\n${recentText}`);
+        trimmed++;
+    }
+    if (prompt.length > PROMPT_BUDGET && runMemory?.milestones) {
+        const old = `\nMilestone snapshots:\n${cap(runMemory.milestones, 2000)}\n`;
+        const neu = `\nMilestone snapshots:\n${cap(runMemory.milestones, 1000)}\n`;
+        if (old !== neu) {
+            prompt = prompt.replace(old, neu);
+            trimmed++;
+        }
+    }
+    if (prompt.length > PROMPT_BUDGET && runMemory?.designs) {
+        const old = `\nArchitectural research:\n${cap(runMemory.designs, 1500)}\n`;
+        const neu = `\nArchitectural research:\n${cap(runMemory.designs, 1000)}\n`;
+        if (old !== neu) {
+            prompt = prompt.replace(old, neu);
+            trimmed++;
+        }
+    }
+    if (prompt.length > PROMPT_BUDGET && runMemory?.reflections) {
+        const old = `\nLatest quality reports:\n${cap(runMemory.reflections, 1000)}\n`;
+        const neu = `\nLatest quality reports:\n${cap(runMemory.reflections, 500)}\n`;
+        if (old !== neu) {
+            prompt = prompt.replace(old, neu);
+            trimmed++;
+        }
+    }
+    if (prompt.length > PROMPT_BUDGET && runMemory?.verifications) {
+        const old = `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 1000)}\n`;
+        const neu = `\nVerification results (from actually running the app):\n${cap(runMemory.verifications, 500)}\n`;
+        if (old !== neu) {
+            prompt = prompt.replace(old, neu);
+            trimmed++;
+        }
+    }
+    if (prompt.length > PROMPT_BUDGET && runMemory?.previousRuns) {
+        const old = `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 800)}\n`;
+        const neu = `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 400)}\n`;
+        if (old !== neu) {
+            prompt = prompt.replace(old, neu);
+            trimmed++;
+        }
+    }
+    if (trimmed > 0) {
+        onLog(`Steering prompt trimmed ${trimmed} blocks (${prompt.length}/${PROMPT_BUDGET} chars)`, "event");
+    }
+    // ── Non-Claude planner JSON hardening ──
+    if (!/^claude/i.test(plannerModel)) {
+        const directive = `OUTPUT: single JSON object. No prose. No markdown fences.`;
+        prompt = `${directive}\n\n${prompt}\n\n${directive}`;
+    }
     onLog("Assessing...", "status");
     onLog(`Reading codebase  -- wave ${history.length + 1}`, "event");
     const turn = createTurn("steer", `Steer wave ${history.length + 1}`, `steer-${history.length}`, plannerModel);
@@ -140,11 +193,34 @@ If done: {"done": true, "reasoning": "...", "statusUpdate": "...", "estimatedSes
         if (first)
             return first;
         onLog(`Steering parse failed (${resultText.length} chars). Asking model to fix...`, "event");
+        // C2: persist raw output on parse failure
+        const steerDir = getTranscriptRunDir() ? join(getTranscriptRunDir(), "steering") : undefined;
+        if (steerDir) {
+            try {
+                mkdirSync(steerDir, { recursive: true });
+            }
+            catch { }
+            // Extract wave info from transcriptName (e.g. "steer-wave-32-attempt-1")
+            const waveMatch = transcriptName.match(/wave-(\d+)-attempt-(\d+)/);
+            if (waveMatch) {
+                writeFileSync(join(steerDir, `wave-${waveMatch[1]}-attempt-${waveMatch[2]}-raw.txt`), resultText, "utf-8");
+            }
+        }
         const snippet = resultText.length > 2000 ? resultText.slice(0, 1000) + "\n...\n" + resultText.slice(-800) : resultText;
         const retryText = await runPlannerQuery(`Your previous steering response could not be parsed as JSON. Here is what you returned:\n\n---\n${snippet}\n---\n\nExtract or rewrite the above as ONLY a valid JSON object with this schema: {"done":boolean,"reasoning":"...","statusUpdate":"...","tasks":[{"prompt":"..."}]}\n\nRespond with ONLY the JSON, no markdown fences, no explanation.`, { cwd, model: plannerModel, permissionMode, outputFormat: STEER_SCHEMA, transcriptName: `${transcriptName}-retry`, turnId: turn.id }, onLog);
         const retryParsed = attemptJsonParse(retryText);
         if (retryParsed)
             return retryParsed;
+        // C2: persist retry raw output
+        if (steerDir) {
+            try {
+                const waveMatch2 = transcriptName.match(/wave-(\d+)-attempt-(\d+)/);
+                if (waveMatch2) {
+                    writeFileSync(join(steerDir, `wave-${waveMatch2[1]}-attempt-${waveMatch2[2]}-retry-raw.txt`), retryText, "utf-8");
+                }
+            }
+            catch { }
+        }
         throw new Error(`Could not parse steering response after retry (${resultText.length} chars: ${resultText.slice(0, 120)}...)`);
     })();
     const isDone = parsed.done === true;

package/dist/swarm.js CHANGED Viewed

@@ -29,29 +29,9 @@ function withCursorWorkspaceHeader(env, cwd) {
 }
 import { getModelCapability } from "./models.js";
 import { createTurn, beginTurn, endTurn, updateTurn } from "./turns.js";
-const SIMPLIFY_PROMPT = `You just finished your task. Now review and simplify your changes.
+const SIMPLIFY_PROMPT = `You just finished your task. Review and simplify your changes.
-Run \`git diff\` to see what you changed, then fix any issues:
-1. **Reuse**: Search the codebase  -- did you write something that already exists? Use existing utilities, helpers, patterns instead. Hand-rolled string manipulation, manual path handling, custom env checks, ad-hoc type guards  -- all candidates for existing utilities.
-2. **Quality**:
-   - Redundant state: cached values that could be derived, observers that could be direct calls
-   - Copy-paste with slight variation: near-duplicate blocks that should be unified
-   - Leaky abstractions: exposing internals or breaking existing abstraction boundaries
-   - Stringly-typed code: raw strings where enums/unions already exist
-   - Unnecessary JSX nesting: wrappers that add no layout value
-   - Comments narrating WHAT the code does  -- delete them; keep only non-obvious WHY
-3. **Efficiency**:
-   - Redundant computations, repeated file reads, duplicate API calls
-   - Sequential operations that could be parallel
-   - Hot-path bloat: new blocking work in startup or per-request paths
-   - Recurring no-op updates: state/store updates inside polling loops that fire unconditionally  -- add change-detection guard
-   - Unnecessary existence checks before operating (TOCTOU anti-pattern)
-   - Memory: unbounded data structures, missing cleanup, event listener leaks
-Less code is better. Delete and simplify rather than add. Fix directly  -- no need to explain.`;
+Invoke the \`simplify\` skill to review your changes for reuse, quality, and efficiency, then fix any issues found.`;
 export class Swarm {
     agents = [];
     logs = [];

package/dist/transcripts.d.ts CHANGED Viewed

@@ -1,5 +1,5 @@
 export declare function setTranscriptRunDir(dir: string | undefined): void;
 export declare function getTranscriptRunDir(): string | undefined;
 export declare function transcriptPath(name: string): string | undefined;
-/** Append a single event; silent on error (disk full, permission, etc.). */
+/** Append a single event; log to stderr once per name on failure (C5). */
 export declare function writeTranscriptEvent(name: string, event: Record<string, unknown>): void;

package/dist/transcripts.js CHANGED Viewed

@@ -25,7 +25,9 @@ export function getTranscriptRunDir() {
 export function transcriptPath(name) {
     return _runDir ? join(_runDir, "transcripts", `${name}.ndjson`) : undefined;
 }
-/** Append a single event; silent on error (disk full, permission, etc.). */
+/** Names that already errored — guard against repeated stderr spam. */
+const _seenErrors = new Set();
+/** Append a single event; log to stderr once per name on failure (C5). */
 export function writeTranscriptEvent(name, event) {
     const path = transcriptPath(name);
     if (!path)
@@ -34,5 +36,11 @@ export function writeTranscriptEvent(name, event) {
         mkdirSync(dirname(path), { recursive: true });
         appendFileSync(path, JSON.stringify({ t: Date.now(), ...event }) + "\n", "utf-8");
     }
-    catch { }
+    catch (err) {
+        if (!_seenErrors.has(name)) {
+            _seenErrors.add(name);
+            const msg = err instanceof Error ? err.message : String(err);
+            process.stderr.write(`[transcript] writeTranscriptEvent("${name}") failed: ${msg}\n`);
+        }
+    }
 }

package/dist/types.d.ts CHANGED Viewed

@@ -156,9 +156,10 @@ export type MergeStrategy = "yolo" | "branch";
 export interface BranchRecord {
     branch: string;
     taskPrompt: string;
-    status: "merged" | "unmerged" | "failed" | "merge-failed";
+    status: "merged" | "unmerged" | "failed" | "merge-failed" | "discarded";
     filesChanged: number;
     costUsd: number;
+    firstFailedWave?: number;
 }
 /** Per-window rate limit snapshot (matches SDK rateLimitType). */
 export interface RateLimitWindow {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.41",
+  "version": "1.25.43",
   "description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
   "type": "module",
   "bin": {

package/plugins/claude-overnight/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.41",
+  "version": "1.25.43",
   "description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs  -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
   "author": {
     "name": "Francesco Fornace"

package/plugins/claude-overnight/skills/claude-overnight/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ description: >
 # What it is
-`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits verified by the worker next wave). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
+`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
 **Three-layer review system** runs on every wave:
 1. **Per-agent self-review**  -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm  -- no initial context bloat.
@@ -55,7 +55,7 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
 | File / dir           | What it tells you                                                                 |
 |----------------------|-----------------------------------------------------------------------------------|
-| `run.json`           | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
+| `run.json`           | Machine state: objective, planner/main-worker/fast-worker models, budget, cost, waves done, branches, done flag. |
 | `status.md`          | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
 | `goal.md`            | Evolving "north star"  -- what the run currently thinks "amazing" means.            |
 | `themes.md`          | The thinking-wave research angles picked for this objective (human-readable).     |

package/plugins/claude-overnight/skills/coach/SKILL.md CHANGED Viewed

@@ -2,8 +2,9 @@
 name: claude-overnight-coach
 description: >
   Setup coach for claude-overnight. Turns a raw user objective into a ready
-  objective plus recommended run settings (budget, concurrency, planner/worker
-  models, flex, usage cap, permission mode) and an actionable preflight
+  objective plus recommended run settings (budget, concurrency, planner /
+  main-worker / optional fast-worker models, flex, usage cap, permission mode)
+  and an actionable preflight
   checklist. Invoked once, before the interactive pickers, to catch prompt-shape
   failures (vague, overambitious, multi-goal, unverifiable) and environmental
   failures (missing keys, dirty tree, missing .env) while they're still cheap
@@ -69,8 +70,8 @@ Rules:
 - `improvedObjective` preserves the user's voice and domain vocabulary. It MUST include a `Done:` line, a `Critical:` line (or `Critical: none` when nothing is off-limits), and a `Verify by:` line.
 - `recommended.budget` is an integer ≥ 1. `concurrency` is an integer in [1, 12]. `usageCap` is either `null` (unlimited) or a float in (0, 1].
 - `recommended.permissionMode` is `"auto" | "bypassPermissions" | "default"`.
-- `fastModel` is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast model is reachable from the available providers.
-- `recommended.plannerModel` / `workerModel` / `fastModel` MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
+- `fastModel` (the fast-worker model) is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast-worker model is reachable from the available providers.
+- `recommended.plannerModel` (planner) / `workerModel` (main worker) / `fastModel` (fast worker) MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
 - `checklist` `remediation` is an informational label — the host does NOT auto-act on it. Set it to the slug that best describes the issue, or `"none"` for purely advisory items.
 - `questions` is reserved for a future clarification loop; return `[]` for now.
@@ -105,27 +106,27 @@ Rows: scope. Each cell is a starting point — adjust by one step when repo fact
 `conc` ⇒ `recommended.concurrency` (clamp to ≤ budget).
 `flex` ⇒ `recommended.flex`.
-`fast=true` ⇒ recommend a fast model **if the user has one configured and reachable** from their available providers. Pick whatever the cheapest fast model is among their providers (e.g. `claude-haiku-4-5`, `composer-2-fast`, `qwen3` variants). If no fast model is reachable, set `null`.
-`fast=null` ⇒ do not recommend a fast model (scope too complex or no suitable fast model available).
+`fast=true` ⇒ recommend a fast-worker model **if the user has one configured and reachable** from their available providers. The fast worker is a real worker (same tools, same env) on a cheaper/faster model — steering routes well-scoped tasks to it by default. Pick whatever the cheapest fast-worker model is among their providers (e.g. `claude-haiku-4-5`, `composer-2-fast`, `qwen3` variants). If none is reachable, set `null`.
+`fast=null` ⇒ do not recommend a fast worker (scope too complex or no suitable fast-worker model available).
 `cap=null` ⇒ unlimited (`recommended.usageCap = null`).
-## Planner / worker model selection
+## Planner / main-worker / fast-worker model selection
-Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the worker.
+Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the main worker; optionally add a cheaper/faster second model as the fast worker.
 Decision order (stop at the first row whose providers are present):
 1. **Anthropic direct available**
    - planner: `claude-opus-4-7` (or its `-thinking-high` variant when scope is `audit-and-fix` / `research-and-implement` / `migration`).
-   - worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
-   - fastModel: recommend the cheapest fast model available among the user's reachable providers when the matrix says `fast=true`.
+   - main worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
+   - fast worker (`fastModel`): recommend the cheapest fast-worker model available among the user's reachable providers when the matrix says `fast=true`.
 2. **Custom Anthropic-compatible provider with a strong model** (e.g. `qwen3.6-plus`, `qwen3-coder-plus`)
    - planner: the strongest such model the user has.
-   - worker: same model, or a cheaper sibling if the user has one.
+   - main worker: same model, or a cheaper sibling if the user has one.
 3. **Cursor proxy is the only reachable provider**
    - planner: `claude-opus-4-7` via Cursor (only if the proxy exposes it).
-   - worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
-   - fastModel: recommend a Cursor fast model (e.g. `composer-2-fast`) when the matrix says `fast=true`.
+   - main worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
+   - fast worker (`fastModel`): recommend a Cursor fast-worker model (e.g. `composer-2-fast`) when the matrix says `fast=true`.
 4. **No reachable provider** — leave `plannerModel` and `workerModel` as `claude-sonnet-4-6` and emit a `blocking` checklist item titled "No reachable provider".
 Never recommend Cursor models when the input does not list a `cursor proxy` provider, and never recommend stock Anthropic IDs when the input does not say "Anthropic direct: available". `fastModel` MUST be `null` rather than guessed.