npm - claude-overnight - Versions diffs - 1.25.39 → 1.25.42 - Mend

claude-overnight 1.25.39 → 1.25.42

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +3 -3
package/dist/_version.d.ts +1 -1
package/dist/_version.js +1 -1
package/dist/index.js +36 -7
package/dist/providers.js +5 -0
package/dist/run.js +2 -25
package/dist/settings.js +4 -4
package/dist/steering.js +22 -3
package/dist/swarm.js +27 -24
package/docs/PROXIED_FAST_MODEL_RESEARCH.md +403 -0
package/package.json +2 -2
package/plugins/claude-overnight/.claude-plugin/plugin.json +1 -1
package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +2 -2
package/plugins/claude-overnight/skills/coach/SKILL.md +21 -19

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@ Parallel Claude agents in isolated git worktrees. Set a usage cap so your intera
 Hand it an objective and a session budget, walk away, review the diff when the run ends. Every agent runs in its own worktree on its own branch — a misbehaving agent can't trash your working tree. Unmerged branches are preserved for manual review, never discarded.
-Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits, verified by the worker next wave). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
+Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) — every session runs on the SDK's agent harness. Three roles, each picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). Pair any planner (Opus, Sonnet) with any worker — Anthropic, Cursor, Qwen, OpenRouter, or any Anthropic-compatible endpoint.
 ## Run on Qwen 3.6 Plus
@@ -333,7 +333,7 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
 ## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
-Planner, worker, and optional fast model are each picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work.
+Planner, main worker, and optional fast worker are each picked separately  -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of work. The fast worker is a real worker (same tools, same env), just on a cheaper/faster model — steering routes well-scoped tasks to it by default.
 From the interactive picker, choose `Other…` on the planner, worker, or fast step:
@@ -353,7 +353,7 @@ From the interactive picker, choose `Other…` on the planner, worker, or fast s
 Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
-**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, worker queries use the worker provider, fast queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
+**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)  -- planner queries use the planner provider, main-worker queries use the worker provider, fast-worker queries use the fast provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
 **Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ worker preflight failed: ...` instead of N scattered mid-run errors.

package/dist/_version.d.ts CHANGED Viewed

	@@ -1 +1 @@
1	- export declare const VERSION = "1.25.38";
1	+ export declare const VERSION = "1.25.42";

package/dist/_version.js CHANGED Viewed

@@ -1,2 +1,2 @@
 // Auto-generated by build — do not edit manually.
-export const VERSION = "1.25.38";
+export const VERSION = "1.25.42";

package/dist/index.js CHANGED Viewed

@@ -156,7 +156,7 @@ async function main() {
     --budget=N             Target number of agent runs ${chalk.dim("(default: 10)")}
     --concurrency=N        Max parallel agents ${chalk.dim("(default: 5)")}
     --model=NAME           Worker model override ${chalk.dim("(interactive mode picks planner + worker separately  -- supports 'Other…' for Qwen / OpenRouter / etc.)")}
-    --fast-model=NAME      Fast model for quick tasks ${chalk.dim("(optional  -- checked by worker model in next wave)")}
+    --fast-model=NAME      Fast worker model for quick tasks ${chalk.dim("(optional  -- checked by next wave's workers)")}
     --usage-cap=N          Stop at N% utilization ${chalk.dim("(e.g. 90 to save 10% for other work)")}
     --allow-extra-usage    Allow extra/overage usage ${chalk.dim("(default: stop when plan limits hit)")}
     --extra-usage-budget=N Max $ for extra usage ${chalk.dim("(implies --allow-extra-usage)")}
@@ -843,20 +843,36 @@ async function main() {
          *  preflight now also runs a write-capability probe (see probeCursorWriteCapability) that
          *  asks cursor to Bash a marker file — so the total budget must cover auth ping + write turn. */
         const preflightMs = (p) => isCursorProxyProvider(p) ? 90_000 : 20_000;
-        const results = await Promise.all(pending.map(async ([role, p]) => {
+        // Cursor's composer-2 pipeline intermittently stalls for 100s+ on a write-tool turn
+        // even though the tool succeeded (proxy logs it as "SLOW response"). A single retry
+        // almost always clears it — so we retry once on timeout-style failures for cursor
+        // proxy providers before giving up.
+        const isTimeoutError = (err) => /^timeout after /.test(err) || /: timeout after /.test(err);
+        const runPreflight = async (role, p) => {
             statuses.set(role, "connecting…");
             renderStatus();
-            const result = await preflightProvider(p, cwd, preflightMs(p), {
+            let result = await preflightProvider(p, cwd, preflightMs(p), {
                 onProgress: (msg) => { statuses.set(role, msg); renderStatus(); },
             });
+            if (!result.ok && isCursorProxyProvider(p) && isTimeoutError(result.error)) {
+                statuses.set(role, "retrying after timeout…");
+                renderStatus();
+                result = await preflightProvider(p, cwd, preflightMs(p), {
+                    onProgress: (msg) => { statuses.set(role, `retry: ${msg}`); renderStatus(); },
+                });
+            }
             statuses.delete(role);
             renderStatus();
             return { role, provider: p, result };
-        }));
+        };
+        const results = await Promise.all(pending.map(([role, p]) => runPreflight(role, p)));
         clearStatusLine();
+        let fastDegraded = false;
         for (const { role, provider, result } of results) {
             if (!result.ok) {
-                console.error(chalk.red(`  ✗ ${role} preflight failed: ${chalk.dim(result.error)}`));
+                const degradable = role === "fast";
+                const prefix = degradable ? chalk.yellow(`  ⚠ ${role} preflight failed`) : chalk.red(`  ✗ ${role} preflight failed`);
+                console.error(`${prefix}: ${chalk.dim(result.error)}`);
                 if (isCursorProxyProvider(provider)) {
                     const tail = readCursorProxyLogTail(25);
                     if (tail) {
@@ -865,16 +881,29 @@ async function main() {
                             console.error(chalk.dim(`    ${line}`));
                     }
                     const cmd = bundledComposerProxyShellCommand();
-                    console.error(chalk.yellow(`  The proxy at ${PROXY_DEFAULT_URL} may have crashed or timed out (e.g. keychain/UI). Retry, or start the bundled proxy: ${cmd ?? "npm install in the claude-overnight package, then re-run"}`));
+                    const proxyUrl = provider.baseURL || PROXY_DEFAULT_URL;
+                    console.error(chalk.yellow(`  The proxy at ${proxyUrl} may have crashed or timed out (e.g. keychain/UI). Retry, or start the bundled proxy: ${cmd ?? "npm install in the claude-overnight package, then re-run"}`));
                 }
-                else {
+                else if (!degradable) {
                     console.error(chalk.red(`  Fix the provider at ~/.claude/claude-overnight/providers.json and retry.`));
                 }
+                if (degradable) {
+                    console.error(chalk.yellow(`  Continuing without the fast worker — fast-eligible tasks will run on the main worker model instead.`));
+                    console.error("");
+                    fastDegraded = true;
+                    continue;
+                }
                 console.error("");
                 process.exit(1);
             }
             console.log(`  ${chalk.green(`✓ ${role} ready`)} ${chalk.dim(`· ${provider.displayName} · ${provider.model}`)}`);
         }
+        if (fastDegraded) {
+            fastModel = undefined;
+            fastProvider = undefined;
+            const rebuilt = buildEnvResolver({ plannerModel, plannerProvider, workerModel, workerProvider, fastModel, fastProvider });
+            setPlannerEnvResolver(rebuilt);
+        }
     }
     if (nonInteractive) {
         const capStr = usageCap != null ? `  cap=${Math.round(usageCap * 100)}%` : "";

package/dist/providers.js CHANGED Viewed

@@ -1011,6 +1011,11 @@ async function startProxyProcess(baseUrl, url, port) {
         // cursor-composer chat-only mode fakes HOME to a temp dir; on macOS the agent still waits on
         // Keychain (~30s) for `cursor-user` despite CURSOR_API_KEY. Use the real workspace profile.
         CURSOR_BRIDGE_CHAT_ONLY_WORKSPACE: "false",
+        // Broad base so per-request `X-Cursor-Workspace` headers (set from each
+        // agent's cwd in swarm.ts) validate under the proxy's `resolveWorkspace`
+        // check. Without this, proxied agents in worktrees all resolve to the
+        // proxy's startup cwd.
+        CURSOR_BRIDGE_WORKSPACE: "/",
     };
     if (sysNode && agentJs) {
         proxyEnv.CURSOR_AGENT_NODE = sysNode;

package/dist/run.js CHANGED Viewed

@@ -979,34 +979,11 @@ export async function executeRun(cfg) {
 }
 function reviewPrompt(scope, objective) {
     const scopeLine = scope === "wave"
-        ? "You are reviewing all changes made in the most recent wave of agent work."
+        ? "Review and simplify all changes from the most recent wave."
         : `You are the final quality gate before this autonomous run completes.\n\nThe objective was: ${objective || "improve the codebase"}`;
-    const diffCmd = scope === "wave"
-        ? "Run `git diff` to see what changed."
-        : "Run `git diff main` (or `git diff HEAD` if on the same branch) to see ALL changes made during this run.";
-    const checks = scope === "wave"
-        ? `1. **Missed reuse**: Did any agent write something that already exists elsewhere? Find existing utilities and suggest replacements.
-2. **Quality issues**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code where enums exist, unnecessary JSX nesting, comments that narrate what the code does.
-3. **Efficiency problems**: Redundant computations, sequential operations that could be parallel, hot-path bloat, recurring no-op updates, TOCTOU patterns, memory leaks.
-4. **Merge conflicts or inconsistencies**: Changes that work against each other or break existing patterns.`
-        : `1. **Architecture coherence**: Do the changes form a coherent whole, or are they a patchwork of independent edits that don't fit together?
-2. **Missed reuse**: Any new code that duplicates existing functionality?
-3. **Quality**: Redundant state, copy-paste variations, leaky abstractions, stringly-typed code, unnecessary nesting, narrative comments.
-4. **Efficiency**: N+1 patterns, redundant computations, hot-path bloat, missing cleanup, unbounded data structures.
-5. **Consistency**: Do all changes follow the project's existing patterns, conventions, and design system?
-6. **Build and test**: Run the build and any existing tests. Fix any breakage.`;
-    const close = scope === "wave"
-        ? "Fix issues directly. Delete and simplify rather than add. If the code is already clean, skip."
-        : "Fix issues directly. Delete and simplify. If the codebase is clean and the build passes, say so.";
     return `${scopeLine}
-${diffCmd} Review for:
-${checks}
-${close}
-No need to explain your changes  -- just fix them.`;
+Invoke the \`simplify\` skill to review changed code for reuse, quality, and efficiency, then fix any issues found.`;
 }
 async function runReview(opts, scope, objective, onSwarm) {
     const swarm = new Swarm({

package/dist/settings.js CHANGED Viewed

@@ -25,12 +25,12 @@ export async function editRunSettings(options) {
     s.workerModel = workerPick.model;
     s.workerProviderId = workerPick.providerId;
     const suggestFast = !!(options.defaults?.fastModel);
-    const fastChoice = await select(`${chalk.cyan("③")} Fast model ${chalk.dim("(optional  -- Haiku/Qwen for quick tasks, checked by worker)")}:`, [
-        { name: "Skip", value: "skip", hint: "two-tier mode only (current setup)" },
-        { name: "Pick a fast model", value: "pick", hint: "Haiku, Qwen, or any provider  -- for well-scoped tasks" },
+    const fastChoice = await select(`${chalk.cyan("③")} Fast worker model ${chalk.dim("(optional  -- Haiku/Qwen for well-scoped tasks, checked by next wave's workers)")}:`, [
+        { name: "Skip", value: "skip", hint: "single-worker mode (main worker handles everything)" },
+        { name: "Pick a fast worker", value: "pick", hint: "Haiku, Qwen, or any provider  -- a cheaper, faster second worker" },
     ], suggestFast ? 1 : 0);
     if (fastChoice === "pick") {
-        const fastPick = await pickModel(`${chalk.cyan("③b")} Fast model:`, models, options.defaults?.fastModel ?? s.fastModel);
+        const fastPick = await pickModel(`${chalk.cyan("③b")} Fast worker model:`, models, options.defaults?.fastModel ?? s.fastModel);
         s.fastModel = fastPick.model;
         s.fastProviderId = fastPick.providerId;
     }

package/dist/steering.js CHANGED Viewed

@@ -89,6 +89,9 @@ You have full creative freedom. Design the wave that will have the highest impac
 **Polish**  -- Agents focus purely on feel: loading states, error messages, micro-interactions, empty states, responsiveness. Not features  -- the texture that makes users trust the product.
   Example: 2 agents, one on happy paths, one on error/edge states
+**Simplify**  -- Invoke the 'simplify' skill. It reviews changed code and spawns parallel sub-agents for thorough review.
+  Example: 1 agent per wave with task type "review", let the skill handle the rest
 You can combine these. A wave can have 3 execute agents + 1 verification agent. Or 2 divergent explorers. Whatever the situation calls for.
 For non-execute tasks (critique, verify, user-test, synthesize), tell agents to write their output to files in the run directory so findings persist for future waves. Use paths like: .claude-overnight/latest/reflections/wave-N-{topic}.md or .claude-overnight/latest/verifications/wave-N-{topic}.md.
@@ -104,15 +107,31 @@ Respond with ONLY a JSON object (no markdown fences):
   "estimatedSessionsRemaining": 15,
   "tasks": [
     {"prompt": "task instruction...", "model": "worker", "postcondition": "test -f src/new-file.ts"},
-    {"prompt": "quick icon fix, verified by worker next wave...", "model": "fast"},
+    {"prompt": "quick icon fix, verified by next wave's workers...", "model": "fast"},
     {"prompt": "verify the app end-to-end...", "model": "worker", "noWorktree": true}
   ]
 }
 "estimatedSessionsRemaining" is REQUIRED. Your best honest estimate of how many MORE agent sessions (beyond the wave you just composed above) are needed to reach 'amazing'  -- include follow-up fixes, polish, verification, and anything else you'd want before shipping. Be realistic, not optimistic. Use 0 only if truly done.
-The "model" field on each task: use "worker" (${workerModel}) for all tasks. Use "fast" (${fastModel ?? "not set"}) for small, single-file changes that will be checked by the worker in the next wave.
-Set "noWorktree": true for verify/user-test tasks  -- they need the real project directory with env files, dependencies, and local config.
+The "model" field on each task — you have **two kinds of workers**, both first-class. Pick the right one per task:
+**Fast worker — "fast" (${fastModel ?? "not set"})** is the default workhorse for well-scoped, mechanical tasks. It's a real worker, same tools, same environment — just a cheaper, faster model. The next wave's workers (fast or main) will catch and fix any issues. Route here by default when any of these apply:
+- Single-file edits, refactors, renames
+- Surgical multi-line changes with a clear spec (add a param, wrap a call, tweak a prompt line)
+- Read/research: scan files, summarize findings
+- Build checks, postcondition verification
+- E2E test runs with concrete steps
+- Simple critiques, polish tweaks
+- Running existing scripts/tests and capturing output
+- Docs / markdown updates
+- Stdlib-only utility scripts with a crisp spec
+**Main worker — "worker" (${workerModel})** is for tasks that genuinely need deeper reasoning: multi-file features, complex logic, architectural changes, ambiguous specs, anything where a mis-step costs more than a wave to recover from.
+When in doubt, pick "fast". Both are workers; the wave loop iterates. Over-using "worker" is a real cost — aim to route the clear majority of well-scoped tasks to the fast worker whenever a fast worker is configured.
+Set "noWorktree": true for verify/user-test tasks -- they need the real project directory with env files, dependencies, and local config.
 OPTIONAL "postcondition": a single shell one-liner that exits 0 when the task is truly done. The framework runs it after merge; if it fails, the agent's "no-op" claim is rejected and the task is retried with the failure output as context. Use it whenever the task has a concrete, machine-checkable outcome. Examples: \`test -f src/tracking/watchlist-poller.ts && grep -q "runWatchlistPoll" src/tracking/watchlist-poller.ts\`, \`grep -q "watchlistPollerTask" src/scraper/scheduler.ts\`, \`pnpm run build\`, \`diff -q src/public/index.html frontend/dist/index.html\`. Keep it cheap (sub-second, no network). Omit for exploratory/research tasks where there is no crisp check.

package/dist/swarm.js CHANGED Viewed

@@ -6,31 +6,32 @@ import { query } from "@anthropic-ai/claude-agent-sdk";
 import { NudgeError, RATE_LIMIT_WINDOW_SHORT, extractToolTarget, sumUsageTokens } from "./types.js";
 import { gitExec, autoCommit, mergeAllBranches, warnDirtyTree, cleanStaleWorktrees, writeSwarmLog } from "./merge.js";
 import { ensureCursorProxyRunning, PROXY_DEFAULT_URL } from "./providers.js";
+/**
+ * Proxied Cursor models ignore SDK `cwd` and use their own workspace
+ * resolution. Inject `X-Cursor-Workspace` via ANTHROPIC_CUSTOM_HEADERS so the
+ * proxy's per-request workspace override points at this agent's cwd.
+ * Requires the proxy to run with `CURSOR_BRIDGE_WORKSPACE=/` (or a parent of
+ * all worktree paths) so the header value passes the safety check.
+ */
+function withCursorWorkspaceHeader(env, cwd) {
+    if (!env)
+        return undefined;
+    if (env.ANTHROPIC_BASE_URL !== PROXY_DEFAULT_URL)
+        return env;
+    const hdr = `X-Cursor-Workspace: ${cwd}`;
+    const existing = env.ANTHROPIC_CUSTOM_HEADERS?.trim();
+    return {
+        ...env,
+        ANTHROPIC_CUSTOM_HEADERS: existing
+            ? `${existing}\n${hdr}`
+            : hdr,
+    };
+}
 import { getModelCapability } from "./models.js";
 import { createTurn, beginTurn, endTurn, updateTurn } from "./turns.js";
-const SIMPLIFY_PROMPT = `You just finished your task. Now review and simplify your changes.
-Run \`git diff\` to see what you changed, then fix any issues:
-1. **Reuse**: Search the codebase  -- did you write something that already exists? Use existing utilities, helpers, patterns instead. Hand-rolled string manipulation, manual path handling, custom env checks, ad-hoc type guards  -- all candidates for existing utilities.
-2. **Quality**:
-   - Redundant state: cached values that could be derived, observers that could be direct calls
-   - Copy-paste with slight variation: near-duplicate blocks that should be unified
-   - Leaky abstractions: exposing internals or breaking existing abstraction boundaries
-   - Stringly-typed code: raw strings where enums/unions already exist
-   - Unnecessary JSX nesting: wrappers that add no layout value
-   - Comments narrating WHAT the code does  -- delete them; keep only non-obvious WHY
-3. **Efficiency**:
-   - Redundant computations, repeated file reads, duplicate API calls
-   - Sequential operations that could be parallel
-   - Hot-path bloat: new blocking work in startup or per-request paths
-   - Recurring no-op updates: state/store updates inside polling loops that fire unconditionally  -- add change-detection guard
-   - Unnecessary existence checks before operating (TOCTOU anti-pattern)
-   - Memory: unbounded data structures, missing cleanup, event listener leaks
+const SIMPLIFY_PROMPT = `You just finished your task. Review and simplify your changes.
-Less code is better. Delete and simplify rather than add. Fix directly  -- no need to explain.`;
+Invoke the \`simplify\` skill to review your changes for reuse, quality, and efficiency, then fix any issues found.`;
 export class Swarm {
     agents = [];
     logs = [];
@@ -561,7 +562,7 @@ export class Swarm {
                             ? `You are working in an isolated git worktree. Focus only on this task. Do NOT commit your changes  -- the framework handles that.\n\n${preamble}${task.prompt}${postBlock}`
                             : `${preamble}${task.prompt}${postBlock}`;
                     const effectiveModel = task.model || this.config.model;
-                    const envOverride = this.config.envForModel?.(effectiveModel);
+                    const envOverride = withCursorWorkspaceHeader(this.config.envForModel?.(effectiveModel), agentCwd);
                     const agentQuery = query({
                         prompt: agentPrompt,
                         options: {
@@ -786,7 +787,9 @@ Respond with JSON: {"keep": true/false, "reason": "brief explanation"}`;
                         allowDangerouslySkipPermissions: true,
                         maxTurns: 1,
                         persistSession: false,
-                        ...(envFor?.(evalModel) && { env: envFor(evalModel) }),
+                        ...(envFor?.(evalModel) && {
+                            env: withCursorWorkspaceHeader(envFor(evalModel), this.config.cwd),
+                        }),
                     },
                 });
                 this.activeQueries.add(eq);

package/docs/PROXIED_FAST_MODEL_RESEARCH.md ADDED Viewed

@@ -0,0 +1,403 @@
+# Proxied fast-model research — Skills, tool_use, workspace, and cursor-native translation
+Session date: 2026-04-18. Status: **research notes, no code changes yet.** Picks up where `CURSOR_PROXY_MACOS_DISCOVERY.md` left off.
+Goal: understand what happens when a proxied Cursor model (composer-2-fast via cursor-composer-in-claude) is dispatched through the Agent SDK's `query()` — specifically whether Anthropic skills and tool-use introspection work, and what would be needed to make proxied fast models feel "just like another endpoint" (qwen-style).
+## TL;DR findings
+1. **Proxied fast models cannot invoke the Skill tool.** Not a phrasing issue — cursor-agent has its own hardcoded tool loop and treats SDK-provided tools (Skill, Task, sub-Agent, etc.) as text context only.
+2. **Zero `tool_use` content blocks surface to the SDK.** cursor-agent emits rich `tool_call` events in its `stream-json` output, but the proxy's `cli-stream-parser.ts` only parses `type:"assistant"` blocks with nested `part.type==="tool_use"`. It drops every `tool_call` event on the floor. ~30 LOC fix.
+3. **SDK `cwd` option is ignored** by cursor-agent. Needs per-request `X-Cursor-Workspace` header (already supported by the proxy) + `CURSOR_BRIDGE_WORKSPACE=/` (or broad enough base) for worktree isolation with proxied agents.
+4. **Proxy version floor is 0.9.4.** v0.9.2 forced `--mode ask` (read-only); fixed in 0.9.3 but 0.9.3 was never published. `npm install cursor-composer-in-claude@0.9.4` gets agent-mode default.
+5. **The cloud endpoint is `https://agentn.global.api5.cursor.sh/agent.v1.AgentService/Run`** — HTTP/2 + protobuf, not JSON. It's an *agent* endpoint, not a pure model endpoint.
+6. **Cursor-native rules work perfectly as skill equivalents.** `.cursor/rules/*.mdc` files with frontmatter are discovered, read, and followed by cursor-agent verbatim — including slash-command invocation like `/simplify`.
+## Baseline: what works vs doesn't
+| | Haiku 4.5 direct | composer-2-fast via proxy (0.9.4) |
+|---|---|---|
+| `/simplify` Skill invocation | ✅ 12 tool calls, follows skill recipe (3 parallel review agents) | ❌ model says "Skill tool isn't wired up in this session" |
+| File actually simplified | ✅ | ✅ (done inline via cursor-agent's internal tools) |
+| `tool_use` blocks surface to SDK | ✅ Read, Edit, Bash, Agent visible | ❌ zero — everything is invisible |
+| `cwd: <path>` option | ✅ respected | ❌ cursor-agent uses its own workspace resolution |
+| Cost | $0.21 | $0.068 (≈3× cheaper) |
+| Duration | 41s | 24–43s |
+## How I tested
+All probes in `/tmp/simplify-probe/` (scratch dir, not committed). Created a trivial messy TypeScript file:
+```ts
+export function add(a: number, b: number): number {
+  const result: number = a + b;
+  return result;
+}
+```
+Then spawned `query()` from `@anthropic-ai/claude-agent-sdk` with different model/env combinations, each time asking it to simplify the file.
+### 1. Haiku 4.5 direct (baseline)
+```js
+const agent = query({
+  prompt: "Please run /simplify on messy.ts in the current directory.",
+  options: { cwd: "/tmp/simplify-probe", model: "claude-haiku-4-5-20251001", permissionMode: "bypassPermissions" },
+});
+```
+- **Result:** invoked `Skill({skill:"simplify", args:"messy.ts"})` on turn 1, then launched 3 parallel `general-purpose` subagents (reuse/quality/efficiency), then edited.
+- **Tool calls surfaced:** Skill, Read (×2), Bash (×3), Agent (×3), Edit (×1).
+- **Cost / time:** $0.21 / 41s.
+### 2. composer-2-fast via cursor-composer-in-claude (v0.9.2 — broken)
+Symptoms that led us to debug:
+- Text reply: *"### Ask mode — I can't run `/simplify` or change messy.ts from here. That needs Agent mode."*
+- Zero tool calls.
+- File not modified.
+- Looked for the file in the proxy's startup cwd, not the SDK's.
+Root cause (from `cursor-composer-in-claude/CHANGELOG.md` 0.9.3):
+> `--mode agent` is now the default — Previously the proxy always appended `--mode <plan|ask>` to every cursor-agent invocation. Current cursor-agent treats both as strictly read-only (Write/Bash calls are silently dropped, exit 0 with empty stdout).
+Fix: `npm install cursor-composer-in-claude@0.9.4`. The package.json already pins `^0.9.4` but our `node_modules` had stale 0.9.2.
+### 3. composer-2-fast via v0.9.4 (now agent-mode default)
+Model now does real work but:
+- Edits `src/__tests__/simplify-target.ts` in the claude-overnight repo instead of `/tmp/simplify-probe/messy.ts`, because it resolves cwd from the proxy's startup dir, not the SDK's `cwd: "/tmp/simplify-probe"` option. **Real bug for claude-overnight worktree isolation.**
+- Still zero `tool_use` blocks surfaced. File changes happen through cursor-agent's internal Write tool and don't bubble up.
+### 4. composer-2-fast with workspace header (the fix)
+```js
+const env = envFor(p);
+env.ANTHROPIC_CUSTOM_HEADERS = "X-Cursor-Workspace: /tmp/simplify-probe";
+// and start proxy with CURSOR_BRIDGE_WORKSPACE=/
+```
+- Agent SDK honors `ANTHROPIC_CUSTOM_HEADERS` env var (newline-separated `Key: Value` pairs — confirmed in `cli.js` string `ANTHROPIC_CUSTOM_HEADERS`).
+- Proxy's `resolveWorkspace()` in `workspace.ts:50` reads `x-cursor-workspace` header; validates that the requested path is under `config.workspace` (the proxy's base). Setting base to `/` (or a broad parent) lets arbitrary worktree paths validate.
+- Three prompt variants (`/simplify`, "use the simplify skill", concrete instructions) all simplified correctly now. Still 0 tool_use blocks.
+### 5. Forcing the Skill tool explicitly (confirmation test)
+Prompt: *"You have a tool named Skill. Invoke it now with parameters {skill: \"simplify\", args: \"messy.ts\"}. Do not do any work yourself — your only job is to emit that one Skill tool call."*
+Response: *"I don't have a `Skill` tool in this Cursor session, so I can't emit that call here."*
+Confirmed: the model is correctly reporting that the Skill tool isn't actually callable from its vantage point. Not a prompting issue.
+## Why tool_use doesn't surface
+Ran cursor-agent directly, bypassing the proxy:
+```bash
+CI=true CURSOR_SKIP_KEYCHAIN=1 CURSOR_API_KEY="..." \
+  /opt/homebrew/bin/node /Users/francesco/.local/share/cursor-agent/versions/2026.04.17-479fd04/index.js \
+  -p --output-format stream-json --stream-partial-output \
+  --trust --workspace /tmp/simplify-probe --model composer-2-fast \
+  "read messy.ts then edit it to remove the intermediate result variable"
+```
+**Cursor-agent emits rich `tool_call` events** (not `tool_use`):
+```json
+{"type":"tool_call","subtype":"started","call_id":"tool_…","tool_call":{"readToolCall":{"args":{"path":"/tmp/simplify-probe/messy.ts"}}}}
+{"type":"tool_call","subtype":"completed","call_id":"tool_…","tool_call":{"readToolCall":{"args":{…},"result":{"success":{"content":"…","totalLines":5,"fileSize":103,"path":"…","readRange":{"startLine":1,"endLine":5}}}}}}
+{"type":"tool_call","subtype":"started","tool_call":{"editToolCall":{"args":{"path":"/tmp/simplify-probe/messy.ts","streamContent":"export function add(a: number, b: number): number {\n  return a + b;\n}"}}}}
+{"type":"tool_call","subtype":"completed","tool_call":{"editToolCall":{"args":{…},"result":{"success":{"linesAdded":1,"linesRemoved":2,"diffString":"--- a//tmp/simplify-probe/messy.ts\n+++ …"}}}}}
+{"type":"tool_call","subtype":"started","tool_call":{"readLintsToolCall":{"args":{"paths":["/tmp/simplify-probe/messy.ts"]}}}}
+```
+Tool taxonomy observed (there are more — this is just what I triggered):
+| cursor-agent event | Mapping to Anthropic standard |
+|---|---|
+| `readToolCall` | `Read` |
+| `editToolCall` | `Edit` (also `Write` when streamContent is full file) |
+| `readLintsToolCall` | (no direct equivalent — could be "LSP diagnostics") |
+| `globToolCall` | `Glob` |
+| `grepToolCall` | `Grep` |
+| `shellToolCall` | `Bash` |
+| `taskToolCall` | `Task` / `Agent` (parallel sub-agents — confirmed working) |
+| `webFetchToolCall` | `WebFetch` |
+| `webSearchToolCall` | `WebSearch` |
+The proxy's `cli-stream-parser.ts` only handles:
+```ts
+if (obj.type === "assistant" && obj.message?.content) {
+  for (const part of obj.message.content) {
+    if (part.type === "text") …
+    else if (part.type === "thinking") …
+    else if (part.type === "tool_use" && part.id && part.name) …
+  }
+}
+if (obj.type === "result" && obj.subtype === "success") { done = true; onDone(); }
+```
+**It never matches `obj.type === "tool_call"`.** That's the bug. The `anthropic-sse-writer.ts` at line 59–82 already has a full `kind: "tool_use"` → SSE `content_block_start` path. We just don't feed it.
+Fix sketch (~30 LOC in `cli-stream-parser.ts`):
+```ts
+if (obj.type === "tool_call" && obj.subtype === "started") {
+  const [kind, body] = Object.entries(obj.tool_call)[0]; // e.g. ["readToolCall", {args, ...}]
+  const name = mapToolName(kind); // readToolCall → Read
+  const input = translateArgs(kind, body.args); // keep args shape the Anthropic SDK expects
+  onEvent({ kind: "tool_use", id: obj.call_id, name, input });
+}
+```
+(May also need to buffer tool results and forward them as `tool_result` content blocks in the next turn, depending on how the Agent SDK wants to correlate them.)
+## The cloud endpoint — what Cursor actually talks to
+Instrumented cursor-agent with a `NODE_OPTIONS=--require` preload (`/tmp/simplify-probe/fetch-logger.cjs`) that hooks `global.fetch`, `http.request`, `https.request`, and `http2.connect`. Only http2 captured the real chat traffic — cursor-agent uses undici under the hood, but the chat RPC goes through node:http2.
+```
+HTTP/2 POST https://agentn.global.api5.cursor.sh/agent.v1.AgentService/Run
+Authorization: Bearer <JWT>
+Content-Type: (protobuf, inferred — body is binary)
+Body: 153 KB for a "what is 2+2" prompt  (!!)
+Response: streaming, ~9 KB+ rolling
+```
+Plus many auxiliary JSON HTTP/1.1 calls to `https://api2.cursor.sh/aiserver.v1.*Service/*`:
+- `AnalyticsService/BootstrapStatsig`
+- `DashboardService/GetMe`, `GetTeamAdminSettings…`, `GetTeamHooks`, `GetManagedSkills`
+- `ServerConfigService/GetServerConfig`
+- `AiService/GetUsableModels`, `GetDefaultModelForCli`
+- `AnalyticsService/SubmitLogs`, `TrackEvents`
+- `DashboardService/GetCliDownloadUrl`
+- `/v1/traces` (OTEL)
+The chat endpoint `agent.v1.AgentService/Run` is revealing: **it's an agent-loop RPC, not a model-completion endpoint**. It expects the client to hold conversational state, execute tools locally, and feed tool results back for the next step. The 153 KB initial payload carries the whole context (prompt + tool defs + workspace hints + history).
+So composer-2-fast's *only* public interface is the agent loop. There's no bare "generate text from this prompt" endpoint to call qwen-style.
+## Full path A: bypass cursor-agent (the qwen dream) — not recommended
+What it would take:
+1. Extract `agent.v1.*` proto schema from `cursor-agent-svc.js` (contains hundreds of message type definitions — looks doable but tedious).
+2. Implement protobuf codec for request + streaming response.
+3. Handle JWT refresh (observed short-lived tokens ~1h expiry).
+4. Translate Anthropic tool_use ↔ cursor tool_call format bidirectionally.
+5. Handle all the auxiliary RPCs (`BootstrapStatsig`, `GetUsableModels`, etc.) that cursor-agent fires on startup.
+6. Maintain against Cursor's API churn indefinitely.
+**Weeks of work, permanent maintenance tax, can break any time.** Probably also violates Cursor's TOS.
+Also: even if we do this, SDK-provided tools like Skill wouldn't automatically "just work" — we'd need to map them to cursor's native tool concepts anyway, which we can do without the protobuf spike.
+## Full path B+C: fix the parser + expose cursor tools as Anthropic names (recommended)
+Scope:
+1. **`cli-stream-parser.ts` — translate `tool_call` events to `tool_use` events.** ~30 LOC. Gives the SDK full tool visibility: progress UI, budget tracking, nudge-on-silence, logs.
+2. **Tool-name mapping** (tiny table in the proxy): `readToolCall → Read`, `editToolCall → Edit`, `globToolCall → Glob`, `runTerminalToolCall → Bash`, etc.
+3. **Rewrite `toolsToSystemText`**: drop SDK-provided tools that cursor-agent can't honor (Skill, Task, sub-Agent) from the system text. Advertise only the cursor-native tools that actually execute, under Anthropic-standard names.
+After this, the SDK sees: `assistant → tool_use(Read) → tool_result → tool_use(Edit) → …` exactly like a direct Anthropic session.
+## Path D — **skill translation via `.cursor/rules/*.mdc`** (the killer unlock)
+cursor-agent supports `.cursor/rules/<name>.mdc` files natively (confirmed: `cursor-agent rule` subcommand, `generate-rule`, rules auto-discovered). Shape:
+```markdown
+---
+description: Short description for the model to decide when to apply
+alwaysApply: false
+# globs: optional
+---
+# Rule body
+Instructions the agent follows…
+```
+**Proof that cursor-agent resolves them autonomously** — wrote `/tmp/skilltest/.cursor/rules/simplify.mdc` with a description matching Anthropic's simplify skill, then ran:
+```bash
+cursor-agent -p --workspace /tmp/skilltest --model composer-2-fast "/simplify messy.ts"
+```
+First emitted tool call:
+```json
+{"tool_call":{"readToolCall":{"args":{"path":"/tmp/skilltest/.cursor/rules/simplify.mdc"}}}}
+```
+**Cursor-agent autonomously discovered, read, and followed the rule.** File was simplified according to the rule body. Full tool stream: read rule → glob for target → read target → edit → lint.
+### Translation map
+| Anthropic | Cursor |
+|---|---|
+| `SKILL.md` frontmatter `name`, `description`, `type` | `.mdc` frontmatter `description`, `alwaysApply`, `globs` |
+| Skill body | Rule body |
+| Skill lives in plugin/user dir | Rule lives in `.cursor/rules/` or `~/.cursor/rules/` |
+| Slash invocation `/simplify` | Slash invocation `/simplify` (identical UX — model resolves from description) |
+| Model-selected based on task | Model-selected based on task (identical) |
+| MCP tools | `.cursor/mcp.json` MCP tools (universal MCP protocol — no translation) |
+| `CLAUDE.md` | `.cursor/rules/_always.mdc` with `alwaysApply: true` |
+### Proxy behavior after adding skill translation
+Per request:
+1. Receive Anthropic `/v1/messages` with tools + system + user prompt.
+2. Extract skill metadata (names + descriptions). Full bodies either:
+   - (a) bundled in the proxy for well-known Anthropic skills, OR
+   - (b) sent by claude-overnight as custom headers / system-prompt extra blocks, OR
+   - (c) the Agent SDK exposes them via a mechanism TBD.
+3. Materialize each advertised skill as `.cursor/rules/<name>.mdc` in the workspace (or per-request temp dir if `chatOnlyWorkspace`).
+4. Strip Skill/Task/sub-Agent from `toolsToSystemText` (they're unneeded now — skills live on disk as rules).
+5. Run cursor-agent.
+6. `tool_call` → `tool_use` translation streams back (from B).
+**Result:** from the SDK's view, proxied fast models now honor skills. From cursor-agent's view, it's a normal Cursor session.
+### Caveats
+- **Skill bodies need to travel** — simplest path: bundle the common ones (simplify, security-review, etc.) with the proxy. Less clean but works day one.
+- **Rule-file writes need per-request workspace isolation** — tie-in with the `X-Cursor-Workspace` fix. Don't stomp on parallel agents.
+- **`alwaysApply: false`** rules are model-selected based on description — works well in practice (test confirmed composer-2-fast picked up the rule on `/simplify`). For stronger guarantees use `alwaysApply: true` or matching `globs`.
+- **Sub-skill chains** (skill A invokes skill B) — Cursor rules can reference other rules (`@ruleName`). Needs a naming convention.
+- **Parallel sub-agents DO work.** Earlier version of this doc claimed cursor-agent was single-agent — that was wrong. cursor-agent ships a first-class `TaskToolCall` (proto `agent.v1.TaskToolCallArgsProto`, fields `description`/`prompt`/`model`/`subagent_type`/`resume`/`readonly`/`run_in_background`/`attachments` — identical shape to Anthropic's Task tool). Runtime creates `kind: "subagent"` sessions with their own `agentId`, and the UI explicitly groups parallel `taskToolCall`s. See "Parallel sub-agents — confirmed" below for the empirical test. `/simplify`'s 3-reviewer fan-out replicates directly.
+## Parallel sub-agents — confirmed (2026-04-18)
+Empirical test that cursor-agent runs sub-agents concurrently, not sequentially.
+Setup: `/tmp/subagent-probe/` with `messy.ts` and `.cursor/rules/fanout.mdc`. Rule body instructs the model to spawn three Task sub-agents in a single turn (count lines / count exports / find inline candidates).
+Invocation:
+```bash
+CI=true CURSOR_SKIP_KEYCHAIN=1 CURSOR_API_KEY=… \
+  /opt/homebrew/bin/node /Users/francesco/.local/share/cursor-agent/versions/2026.04.17-479fd04/index.js \
+  -p --output-format stream-json --trust \
+  --workspace /tmp/subagent-probe --model composer-2-fast \
+  "/fanout messy.ts"
+```
+Observed in `stream-json`:
+```
+other started: readToolCall            # rule discovery
+other started: readToolCall            # target file
+task started    id=tool_171e…  desc=Count lines in messy.ts
+task started    id=tool_d2ab…  desc=Count exports in messy.ts
+task started    id=tool_da0d…  desc=Inline candidates in messy.ts
+task completed  id=tool_d2ab…
+task completed  id=tool_da0d…
+task completed  id=tool_171e…
+```
+Three `taskToolCall`s dispatched in the same assistant turn. **Start order (171e, d2ab, da0d) differs from completion order (d2ab, da0d, 171e) — proves concurrent execution.** Each sub-agent got its own `agentId` and ran its own internal tools independently (one used `shellToolCall` for `wc -l`, the others used `readToolCall`).
+Task call payload shape (what the SDK must encode when surfacing):
+```json
+{
+  "taskToolCall": {
+    "args": {
+      "description": "Count lines in messy.ts",
+      "prompt": "Read the file at absolute path /tmp/subagent-probe/messy.ts. Report ONLY the total number of lines…",
+      "subagentType": {"unspecified": {}},
+      "model": "composer-2-fast",
+      "agentId": "0b2fd6e9-9e3f-406a-92b6-8c87072303be",
+      "attachments": [],
+      "mode": "TASK_MODE_UNSPECIFIED",
+      "respondingToMessageIds": []
+    },
+    "result": {"success": {"conversationSteps": [ /* nested tool calls executed by the subagent */ ]}}
+  }
+}
+```
+Totals: 12.1s, 9.8k input / 826 output tokens for the full fan-out including parent aggregation.
+**Implications for Path B/D:**
+1. `cli-stream-parser.ts` tool-name table must include `taskToolCall → Task` (or `Agent`, whichever name the SDK expects for the parent-visible sub-agent tool).
+2. Subagent inner events live inside `result.success.conversationSteps`. Decide whether to flatten them into the outer event stream (so the SDK sees `tool_use(Task) → tool_use(Read) inside → tool_result(Task)` as a nested tree) or collapse them into just the outer Task tool_use/tool_result pair. The latter is simpler and matches Anthropic's Task-tool UX, where sub-agent internals are opaque to the caller.
+3. `subagent_type` can be left unspecified; cursor-agent accepts it. `model` defaults to the parent's model (inherited), which is the right default.
+Raw stream preserved at `/tmp/subagent-probe/run.jsonl` for later inspection.
+## Per-workspace isolation — the adjacent bug
+Independent of skills, claude-overnight currently has a real correctness issue for proxied agents in worktrees:
+```ts
+// src/swarm.ts:578 — current spawn
+const agentQuery = query({
+  prompt: agentPrompt,
+  options: {
+    cwd: agentCwd, model: effectiveModel, permissionMode: perm,
+    allowedTools: this.config.allowedTools,
+    …
+  },
+});
+```
+For proxied agents, `cwd: agentCwd` has no effect. Two agents in separate worktrees would both execute in the proxy's startup cwd. Fix:
+```ts
+const env = this.config.envForModel?.(effectiveModel);
+if (env && isCursorProxiedModel(effectiveModel)) {
+  env.ANTHROPIC_CUSTOM_HEADERS = `X-Cursor-Workspace: ${agentCwd}`;
+}
+```
+Plus ensure the proxy is started with `CURSOR_BRIDGE_WORKSPACE=/` (or a common parent of all worktree dirs).
+This is a separate fix that should land regardless of the skill-translation work.
+## Code locations for reference
+### `cursor-composer-in-claude` (sibling repo, Francesco's fork at ../cursor-composer-in-claude)
+- `src/lib/agent-cmd-args.ts` — builds `--mode` / `--workspace` / `--model` flags. 0.9.3 made `agent` default.
+- `src/lib/env.ts:276–281` — `CURSOR_BRIDGE_MODE` parsing (`plan` | `ask` | `agent`).
+- `src/lib/env.ts:256–258` — `workspace` config (defaults to proxy's `process.cwd()`).
+- `src/lib/workspace.ts:50–106` — `resolveWorkspace()`: reads `x-cursor-workspace` header, validates path is under base.
+- `src/lib/handlers/anthropic-messages.ts:147–159` — per-request header-based workspace resolution.
+- `src/lib/openai.ts:58–87` — `toolsToSystemText()`: how SDK tool defs get serialized to system-prompt text (this is where to rewrite when exposing cursor tools under Anthropic names).
+- `src/lib/cli-stream-parser.ts:41–75` — the parser that needs the `tool_call` case added.
+- `src/lib/anthropic-sse-writer.ts:59–82` — already-wired SSE emitter for `tool_use` events.
+### `claude-overnight`
+- `src/providers.ts:160–215` — `envFor()`: where per-model env (including proxy auth + bridge settings) is built. Add `X-Cursor-Workspace` injection here, driven by the agent's `cwd`.
+- `src/swarm.ts:563–584` — agent spawn. `env` is already passed via `envForModel(effectiveModel)`; just needs per-agent cwd propagation.
+### Agent SDK (`@anthropic-ai/claude-agent-sdk`)
+- `cli.js` — honors `ANTHROPIC_CUSTOM_HEADERS` env var (newline-separated `Key: Value`), string confirmed present.
+- `sdk.d.ts:700–710` — `headers` field on McpHttpServerConfig (not the right one for our use — the env var is the right path).
+### Cursor
+- `https://agentn.global.api5.cursor.sh/agent.v1.AgentService/Run` — the chat RPC (HTTP/2 + protobuf).
+- `https://api2.cursor.sh/aiserver.v1.*Service/*` — auxiliary REST/JSON endpoints.
+- Proto schema lives in `/Users/francesco/.local/share/cursor-agent/versions/<ver>/cursor-agent-svc.js` (bundled, minified) — contains hundreds of `aiserver.v1.*` / `agent.v1.*` message type definitions.
+## Quick artifacts for picking this up later
+- Scratch test dir: `/tmp/simplify-probe/` — has all probe scripts (probe.mjs, probe-proxy.mjs, probe-proxy-v2.mjs, probe-proxy-v3.mjs, probe-skill-direct.mjs, fetch-logger.cjs).
+- Cursor-rule test dir: `/tmp/skilltest/` — has the `.cursor/rules/simplify.mdc` demo.
+- Proxy logs: `/Users/francesco/.cursor-api-proxy/proxy.out.log` and `sessions.log`.
+- Cursor-agent CLI: `/Users/francesco/.local/bin/cursor-agent` (avoid — segfaults with bundled Node on macOS); use `/opt/homebrew/bin/node <cursor-agent-install>/index.js` instead.
+## Recommended next steps (in order)
+1. **Land the `X-Cursor-Workspace` fix in claude-overnight** — independent, fixes a real worktree-isolation bug. Small patch in `providers.ts:envFor()` + start proxy with `CURSOR_BRIDGE_WORKSPACE=/`.
+2. **Patch the proxy's `cli-stream-parser.ts`** to translate `tool_call` → `tool_use`. ~30 LOC. Gives full tool visibility in claude-overnight's UI/logs for proxied agents.
+3. **Update `toolsToSystemText`** to drop non-executable SDK tools (Skill/Task/sub-Agent) for proxied sessions and list cursor-native tools under Anthropic names.
+4. **Bundle skill → rule translation** in the proxy. Start with `/simplify`, `/review`, `/security-review`, `/init`. Materialize into workspace on request. Confirm end-to-end.
+5. **Update steering/planner prompts** to give concrete operational briefs instead of skill invocations (works for both direct and proxied models — concrete is the common denominator).
+6. **Optional/far future:** Path A (bypass cursor-agent entirely) only if the ceiling of B+C+skill-translation turns out to be too low — which seems unlikely given the experiments so far.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.39",
+  "version": "1.25.42",
   "description": "Parallel Claude agents in git worktrees with a usage cap that reserves headroom for your interactive Claude Code. Crash-safe resume. Provider-agnostic model catalog (Anthropic, Cursor, OpenAI, Gemini, DeepSeek, Llama, Qwen) with capability-based task scoping.",
   "type": "module",
   "bin": {
@@ -17,7 +17,7 @@
   "dependencies": {
     "@anthropic-ai/claude-agent-sdk": "^0.2.92",
     "chalk": "^5.4.1",
-    "cursor-composer-in-claude": "^0.9.4",
+    "cursor-composer-in-claude": "^0.10.0",
     "jsonwebtoken": "^9.0.2"
   },
   "devDependencies": {

package/plugins/claude-overnight/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-overnight",
-  "version": "1.25.39",
+  "version": "1.25.42",
   "description": "Claude Code skill for understanding, installing, and inspecting claude-overnight runs  -- parallel Claude agents in git worktrees with thinking waves, multi-wave steering, and crash-safe resume. Supports Cursor API Proxy, Qwen, OpenRouter.",
   "author": {
     "name": "Francesco Fornace"

package/plugins/claude-overnight/skills/claude-overnight/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ description: >
 # What it is
-`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **worker** (runs the tasks), and an optional **fast** model (quick well-scoped edits verified by the worker next wave). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
+`claude-overnight` is a CLI (npm: `claude-overnight`, bin: `claude-overnight`) that takes an objective + budget and launches many Claude agent sessions in parallel, each in an isolated git worktree. It's a local multi-session orchestrator built on top of the Claude Agent SDK  -- not itself an agent harness, but a layer that plans, dispatches, and steers many sessions that run on the SDK's harness. Three roles are picked independently: **planner** (thinks, steers, reviews), **main worker** (runs the tasks), and an optional **fast worker** (a cheaper/faster second worker for well-scoped tasks, verified by the next wave's workers). A "thinking wave" of architect sessions explores the codebase, an orchestrator synthesizes concrete tasks, worker waves run them in parallel, and steering decides between more work, reflection, or declaring done. Rate limits, crashes, and usage caps are all resumable  -- nothing is lost.
 **Three-layer review system** runs on every wave:
 1. **Per-agent self-review**  -- after each agent finishes, the same session continues via SDK session resume (continue mechanism) with a follow-up prompt to review and simplify its own `git diff`. The agent's full context stays warm  -- no initial context bloat.
@@ -55,7 +55,7 @@ Every run lives at `<repo>/.claude-overnight/runs/<ISO-timestamp>/`:
 | File / dir           | What it tells you                                                                 |
 |----------------------|-----------------------------------------------------------------------------------|
-| `run.json`           | Machine state: objective, planner/worker/fast models, budget, cost, waves done, branches, done flag. |
+| `run.json`           | Machine state: objective, planner/main-worker/fast-worker models, budget, cost, waves done, branches, done flag. |
 | `status.md`          | **Living project snapshot**, rewritten by steering every wave. First line = short status. |
 | `goal.md`            | Evolving "north star"  -- what the run currently thinks "amazing" means.            |
 | `themes.md`          | The thinking-wave research angles picked for this objective (human-readable).     |

package/plugins/claude-overnight/skills/coach/SKILL.md CHANGED Viewed

@@ -2,8 +2,9 @@
 name: claude-overnight-coach
 description: >
   Setup coach for claude-overnight. Turns a raw user objective into a ready
-  objective plus recommended run settings (budget, concurrency, planner/worker
-  models, flex, usage cap, permission mode) and an actionable preflight
+  objective plus recommended run settings (budget, concurrency, planner /
+  main-worker / optional fast-worker models, flex, usage cap, permission mode)
+  and an actionable preflight
   checklist. Invoked once, before the interactive pickers, to catch prompt-shape
   failures (vague, overambitious, multi-goal, unverifiable) and environmental
   failures (missing keys, dirty tree, missing .env) while they're still cheap
@@ -69,8 +70,8 @@ Rules:
 - `improvedObjective` preserves the user's voice and domain vocabulary. It MUST include a `Done:` line, a `Critical:` line (or `Critical: none` when nothing is off-limits), and a `Verify by:` line.
 - `recommended.budget` is an integer ≥ 1. `concurrency` is an integer in [1, 12]. `usageCap` is either `null` (unlimited) or a float in (0, 1].
 - `recommended.permissionMode` is `"auto" | "bypassPermissions" | "default"`.
-- `fastModel` is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast model is reachable from the available providers.
-- `recommended.plannerModel` / `workerModel` / `fastModel` MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
+- `fastModel` (the fast-worker model) is `null` unless adding one is clearly warranted for this scope + budget AND a cheap fast-worker model is reachable from the available providers.
+- `recommended.plannerModel` (planner) / `workerModel` (main worker) / `fastModel` (fast worker) MUST be model IDs that the user can actually reach given the providers listed in the input. Stock Anthropic IDs (e.g. `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) are only valid when "Anthropic direct: available" appears in the input.
 - `checklist` `remediation` is an informational label — the host does NOT auto-act on it. Set it to the slug that best describes the issue, or `"none"` for purely advisory items.
 - `questions` is reserved for a future clarification loop; return `[]` for now.
@@ -95,36 +96,37 @@ Rows: scope. Each cell is a starting point — adjust by one step when repo fact
 | scope                    | tight ≤ 10                                   | standard 11–25                                | wide 26–60                                    | saturated > 60                                  |
 | ------------------------ | -------------------------------------------- | --------------------------------------------- | --------------------------------------------- | ----------------------------------------------- |
-| bugfix                   | conc=2, flex=false, fast=null, cap=0.75      | conc=3, flex=true, fast=null, cap=0.75        | conc=4, flex=true, fast=Haiku, cap=0.9        | conc=5, flex=true, fast=Haiku, cap=null         |
-| feature-add              | conc=2, flex=true, fast=null, cap=0.75       | conc=4, flex=true, fast=null, cap=0.75        | conc=6, flex=true, fast=Haiku, cap=0.9        | conc=8, flex=true, fast=Haiku, cap=null         |
-| refactor                 | conc=2, flex=false, fast=null, cap=0.75      | conc=4, flex=false, fast=null, cap=0.75       | conc=6, flex=true, fast=null, cap=0.9         | conc=8, flex=true, fast=Haiku, cap=null         |
-| audit-and-fix            | conc=3, flex=true, fast=Haiku, cap=0.75      | conc=5, flex=true, fast=Haiku, cap=0.9        | conc=8, flex=true, fast=Haiku, cap=0.9        | conc=10, flex=true, fast=Haiku, cap=null        |
-| migration                | conc=2, flex=true, fast=null, cap=0.75       | conc=4, flex=true, fast=null, cap=0.9         | conc=6, flex=true, fast=null, cap=0.9         | conc=8, flex=true, fast=null, cap=null          |
-| research-and-implement   | conc=2, flex=true, fast=null, cap=0.75       | conc=3, flex=true, fast=null, cap=0.75        | conc=4, flex=true, fast=null, cap=0.9         | conc=5, flex=true, fast=Haiku, cap=null         |
-| polish-and-verify        | conc=3, flex=false, fast=Haiku, cap=0.75     | conc=5, flex=false, fast=Haiku, cap=0.75      | conc=8, flex=true, fast=Haiku, cap=0.9        | conc=10, flex=true, fast=Haiku, cap=null        |
+| bugfix                   | conc=2, flex=false, fast=null, cap=0.75      | conc=3, flex=true, fast=null, cap=0.75        | conc=4, flex=true, fast=true, cap=0.9          | conc=5, flex=true, fast=true, cap=null          |
+| feature-add              | conc=2, flex=true, fast=null, cap=0.75       | conc=4, flex=true, fast=null, cap=0.75        | conc=6, flex=true, fast=true, cap=0.9          | conc=8, flex=true, fast=true, cap=null          |
+| refactor                 | conc=2, flex=false, fast=null, cap=0.75      | conc=4, flex=false, fast=null, cap=0.75       | conc=6, flex=true, fast=null, cap=0.9          | conc=8, flex=true, fast=true, cap=null          |
+| audit-and-fix            | conc=3, flex=true, fast=true, cap=0.75       | conc=5, flex=true, fast=true, cap=0.9         | conc=8, flex=true, fast=true, cap=0.9          | conc=10, flex=true, fast=true, cap=null         |
+| migration                | conc=2, flex=true, fast=null, cap=0.75       | conc=4, flex=true, fast=null, cap=0.9         | conc=6, flex=true, fast=null, cap=0.9          | conc=8, flex=true, fast=null, cap=null          |
+| research-and-implement   | conc=2, flex=true, fast=null, cap=0.75       | conc=3, flex=true, fast=null, cap=0.75        | conc=4, flex=true, fast=null, cap=0.9          | conc=5, flex=true, fast=true, cap=null          |
+| polish-and-verify        | conc=3, flex=false, fast=true, cap=0.75      | conc=5, flex=false, fast=true, cap=0.75       | conc=8, flex=true, fast=true, cap=0.9          | conc=10, flex=true, fast=true, cap=null         |
 `conc` ⇒ `recommended.concurrency` (clamp to ≤ budget).
 `flex` ⇒ `recommended.flex`.
-`fast=Haiku` ⇒ recommend a Haiku-class fast model **only if** Anthropic direct is available or a saved provider exposes one (e.g. `claude-haiku-4-5`); otherwise `null`.
+`fast=true` ⇒ recommend a fast-worker model **if the user has one configured and reachable** from their available providers. The fast worker is a real worker (same tools, same env) on a cheaper/faster model — steering routes well-scoped tasks to it by default. Pick whatever the cheapest fast-worker model is among their providers (e.g. `claude-haiku-4-5`, `composer-2-fast`, `qwen3` variants). If none is reachable, set `null`.
+`fast=null` ⇒ do not recommend a fast worker (scope too complex or no suitable fast-worker model available).
 `cap=null` ⇒ unlimited (`recommended.usageCap = null`).
-## Planner / worker model selection
+## Planner / main-worker / fast-worker model selection
-Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the worker.
+Pick the strongest reachable model for the planner; pick a cheap-but-capable reachable model for the main worker; optionally add a cheaper/faster second model as the fast worker.
 Decision order (stop at the first row whose providers are present):
 1. **Anthropic direct available**
    - planner: `claude-opus-4-7` (or its `-thinking-high` variant when scope is `audit-and-fix` / `research-and-implement` / `migration`).
-   - worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
-   - fastModel: `claude-haiku-4-5` when the matrix says `fast=Haiku`.
+   - main worker: `claude-sonnet-4-6` for normal work; `claude-opus-4-7` for `wide`/`saturated` migrations or research.
+   - fast worker (`fastModel`): recommend the cheapest fast-worker model available among the user's reachable providers when the matrix says `fast=true`.
 2. **Custom Anthropic-compatible provider with a strong model** (e.g. `qwen3.6-plus`, `qwen3-coder-plus`)
    - planner: the strongest such model the user has.
-   - worker: same model, or a cheaper sibling if the user has one.
+   - main worker: same model, or a cheaper sibling if the user has one.
 3. **Cursor proxy is the only reachable provider**
    - planner: `claude-opus-4-7` via Cursor (only if the proxy exposes it).
-   - worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
-   - fastModel: `composer-2-fast` when the matrix says `fast=Haiku`.
+   - main worker: `claude-sonnet-4-6` via Cursor, or `composer-2` for the cheapest path.
+   - fast worker (`fastModel`): recommend a Cursor fast-worker model (e.g. `composer-2-fast`) when the matrix says `fast=true`.
 4. **No reachable provider** — leave `plannerModel` and `workerModel` as `claude-sonnet-4-6` and emit a `blocking` checklist item titled "No reachable provider".
 Never recommend Cursor models when the input does not list a `cursor proxy` provider, and never recommend stock Anthropic IDs when the input does not say "Anthropic direct: available". `fastModel` MUST be `null` rather than guessed.