npm - claude-overnight - Versions diffs - 0.5.1 → 1.0.0 - Mend

claude-overnight 0.5.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # claude-overnight
-Fire off Claude agents, come back to shipped work.
+Run 10, 100, or 1000 Claude agents overnight. Come back to shipped work.
-Describe what to build. Set a budget — 10 agents, 100, 1000. A planner agent analyzes your codebase, breaks the objective into independent tasks, and launches them all. Each agent runs in its own git worktree with full tooling (Read, Edit, Bash, Grep — everything). Rate limits? It waits. Windows reset? It resumes. It doesn't stop until every task is done.
+Describe what to build. Set a budget. The tool plans, explores your codebase, breaks the objective into tasks, launches parallel agents in isolated git worktrees, iterates toward quality, and handles rate limits automatically. You press Run once, then go to sleep.
+Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Works with Claude Opus, Sonnet, and Haiku.
 ## Install
@@ -10,18 +12,14 @@ Describe what to build. Set a budget — 10 agents, 100, 1000. A planner agent a
 npm install -g claude-overnight
 ```
-Requires Node.js >= 20 and Claude authentication (OAuth via `claude` CLI, or `ANTHROPIC_API_KEY`).
-## Usage
+Requires Node.js >= 20 and Claude authentication (`claude auth login`, or set `ANTHROPIC_API_KEY`).
-### Interactive
+## Quick start
 ```bash
 claude-overnight
 ```
-A guided flow walks you through each step:
 ```
 🌙  claude-overnight
 ────────────────────────────────────
@@ -29,98 +27,102 @@ A guided flow walks you through each step:
 ① What should the agents do?
   > refactor auth, add tests, update docs
-② Budget [10]: 50
+② Budget [10]: 200
 ③ Worker model:
   ● Sonnet — Sonnet 4.6 · Best for everyday tasks
   ○ Opus — Opus 4.6 · Most capable
-  ○ Haiku — Haiku 4.5 · Fastest
 ④ Usage:
-  ● Unlimited · full capacity, wait through rate limits
-  ○ 90% · leave 10% for other work
-╭────────────────────────────────────╮
-│  sonnet · budget 50 · 5× · flex   │
-╰────────────────────────────────────╯
+  ● 90% · leave 10% for other work
+╭──────────────────────────────────────────╮
+│  sonnet · budget 200 · 5× · flex · 90%  │
+╰──────────────────────────────────────────╯
+✓ 5 themes → review, press Run, walk away
+◆ Thinking: 5 agents exploring...     ← architects analyze your codebase
+◆ Orchestrating plan...               ← synthesizes 50 concrete tasks
+◆ Wave 1 · 50 tasks                   ← fully autonomous from here
+◆ Assessing... how close to amazing?
+◆ Wave 2 · 30 tasks                   ← improvements from assessment
+◆ Reflection: 2 agents reviewing      ← deep quality audit
+◆ Wave 3 · 20 tasks                   ← fixes from review findings
+◆ Assessing... ✓ Vision met
 ```
-For large budgets, the planner identifies research themes — review them, then press Run. Everything after that is fully autonomous: thinking agents explore, the orchestrator synthesizes tasks, execution waves run, and steering adapts between waves. No further interaction needed — go to sleep.
+You interact once (objective, budget, model, review themes), then everything runs autonomously — thinking, planning, executing, reflecting, steering. Rate-limited? It waits and retries. Crash? Resume where you left off.
-### Task file
+## How it works
-```bash
-claude-overnight tasks.json
-```
+### 1. Thinking wave
-### Inline
+For budgets > 15, the tool launches **architect agents** that explore your codebase before any code is written. Each one gets a different research angle (architecture, data models, APIs, testing, etc.) and writes a structured design document. The number scales with budget: 5 for budget=50, 10 for budget=2000.
-```bash
-claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
-```
-## How the planner works
-The planner always runs on the best available model (Opus) regardless of which model you pick for workers. This ensures high-quality task decomposition even when workers use a cheaper model.
+### 2. Orchestration
-### Thinking wave
+An orchestrator agent reads all design documents and synthesizes concrete execution tasks — grounded in real files and patterns the architects found. No guesswork.
-For large budgets (`budget > concurrency * 3`), the planner doesn't try to generate hundreds of tasks from scratch. Instead, it launches a **thinking wave** — a team of architect agents that explore your codebase in parallel before any code is written.
+### 3. Iterative execution
-```
-⠋ identifying themes...          → splits objective into N angles (< 30s)
-✓ 10 themes                      → review themes, press Run, walk away
-◆ Thinking: 10 agents exploring  → each explores from its angle, writes a design doc
-◆ Orchestrating plan...          → reads all design docs, synthesizes execution tasks
-◆ Wave 1 · 50 tasks              → fully autonomous from here
-◆ Steering...                    → adapts between waves, retries on rate limits
-```
+Tasks run in parallel (each agent in its own git worktree). After each wave, steering assesses: "how good is this?" — not "what's missing?" It can:
-The review prompt appears right after theme identification — the last thing requiring your presence. After you press Run, the thinking wave, orchestration, execution, and steering all run autonomously. Rate-limited? The planner waits and retries. Go to sleep.
+- **Execute** more tasks to build features, fix bugs, polish UX
+- **Reflect** by spinning up 1-2 review agents for deep quality/architecture audits
+- **Declare done** when the vision is met at high quality
-The number of thinking agents scales with budget: 5 for budget=50, 10 for budget=2000+. Each agent explores the codebase from a different angle and writes a structured design document. The orchestrator then reads all design docs and produces grounded execution tasks referencing real files and patterns.
+### 4. Goal refinement
-For small budgets (≤ `concurrency * 3`), the planner skips the thinking wave and generates tasks directly — fast and efficient for focused work.
+The tool starts with your broad objective but evolves its definition of "amazing" as it learns your codebase. Steering refines the goal after each wave. Late waves are informed by early discoveries.
-### Model-aware task design
+### 5. Three-layer context
-The planner calibrates task ambition based on your worker model:
+Long runs stay sharp because steering maintains three layers of memory:
-**Opus workers**: Each session is a powerhouse — it can own entire epics, do deep codebase research, make architectural decisions, implement complex multi-file systems, and use browser tools for analysis. The planner gives these agents full ownership and autonomy.
+- **Status** — a living project snapshot, updated every wave. Compressed, never truncated.
+- **Milestones** — strategic snapshots archived every ~5 waves. Long-term memory.
+- **Goal** — the evolving north star. What "amazing" means for this codebase.
-**Sonnet workers**: Capable of substantial implementation, refactoring, and testing. The planner gives meaningful missions with room for decision-making.
+## Run history and resume
-**Haiku workers**: Fast and efficient, best for focused tasks. The planner gives specific, well-scoped instructions with clear file paths and expected changes.
+Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever overwritten.
-### Budget scaling
+```
+.claude-overnight/
+  runs/
+    2026-04-04T18-52-49/     ← run A (done, $200, 200 tasks)
+      run.json, status.md, goal.md, milestones/, sessions/
+    2026-04-05T10-30-00/     ← run B (crashed)
+      run.json, sessions/
+```
-The budget also shapes task granularity:
+If a run crashes, gets rate-limited, or you Ctrl+C:
-**Small budget (1-15)**: Specific, file-level tasks. "In `src/auth.ts`, refactor `validateToken()` to use JWT."
+```
+  ⚠ Interrupted run
+  ╭──────────────────────────────────────────────────╮
+  │  refactor auth, add tests, update docs           │
+  │  50/200 sessions · 3 waves · $69.16              │
+  │  34 merged · 16 unmerged · 0 failed branches     │
+  ╰──────────────────────────────────────────────────╯
+  Resume  │  Fresh  │  Quit
+```
-**Medium budget (16-50)**: Autonomous missions. "Design and implement the complete favorites system: DB schema, API routes, client hooks, error handling."
+On resume: unmerged branches auto-merge, the wave loop continues, all context is preserved.
-**Large budget (50+)**: Thinking wave + orchestration. Architects explore, then execution tasks are synthesized from their findings. Each task is a substantial work session grounded in real codebase analysis.
+**Knowledge carries forward** — new runs inherit knowledge from completed previous runs. Thinking agents and steering see what past runs built. Run 2 knows run 1 already built the auth system.
-A budget of 200 is not 200 micro-edits. It's ~5 architects + ~195 senior-engineer work sessions, planned in waves. A budget of 2000 gets 10 architects.
+Add `.claude-overnight` to your `.gitignore`.
-## Usage limits
+## Other usage modes
-Control how much of your plan capacity the run consumes:
+### Task file
+```bash
+claude-overnight tasks.json
 ```
-④ Usage:
-  ● Unlimited · full capacity, wait through rate limits
-  ○ 90% · leave 10% for other work
-  ○ 75% · conservative, plenty of headroom
-  ○ 50% · use half, keep the rest
-```
-When utilization hits your cap, the swarm stops dispatching new tasks and lets active agents finish gracefully. This way you can run a big overnight job and still have capacity left for manual Claude usage.
-Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
-## Task file format
 ```json
 {
@@ -135,71 +137,67 @@ Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
 }
 ```
-A plain array also works: `["task one", "task two"]`.
-For multi-wave runs from a task file, add `objective` and `flexiblePlan`:
+For multi-wave runs, add `objective` and `flexiblePlan`:
 ```json
 {
-  "objective": "Modernize the auth system and add comprehensive tests",
+  "objective": "Modernize the auth system",
   "flexiblePlan": true,
   "tasks": ["Refactor auth middleware", "Add JWT validation"],
   "usageCap": 90
 }
 ```
-The initial tasks run first. After each wave, a steering agent reads the codebase and plans the next wave until the objective is met or the budget runs out.
+### Inline
-| Field | Type | Default | Description |
-|---|---|---|---|
-| `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
-| `objective` | `string` | — | High-level goal for multi-wave steering (required when `flexiblePlan` is true) |
-| `flexiblePlan` | `boolean` | `false` | Enable adaptive multi-wave planning from task files |
-| `model` | `string` | prompted | Worker model (per-task overridable) |
-| `concurrency` | `number` | `5` | Max parallel agents |
-| `worktrees` | `boolean` | auto (git repo) | Isolate each agent in a git worktree |
-| `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | How agents handle dangerous operations |
-| `cwd` | `string` | `process.cwd()` | Working directory |
-| `allowedTools` | `string[]` | all | Restrict agent tools |
-| `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or a new branch |
-| `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization (e.g. 90) |
+```bash
+claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
+```
 ## CLI flags
 | Flag | Default | Description |
 |---|---|---|
-| `--budget=N` | `10` | Total agent sessions the planner targets |
-| `--concurrency=N` | `5` | How many agents run simultaneously |
-| `--model=NAME` | prompted | Worker model (planner always uses best available) |
+| `--budget=N` | `10` | Total agent sessions |
+| `--concurrency=N` | `5` | Parallel agents |
+| `--model=NAME` | prompted | Worker model (planner uses best available) |
 | `--usage-cap=N` | unlimited | Stop at N% utilization |
-| `--timeout=SECONDS` | `300` | Inactivity timeout (kills only silent agents) |
-| `--no-flex` | — | Disable adaptive multi-wave planning (run all tasks in one shot) |
+| `--timeout=SECONDS` | `300` | Inactivity timeout per agent |
+| `--no-flex` | — | Disable multi-wave steering |
 | `--dry-run` | — | Show planned tasks without running |
-| `-h, --help` | — | Help |
-| `-v, --version` | — | Version |
-Budget = total work. Concurrency = pace. A budget of 100 with concurrency 5 means 100 tasks, 5 at a time.
+## Task file fields
-## Rate limits and long runs
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
+| `objective` | `string` | — | High-level goal for steering |
+| `flexiblePlan` | `boolean` | `false` | Enable multi-wave planning |
+| `model` | `string` | prompted | Worker model |
+| `concurrency` | `number` | `5` | Parallel agents |
+| `worktrees` | `boolean` | auto | Git worktree isolation |
+| `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | Permission handling |
+| `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or new branch |
+| `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization |
-Built for unattended runs lasting hours, days, or weeks.
+## Rate limits
-- **Usage bar**: the live UI shows current utilization with a visual bar, percentage, and countdown to reset when rate-limited.
-- **Hard block**: API returns a reset timestamp — swarm pauses and resumes exactly when the window opens.
-- **Soft throttle**: at >75% utilization, dispatch slows to avoid hitting the limit.
-- **Retry with backoff**: transient errors (429, overloaded, connection reset) retry with exponential backoff.
-- **Usage cap**: set a ceiling and the swarm stops dispatching when it's reached — active agents finish, no new ones start.
+Built for unattended runs lasting hours or days.
-No tasks are dropped. Set a budget of 1000 and go to sleep.
+- **Hard block**: pauses until the rate limit window resets, then resumes
+- **Soft throttle**: slows dispatch at >75% utilization
+- **Retry with backoff**: transient errors (429, overloaded) retry automatically
+- **Usage cap**: set a ceiling, active agents finish, no new ones start
+- **Planner retries**: steering and orchestration also retry on rate limits (30s/60s/120s backoff)
 ## Worktrees and merging
-Each agent gets an isolated git worktree on a `swarm/task-N` branch. Changes auto-commit when the agent finishes. After all agents complete, branches merge back sequentially.
+Each agent gets an isolated git worktree (`swarm/task-N` branch). Changes auto-commit. After all agents complete, branches merge back.
-- `"yolo"` (default): merges directly into your current branch
-- `"branch"`: creates a `swarm/run-{timestamp}` branch (main untouched)
+- `"yolo"` (default): merges into your current branch
+- `"branch"`: creates a new `swarm/run-{timestamp}` branch
-Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved for manual resolution. Stale worktrees and `swarm/*` branches from previous runs are cleaned up on startup.
+Conflicts retry with `-X theirs`. Unresolved branches are preserved for manual merge.
 ## Exit codes
@@ -208,3 +206,7 @@ Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved f
 | `0` | All tasks succeeded |
 | `1` | Some tasks failed |
 | `2` | All failed or none completed |
+## License
+MIT

package/dist/index.js CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env node
-import { readFileSync, existsSync, mkdirSync, readdirSync, rmSync } from "fs";
+import { readFileSync, existsSync, mkdirSync, readdirSync, rmSync, writeFileSync } from "fs";
 import { resolve, dirname, join } from "path";
 import { fileURLToPath } from "url";
 import { execSync } from "child_process";
@@ -7,7 +7,7 @@ import { createInterface } from "readline";
 import chalk from "chalk";
 import { query } from "@anthropic-ai/claude-agent-sdk";
 import { Swarm } from "./swarm.js";
-import { planTasks, refinePlan, detectModelTier, steerWave, identifyThemes, buildThinkingTasks, orchestrate } from "./planner.js";
+import { planTasks, refinePlan, detectModelTier, steerWave, identifyThemes, buildThinkingTasks, buildReflectionTasks, orchestrate } from "./planner.js";
 import { startRenderLoop, renderSummary } from "./ui.js";
 // ── CLI flag parsing ──
 function parseCliFlags(argv) {
@@ -270,7 +270,7 @@ function showPlan(tasks) {
     }
     console.log(chalk.dim(`  ${"─".repeat(ruleLen)}\n`));
 }
-function readDesignDocs(dir) {
+function readMdDir(dir) {
     try {
         const files = readdirSync(dir).filter(f => f.endsWith(".md")).sort();
         return files.map(f => {
@@ -282,6 +282,191 @@ function readDesignDocs(dir) {
         return "";
     }
 }
+function readRunMemory(runDir, previousRuns) {
+    let goal = "", status = "";
+    try {
+        goal = readFileSync(join(runDir, "goal.md"), "utf-8");
+    }
+    catch { }
+    try {
+        status = readFileSync(join(runDir, "status.md"), "utf-8");
+    }
+    catch { }
+    return {
+        designs: readMdDir(join(runDir, "designs")),
+        reflections: readMdDir(join(runDir, "reflections")),
+        milestones: readMdDir(join(runDir, "milestones")),
+        status,
+        goal,
+        previousRuns,
+    };
+}
+function writeStatus(baseDir, status) {
+    writeFileSync(join(baseDir, "status.md"), status, "utf-8");
+}
+function saveRunState(runDir, state) {
+    mkdirSync(runDir, { recursive: true });
+    writeFileSync(join(runDir, "run.json"), JSON.stringify(state, null, 2), "utf-8");
+}
+function loadRunState(runDir) {
+    try {
+        return JSON.parse(readFileSync(join(runDir, "run.json"), "utf-8"));
+    }
+    catch {
+        return null;
+    }
+}
+/** Find the latest incomplete run, or null. */
+function findIncompleteRun(rootDir) {
+    const runsDir = join(rootDir, "runs");
+    try {
+        const dirs = readdirSync(runsDir).sort().reverse(); // newest first
+        for (const d of dirs) {
+            const state = loadRunState(join(runsDir, d));
+            if (state && state.phase !== "done")
+                return { dir: join(runsDir, d), state };
+        }
+    }
+    catch { }
+    return null;
+}
+/** Read final status + goal from all completed previous runs (newest first, max 5). */
+function readPreviousRunKnowledge(rootDir) {
+    const runsDir = join(rootDir, "runs");
+    try {
+        const dirs = readdirSync(runsDir).sort().reverse();
+        const summaries = [];
+        for (const d of dirs) {
+            if (summaries.length >= 5)
+                break;
+            const state = loadRunState(join(runsDir, d));
+            if (!state || state.phase !== "done")
+                continue;
+            let status = "";
+            try {
+                status = readFileSync(join(runsDir, d, "status.md"), "utf-8");
+            }
+            catch { }
+            let goal = "";
+            try {
+                goal = readFileSync(join(runsDir, d, "goal.md"), "utf-8");
+            }
+            catch { }
+            const date = d.replace("T", " ").slice(0, 19);
+            const cost = state.accCost > 0 ? ` · $${state.accCost.toFixed(2)}` : "";
+            summaries.push(`### Run ${date} (${state.accCompleted} tasks${cost})\n${status || "(no status recorded)"}\n${goal ? `Goal: ${goal.slice(0, 500)}` : ""}`);
+        }
+        return summaries.join("\n\n");
+    }
+    catch {
+        return "";
+    }
+}
+function createRunDir(rootDir) {
+    const ts = new Date().toISOString().replace(/[:.]/g, "-").slice(0, 19);
+    const runDir = join(rootDir, "runs", ts);
+    mkdirSync(join(runDir, "designs"), { recursive: true });
+    mkdirSync(join(runDir, "reflections"), { recursive: true });
+    mkdirSync(join(runDir, "milestones"), { recursive: true });
+    mkdirSync(join(runDir, "sessions"), { recursive: true });
+    return runDir;
+}
+function saveWaveSession(baseDir, waveNum, kind, swarm) {
+    const dir = join(baseDir, "sessions");
+    mkdirSync(dir, { recursive: true });
+    writeFileSync(join(dir, `wave-${waveNum}.json`), JSON.stringify({
+        wave: waveNum, kind,
+        agents: swarm.agents.map(a => ({
+            id: a.id,
+            prompt: a.task.prompt,
+            status: a.status,
+            error: a.error,
+            cost: a.costUsd,
+            toolCalls: a.toolCalls,
+            filesChanged: a.filesChanged,
+            duration: a.finishedAt && a.startedAt ? a.finishedAt - a.startedAt : 0,
+            branch: a.branch,
+        })),
+        totalCost: swarm.totalCostUsd,
+    }, null, 2), "utf-8");
+}
+function recordBranches(swarm, branches) {
+    for (const a of swarm.agents) {
+        if (a.branch) {
+            branches.push({
+                branch: a.branch,
+                taskPrompt: a.task.prompt.slice(0, 200),
+                status: a.status === "done" ? "unmerged" : "failed",
+                filesChanged: a.filesChanged ?? 0,
+                costUsd: a.costUsd ?? 0,
+            });
+        }
+    }
+    // Update with merge results
+    for (const mr of swarm.mergeResults) {
+        const br = branches.find(b => b.branch === mr.branch);
+        if (br)
+            br.status = mr.ok ? "merged" : "merge-failed";
+    }
+}
+function autoMergeBranches(cwd, branches, onLog) {
+    const unmerged = branches.filter(b => b.status === "unmerged" && b.filesChanged > 0);
+    if (unmerged.length === 0)
+        return;
+    onLog(`Merging ${unmerged.length} unmerged branches...`);
+    for (const br of unmerged) {
+        try {
+            execSync(`git merge --no-edit "${br.branch}"`, { cwd, encoding: "utf-8", stdio: "pipe" });
+            br.status = "merged";
+            onLog(`  ✓ ${br.branch} (${br.filesChanged} files)`);
+        }
+        catch {
+            try {
+                try {
+                    execSync("git merge --abort", { cwd, encoding: "utf-8", stdio: "pipe" });
+                }
+                catch { }
+                execSync(`git merge --no-edit -X theirs "${br.branch}"`, { cwd, encoding: "utf-8", stdio: "pipe" });
+                br.status = "merged";
+                onLog(`  ✓ ${br.branch} (auto-resolved)`);
+            }
+            catch {
+                try {
+                    execSync("git merge --abort", { cwd, encoding: "utf-8", stdio: "pipe" });
+                }
+                catch { }
+                br.status = "merge-failed";
+                onLog(`  ✗ ${br.branch} (conflict — preserved for manual merge)`);
+            }
+        }
+    }
+}
+function archiveMilestone(baseDir, waveNum) {
+    const statusPath = join(baseDir, "status.md");
+    if (!existsSync(statusPath))
+        return;
+    const content = readFileSync(statusPath, "utf-8");
+    if (!content.trim())
+        return;
+    const milestoneDir = join(baseDir, "milestones");
+    mkdirSync(milestoneDir, { recursive: true });
+    const ts = new Date().toISOString().slice(0, 19).replace("T", " ");
+    writeFileSync(join(milestoneDir, `wave-${waveNum}.md`), `# Milestone — Wave ${waveNum} (${ts})\n\n${content}`, "utf-8");
+}
+function writeGoalUpdate(baseDir, update) {
+    const goalPath = join(baseDir, "goal.md");
+    let existing = "";
+    try {
+        existing = readFileSync(goalPath, "utf-8");
+    }
+    catch { }
+    const ts = new Date().toISOString().slice(0, 19).replace("T", " ");
+    const entry = `\n\n## Update — ${ts}\n${update}`;
+    const full = existing + entry;
+    // Keep it bounded: original + last ~3000 chars of updates
+    const trimmed = full.length > 4000 ? full.slice(0, 1000) + "\n\n...\n\n" + full.slice(-3000) : full;
+    writeFileSync(goalPath, trimmed, "utf-8");
+}
 const BRAILLE = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"];
 function makeProgressLog() {
     let frame = 0;
@@ -386,6 +571,77 @@ async function main() {
     }
     if (noTTY)
         console.log(chalk.dim("  Non-interactive mode — using defaults\n"));
+    // ── Show run history ──
+    const rootDir = join(cwd, ".claude-overnight");
+    const runsDir = join(rootDir, "runs");
+    let completedRuns = [];
+    try {
+        const dirs = readdirSync(runsDir).sort().reverse();
+        for (const d of dirs) {
+            const s = loadRunState(join(runsDir, d));
+            if (s && s.phase === "done")
+                completedRuns.push({ dir: join(runsDir, d), state: s });
+        }
+    }
+    catch { }
+    if (completedRuns.length > 0 && !noTTY) {
+        console.log(chalk.dim(`\n  ${completedRuns.length} previous run${completedRuns.length > 1 ? "s" : ""}`));
+        for (const r of completedRuns.slice(0, 3)) {
+            const date = r.state.startedAt?.slice(0, 10) || "unknown";
+            const obj = r.state.objective?.slice(0, 40) || "";
+            const cost = r.state.accCost > 0 ? ` · $${r.state.accCost.toFixed(0)}` : "";
+            console.log(chalk.dim(`     ${date} · ${r.state.accCompleted} tasks${cost}${obj ? ` · ${obj}` : ""}${obj.length >= 40 ? "…" : ""}`));
+        }
+    }
+    // ── Resume detection ──
+    let resuming = false;
+    let resumeState = null;
+    let resumeRunDir;
+    const incomplete = findIncompleteRun(rootDir);
+    if (incomplete && incomplete.state.cwd === cwd && !noTTY && tasks.length === 0) {
+        const prev = incomplete.state;
+        const merged = prev.branches.filter(b => b.status === "merged").length;
+        const unmerged = prev.branches.filter(b => b.status === "unmerged").length;
+        const failed = prev.branches.filter(b => b.status === "failed" || b.status === "merge-failed").length;
+        const obj = prev.objective?.slice(0, 50) || "";
+        // Read last status for context
+        let lastStatus = "";
+        try {
+            lastStatus = readFileSync(join(incomplete.dir, "status.md"), "utf-8").trim().slice(0, 120);
+        }
+        catch { }
+        console.log(chalk.yellow(`\n  ⚠ Interrupted run`));
+        const boxLines = [
+            `${obj}${obj.length >= 50 ? "…" : ""}`,
+            `${prev.accCompleted}/${prev.budget} sessions · ${prev.waveNum + 1} waves · $${prev.accCost.toFixed(2)}`,
+        ];
+        if (lastStatus)
+            boxLines.push(lastStatus);
+        if (merged + unmerged + failed > 0)
+            boxLines.push(`${merged} merged · ${unmerged} unmerged · ${failed} failed branches`);
+        const boxW = Math.max(...boxLines.map(l => l.length)) + 4;
+        console.log(chalk.dim(`  ╭${"─".repeat(boxW)}╮`));
+        for (const line of boxLines)
+            console.log(chalk.dim("  │") + `  ${line.padEnd(boxW - 2)}` + chalk.dim("│"));
+        console.log(chalk.dim(`  ╰${"─".repeat(boxW)}╯`));
+        const action = await selectKey("", [
+            { key: "r", desc: "esume" },
+            { key: "f", desc: "resh" },
+            { key: "q", desc: "uit" },
+        ]);
+        if (action === "q") {
+            process.exit(0);
+        }
+        if (action === "r") {
+            resuming = true;
+            resumeState = prev;
+            resumeRunDir = incomplete.dir;
+            if (unmerged > 0) {
+                console.log("");
+                autoMergeBranches(cwd, prev.branches, (msg) => console.log(chalk.dim(`  ${msg}`)));
+            }
+        }
+    }
     // ── Interactive flow: Objective → Budget → Model → Usage cap → Plan → Review ──
     let workerModel;
     let plannerModel;
@@ -463,6 +719,8 @@ async function main() {
             parts.push("flex");
         if (usageCap != null)
             parts.push(`cap ${Math.round(usageCap * 100)}%`);
+        if (completedRuns.length > 0)
+            parts.push(`${completedRuns.length} prior`);
         const inner = parts.join(chalk.dim(" · "));
         const innerLen = parts.join(" · ").length;
         console.log(chalk.dim(`\n  ╭${"─".repeat(innerLen + 4)}╮`));
@@ -506,13 +764,17 @@ async function main() {
         console.log(chalk.dim(`  ${workerModel}  concurrency=${concurrency}  worktrees=${useWorktrees}  merge=${mergeStrategy}  perms=${permissionMode}${capStr}`));
     }
     // ── Flex mode: adaptive multi-wave planning ──
-    const flex = !argv.includes("--no-flex") && (fileCfg?.flexiblePlan ?? objective != null) && objective != null && (budget ?? 10) > 2;
+    let flex = !argv.includes("--no-flex") && (fileCfg?.flexiblePlan ?? objective != null) && objective != null && (budget ?? 10) > 2;
     const agentTimeoutMs = cliFlags.timeout ? parseFloat(cliFlags.timeout) * 1000 : undefined;
     let thinkingUsed = 0;
     let thinkingCost = 0, thinkingIn = 0, thinkingOut = 0, thinkingTools = 0;
-    let designContext;
+    let thinkingHistory;
+    // Create run directory early so thinking wave can use it
+    const runDir = resuming && resumeRunDir ? resumeRunDir : createRunDir(rootDir);
+    const previousKnowledge = readPreviousRunKnowledge(rootDir);
     // ── Plan phase (interactive: review loop, non-interactive: auto-plan or skip) ──
     const needsPlan = tasks.length === 0;
+    const designDir = join(runDir, "designs");
     if (needsPlan) {
         if (noTTY) {
             console.error(chalk.red("  No tasks provided and stdin is not a TTY. Provide tasks via args or a .json file."));
@@ -522,7 +784,6 @@ async function main() {
         const planRestore = () => process.stdout.write("\x1B[?25h");
         const useThinking = flex && (budget ?? 10) > concurrency * 3;
         const thinkingCount = useThinking ? Math.min(Math.max(concurrency, Math.ceil((budget ?? 10) * 0.005)), 10) : 0;
-        const designDir = join(cwd, ".claude-overnight", "designs");
         try {
             if (useThinking) {
                 // Phase 1: Quick theme identification → review → then autonomous
@@ -580,7 +841,7 @@ async function main() {
                 process.stdout.write("\x1B[?25l");
                 // Phase 2: Thinking wave
                 mkdirSync(designDir, { recursive: true });
-                const thinkingTasks = buildThinkingTasks(objective, themes, designDir, plannerModel);
+                const thinkingTasks = buildThinkingTasks(objective, themes, designDir, plannerModel, previousKnowledge || undefined);
                 console.log(chalk.cyan(`\n  ◆ Thinking: ${thinkingTasks.length} agents exploring...\n`));
                 const thinkingSwarm = new Swarm({
                     tasks: thinkingTasks, concurrency, cwd,
@@ -604,13 +865,24 @@ async function main() {
                 thinkingIn = thinkingSwarm.totalInputTokens;
                 thinkingOut = thinkingSwarm.totalOutputTokens;
                 thinkingTools = thinkingSwarm.agents.reduce((sum, a) => sum + a.toolCalls, 0);
+                // Record thinking wave so steering knows what happened
+                thinkingHistory = {
+                    wave: -1,
+                    kind: "think",
+                    tasks: thinkingSwarm.agents.map(a => ({
+                        prompt: a.task.prompt.slice(0, 200),
+                        status: a.status,
+                        filesChanged: a.filesChanged,
+                        error: a.error,
+                    })),
+                };
                 // Phase 3: Orchestrate from design docs
-                designContext = readDesignDocs(designDir);
-                if (designContext) {
+                const designs = readMdDir(designDir);
+                if (designs) {
                     const orchBudget = Math.min(50, Math.max(concurrency, Math.ceil(((budget ?? 10) - thinkingUsed) * 0.5)));
                     const flexNote = `This is wave 1 of an adaptive multi-wave run (total budget: ${(budget ?? 10) - thinkingUsed}). Plan the highest-impact foundational work first. Future waves will iterate based on what's learned.`;
                     console.log(chalk.cyan(`\n  ◆ Orchestrating plan...\n`));
-                    tasks = await orchestrate(objective, designContext, cwd, plannerModel, workerModel, permissionMode, orchBudget, concurrency, makeProgressLog(), flexNote);
+                    tasks = await orchestrate(objective, designs, cwd, plannerModel, workerModel, permissionMode, orchBudget, concurrency, makeProgressLog(), flexNote);
                     process.stdout.write(`\x1B[2K\r  ${chalk.green(`\u2713 ${tasks.length} tasks`)}\n\n`);
                 }
                 else {
@@ -713,14 +985,62 @@ async function main() {
     process.stdout.write("\x1B[?25l");
     const restore = () => process.stdout.write("\x1B[?25h\n");
     const runStartedAt = Date.now();
-    // Wave-loop state
+    // Wave-loop state — either fresh or resumed
+    mkdirSync(join(runDir, "reflections"), { recursive: true });
+    mkdirSync(join(runDir, "milestones"), { recursive: true });
+    mkdirSync(join(runDir, "sessions"), { recursive: true });
     let currentSwarm;
-    let remaining = (budget ?? tasks.length) - thinkingUsed;
-    let currentTasks = tasks;
-    let waveNum = 0;
+    let remaining;
+    let currentTasks;
+    let waveNum;
     const waveHistory = [];
-    let accCost = thinkingCost, accIn = thinkingIn, accOut = thinkingOut, accCompleted = 0, accFailed = 0, accTools = thinkingTools;
+    let accCost, accCompleted, accFailed, accTools;
+    let accIn = 0, accOut = 0;
     let lastCapped = false, lastAborted = false;
+    let lastWaveKind;
+    let reflectionBudgetUsed;
+    const branches = [];
+    if (resuming && resumeState) {
+        // Restore ALL config from saved state
+        remaining = resumeState.remaining;
+        currentTasks = resumeState.currentTasks;
+        waveNum = resumeState.waveNum;
+        accCost = resumeState.accCost;
+        accCompleted = resumeState.accCompleted;
+        accFailed = resumeState.accFailed;
+        accTools = 0;
+        lastWaveKind = resumeState.lastWaveKind;
+        reflectionBudgetUsed = resumeState.reflectionBudgetUsed;
+        branches.push(...resumeState.branches);
+        objective = resumeState.objective;
+        workerModel = resumeState.workerModel;
+        plannerModel = resumeState.plannerModel;
+        budget = resumeState.budget;
+        concurrency = resumeState.concurrency;
+        flex = resumeState.flex;
+        usageCap = resumeState.usageCap;
+        console.log(chalk.green(`\n  ✓ Resumed`) + chalk.dim(` · wave ${waveNum + 1} · ${remaining} remaining · $${accCost.toFixed(2)} spent\n`));
+    }
+    else {
+        // Fresh run
+        if (objective && !existsSync(join(runDir, "goal.md"))) {
+            writeFileSync(join(runDir, "goal.md"), `## Original Objective\n${objective}`, "utf-8");
+        }
+        remaining = (budget ?? tasks.length) - thinkingUsed;
+        currentTasks = tasks;
+        waveNum = 0;
+        if (thinkingHistory)
+            waveHistory.push(thinkingHistory);
+        accCost = thinkingCost;
+        accCompleted = 0;
+        accFailed = 0;
+        accTools = thinkingTools;
+        accIn = thinkingIn;
+        accOut = thinkingOut;
+        lastWaveKind = "execute";
+        reflectionBudgetUsed = 0;
+    }
+    const maxReflectionBudget = Math.max(2, Math.ceil((budget ?? 10) * 0.05));
     // For flex + branch strategy: create one target branch, waves merge via yolo into it
     let runBranch;
     let originalRef;
@@ -791,8 +1111,18 @@ async function main() {
         remaining -= swarm.completed + swarm.failed;
         lastCapped = swarm.cappedOut;
         lastAborted = swarm.aborted;
+        recordBranches(swarm, branches);
+        saveWaveSession(runDir, waveNum, lastWaveKind, swarm);
+        saveRunState(runDir, {
+            id: `run-${new Date().toISOString().slice(0, 19)}`, objective: objective, budget: budget ?? tasks.length,
+            remaining, workerModel, plannerModel, concurrency, permissionMode,
+            usageCap, flex, useWorktrees, mergeStrategy, waveNum, currentTasks,
+            lastWaveKind, reflectionBudgetUsed, accCost, accCompleted, accFailed,
+            branches, phase: "steering", startedAt: new Date(runStartedAt).toISOString(), cwd,
+        });
         waveHistory.push({
             wave: waveNum,
+            kind: lastWaveKind,
             tasks: swarm.agents.map(a => ({
                 prompt: a.task.prompt,
                 status: a.status,
@@ -802,30 +1132,116 @@ async function main() {
         });
         if (!flex || remaining <= 0 || swarm.aborted || swarm.cappedOut)
             break;
-        // ── Steer next wave ──
-        console.log(chalk.cyan("\n  ◆ Steering...\n"));
-        process.stdout.write("\x1B[?25l");
-        try {
-            const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, permissionMode, concurrency, makeProgressLog(), designContext);
-            process.stdout.write(`\x1B[2K\r`);
-            process.stdout.write("\x1B[?25h");
-            if (steer.done) {
-                console.log(chalk.green(`  \u2713 ${steer.reasoning}\n`));
+        // ── Steer: assess quality and decide next action ──
+        // May loop through reflect→re-steer cycles before producing execution tasks
+        let steerDone = false;
+        let steerAttempts = 0;
+        while (!steerDone && remaining > 0 && !stopping && steerAttempts < 4) {
+            steerAttempts++;
+            console.log(chalk.cyan(`\n  ◆ Assessing...\n`));
+            process.stdout.write("\x1B[?25l");
+            try {
+                const memory = readRunMemory(runDir, previousKnowledge || undefined);
+                const steer = await steerWave(objective, waveHistory, remaining, cwd, plannerModel, workerModel, permissionMode, concurrency, makeProgressLog(), memory);
+                process.stdout.write(`\x1B[2K\r`);
+                process.stdout.write("\x1B[?25h");
+                // Persist context layers
+                if (steer.statusUpdate)
+                    writeStatus(runDir, steer.statusUpdate);
+                if (steer.goalUpdate) {
+                    writeGoalUpdate(runDir, steer.goalUpdate);
+                    console.log(chalk.dim(`  Goal refined: ${steer.goalUpdate.slice(0, 100)}\n`));
+                }
+                // Archive milestone every ~5 execution waves
+                const execWaves = waveHistory.filter(w => w.kind === "execute").length;
+                if (execWaves > 0 && execWaves % 5 === 0)
+                    archiveMilestone(runDir, waveNum);
+                if (steer.done || steer.action === "done") {
+                    console.log(chalk.green(`  \u2713 ${steer.reasoning}\n`));
+                    steerDone = true;
+                    remaining = 0; // exit outer loop too
+                    break;
+                }
+                if (steer.action === "reflect") {
+                    // Safety: no consecutive reflections, budget cap
+                    const canReflect = lastWaveKind !== "reflect" && reflectionBudgetUsed + 2 <= maxReflectionBudget;
+                    if (!canReflect) {
+                        console.log(chalk.dim(`  ${steer.reasoning}`));
+                        console.log(chalk.yellow(`  Reflection skipped (${lastWaveKind === "reflect" ? "consecutive" : "budget cap"}) — re-assessing\n`));
+                        lastWaveKind = "execute"; // allow next steer to see non-reflect
+                        continue; // re-steer in this inner loop
+                    }
+                    // Run reflection wave
+                    console.log(chalk.dim(`  ${steer.reasoning}`));
+                    console.log(chalk.cyan(`\n  ◆ Reflection: 2 agents reviewing...\n`));
+                    const reflectionDir = join(runDir, "reflections");
+                    waveNum++;
+                    const reflTasks = buildReflectionTasks(objective, memory.goal, reflectionDir, waveNum, plannerModel);
+                    const reflSwarm = new Swarm({
+                        tasks: reflTasks, concurrency: 2, cwd,
+                        model: plannerModel, permissionMode,
+                        useWorktrees: false, mergeStrategy: "yolo",
+                        agentTimeoutMs, usageCap,
+                    });
+                    currentSwarm = reflSwarm;
+                    const stopReflRender = startRenderLoop(reflSwarm);
+                    try {
+                        await reflSwarm.run();
+                    }
+                    finally {
+                        stopReflRender();
+                    }
+                    console.log(renderSummary(reflSwarm));
+                    accCost += reflSwarm.totalCostUsd;
+                    accIn += reflSwarm.totalInputTokens;
+                    accOut += reflSwarm.totalOutputTokens;
+                    accCompleted += reflSwarm.completed;
+                    accFailed += reflSwarm.failed;
+                    accTools += reflSwarm.agents.reduce((sum, a) => sum + a.toolCalls, 0);
+                    remaining -= reflSwarm.completed + reflSwarm.failed;
+                    reflectionBudgetUsed += reflSwarm.completed + reflSwarm.failed;
+                    waveHistory.push({
+                        wave: waveNum,
+                        kind: "reflect",
+                        tasks: reflSwarm.agents.map(a => ({ prompt: a.task.prompt, status: a.status, filesChanged: a.filesChanged, error: a.error })),
+                    });
+                    lastWaveKind = "reflect";
+                    continue; // re-steer with reflection artifacts
+                }
+                // action === "execute"
+                if (steer.tasks.length === 0) {
+                    console.log(chalk.green(`  \u2713 ${steer.reasoning}\n`));
+                    remaining = 0;
+                    break;
+                }
+                console.log(chalk.dim(`  ${steer.reasoning}\n`));
+                currentTasks = steer.tasks;
+                lastWaveKind = "execute";
+                steerDone = true; // exit inner loop, outer loop runs the tasks
+            }
+            catch (err) {
+                process.stdout.write("\x1B[?25h");
+                console.log(chalk.yellow(`  Steering failed: ${err.message?.slice(0, 80)} \u2014 stopping\n`));
+                remaining = 0;
                 break;
             }
-            console.log(chalk.dim(`  ${steer.reasoning}\n`));
-            currentTasks = steer.tasks;
-            waveNum++;
-        }
-        catch (err) {
-            process.stdout.write("\x1B[?25h");
-            console.log(chalk.yellow(`  Steering failed: ${err.message?.slice(0, 80)} \u2014 stopping\n`));
-            break;
         }
+        waveNum++;
+    }
+    // Mark run as done — keep sessions/milestones/status/goal, clean transient files
+    saveRunState(runDir, {
+        id: `run-${new Date().toISOString().slice(0, 19)}`, objective: objective ?? "", budget: budget ?? tasks.length,
+        remaining, workerModel, plannerModel, concurrency, permissionMode,
+        usageCap, flex, useWorktrees, mergeStrategy, waveNum, currentTasks: [],
+        lastWaveKind, reflectionBudgetUsed, accCost, accCompleted, accFailed,
+        branches, phase: "done", startedAt: new Date(runStartedAt).toISOString(), cwd,
+    });
+    try {
+        rmSync(join(runDir, "designs"), { recursive: true, force: true });
     }
-    // Clean up design docs
+    catch { }
     try {
-        rmSync(join(cwd, ".claude-overnight", "designs"), { recursive: true, force: true });
+        rmSync(join(runDir, "reflections"), { recursive: true, force: true });
     }
     catch { }
     // Switch back if we created a run branch
@@ -837,48 +1253,40 @@ async function main() {
     }
     // ── Final summary ──
     const waves = waveNum + 1;
-    const cappedNote = lastCapped ? chalk.yellow(` (capped at ${usageCap != null ? Math.round(usageCap * 100) : 100}%)`) : "";
-    const summaryText = accFailed > 0
-        ? chalk.yellow(`${accCompleted} done, ${accFailed} failed`) + cappedNote
-        : chalk.green(`${accCompleted} done`) + cappedNote;
-    const costText = accCost > 0 ? chalk.dim(` · $${accCost.toFixed(3)}`) : "";
-    const wavePart = waves > 1 ? chalk.dim(`${waves} waves · `) : "";
+    const elapsed = Math.round((Date.now() - runStartedAt) / 1000);
+    const elapsedStr = elapsed < 60 ? `${elapsed}s` : elapsed < 3600 ? `${Math.floor(elapsed / 60)}m ${elapsed % 60}s` : `${Math.floor(elapsed / 3600)}h ${Math.floor((elapsed % 3600) / 60)}m`;
+    const totalMerged = branches.filter(b => b.status === "merged").length;
+    const totalConflicts = branches.filter(b => b.status === "merge-failed").length;
     console.log(chalk.dim(`\n  ${"─".repeat(36)}`));
-    console.log(`  ${chalk.green("✓")} ${chalk.bold("Complete")}  ${wavePart}${summaryText}${costText}`);
-    if (accFailed > 0 && waves === 1) {
-        const failedAgents = currentSwarm?.agents.filter((a) => a.status === "error") ?? [];
-        if (failedAgents.length > 0) {
-            console.log(chalk.red(`\n  Failed agents:`));
-            for (const a of failedAgents) {
-                console.log(chalk.red(`    \u2717 Agent ${a.id + 1}: ${a.task.prompt.slice(0, 60)}${a.task.prompt.length > 60 ? "\u2026" : ""}`));
-                console.log(chalk.dim(`      ${a.error || "unknown error"}`));
-            }
-        }
+    console.log(`  ${accFailed === 0 ? chalk.green("✓") : chalk.yellow("⚠")} ${chalk.bold("Complete")}\n`);
+    const boxLines = [];
+    const statusLine = accFailed > 0 ? `${accCompleted} done · ${accFailed} failed` : `${accCompleted} done`;
+    boxLines.push(`${waves} wave${waves > 1 ? "s" : ""} · ${statusLine} · $${accCost.toFixed(2)}`);
+    boxLines.push(`${elapsedStr} · ${fmtTokens(accIn)} in / ${fmtTokens(accOut)} out · ${accTools} tools`);
+    if (totalMerged > 0 || totalConflicts > 0)
+        boxLines.push(`${totalMerged} merged${totalConflicts > 0 ? ` · ${totalConflicts} conflicts` : ""}`);
+    if (reflectionBudgetUsed > 0)
+        boxLines.push(`${reflectionBudgetUsed} reflection agents`);
+    if (lastCapped)
+        boxLines.push(chalk.yellow(`Capped at ${usageCap != null ? Math.round(usageCap * 100) : 100}%`));
+    const boxW = Math.max(...boxLines.map(l => l.replace(/\x1B\[[0-9;]*m/g, "").length)) + 4;
+    console.log(chalk.dim(`  ╭${"─".repeat(boxW)}╮`));
+    for (const line of boxLines) {
+        const plainLen = line.replace(/\x1B\[[0-9;]*m/g, "").length;
+        console.log(chalk.dim("  │") + `  ${line}${" ".repeat(Math.max(0, boxW - 2 - plainLen))}` + chalk.dim("│"));
+    }
+    console.log(chalk.dim(`  ╰${"─".repeat(boxW)}╯`));
+    if (totalConflicts > 0) {
+        const conflictBranches = branches.filter(b => b.status === "merge-failed");
+        console.log(chalk.red(`\n  Unresolved conflicts:`));
+        for (const c of conflictBranches)
+            console.log(chalk.red(`    ${c.branch}`));
+        console.log(chalk.dim("  git merge <branch> to resolve"));
     }
-    const elapsed = Math.round((Date.now() - runStartedAt) / 1000);
-    const elapsedStr = elapsed < 60 ? `${elapsed}s` : `${Math.floor(elapsed / 60)}m ${elapsed % 60}s`;
-    console.log(chalk.dim(`  ${elapsedStr}  ${fmtTokens(accIn)} in / ${fmtTokens(accOut)} out  ${accTools} tool calls`));
     if (runBranch) {
-        console.log(chalk.dim(`  Branch: ${runBranch} \u2014 create a PR or: git merge ${runBranch}`));
-    }
-    else if (currentSwarm?.mergeResults && currentSwarm.mergeResults.length > 0) {
-        const merged = currentSwarm.mergeResults.filter((r) => r.ok);
-        const autoResolved = merged.filter((r) => r.autoResolved).length;
-        const conflicts = currentSwarm.mergeResults.filter((r) => !r.ok);
-        const target = currentSwarm.mergeBranch || "HEAD";
-        if (merged.length > 0) {
-            const extra = autoResolved > 0 ? chalk.yellow(` (${autoResolved} auto-resolved)`) : "";
-            console.log(chalk.green(`  Merged ${merged.length} branch(es) into ${target}`) + extra);
-        }
-        if (currentSwarm.mergeBranch)
-            console.log(chalk.dim(`  Branch: ${currentSwarm.mergeBranch} \u2014 create a PR or: git merge ${currentSwarm.mergeBranch}`));
-        if (conflicts.length > 0) {
-            console.log(chalk.red(`  ${conflicts.length} unresolved conflict(s):`));
-            for (const c of conflicts)
-                console.log(chalk.red(`    ${c.branch}`));
-            console.log(chalk.dim("  Merge manually: git merge <branch>"));
-        }
+        console.log(chalk.dim(`\n  Branch: ${runBranch} — git merge ${runBranch}`));
     }
+    console.log(chalk.dim(`  Run: ${runDir}`));
     if (currentSwarm?.logFile)
         console.log(chalk.dim(`  Log: ${currentSwarm.logFile}`));
     console.log("");

package/dist/planner.d.ts CHANGED Viewed

@@ -1,6 +1,7 @@
 import type { Task, PermMode } from "./types.js";
 export interface WaveSummary {
     wave: number;
+    kind: "execute" | "reflect" | "think";
     tasks: {
         prompt: string;
         status: string;
@@ -10,14 +11,26 @@ export interface WaveSummary {
 }
 export interface SteerResult {
     done: boolean;
+    action: "execute" | "reflect" | "done";
     tasks: Task[];
     reasoning: string;
+    goalUpdate?: string;
+    statusUpdate?: string;
+}
+export interface RunMemory {
+    designs: string;
+    reflections: string;
+    milestones: string;
+    status: string;
+    goal: string;
+    previousRuns?: string;
 }
 export type ModelTier = "opus" | "sonnet" | "haiku" | "unknown";
 export declare function detectModelTier(model: string): ModelTier;
 export declare function planTasks(objective: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void, flexNote?: string): Promise<Task[]>;
 export declare function identifyThemes(objective: string, count: number, model: string, permissionMode: PermMode): Promise<string[]>;
-export declare function buildThinkingTasks(objective: string, themes: string[], designDir: string, plannerModel: string): Task[];
+export declare function buildThinkingTasks(objective: string, themes: string[], designDir: string, plannerModel: string, previousKnowledge?: string): Task[];
+export declare function buildReflectionTasks(objective: string, goal: string, reflectionDir: string, waveNum: number, plannerModel: string): Task[];
 export declare function orchestrate(objective: string, designDocs: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number, concurrency: number, onLog: (text: string) => void, flexNote?: string): Promise<Task[]>;
 export declare function refinePlan(objective: string, previousTasks: Task[], feedback: string, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, budget: number | undefined, concurrency: number, onLog: (text: string) => void): Promise<Task[]>;
-export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, concurrency: number, onLog: (text: string) => void, designContext?: string): Promise<SteerResult>;
+export declare function steerWave(objective: string, history: WaveSummary[], remainingBudget: number, cwd: string, plannerModel: string, workerModel: string, permissionMode: PermMode, concurrency: number, onLog: (text: string) => void, runMemory?: RunMemory): Promise<SteerResult>;

package/dist/planner.js CHANGED Viewed

@@ -361,13 +361,14 @@ Return ONLY a JSON object: {"themes": ["angle description", ...]}`,
     const fallback = ["architecture, patterns, and conventions", "data models, state, and persistence", "user-facing flows, components, and UX", "APIs, integrations, and services", "testing, quality, and error handling", "security, performance, and infrastructure", "build, deployment, and configuration", "documentation and developer experience"];
     return Array.from({ length: count }, (_, i) => fallback[i % fallback.length]);
 }
-export function buildThinkingTasks(objective, themes, designDir, plannerModel) {
+export function buildThinkingTasks(objective, themes, designDir, plannerModel, previousKnowledge) {
+    const prevBlock = previousKnowledge ? `\nKNOWLEDGE FROM PREVIOUS RUNS:\n${previousKnowledge}\n\nBuild on this — don't re-discover what's already known.\n` : "";
     return themes.map((theme, i) => ({
         id: `think-${i}`,
         prompt: `You are a senior architect exploring a codebase to design a solution.
 OVERALL OBJECTIVE: ${objective}
+${prevBlock}
 YOUR FOCUS: ${theme}
 Explore the codebase thoroughly using Read, Glob, and Grep. Then write a design document to ${designDir}/focus-${i}.md with these sections:
@@ -389,6 +390,44 @@ Be thorough — your findings drive the execution plan.`,
         model: plannerModel,
     }));
 }
+export function buildReflectionTasks(objective, goal, reflectionDir, waveNum, plannerModel) {
+    const goalBlock = goal ? `\nEVOLVED GOAL:\n${goal}\n` : "";
+    return [
+        {
+            id: "review-0",
+            prompt: `You are a senior code reviewer performing a deep quality audit.
+OBJECTIVE: ${objective}
+${goalBlock}
+Read the codebase thoroughly. Assess:
+- **Correctness**: Bugs, missing error handling, broken flows?
+- **Architecture**: Clean design? Unnecessary or missing abstractions?
+- **Code quality**: Readability, naming, duplication, dead code?
+- **Completeness**: What's missing vs. the objective? Half-done work?
+- **Polish**: Edge cases, error messages, loading states?
+Write findings to ${reflectionDir}/wave-${waveNum}-quality.md.
+End with a ## Verdict: is this closer to "good enough" or "amazing"? What would make the biggest difference?`,
+            model: plannerModel,
+        },
+        {
+            id: "review-1",
+            prompt: `You are a UX and integration reviewer.
+OBJECTIVE: ${objective}
+${goalBlock}
+Read the codebase. Assess:
+- **UX coherence**: Do user-facing flows make sense end-to-end? Consistent experience?
+- **Integration**: Do pieces fit together? Seams, inconsistencies, broken contracts?
+- **Testing**: Meaningful coverage? Testing the right things?
+- **Gaps**: Unhandled use cases? What would surprise a user?
+Write findings to ${reflectionDir}/wave-${waveNum}-ux.md.
+End with ## Priorities: rank the top 3 things that would most improve the result.`,
+            model: plannerModel,
+        },
+    ];
+}
 export async function orchestrate(objective, designDocs, cwd, plannerModel, workerModel, permissionMode, budget, concurrency, onLog, flexNote) {
     const capability = modelCapabilityBlock(workerModel);
     const flexLine = flexNote ? `\n\n${flexNote}` : "";
@@ -549,34 +588,67 @@ async function extractTaskJson(raw, retry) {
     throw new Error("Planner did not return valid task JSON after retry");
 }
 // ── Wave steering ──
-export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, permissionMode, concurrency, onLog, designContext) {
+export async function steerWave(objective, history, remainingBudget, cwd, plannerModel, workerModel, permissionMode, concurrency, onLog, runMemory) {
     const capability = modelCapabilityBlock(workerModel);
-    const historyText = history.map(w => {
+    // Three-layer context: status (current), milestones (strategic), recent waves (tactical)
+    const recentWaves = history.slice(-3);
+    const recentText = recentWaves.length > 0 ? recentWaves.map(w => {
+        const tag = w.kind === "reflect" ? " (reflection)" : w.kind === "think" ? " (thinking)" : "";
         const lines = w.tasks.map(t => {
             const files = t.filesChanged ? ` (${t.filesChanged} files)` : "";
             const err = t.error ? ` — ${t.error}` : "";
             return `  - [${t.status}] ${t.prompt.slice(0, 120)}${files}${err}`;
         }).join("\n");
-        return `Wave ${w.wave + 1}:\n${lines}`;
-    }).join("\n\n");
-    const prompt = `You are steering an autonomous multi-wave agent system. Read the codebase to understand current state, then decide what's next.
+        return `Wave ${w.wave + 1}${tag}:\n${lines}`;
+    }).join("\n\n") : "(first wave)";
+    const lastWasReflection = history.length > 0 && history[history.length - 1].kind === "reflect";
+    const noReflectHint = lastWasReflection ? `\nIMPORTANT: The previous wave was a reflection. You MUST choose "execute" or "done" — not "reflect" again.\n` : "";
+    const cap = (s, max) => s.length > max ? s.slice(0, max) + "\n...(truncated)" : s;
+    const statusBlock = runMemory?.status ? `\nCurrent project status:\n${runMemory.status}\n` : "";
+    const milestoneBlock = runMemory?.milestones ? `\nMilestone snapshots:\n${cap(runMemory.milestones, 4000)}\n` : "";
+    const designBlock = runMemory?.designs ? `\nArchitectural research:\n${cap(runMemory.designs, 4000)}\n` : "";
+    const reflectionBlock = runMemory?.reflections ? `\nLatest quality reports:\n${cap(runMemory.reflections, 3000)}\n` : "";
+    const goalBlock = runMemory?.goal ? `\nNorth star — what "amazing" means:\n${runMemory.goal}\n` : "";
+    const prevRunBlock = runMemory?.previousRuns ? `\nKnowledge from previous runs:\n${cap(runMemory.previousRuns, 3000)}\n` : "";
+    const prompt = `You are the quality director for an autonomous multi-wave agent system. Your job is to push the work toward "amazing," not just "done."
 Objective: ${objective}
-Work completed so far:
-${historyText}
-${designContext ? `\nOriginal architectural research:\n${designContext}\n` : ""}
+${goalBlock}${statusBlock}${milestoneBlock}${prevRunBlock}
+Recent waves:
+${recentText}
+${designBlock}${reflectionBlock}
 Remaining budget: ${remainingBudget} agent sessions. ${concurrency} agents run in parallel — tasks must touch DIFFERENT files.
 ${capability}
+Total waves completed: ${history.length}
+Read the codebase. Assess: how close is this to the VISION? Not "what's missing" — "how good is what we built?"
-Read the codebase. Then decide:
-- Is the objective fully met? → {"done": true, "reasoning": "..."}
-- More work needed? Plan the next wave → {"done": false, "reasoning": "what needs doing and why", "tasks": [{"prompt": "..."}]}
+Then choose ONE action:
-Think like a tech lead between sprints: what shipped, what's missing, what needs polish, what should be scrapped and redone, what's over-engineered. Less is more — don't add work for the sake of filling budget.
+**"reflect"** — Spin up 1-2 review agents for a deep quality audit. Choose when:
+  - Substantial new code shipped and hasn't been reviewed
+  - You're unsure about quality and need expert eyes
+  - A subsystem just "completed" and deserves verification
-Respond with ONLY a JSON object (no markdown fences).`;
-    onLog("Reading codebase...");
+**"execute"** — Plan the next batch of tasks. Choose when:
+  - You know what needs doing (from reviews or your own assessment)
+  - There are clear gaps, bugs, or improvements to make
+**"done"** — The objective is met at high quality. Choose when:
+  - The code works correctly and handles edge cases
+  - The architecture is clean and pieces fit together
+  - Further work would be diminishing returns
+${noReflectHint}
+Respond with ONLY a JSON object (no markdown fences):
+{
+  "action": "execute" | "reflect" | "done",
+  "done": true/false,
+  "reasoning": "your assessment and why you chose this action",
+  "goalUpdate": "optional — refine what 'amazing' means as you learn more",
+  "statusUpdate": "REQUIRED — write a concise project status: what's built, what works, what's rough, quality level, key gaps. This replaces the previous status and is your memory for future waves.",
+  "tasks": [{"prompt": "..."}]
+}`;
+    onLog("Assessing...");
     const resultText = await runPlannerQuery(prompt, { cwd, model: plannerModel, permissionMode }, onLog);
     const parsed = await (async () => {
         const first = attemptJsonParse(resultText);
@@ -585,21 +657,26 @@ Respond with ONLY a JSON object (no markdown fences).`;
         onLog("Retrying...");
         let retryText = "";
         for await (const msg of query({
-            prompt: `Output ONLY a JSON object: {"done":true/false,"reasoning":"...","tasks":[{"prompt":"..."}]}`,
+            prompt: `Output ONLY a JSON object: {"action":"execute"|"reflect"|"done","done":true/false,"reasoning":"...","tasks":[{"prompt":"..."}]}`,
             options: { cwd, model: plannerModel, permissionMode, ...(permissionMode === "bypassPermissions" && { allowDangerouslySkipPermissions: true }), persistSession: false },
         })) {
             if (msg.type === "result" && msg.subtype === "success")
                 retryText = msg.result || "";
         }
-        return attemptJsonParse(retryText) ?? { done: true, reasoning: "Could not parse steering response" };
+        return attemptJsonParse(retryText) ?? { action: "done", done: true, reasoning: "Could not parse steering response" };
     })();
-    if (parsed.done) {
-        return { done: true, tasks: [], reasoning: parsed.reasoning || "Objective complete" };
+    const action = parsed.action || (parsed.done ? "done" : "execute");
+    const statusUpdate = parsed.statusUpdate || undefined;
+    if (action === "done") {
+        return { done: true, action: "done", tasks: [], reasoning: parsed.reasoning || "Objective complete", goalUpdate: parsed.goalUpdate, statusUpdate };
+    }
+    if (action === "reflect") {
+        return { done: false, action: "reflect", tasks: [], reasoning: parsed.reasoning || "Quality audit needed", goalUpdate: parsed.goalUpdate, statusUpdate };
     }
     let tasks = (parsed.tasks || []).map((t, i) => ({
         id: String(i),
         prompt: typeof t === "string" ? t : t.prompt,
     }));
     tasks = postProcess(tasks, remainingBudget, onLog);
-    return { done: tasks.length === 0, tasks, reasoning: parsed.reasoning || "" };
+    return { done: tasks.length === 0, action: tasks.length === 0 ? "done" : "execute", tasks, reasoning: parsed.reasoning || "", goalUpdate: parsed.goalUpdate, statusUpdate };
 }

package/dist/types.d.ts CHANGED Viewed

@@ -95,3 +95,37 @@ export type SwarmPhase = "planning" | "running" | "merging" | "done";
  * - "branch": Create a new branch, merge everything there (main untouched).
  */
 export type MergeStrategy = "yolo" | "branch";
+/** Tracks a git branch created by an agent. */
+export interface BranchRecord {
+    branch: string;
+    taskPrompt: string;
+    status: "merged" | "unmerged" | "failed" | "merge-failed";
+    filesChanged: number;
+    costUsd: number;
+}
+/** Persisted run state for crash recovery and resume. */
+export interface RunState {
+    id: string;
+    objective: string;
+    budget: number;
+    remaining: number;
+    workerModel: string;
+    plannerModel: string;
+    concurrency: number;
+    permissionMode: PermMode;
+    usageCap?: number;
+    flex: boolean;
+    useWorktrees: boolean;
+    mergeStrategy: MergeStrategy;
+    waveNum: number;
+    currentTasks: Task[];
+    lastWaveKind: "execute" | "reflect" | "think";
+    reflectionBudgetUsed: number;
+    accCost: number;
+    accCompleted: number;
+    accFailed: number;
+    branches: BranchRecord[];
+    phase: "executing" | "steering" | "reflecting" | "done";
+    startedAt: string;
+    cwd: string;
+}

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "claude-overnight",
-  "version": "0.5.1",
-  "description": "Fire off Claude agents, come back days later to shipped work. Maximizes every token in your plan.",
+  "version": "1.0.0",
+  "description": "Run 10, 100, or 1000 Claude agents overnight. Parallel autonomous AI coding with thinking waves, iterative quality steering, crash recovery, and rate limit handling. Built on the Claude Agent SDK.",
   "type": "module",
   "bin": {
     "claude-overnight": "dist/index.js"
@@ -28,14 +28,20 @@
   },
   "keywords": [
     "claude",
-    "ai",
-    "agents",
-    "parallel",
-    "autonomous",
-    "background",
+    "claude-agent-sdk",
+    "ai-agents",
+    "ai-coding",
+    "parallel-agents",
+    "autonomous-coding",
+    "swarm",
     "overnight",
     "cli",
-    "orchestration"
+    "orchestration",
+    "multi-agent",
+    "code-generation",
+    "anthropic",
+    "worktrees",
+    "iterative"
   ],
   "engines": {
     "node": ">=20"