claude-overnight 0.5.1 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,10 @@
1
1
  # claude-overnight
2
2
 
3
- Fire off Claude agents, come back to shipped work.
3
+ Run 10, 100, or 1000 Claude agents overnight. Come back to shipped work.
4
4
 
5
- Describe what to build. Set a budget 10 agents, 100, 1000. A planner agent analyzes your codebase, breaks the objective into independent tasks, and launches them all. Each agent runs in its own git worktree with full tooling (Read, Edit, Bash, Grep — everything). Rate limits? It waits. Windows reset? It resumes. It doesn't stop until every task is done.
5
+ Describe what to build. Set a budget. The tool plans, explores your codebase, breaks the objective into tasks, launches parallel agents in isolated git worktrees, iterates toward quality, and handles rate limits automatically. You press Run once, then go to sleep.
6
+
7
+ Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Works with Claude Opus, Sonnet, and Haiku.
6
8
 
7
9
  ## Install
8
10
 
@@ -10,18 +12,14 @@ Describe what to build. Set a budget — 10 agents, 100, 1000. A planner agent a
10
12
  npm install -g claude-overnight
11
13
  ```
12
14
 
13
- Requires Node.js >= 20 and Claude authentication (OAuth via `claude` CLI, or `ANTHROPIC_API_KEY`).
14
-
15
- ## Usage
15
+ Requires Node.js >= 20 and Claude authentication (`claude auth login`, or set `ANTHROPIC_API_KEY`).
16
16
 
17
- ### Interactive
17
+ ## Quick start
18
18
 
19
19
  ```bash
20
20
  claude-overnight
21
21
  ```
22
22
 
23
- A guided flow walks you through each step:
24
-
25
23
  ```
26
24
  🌙 claude-overnight
27
25
  ────────────────────────────────────
@@ -29,98 +27,102 @@ A guided flow walks you through each step:
29
27
  ① What should the agents do?
30
28
  > refactor auth, add tests, update docs
31
29
 
32
- ② Budget [10]: 50
30
+ ② Budget [10]: 200
33
31
 
34
32
  ③ Worker model:
35
33
  ● Sonnet — Sonnet 4.6 · Best for everyday tasks
36
34
  ○ Opus — Opus 4.6 · Most capable
37
- ○ Haiku — Haiku 4.5 · Fastest
38
35
 
39
36
  ④ Usage:
40
- Unlimited · full capacity, wait through rate limits
41
- ○ 90% · leave 10% for other work
42
-
43
- ╭────────────────────────────────────╮
44
- │ sonnet · budget 50 · 5× · flex │
45
- ╰────────────────────────────────────╯
37
+ 90% · leave 10% for other work
38
+
39
+ ╭──────────────────────────────────────────╮
40
+ │ sonnet · budget 200 · 5× · flex · 90% │
41
+ ╰──────────────────────────────────────────╯
42
+
43
+ ✓ 5 themes → review, press Run, walk away
44
+
45
+ ◆ Thinking: 5 agents exploring... ← architects analyze your codebase
46
+ ◆ Orchestrating plan... ← synthesizes 50 concrete tasks
47
+ ◆ Wave 1 · 50 tasks ← fully autonomous from here
48
+ ◆ Assessing... how close to amazing?
49
+ ◆ Wave 2 · 30 tasks ← improvements from assessment
50
+ ◆ Reflection: 2 agents reviewing ← deep quality audit
51
+ ◆ Wave 3 · 20 tasks ← fixes from review findings
52
+ ◆ Assessing... ✓ Vision met
46
53
  ```
47
54
 
48
- For large budgets, the planner identifies research themes — review them, then press Run. Everything after that is fully autonomous: thinking agents explore, the orchestrator synthesizes tasks, execution waves run, and steering adapts between waves. No further interaction needed go to sleep.
55
+ You interact once (objective, budget, model, review themes), then everything runs autonomously thinking, planning, executing, reflecting, steering. Rate-limited? It waits and retries. Crash? Resume where you left off.
49
56
 
50
- ### Task file
57
+ ## How it works
51
58
 
52
- ```bash
53
- claude-overnight tasks.json
54
- ```
59
+ ### 1. Thinking wave
55
60
 
56
- ### Inline
61
+ For budgets > 15, the tool launches **architect agents** that explore your codebase before any code is written. Each one gets a different research angle (architecture, data models, APIs, testing, etc.) and writes a structured design document. The number scales with budget: 5 for budget=50, 10 for budget=2000.
57
62
 
58
- ```bash
59
- claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
60
- ```
61
-
62
- ## How the planner works
63
-
64
- The planner always runs on the best available model (Opus) regardless of which model you pick for workers. This ensures high-quality task decomposition even when workers use a cheaper model.
63
+ ### 2. Orchestration
65
64
 
66
- ### Thinking wave
65
+ An orchestrator agent reads all design documents and synthesizes concrete execution tasks — grounded in real files and patterns the architects found. No guesswork.
67
66
 
68
- For large budgets (`budget > concurrency * 3`), the planner doesn't try to generate hundreds of tasks from scratch. Instead, it launches a **thinking wave** — a team of architect agents that explore your codebase in parallel before any code is written.
67
+ ### 3. Iterative execution
69
68
 
70
- ```
71
- ⠋ identifying themes... → splits objective into N angles (< 30s)
72
- ✓ 10 themes → review themes, press Run, walk away
73
- ◆ Thinking: 10 agents exploring → each explores from its angle, writes a design doc
74
- ◆ Orchestrating plan... → reads all design docs, synthesizes execution tasks
75
- ◆ Wave 1 · 50 tasks → fully autonomous from here
76
- ◆ Steering... → adapts between waves, retries on rate limits
77
- ```
69
+ Tasks run in parallel (each agent in its own git worktree). After each wave, steering assesses: "how good is this?" — not "what's missing?" It can:
78
70
 
79
- The review prompt appears right after theme identification — the last thing requiring your presence. After you press Run, the thinking wave, orchestration, execution, and steering all run autonomously. Rate-limited? The planner waits and retries. Go to sleep.
71
+ - **Execute** more tasks to build features, fix bugs, polish UX
72
+ - **Reflect** by spinning up 1-2 review agents for deep quality/architecture audits
73
+ - **Declare done** when the vision is met at high quality
80
74
 
81
- The number of thinking agents scales with budget: 5 for budget=50, 10 for budget=2000+. Each agent explores the codebase from a different angle and writes a structured design document. The orchestrator then reads all design docs and produces grounded execution tasks referencing real files and patterns.
75
+ ### 4. Goal refinement
82
76
 
83
- For small budgets (≤ `concurrency * 3`), the planner skips the thinking wave and generates tasks directly fast and efficient for focused work.
77
+ The tool starts with your broad objective but evolves its definition of "amazing" as it learns your codebase. Steering refines the goal after each wave. Late waves are informed by early discoveries.
84
78
 
85
- ### Model-aware task design
79
+ ### 5. Three-layer context
86
80
 
87
- The planner calibrates task ambition based on your worker model:
81
+ Long runs stay sharp because steering maintains three layers of memory:
88
82
 
89
- **Opus workers**: Each session is a powerhouse it can own entire epics, do deep codebase research, make architectural decisions, implement complex multi-file systems, and use browser tools for analysis. The planner gives these agents full ownership and autonomy.
83
+ - **Status** a living project snapshot, updated every wave. Compressed, never truncated.
84
+ - **Milestones** — strategic snapshots archived every ~5 waves. Long-term memory.
85
+ - **Goal** — the evolving north star. What "amazing" means for this codebase.
90
86
 
91
- **Sonnet workers**: Capable of substantial implementation, refactoring, and testing. The planner gives meaningful missions with room for decision-making.
87
+ ## Run history and resume
92
88
 
93
- **Haiku workers**: Fast and efficient, best for focused tasks. The planner gives specific, well-scoped instructions with clear file paths and expected changes.
89
+ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever overwritten.
94
90
 
95
- ### Budget scaling
91
+ ```
92
+ .claude-overnight/
93
+ runs/
94
+ 2026-04-04T18-52-49/ ← run A (done, $200, 200 tasks)
95
+ run.json, status.md, goal.md, milestones/, sessions/
96
+ 2026-04-05T10-30-00/ ← run B (crashed)
97
+ run.json, sessions/
98
+ ```
96
99
 
97
- The budget also shapes task granularity:
100
+ If a run crashes, gets rate-limited, or you Ctrl+C:
98
101
 
99
- **Small budget (1-15)**: Specific, file-level tasks. "In `src/auth.ts`, refactor `validateToken()` to use JWT."
102
+ ```
103
+ ⚠ Interrupted run
104
+ ╭──────────────────────────────────────────────────╮
105
+ │ refactor auth, add tests, update docs │
106
+ │ 50/200 sessions · 3 waves · $69.16 │
107
+ │ 34 merged · 16 unmerged · 0 failed branches │
108
+ ╰──────────────────────────────────────────────────╯
109
+
110
+ Resume │ Fresh │ Quit
111
+ ```
100
112
 
101
- **Medium budget (16-50)**: Autonomous missions. "Design and implement the complete favorites system: DB schema, API routes, client hooks, error handling."
113
+ On resume: unmerged branches auto-merge, the wave loop continues, all context is preserved.
102
114
 
103
- **Large budget (50+)**: Thinking wave + orchestration. Architects explore, then execution tasks are synthesized from their findings. Each task is a substantial work session grounded in real codebase analysis.
115
+ **Knowledge carries forward** new runs inherit knowledge from completed previous runs. Thinking agents and steering see what past runs built. Run 2 knows run 1 already built the auth system.
104
116
 
105
- A budget of 200 is not 200 micro-edits. It's ~5 architects + ~195 senior-engineer work sessions, planned in waves. A budget of 2000 gets 10 architects.
117
+ Add `.claude-overnight` to your `.gitignore`.
106
118
 
107
- ## Usage limits
119
+ ## Other usage modes
108
120
 
109
- Control how much of your plan capacity the run consumes:
121
+ ### Task file
110
122
 
123
+ ```bash
124
+ claude-overnight tasks.json
111
125
  ```
112
- ④ Usage:
113
- ● Unlimited · full capacity, wait through rate limits
114
- ○ 90% · leave 10% for other work
115
- ○ 75% · conservative, plenty of headroom
116
- ○ 50% · use half, keep the rest
117
- ```
118
-
119
- When utilization hits your cap, the swarm stops dispatching new tasks and lets active agents finish gracefully. This way you can run a big overnight job and still have capacity left for manual Claude usage.
120
-
121
- Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
122
-
123
- ## Task file format
124
126
 
125
127
  ```json
126
128
  {
@@ -135,71 +137,67 @@ Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
135
137
  }
136
138
  ```
137
139
 
138
- A plain array also works: `["task one", "task two"]`.
139
-
140
- For multi-wave runs from a task file, add `objective` and `flexiblePlan`:
140
+ For multi-wave runs, add `objective` and `flexiblePlan`:
141
141
 
142
142
  ```json
143
143
  {
144
- "objective": "Modernize the auth system and add comprehensive tests",
144
+ "objective": "Modernize the auth system",
145
145
  "flexiblePlan": true,
146
146
  "tasks": ["Refactor auth middleware", "Add JWT validation"],
147
147
  "usageCap": 90
148
148
  }
149
149
  ```
150
150
 
151
- The initial tasks run first. After each wave, a steering agent reads the codebase and plans the next wave until the objective is met or the budget runs out.
151
+ ### Inline
152
152
 
153
- | Field | Type | Default | Description |
154
- |---|---|---|---|
155
- | `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
156
- | `objective` | `string` | — | High-level goal for multi-wave steering (required when `flexiblePlan` is true) |
157
- | `flexiblePlan` | `boolean` | `false` | Enable adaptive multi-wave planning from task files |
158
- | `model` | `string` | prompted | Worker model (per-task overridable) |
159
- | `concurrency` | `number` | `5` | Max parallel agents |
160
- | `worktrees` | `boolean` | auto (git repo) | Isolate each agent in a git worktree |
161
- | `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | How agents handle dangerous operations |
162
- | `cwd` | `string` | `process.cwd()` | Working directory |
163
- | `allowedTools` | `string[]` | all | Restrict agent tools |
164
- | `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or a new branch |
165
- | `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization (e.g. 90) |
153
+ ```bash
154
+ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
155
+ ```
166
156
 
167
157
  ## CLI flags
168
158
 
169
159
  | Flag | Default | Description |
170
160
  |---|---|---|
171
- | `--budget=N` | `10` | Total agent sessions the planner targets |
172
- | `--concurrency=N` | `5` | How many agents run simultaneously |
173
- | `--model=NAME` | prompted | Worker model (planner always uses best available) |
161
+ | `--budget=N` | `10` | Total agent sessions |
162
+ | `--concurrency=N` | `5` | Parallel agents |
163
+ | `--model=NAME` | prompted | Worker model (planner uses best available) |
174
164
  | `--usage-cap=N` | unlimited | Stop at N% utilization |
175
- | `--timeout=SECONDS` | `300` | Inactivity timeout (kills only silent agents) |
176
- | `--no-flex` | — | Disable adaptive multi-wave planning (run all tasks in one shot) |
165
+ | `--timeout=SECONDS` | `300` | Inactivity timeout per agent |
166
+ | `--no-flex` | — | Disable multi-wave steering |
177
167
  | `--dry-run` | — | Show planned tasks without running |
178
- | `-h, --help` | — | Help |
179
- | `-v, --version` | — | Version |
180
168
 
181
- Budget = total work. Concurrency = pace. A budget of 100 with concurrency 5 means 100 tasks, 5 at a time.
169
+ ## Task file fields
182
170
 
183
- ## Rate limits and long runs
171
+ | Field | Type | Default | Description |
172
+ |---|---|---|---|
173
+ | `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
174
+ | `objective` | `string` | — | High-level goal for steering |
175
+ | `flexiblePlan` | `boolean` | `false` | Enable multi-wave planning |
176
+ | `model` | `string` | prompted | Worker model |
177
+ | `concurrency` | `number` | `5` | Parallel agents |
178
+ | `worktrees` | `boolean` | auto | Git worktree isolation |
179
+ | `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | Permission handling |
180
+ | `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or new branch |
181
+ | `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization |
184
182
 
185
- Built for unattended runs lasting hours, days, or weeks.
183
+ ## Rate limits
186
184
 
187
- - **Usage bar**: the live UI shows current utilization with a visual bar, percentage, and countdown to reset when rate-limited.
188
- - **Hard block**: API returns a reset timestamp — swarm pauses and resumes exactly when the window opens.
189
- - **Soft throttle**: at >75% utilization, dispatch slows to avoid hitting the limit.
190
- - **Retry with backoff**: transient errors (429, overloaded, connection reset) retry with exponential backoff.
191
- - **Usage cap**: set a ceiling and the swarm stops dispatching when it's reached — active agents finish, no new ones start.
185
+ Built for unattended runs lasting hours or days.
192
186
 
193
- No tasks are dropped. Set a budget of 1000 and go to sleep.
187
+ - **Hard block**: pauses until the rate limit window resets, then resumes
188
+ - **Soft throttle**: slows dispatch at >75% utilization
189
+ - **Retry with backoff**: transient errors (429, overloaded) retry automatically
190
+ - **Usage cap**: set a ceiling, active agents finish, no new ones start
191
+ - **Planner retries**: steering and orchestration also retry on rate limits (30s/60s/120s backoff)
194
192
 
195
193
  ## Worktrees and merging
196
194
 
197
- Each agent gets an isolated git worktree on a `swarm/task-N` branch. Changes auto-commit when the agent finishes. After all agents complete, branches merge back sequentially.
195
+ Each agent gets an isolated git worktree (`swarm/task-N` branch). Changes auto-commit. After all agents complete, branches merge back.
198
196
 
199
- - `"yolo"` (default): merges directly into your current branch
200
- - `"branch"`: creates a `swarm/run-{timestamp}` branch (main untouched)
197
+ - `"yolo"` (default): merges into your current branch
198
+ - `"branch"`: creates a new `swarm/run-{timestamp}` branch
201
199
 
202
- Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved for manual resolution. Stale worktrees and `swarm/*` branches from previous runs are cleaned up on startup.
200
+ Conflicts retry with `-X theirs`. Unresolved branches are preserved for manual merge.
203
201
 
204
202
  ## Exit codes
205
203
 
@@ -208,3 +206,7 @@ Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved f
208
206
  | `0` | All tasks succeeded |
209
207
  | `1` | Some tasks failed |
210
208
  | `2` | All failed or none completed |
209
+
210
+ ## License
211
+
212
+ MIT