claude-overnight 0.5.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,10 @@
1
1
  # claude-overnight
2
2
 
3
- Fire off Claude agents, come back to shipped work.
3
+ Run 10, 100, or 1000 Claude agents overnight. Come back to shipped work.
4
4
 
5
- Describe what to build. Set a budget 10 agents, 100, 1000. A planner agent analyzes your codebase, breaks the objective into independent tasks, and launches them all. Each agent runs in its own git worktree with full tooling (Read, Edit, Bash, Grep — everything). Rate limits? It waits. Windows reset? It resumes. It doesn't stop until every task is done.
5
+ Describe what to build. Set a budget. The tool plans, explores your codebase, breaks the objective into tasks, launches parallel agents in isolated git worktrees, iterates toward quality, and handles rate limits automatically. You press Run once, then go to sleep.
6
+
7
+ Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk). Works with Claude Opus, Sonnet, and Haiku.
6
8
 
7
9
  ## Install
8
10
 
@@ -10,18 +12,14 @@ Describe what to build. Set a budget — 10 agents, 100, 1000. A planner agent a
10
12
  npm install -g claude-overnight
11
13
  ```
12
14
 
13
- Requires Node.js >= 20 and Claude authentication (OAuth via `claude` CLI, or `ANTHROPIC_API_KEY`).
14
-
15
- ## Usage
15
+ Requires Node.js >= 20 and Claude authentication (`claude auth login`, or set `ANTHROPIC_API_KEY`).
16
16
 
17
- ### Interactive
17
+ ## Quick start
18
18
 
19
19
  ```bash
20
20
  claude-overnight
21
21
  ```
22
22
 
23
- A guided flow walks you through each step:
24
-
25
23
  ```
26
24
  🌙 claude-overnight
27
25
  ────────────────────────────────────
@@ -29,95 +27,102 @@ A guided flow walks you through each step:
29
27
  ① What should the agents do?
30
28
  > refactor auth, add tests, update docs
31
29
 
32
- ② Budget [10]: 50
30
+ ② Budget [10]: 200
33
31
 
34
32
  ③ Worker model:
35
33
  ● Sonnet — Sonnet 4.6 · Best for everyday tasks
36
34
  ○ Opus — Opus 4.6 · Most capable
37
- ○ Haiku — Haiku 4.5 · Fastest
38
35
 
39
36
  ④ Usage:
40
- Unlimited · full capacity, wait through rate limits
41
- ○ 90% · leave 10% for other work
42
-
43
- ╭────────────────────────────────────╮
44
- │ sonnet · budget 50 · 5× · flex │
45
- ╰────────────────────────────────────╯
37
+ 90% · leave 10% for other work
38
+
39
+ ╭──────────────────────────────────────────╮
40
+ │ sonnet · budget 200 · 5× · flex · 90% │
41
+ ╰──────────────────────────────────────────╯
42
+
43
+ ✓ 5 themes → review, press Run, walk away
44
+
45
+ ◆ Thinking: 5 agents exploring... ← architects analyze your codebase
46
+ ◆ Orchestrating plan... ← synthesizes 50 concrete tasks
47
+ ◆ Wave 1 · 50 tasks ← fully autonomous from here
48
+ ◆ Assessing... how close to amazing?
49
+ ◆ Wave 2 · 30 tasks ← improvements from assessment
50
+ ◆ Reflection: 2 agents reviewing ← deep quality audit
51
+ ◆ Wave 3 · 20 tasks ← fixes from review findings
52
+ ◆ Assessing... ✓ Vision met
46
53
  ```
47
54
 
48
- The planner generates tasks review, edit, or chat about them, then run.
55
+ You interact once (objective, budget, model, review themes), then everything runs autonomously — thinking, planning, executing, reflecting, steering. Rate-limited? It waits and retries. Crash? Resume where you left off.
49
56
 
50
- ### Task file
57
+ ## How it works
51
58
 
52
- ```bash
53
- claude-overnight tasks.json
54
- ```
59
+ ### 1. Thinking wave
55
60
 
56
- ### Inline
61
+ For budgets > 15, the tool launches **architect agents** that explore your codebase before any code is written. Each one gets a different research angle (architecture, data models, APIs, testing, etc.) and writes a structured design document. The number scales with budget: 5 for budget=50, 10 for budget=2000.
57
62
 
58
- ```bash
59
- claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
60
- ```
61
-
62
- ## How the planner works
63
-
64
- The planner always runs on the best available model (Opus) regardless of which model you pick for workers. This ensures high-quality task decomposition even when workers use a cheaper model.
63
+ ### 2. Orchestration
65
64
 
66
- ### Thinking wave
65
+ An orchestrator agent reads all design documents and synthesizes concrete execution tasks — grounded in real files and patterns the architects found. No guesswork.
67
66
 
68
- For large budgets (`budget > concurrency * 3`), the planner doesn't try to generate hundreds of tasks from scratch. Instead, it launches a **thinking wave** — a team of architect agents that explore your codebase in parallel before any code is written.
67
+ ### 3. Iterative execution
69
68
 
70
- ```
71
- ⠋ identifying themes... → splits objective into N angles (< 30s)
72
- ◆ Thinking: 5 agents exploring → each explores from its angle, writes a design doc
73
- ◆ Orchestrating plan... → reads all design docs, synthesizes execution tasks
74
- ```
69
+ Tasks run in parallel (each agent in its own git worktree). After each wave, steering assesses: "how good is this?" — not "what's missing?" It can:
75
70
 
76
- Each thinking agent gets a different research focus (architecture, data, UI, APIs, testing, etc.), explores using Read/Glob/Grep, and writes a structured design document with findings, proposed work items, and key files. The orchestrator then reads all design docs and produces grounded, well-informed execution tasks that reference specific files and patterns the researchers found.
71
+ - **Execute** more tasks to build features, fix bugs, polish UX
72
+ - **Reflect** by spinning up 1-2 review agents for deep quality/architecture audits
73
+ - **Declare done** when the vision is met at high quality
77
74
 
78
- This means a budget of 200 doesn't generate 200 tasks from a single LLM call guessing at your codebase. It sends 5 architects to study the code first, then plans 50 tasks based on their findings, executes them, steers, and repeats.
75
+ ### 4. Goal refinement
79
76
 
80
- For small budgets (≤ `concurrency * 3`), the planner skips the thinking wave and generates tasks directly fast and efficient for focused work.
77
+ The tool starts with your broad objective but evolves its definition of "amazing" as it learns your codebase. Steering refines the goal after each wave. Late waves are informed by early discoveries.
81
78
 
82
- ### Model-aware task design
79
+ ### 5. Three-layer context
83
80
 
84
- The planner calibrates task ambition based on your worker model:
81
+ Long runs stay sharp because steering maintains three layers of memory:
85
82
 
86
- **Opus workers**: Each session is a powerhouse it can own entire epics, do deep codebase research, make architectural decisions, implement complex multi-file systems, and use browser tools for analysis. The planner gives these agents full ownership and autonomy.
83
+ - **Status** a living project snapshot, updated every wave. Compressed, never truncated.
84
+ - **Milestones** — strategic snapshots archived every ~5 waves. Long-term memory.
85
+ - **Goal** — the evolving north star. What "amazing" means for this codebase.
87
86
 
88
- **Sonnet workers**: Capable of substantial implementation, refactoring, and testing. The planner gives meaningful missions with room for decision-making.
87
+ ## Run history and resume
89
88
 
90
- **Haiku workers**: Fast and efficient, best for focused tasks. The planner gives specific, well-scoped instructions with clear file paths and expected changes.
89
+ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever overwritten.
91
90
 
92
- ### Budget scaling
91
+ ```
92
+ .claude-overnight/
93
+ runs/
94
+ 2026-04-04T18-52-49/ ← run A (done, $200, 200 tasks)
95
+ run.json, status.md, goal.md, milestones/, sessions/
96
+ 2026-04-05T10-30-00/ ← run B (crashed)
97
+ run.json, sessions/
98
+ ```
93
99
 
94
- The budget also shapes task granularity:
100
+ If a run crashes, gets rate-limited, or you Ctrl+C:
95
101
 
96
- **Small budget (1-15)**: Specific, file-level tasks. "In `src/auth.ts`, refactor `validateToken()` to use JWT."
102
+ ```
103
+ ⚠ Interrupted run
104
+ ╭──────────────────────────────────────────────────╮
105
+ │ refactor auth, add tests, update docs │
106
+ │ 50/200 sessions · 3 waves · $69.16 │
107
+ │ 34 merged · 16 unmerged · 0 failed branches │
108
+ ╰──────────────────────────────────────────────────╯
109
+
110
+ Resume │ Fresh │ Quit
111
+ ```
97
112
 
98
- **Medium budget (16-50)**: Autonomous missions. "Design and implement the complete favorites system: DB schema, API routes, client hooks, error handling."
113
+ On resume: unmerged branches auto-merge, the wave loop continues, all context is preserved.
99
114
 
100
- **Large budget (50+)**: Thinking wave + orchestration. Architects explore, then execution tasks are synthesized from their findings. Each task is a substantial work session grounded in real codebase analysis.
115
+ **Knowledge carries forward** new runs inherit knowledge from completed previous runs. Thinking agents and steering see what past runs built. Run 2 knows run 1 already built the auth system.
101
116
 
102
- A budget of 200 is not 200 micro-edits. It's 5 architects + ~195 senior-engineer work sessions, planned in waves.
117
+ Add `.claude-overnight` to your `.gitignore`.
103
118
 
104
- ## Usage limits
119
+ ## Other usage modes
105
120
 
106
- Control how much of your plan capacity the run consumes:
121
+ ### Task file
107
122
 
123
+ ```bash
124
+ claude-overnight tasks.json
108
125
  ```
109
- ④ Usage:
110
- ● Unlimited · full capacity, wait through rate limits
111
- ○ 90% · leave 10% for other work
112
- ○ 75% · conservative, plenty of headroom
113
- ○ 50% · use half, keep the rest
114
- ```
115
-
116
- When utilization hits your cap, the swarm stops dispatching new tasks and lets active agents finish gracefully. This way you can run a big overnight job and still have capacity left for manual Claude usage.
117
-
118
- Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
119
-
120
- ## Task file format
121
126
 
122
127
  ```json
123
128
  {
@@ -132,71 +137,67 @@ Use `--usage-cap=90` on the command line, or `"usageCap": 90` in task files.
132
137
  }
133
138
  ```
134
139
 
135
- A plain array also works: `["task one", "task two"]`.
136
-
137
- For multi-wave runs from a task file, add `objective` and `flexiblePlan`:
140
+ For multi-wave runs, add `objective` and `flexiblePlan`:
138
141
 
139
142
  ```json
140
143
  {
141
- "objective": "Modernize the auth system and add comprehensive tests",
144
+ "objective": "Modernize the auth system",
142
145
  "flexiblePlan": true,
143
146
  "tasks": ["Refactor auth middleware", "Add JWT validation"],
144
147
  "usageCap": 90
145
148
  }
146
149
  ```
147
150
 
148
- The initial tasks run first. After each wave, a steering agent reads the codebase and plans the next wave until the objective is met or the budget runs out.
151
+ ### Inline
149
152
 
150
- | Field | Type | Default | Description |
151
- |---|---|---|---|
152
- | `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
153
- | `objective` | `string` | — | High-level goal for multi-wave steering (required when `flexiblePlan` is true) |
154
- | `flexiblePlan` | `boolean` | `false` | Enable adaptive multi-wave planning from task files |
155
- | `model` | `string` | prompted | Worker model (per-task overridable) |
156
- | `concurrency` | `number` | `5` | Max parallel agents |
157
- | `worktrees` | `boolean` | auto (git repo) | Isolate each agent in a git worktree |
158
- | `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | How agents handle dangerous operations |
159
- | `cwd` | `string` | `process.cwd()` | Working directory |
160
- | `allowedTools` | `string[]` | all | Restrict agent tools |
161
- | `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or a new branch |
162
- | `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization (e.g. 90) |
153
+ ```bash
154
+ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
155
+ ```
163
156
 
164
157
  ## CLI flags
165
158
 
166
159
  | Flag | Default | Description |
167
160
  |---|---|---|
168
- | `--budget=N` | `10` | Total agent sessions the planner targets |
169
- | `--concurrency=N` | `5` | How many agents run simultaneously |
170
- | `--model=NAME` | prompted | Worker model (planner always uses best available) |
161
+ | `--budget=N` | `10` | Total agent sessions |
162
+ | `--concurrency=N` | `5` | Parallel agents |
163
+ | `--model=NAME` | prompted | Worker model (planner uses best available) |
171
164
  | `--usage-cap=N` | unlimited | Stop at N% utilization |
172
- | `--timeout=SECONDS` | `300` | Inactivity timeout (kills only silent agents) |
173
- | `--no-flex` | — | Disable adaptive multi-wave planning (run all tasks in one shot) |
165
+ | `--timeout=SECONDS` | `300` | Inactivity timeout per agent |
166
+ | `--no-flex` | — | Disable multi-wave steering |
174
167
  | `--dry-run` | — | Show planned tasks without running |
175
- | `-h, --help` | — | Help |
176
- | `-v, --version` | — | Version |
177
168
 
178
- Budget = total work. Concurrency = pace. A budget of 100 with concurrency 5 means 100 tasks, 5 at a time.
169
+ ## Task file fields
179
170
 
180
- ## Rate limits and long runs
171
+ | Field | Type | Default | Description |
172
+ |---|---|---|---|
173
+ | `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
174
+ | `objective` | `string` | — | High-level goal for steering |
175
+ | `flexiblePlan` | `boolean` | `false` | Enable multi-wave planning |
176
+ | `model` | `string` | prompted | Worker model |
177
+ | `concurrency` | `number` | `5` | Parallel agents |
178
+ | `worktrees` | `boolean` | auto | Git worktree isolation |
179
+ | `permissionMode` | `"auto" \| "bypassPermissions" \| "default"` | `"auto"` | Permission handling |
180
+ | `mergeStrategy` | `"yolo" \| "branch"` | `"yolo"` | Merge into HEAD or new branch |
181
+ | `usageCap` | `number (0-100)` | unlimited | Stop at N% utilization |
181
182
 
182
- Built for unattended runs lasting hours, days, or weeks.
183
+ ## Rate limits
183
184
 
184
- - **Usage bar**: the live UI shows current utilization with a visual bar, percentage, and countdown to reset when rate-limited.
185
- - **Hard block**: API returns a reset timestamp — swarm pauses and resumes exactly when the window opens.
186
- - **Soft throttle**: at >75% utilization, dispatch slows to avoid hitting the limit.
187
- - **Retry with backoff**: transient errors (429, overloaded, connection reset) retry with exponential backoff.
188
- - **Usage cap**: set a ceiling and the swarm stops dispatching when it's reached — active agents finish, no new ones start.
185
+ Built for unattended runs lasting hours or days.
189
186
 
190
- No tasks are dropped. Set a budget of 1000 and go to sleep.
187
+ - **Hard block**: pauses until the rate limit window resets, then resumes
188
+ - **Soft throttle**: slows dispatch at >75% utilization
189
+ - **Retry with backoff**: transient errors (429, overloaded) retry automatically
190
+ - **Usage cap**: set a ceiling, active agents finish, no new ones start
191
+ - **Planner retries**: steering and orchestration also retry on rate limits (30s/60s/120s backoff)
191
192
 
192
193
  ## Worktrees and merging
193
194
 
194
- Each agent gets an isolated git worktree on a `swarm/task-N` branch. Changes auto-commit when the agent finishes. After all agents complete, branches merge back sequentially.
195
+ Each agent gets an isolated git worktree (`swarm/task-N` branch). Changes auto-commit. After all agents complete, branches merge back.
195
196
 
196
- - `"yolo"` (default): merges directly into your current branch
197
- - `"branch"`: creates a `swarm/run-{timestamp}` branch (main untouched)
197
+ - `"yolo"` (default): merges into your current branch
198
+ - `"branch"`: creates a new `swarm/run-{timestamp}` branch
198
199
 
199
- Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved for manual resolution. Stale worktrees and `swarm/*` branches from previous runs are cleaned up on startup.
200
+ Conflicts retry with `-X theirs`. Unresolved branches are preserved for manual merge.
200
201
 
201
202
  ## Exit codes
202
203
 
@@ -205,3 +206,7 @@ Merge conflicts retry with `-X theirs`. If that fails, the branch is preserved f
205
206
  | `0` | All tasks succeeded |
206
207
  | `1` | Some tasks failed |
207
208
  | `2` | All failed or none completed |
209
+
210
+ ## License
211
+
212
+ MIT