claude-overnight 1.16.15 → 1.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/QUICKSHEET_PLAYWRIGHT.md +65 -0
- package/README.md +91 -55
- package/dist/bin.js +1 -1
- package/dist/cli.d.ts +1 -1
- package/dist/cli.js +4 -4
- package/dist/index.js +87 -46
- package/dist/merge.d.ts +1 -1
- package/dist/merge.js +7 -7
- package/dist/planner-query.d.ts +1 -1
- package/dist/planner-query.js +8 -8
- package/dist/planner.d.ts +1 -1
- package/dist/planner.js +21 -21
- package/dist/providers.d.ts +3 -1
- package/dist/providers.js +5 -3
- package/dist/render.d.ts +1 -1
- package/dist/render.js +3 -3
- package/dist/run.d.ts +21 -14
- package/dist/run.js +177 -74
- package/dist/state.d.ts +2 -2
- package/dist/state.js +9 -9
- package/dist/steering.d.ts +1 -1
- package/dist/steering.js +21 -21
- package/dist/swarm.d.ts +3 -3
- package/dist/swarm.js +38 -24
- package/dist/types.d.ts +47 -14
- package/dist/types.js +2 -2
- package/dist/ui.d.ts +4 -4
- package/dist/ui.js +4 -4
- package/package.json +3 -1
- package/plugins/claude-overnight/.claude-plugin/plugin.json +10 -0
- package/plugins/claude-overnight/skills/claude-overnight/SKILL.md +97 -0
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Playwright Parallel Usage
|
|
2
|
+
|
|
3
|
+
When running claude-overnight with parallel agents that use the Playwright MCP server, avoid lock conflicts and session cross-contamination.
|
|
4
|
+
|
|
5
|
+
## Isolation Levels
|
|
6
|
+
|
|
7
|
+
| Goal | Approach |
|
|
8
|
+
|---|---|
|
|
9
|
+
| Non-disruptive, no focus steal | Headless mode (default) |
|
|
10
|
+
| Several agents in parallel, no shared cookies | Headless + each MCP server: `--isolated` (or `isolated: true`) |
|
|
11
|
+
| Several agents, each with saved login | Headless + each MCP server: unique `userDataDir` or its own `--storage-state` file |
|
|
12
|
+
| Anti-bot interception (CAPTCHA, Cloudflare) | Fall back to headed mode only when necessary |
|
|
13
|
+
|
|
14
|
+
**Headless preferred by default.** Every headed browser launch becomes the foreground app on macOS, which is disruptive during long runs. Only fall back to headed when anti-bot detection (CAPTCHA, Cloudflare challenge, etc.) requires visible browser interaction.
|
|
15
|
+
|
|
16
|
+
## MCP Server Configuration
|
|
17
|
+
|
|
18
|
+
Add to your `settings.json` or `.claude/settings.local.json`:
|
|
19
|
+
|
|
20
|
+
```json
|
|
21
|
+
{
|
|
22
|
+
"mcpServers": {
|
|
23
|
+
"playwright-1": {
|
|
24
|
+
"command": "npx",
|
|
25
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--isolated", "--headless"],
|
|
26
|
+
"env": {}
|
|
27
|
+
},
|
|
28
|
+
"playwright-2": {
|
|
29
|
+
"command": "npx",
|
|
30
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--isolated", "--headless"],
|
|
31
|
+
"env": {}
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
}
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
For saved logins, give each server its own `userDataDir`:
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"mcpServers": {
|
|
42
|
+
"playwright-agent-a": {
|
|
43
|
+
"command": "npx",
|
|
44
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--headless", "--userDataDir", "/tmp/pw-agent-a"],
|
|
45
|
+
"env": {}
|
|
46
|
+
},
|
|
47
|
+
"playwright-agent-b": {
|
|
48
|
+
"command": "npx",
|
|
49
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--headless", "--userDataDir", "/tmp/pw-agent-b"],
|
|
50
|
+
"env": {}
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Context7 Documentation
|
|
57
|
+
|
|
58
|
+
For the latest Playwright API docs and patterns:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
npx ctx7@latest library playwright "parallel browser instances isolation"
|
|
62
|
+
npx ctx7@latest docs <libraryId> "parallel browser instances"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
**Note:** ctx7 requires authentication (`npx ctx7@latest login` or `CONTEXT7_API_KEY` env var). If unauthenticated, lookups will fail -- agents should fall back to training data.
|
package/README.md
CHANGED
|
@@ -1,18 +1,18 @@
|
|
|
1
1
|
# claude-overnight
|
|
2
2
|
|
|
3
|
-
**A background lane for your Claude Max plan.** Runs a capped swarm of Claude Agent SDK sessions in isolated git worktrees
|
|
3
|
+
**A background lane for your Claude Max plan.** Runs a capped swarm of Claude Agent SDK sessions in isolated git worktrees -- stops at a usage cap you set, so your interactive Claude Code always has headroom. Rate-limited? It waits. Crash? It resumes with full context.
|
|
4
4
|
|
|
5
5
|
Your Max plan rate limits eat interactive coding time. One deep refactor and the 5-hour window is gone before lunch. `claude-overnight` runs background agent sessions up to the percentage cap you pick (90% is typical), leaving the rest free for your own Claude Code session. Hand it an objective and a session budget, walk away, review the diff when the run ends.
|
|
6
6
|
|
|
7
|
-
Isolated by default. Every agent runs in its own git worktree on its own branch, so a misbehaving agent can't trash your working tree. You choose what agents can do before the run starts
|
|
7
|
+
Isolated by default. Every agent runs in its own git worktree on its own branch, so a misbehaving agent can't trash your working tree. You choose what agents can do before the run starts -- no surprise escalation mid-flight. Unmerged branches are preserved for manual review, never discarded. Built on the [Claude Agent SDK](https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk) -- not a Claude Code replacement, but a background lane that runs alongside it.
|
|
8
8
|
|
|
9
|
-
Different shape from hosted agent harnesses like [Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview): instead of one agent in one cloud container billed separately, you get many parallel sessions on your own machine, in your real repo, against your own Max plan (or API key). Works with Claude Opus, Sonnet, and Haiku
|
|
9
|
+
Different shape from hosted agent harnesses like [Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview): instead of one agent in one cloud container billed separately, you get many parallel sessions on your own machine, in your real repo, against your own Max plan (or API key). Works with Claude Opus, Sonnet, and Haiku -- or pair an Anthropic planner with a cheaper executor on Qwen, OpenRouter, or any Anthropic-compatible endpoint.
|
|
10
10
|
|
|
11
11
|
## Run on Qwen 3.6 Plus
|
|
12
12
|
|
|
13
|
-
Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in executor that speaks the Anthropic Messages API
|
|
13
|
+
Hit your Claude Max plan limits? Running on a tight budget? Qwen 3.6 Plus via Alibaba Cloud's DashScope gateway is a drop-in executor that speaks the Anthropic Messages API -- same client, same flow, pennies per run.
|
|
14
14
|
|
|
15
|
-
1. **Get an API key.** Sign up at [Alibaba Cloud](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fmodelstudio.console.alibabacloud.com%2Fap-southeast-1%3Ftab%3Ddashboard%23%2Fapi-key&clearRedirectCookie=1)
|
|
15
|
+
1. **Get an API key.** Sign up at [Alibaba Cloud](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fmodelstudio.console.alibabacloud.com%2Fap-southeast-1%3Ftab%3Ddashboard%23%2Fapi-key&clearRedirectCookie=1) -- the link takes you straight to the API key dashboard.
|
|
16
16
|
2. **Configure the provider.** Run `claude-overnight`, choose `Other…` on the executor step, and fill in:
|
|
17
17
|
|
|
18
18
|
| Field | Value |
|
|
@@ -39,7 +39,7 @@ claude-overnight
|
|
|
39
39
|
npm install -g claude-overnight
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
Requires Node.js ≥ 20 and Claude authentication (`claude auth login` or `ANTHROPIC_API_KEY`). No Anthropic plan or key? See **Run on Qwen 3.6 Plus** above
|
|
42
|
+
Requires Node.js ≥ 20 and Claude authentication (`claude auth login` or `ANTHROPIC_API_KEY`). No Anthropic plan or key? See **Run on Qwen 3.6 Plus** above -- a cheap, drop-in alternative.
|
|
43
43
|
|
|
44
44
|
## Quick start
|
|
45
45
|
|
|
@@ -56,13 +56,13 @@ claude-overnight
|
|
|
56
56
|
|
|
57
57
|
② Budget [10]: 200
|
|
58
58
|
|
|
59
|
-
④ Planner model (thinking, steering
|
|
60
|
-
● Opus
|
|
61
|
-
○ Sonnet
|
|
59
|
+
④ Planner model (thinking, steering -- use your strongest):
|
|
60
|
+
● Opus -- Opus 4.6 · Most capable
|
|
61
|
+
○ Sonnet -- Sonnet 4.6 · Best for everyday tasks
|
|
62
62
|
|
|
63
|
-
⑤ Executor model (what runs the tasks
|
|
64
|
-
● Sonnet
|
|
65
|
-
○ Opus
|
|
63
|
+
⑤ Executor model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
|
|
64
|
+
● Sonnet -- Sonnet 4.6 · Best for everyday tasks
|
|
65
|
+
○ Opus -- Opus 4.6 · Most capable
|
|
66
66
|
○ Other… · custom OpenAI/Anthropic-compatible endpoint
|
|
67
67
|
|
|
68
68
|
⑥ Usage cap:
|
|
@@ -89,58 +89,64 @@ claude-overnight
|
|
|
89
89
|
◆ Assessing... ✓ Done
|
|
90
90
|
```
|
|
91
91
|
|
|
92
|
-
You interact once (objective, budget, model, review themes), then the rest runs unattended
|
|
92
|
+
You interact once (objective, budget, model, review themes), then the rest runs unattended -- thinking, planning, executing, reflecting, steering. Rate-limited? It waits and retries. Crash? Resume where you left off. Capped at usage limit? Pick up next time with full context preserved.
|
|
93
93
|
|
|
94
94
|
## How it differs
|
|
95
95
|
|
|
96
96
|
- vs **Claude Code**: many agents, no driver, capped so your Claude Code session keeps its headroom
|
|
97
|
-
- vs **[Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview)**: on your machine, against your Max plan, in your real git history
|
|
97
|
+
- vs **[Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview)**: on your machine, against your Max plan, in your real git history -- not a cloud container billed separately
|
|
98
98
|
- vs **Cursor / Copilot / Cline**: asynchronous, off the keyboard
|
|
99
99
|
|
|
100
100
|
## Use cases
|
|
101
101
|
|
|
102
|
-
- **Overnight refactors**
|
|
103
|
-
- **Batch feature implementation**
|
|
104
|
-
- **Codebase-wide cleanups**
|
|
105
|
-
- **Test generation at scale**
|
|
106
|
-
- **Documentation sprints**
|
|
107
|
-
- **Framework migrations**
|
|
108
|
-
- **Quality audits**
|
|
109
|
-
- **Long research runs**
|
|
102
|
+
- **Overnight refactors** -- "Modernize the auth system" at budget 200.
|
|
103
|
+
- **Batch feature implementation** -- dozens of features from a task file, parallelized.
|
|
104
|
+
- **Codebase-wide cleanups** -- deduplicate, simplify, rename, normalize.
|
|
105
|
+
- **Test generation at scale** -- integration tests for every route or module.
|
|
106
|
+
- **Documentation sprints** -- API docs, READMEs, inline comments, changelogs.
|
|
107
|
+
- **Framework migrations** -- version upgrades, type annotations, config format swaps.
|
|
108
|
+
- **Quality audits** -- reflection waves surface architectural issues and code smells.
|
|
109
|
+
- **Long research runs** -- architect sessions explore a large codebase before any code lands.
|
|
110
110
|
|
|
111
111
|
Typical shape: one objective + a $20–$200 spend cap + walk away.
|
|
112
112
|
|
|
113
113
|
## How it works
|
|
114
114
|
|
|
115
|
-
### 1. Thinking phase
|
|
115
|
+
### 1. Thinking phase -- parallel architect sessions
|
|
116
116
|
|
|
117
117
|
For budgets > 15, the tool launches **architect agents** that explore your codebase before any code is written. Each one gets a different research angle (architecture, data models, APIs, testing, etc.) and writes a structured design document. The number scales with budget: 5 for budget=50, 10 for budget=2000.
|
|
118
118
|
|
|
119
119
|
### 2. Task orchestration
|
|
120
120
|
|
|
121
|
-
An orchestrator session reads all design documents and synthesizes concrete execution tasks
|
|
121
|
+
An orchestrator session reads all design documents and synthesizes concrete execution tasks -- grounded in real files and patterns the architects found. The task plan is also written to a file for resilience -- if orchestration is interrupted, partial results survive.
|
|
122
122
|
|
|
123
123
|
### 3. Parallel execution waves
|
|
124
124
|
|
|
125
|
-
Tasks run in parallel agent sessions (each in its own git worktree). After completing its task, each session automatically runs a **simplify pass**
|
|
125
|
+
Tasks run in parallel agent sessions (each in its own git worktree). After completing its task, each session automatically runs a **simplify pass** -- reviewing its own `git diff` for code reuse opportunities, quality issues, and inefficiencies, then fixing them before the framework commits. This is done via the SDK's **session resume** mechanism: the same agent session continues with a follow-up prompt, so the agent's full context from its task is still available -- no need to re-instruct or re-fill context.
|
|
126
126
|
|
|
127
|
-
|
|
127
|
+
### 4. Post-wave review
|
|
128
|
+
|
|
129
|
+
After each wave (flex mode, budget remaining), a dedicated **review agent** inspects the consolidated diff for issues the individual agents may have blind-spotted: missed reuse opportunities, copy-paste variations, leaky abstractions, efficiency regressions. Runs as a single-agent wave -- one session reviews what the swarm just produced.
|
|
130
|
+
|
|
131
|
+
### 5. Post-run final gate
|
|
132
|
+
|
|
133
|
+
When the run completes (steering declares done), a final **comprehensive review** runs against the full `git diff main`. Checks architecture coherence, consistency with existing patterns, build integrity, and test pass. The last quality gate before the diff lands.
|
|
134
|
+
|
|
135
|
+
### 6. Steering
|
|
136
|
+
|
|
137
|
+
After each wave, steering assesses: "how good is this?" -- not "what's missing?" It can:
|
|
128
138
|
|
|
129
139
|
- **Execute** more tasks to build features, fix bugs, polish UX
|
|
130
140
|
- **Reflect** by spinning up 1-2 review sessions for deep quality/architecture audits
|
|
131
141
|
- **Declare done** when the vision is met at high quality
|
|
132
142
|
|
|
133
|
-
###
|
|
134
|
-
|
|
135
|
-
The tool starts with your broad objective but refines its definition of quality as it learns your codebase. Steering updates the goal after each wave. Late waves are informed by early discoveries.
|
|
136
|
-
|
|
137
|
-
### 5. Three-layer context memory
|
|
143
|
+
### Three-layer context memory
|
|
138
144
|
|
|
139
145
|
Long runs stay sharp because steering maintains three layers of memory:
|
|
140
146
|
|
|
141
|
-
- **Status**
|
|
142
|
-
- **Milestones**
|
|
143
|
-
- **Goal**
|
|
147
|
+
- **Status** -- a living project snapshot, updated every wave. Compressed, never truncated.
|
|
148
|
+
- **Milestones** -- strategic snapshots archived every ~5 waves. Long-term memory.
|
|
149
|
+
- **Goal** -- the evolving north star. What quality means for this codebase.
|
|
144
150
|
|
|
145
151
|
## Run history, resume, and knowledge carryforward
|
|
146
152
|
|
|
@@ -155,7 +161,7 @@ Every run gets its own folder in `.claude-overnight/runs/`. Nothing is ever over
|
|
|
155
161
|
run.json, sessions/
|
|
156
162
|
```
|
|
157
163
|
|
|
158
|
-
Any run that stops before the steering system declares the objective complete
|
|
164
|
+
Any run that stops before the steering system declares the objective complete -- capped at usage limit, Ctrl+C, crash, rate limit timeout, steering failure -- is automatically resumable:
|
|
159
165
|
|
|
160
166
|
```
|
|
161
167
|
⚠ Unfinished run
|
|
@@ -170,7 +176,7 @@ Any run that stops before the steering system declares the objective complete
|
|
|
170
176
|
|
|
171
177
|
On resume: unmerged branches auto-merge, the wave loop continues, all context is preserved. Designs and reflections stay on disk until the objective is truly complete.
|
|
172
178
|
|
|
173
|
-
If the thinking phase succeeds but orchestration crashes, the next run detects the orphaned design docs and reuses them
|
|
179
|
+
If the thinking phase succeeds but orchestration crashes, the next run detects the orphaned design docs and reuses them -- no re-running $9 worth of architect sessions:
|
|
174
180
|
|
|
175
181
|
```
|
|
176
182
|
✓ Reusing 5 design docs (from prior attempt)
|
|
@@ -180,11 +186,11 @@ If the thinking phase succeeds but orchestration crashes, the next run detects t
|
|
|
180
186
|
...
|
|
181
187
|
```
|
|
182
188
|
|
|
183
|
-
**Knowledge carries forward**
|
|
189
|
+
**Knowledge carries forward** -- new runs inherit knowledge from completed previous runs. Thinking sessions and steering see what past runs built. Run 2 knows run 1 already built the auth system.
|
|
184
190
|
|
|
185
|
-
Add `.claude-overnight/` to your `.gitignore` (with the trailing slash
|
|
191
|
+
Add `.claude-overnight/` to your `.gitignore` (with the trailing slash -- see below).
|
|
186
192
|
|
|
187
|
-
A separate, tiny `claude-overnight.log.md` is also written at the repo root on every run. It's human-readable, append-only, one block per run (objective, start/finish, cost, outcome, branch), and is designed to be **committed**
|
|
193
|
+
A separate, tiny `claude-overnight.log.md` is also written at the repo root on every run. It's human-readable, append-only, one block per run (objective, start/finish, cost, outcome, branch), and is designed to be **committed** -- so even after `.claude-overnight/` is cleaned up you can still recover which prompt produced which commits. Use `.claude-overnight/` (with trailing slash) in your gitignore so this file isn't matched by accident.
|
|
188
194
|
|
|
189
195
|
## Task file and inline modes
|
|
190
196
|
|
|
@@ -228,20 +234,20 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
|
|
|
228
234
|
|---|---|---|
|
|
229
235
|
| `--budget=N` | `10` | Total agent sessions |
|
|
230
236
|
| `--concurrency=N` | `5` | Parallel agents |
|
|
231
|
-
| `--model=NAME` | prompted | Worker model
|
|
237
|
+
| `--model=NAME` | prompted | Worker model -- interactive picks planner + executor separately; `Other…` adds Qwen / OpenRouter / any Anthropic-compat endpoint. In non-interactive mode, a saved provider's model id is auto-resolved to the provider. |
|
|
232
238
|
| `--usage-cap=N` | unlimited | Stop at N% utilization |
|
|
233
239
|
| `--allow-extra-usage` | off | Allow extra/overage usage (billed separately) |
|
|
234
|
-
| `--extra-usage-budget=N` |
|
|
240
|
+
| `--extra-usage-budget=N` | -- | Max $ for extra usage (implies --allow-extra-usage) |
|
|
235
241
|
| `--timeout=SECONDS` | `900` | Inactivity timeout per agent (nudges at timeout, kills at 2×) |
|
|
236
|
-
| `--no-flex` |
|
|
237
|
-
| `--dry-run` |
|
|
242
|
+
| `--no-flex` | -- | Disable multi-wave steering |
|
|
243
|
+
| `--dry-run` | -- | Show planned tasks without running |
|
|
238
244
|
|
|
239
245
|
## Task file fields
|
|
240
246
|
|
|
241
247
|
| Field | Type | Default | Description |
|
|
242
248
|
|---|---|---|---|
|
|
243
249
|
| `tasks` | `(string \| {prompt, cwd?, model?})[]` | required | Tasks to run |
|
|
244
|
-
| `objective` | `string` |
|
|
250
|
+
| `objective` | `string` | -- | High-level goal for steering |
|
|
245
251
|
| `flexiblePlan` | `boolean` | `false` | Enable multi-wave planning |
|
|
246
252
|
| `model` | `string` | prompted | Worker model |
|
|
247
253
|
| `concurrency` | `number` | `5` | Parallel agents |
|
|
@@ -252,12 +258,12 @@ claude-overnight "fix auth bug in src/auth.ts" "add tests for user model"
|
|
|
252
258
|
|
|
253
259
|
## Custom providers (Qwen, OpenRouter, any Anthropic-compatible endpoint)
|
|
254
260
|
|
|
255
|
-
Planner and executor are picked separately
|
|
261
|
+
Planner and executor are picked separately -- pair Opus-on-Anthropic for the planner/thinker with a cheaper model on another provider for the bulk of execution.
|
|
256
262
|
|
|
257
263
|
From the interactive picker, choose `Other…` on the planner or executor step:
|
|
258
264
|
|
|
259
265
|
```
|
|
260
|
-
⑤ Executor model (what runs the tasks
|
|
266
|
+
⑤ Executor model (what runs the tasks -- Qwen 3.6 Plus / OpenRouter / etc via Other…):
|
|
261
267
|
○ Sonnet
|
|
262
268
|
○ Opus
|
|
263
269
|
● Other…
|
|
@@ -272,13 +278,43 @@ From the interactive picker, choose `Other…` on the planner or executor step:
|
|
|
272
278
|
|
|
273
279
|
Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (mode 0600) and show up automatically in every repo. No per-project config.
|
|
274
280
|
|
|
275
|
-
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`)
|
|
281
|
+
**How routing works.** Each `query()` gets its own env override (`ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN`) -- planner queries use the planner provider, executor queries use the executor provider. No global shell env, no proxy daemon, no `process.env` pollution between calls.
|
|
276
282
|
|
|
277
283
|
**Pre-flight.** Before the swarm starts, each custom provider is pinged with a 1-turn auth check. Bad keys fail fast with `✗ executor preflight failed: ...` instead of N scattered mid-run errors.
|
|
278
284
|
|
|
279
285
|
**Resume.** Provider ids are persisted in `run.json` and rehydrated on resume. If you deleted a provider between runs, resume refuses to start and tells you exactly which id is missing.
|
|
280
286
|
|
|
281
|
-
**Non-interactive / CI.** `claude-overnight --model=qwen3.6-plus` auto-resolves the model id to a saved provider
|
|
287
|
+
**Non-interactive / CI.** `claude-overnight --model=qwen3.6-plus` auto-resolves the model id to a saved provider -- no separate `--provider` flag.
|
|
288
|
+
|
|
289
|
+
## Parallel Playwright Testing
|
|
290
|
+
|
|
291
|
+
When agents use the Playwright MCP server for testing, parallel instances conflict on browser locks and cookie state. Add multiple MCP entries to `settings.json`:
|
|
292
|
+
|
|
293
|
+
```json
|
|
294
|
+
{
|
|
295
|
+
"mcpServers": {
|
|
296
|
+
"playwright-1": {
|
|
297
|
+
"command": "npx",
|
|
298
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--isolated", "--headless"]
|
|
299
|
+
},
|
|
300
|
+
"playwright-2": {
|
|
301
|
+
"command": "npx",
|
|
302
|
+
"args": ["@anthropic-ai/mcp-playwright@latest", "--isolated", "--headless"]
|
|
303
|
+
}
|
|
304
|
+
}
|
|
305
|
+
}
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
**Isolation levels:**
|
|
309
|
+
|
|
310
|
+
| Goal | Approach |
|
|
311
|
+
|---|---|
|
|
312
|
+
| Non-disruptive, no focus steal | Headless mode (default) |
|
|
313
|
+
| Parallel agents, no shared cookies | Headless + `--isolated` per MCP server |
|
|
314
|
+
| Parallel agents, each with saved login | Headless + unique `userDataDir` or `--storage-state` per server |
|
|
315
|
+
| Anti-bot interception (CAPTCHA, Cloudflare) | Drop `--headless` only when necessary |
|
|
316
|
+
|
|
317
|
+
See `QUICKSHEET_PLAYWRIGHT.md` for full config examples.
|
|
282
318
|
|
|
283
319
|
## Spend caps and usage controls
|
|
284
320
|
|
|
@@ -286,8 +322,8 @@ Saved providers live user-level at `~/.claude/claude-overnight/providers.json` (
|
|
|
286
322
|
|
|
287
323
|
By default, extra/overage usage is **blocked**. When your plan's rate limits are exhausted, the run stops cleanly and is resumable. You control this in the interactive prompt (step ⑤) or via CLI flags:
|
|
288
324
|
|
|
289
|
-
- `--allow-extra-usage`
|
|
290
|
-
- `--extra-usage-budget=20`
|
|
325
|
+
- `--allow-extra-usage` -- opt in to extra usage (billed separately)
|
|
326
|
+
- `--extra-usage-budget=20` -- allow up to $20 of extra usage, then stop
|
|
291
327
|
|
|
292
328
|
### Live controls during execution
|
|
293
329
|
|
|
@@ -299,11 +335,11 @@ Press these keys while agents are running:
|
|
|
299
335
|
| `t` | Change usage cap threshold (0-100%) |
|
|
300
336
|
| `q` | Graceful stop (press twice to force quit) |
|
|
301
337
|
|
|
302
|
-
Changes take effect between waves
|
|
338
|
+
Changes take effect between waves -- active agents finish their current task.
|
|
303
339
|
|
|
304
340
|
### Multi-window usage display
|
|
305
341
|
|
|
306
|
-
The usage bar cycles through all rate limit windows (5h, 7d, etc.) every 3 seconds, showing utilization per window. Usage info is shown during all phases
|
|
342
|
+
The usage bar cycles through all rate limit windows (5h, 7d, etc.) every 3 seconds, showing utilization per window. Usage info is shown during all phases -- thinking, orchestration, steering, and execution.
|
|
307
343
|
|
|
308
344
|
When using extra usage with a budget, a dedicated progress bar shows spend vs limit with color-coded fill (magenta → yellow → red).
|
|
309
345
|
|
|
@@ -311,14 +347,14 @@ When using extra usage with a budget, a dedicated progress bar shows spend vs li
|
|
|
311
347
|
|
|
312
348
|
Built for unattended runs lasting hours or days.
|
|
313
349
|
|
|
314
|
-
- **Smooth overage transition**: when extra usage is allowed, plan limit rejection is seamless
|
|
315
|
-
- **Interrupt + resume**: agents and planner queries that go silent are interrupted and resumed with full conversation context via SDK session resume
|
|
350
|
+
- **Smooth overage transition**: when extra usage is allowed, plan limit rejection is seamless -- no dispatch blocking, agents continue into overage
|
|
351
|
+
- **Interrupt + resume**: agents and planner queries that go silent are interrupted and resumed with full conversation context via SDK session resume -- not killed and restarted from scratch
|
|
316
352
|
- **Hard block**: pauses until the rate limit window resets, then resumes
|
|
317
353
|
- **Soft throttle**: slows dispatch at >75% utilization
|
|
318
354
|
- **Extra usage guard**: detects overage billing and stops unless explicitly allowed
|
|
319
355
|
- **Cooldown between phases**: waits for rate limit reset after thinking before starting orchestration
|
|
320
356
|
- **Retry with backoff**: transient errors (429, overloaded) retry automatically
|
|
321
|
-
- **Usage cap**: set a ceiling, active agents finish, no new ones start
|
|
357
|
+
- **Usage cap**: set a ceiling, active agents finish, no new ones start -- run is resumable
|
|
322
358
|
- **Planner retries**: steering and orchestration retry on rate limits (30s/60s/120s backoff) with full context
|
|
323
359
|
|
|
324
360
|
## Git worktrees and branch merging
|
package/dist/bin.js
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
// Tiny launcher: prints a splash the instant node is ready, then dynamically
|
|
3
3
|
// imports the real entrypoint. Loading `@anthropic-ai/claude-agent-sdk` and the
|
|
4
|
-
// rest of the module graph takes several seconds on a cold cache
|
|
4
|
+
// rest of the module graph takes several seconds on a cold cache -- without
|
|
5
5
|
// this, the terminal sits black that whole time. index.ts stops the splash
|
|
6
6
|
// via `globalThis.__coStopSplash` as soon as its header is about to print.
|
|
7
7
|
const argv = process.argv.slice(2);
|
package/dist/cli.d.ts
CHANGED
|
@@ -30,7 +30,7 @@ export declare function appendPasteToSegments(segs: InputSegment[], text: string
|
|
|
30
30
|
export declare function backspaceSegments(segs: InputSegment[]): void;
|
|
31
31
|
/**
|
|
32
32
|
* Read a line from the user with bracketed-paste awareness.
|
|
33
|
-
* Pasted multi-line text stays in the buffer as a single block
|
|
33
|
+
* Pasted multi-line text stays in the buffer as a single block -- only a typed
|
|
34
34
|
* Enter submits. Falls back to cooked readline when stdin isn't a TTY.
|
|
35
35
|
*/
|
|
36
36
|
export declare function ask(question: string): Promise<string>;
|
package/dist/cli.js
CHANGED
|
@@ -58,11 +58,11 @@ export async function fetchModels(timeoutMs = 10_000) {
|
|
|
58
58
|
// Silent: callers fall back to a text prompt with the current value as default.
|
|
59
59
|
}
|
|
60
60
|
else if (isAuthError(err)) {
|
|
61
|
-
console.error(chalk.red("\n Authentication failed
|
|
61
|
+
console.error(chalk.red("\n Authentication failed -- check your API key or run: claude auth\n"));
|
|
62
62
|
process.exit(1);
|
|
63
63
|
}
|
|
64
64
|
else {
|
|
65
|
-
console.warn(chalk.yellow(`\n Could not fetch models: ${String(err.message || err).slice(0, 80)}
|
|
65
|
+
console.warn(chalk.yellow(`\n Could not fetch models: ${String(err.message || err).slice(0, 80)} -- continuing with defaults`));
|
|
66
66
|
}
|
|
67
67
|
return [];
|
|
68
68
|
}
|
|
@@ -72,7 +72,7 @@ export async function fetchModels(timeoutMs = 10_000) {
|
|
|
72
72
|
// When the terminal is in bracketed paste mode, pasted content is wrapped with
|
|
73
73
|
// \x1B[200~ ... \x1B[201~ so we can distinguish typed Enter from pasted newlines.
|
|
74
74
|
// Multi-line or long pastes are stored as opaque segments and shown as a compact
|
|
75
|
-
// [Pasted +N lines] placeholder while editing
|
|
75
|
+
// [Pasted +N lines] placeholder while editing -- the full text is substituted on submit.
|
|
76
76
|
export const PASTE_START = "\x1B[200~";
|
|
77
77
|
export const PASTE_END = "\x1B[201~";
|
|
78
78
|
export const PASTE_PLACEHOLDER_MAX = 80;
|
|
@@ -150,7 +150,7 @@ function stripAnsi(s) {
|
|
|
150
150
|
// ── Interactive primitives ──
|
|
151
151
|
/**
|
|
152
152
|
* Read a line from the user with bracketed-paste awareness.
|
|
153
|
-
* Pasted multi-line text stays in the buffer as a single block
|
|
153
|
+
* Pasted multi-line text stays in the buffer as a single block -- only a typed
|
|
154
154
|
* Enter submits. Falls back to cooked readline when stdin isn't a TTY.
|
|
155
155
|
*/
|
|
156
156
|
export function ask(question) {
|