shipwright-cli 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +926 -0
- package/claude-code/CLAUDE.md.shipwright +125 -0
- package/claude-code/hooks/notify-idle.sh +35 -0
- package/claude-code/hooks/pre-compact-save.sh +57 -0
- package/claude-code/hooks/task-completed.sh +170 -0
- package/claude-code/hooks/teammate-idle.sh +68 -0
- package/claude-code/settings.json.template +184 -0
- package/completions/_shipwright +140 -0
- package/completions/shipwright.bash +89 -0
- package/completions/shipwright.fish +107 -0
- package/docs/KNOWN-ISSUES.md +199 -0
- package/docs/TIPS.md +331 -0
- package/docs/definition-of-done.example.md +16 -0
- package/docs/patterns/README.md +139 -0
- package/docs/patterns/audit-loop.md +149 -0
- package/docs/patterns/bug-hunt.md +183 -0
- package/docs/patterns/feature-implementation.md +159 -0
- package/docs/patterns/refactoring.md +183 -0
- package/docs/patterns/research-exploration.md +144 -0
- package/docs/patterns/test-generation.md +173 -0
- package/package.json +49 -0
- package/scripts/adapters/docker-deploy.sh +50 -0
- package/scripts/adapters/fly-deploy.sh +41 -0
- package/scripts/adapters/iterm2-adapter.sh +122 -0
- package/scripts/adapters/railway-deploy.sh +34 -0
- package/scripts/adapters/tmux-adapter.sh +87 -0
- package/scripts/adapters/vercel-deploy.sh +35 -0
- package/scripts/adapters/wezterm-adapter.sh +103 -0
- package/scripts/cct +242 -0
- package/scripts/cct-cleanup.sh +172 -0
- package/scripts/cct-cost.sh +590 -0
- package/scripts/cct-daemon.sh +3189 -0
- package/scripts/cct-doctor.sh +328 -0
- package/scripts/cct-fix.sh +478 -0
- package/scripts/cct-fleet.sh +904 -0
- package/scripts/cct-init.sh +282 -0
- package/scripts/cct-logs.sh +273 -0
- package/scripts/cct-loop.sh +1332 -0
- package/scripts/cct-memory.sh +1148 -0
- package/scripts/cct-pipeline.sh +3844 -0
- package/scripts/cct-prep.sh +1352 -0
- package/scripts/cct-ps.sh +168 -0
- package/scripts/cct-reaper.sh +390 -0
- package/scripts/cct-session.sh +284 -0
- package/scripts/cct-status.sh +169 -0
- package/scripts/cct-templates.sh +242 -0
- package/scripts/cct-upgrade.sh +422 -0
- package/scripts/cct-worktree.sh +405 -0
- package/scripts/postinstall.mjs +96 -0
- package/templates/pipelines/autonomous.json +71 -0
- package/templates/pipelines/cost-aware.json +95 -0
- package/templates/pipelines/deployed.json +79 -0
- package/templates/pipelines/enterprise.json +114 -0
- package/templates/pipelines/fast.json +63 -0
- package/templates/pipelines/full.json +104 -0
- package/templates/pipelines/hotfix.json +63 -0
- package/templates/pipelines/standard.json +91 -0
- package/tmux/claude-teams-overlay.conf +109 -0
- package/tmux/templates/architecture.json +19 -0
- package/tmux/templates/bug-fix.json +24 -0
- package/tmux/templates/code-review.json +24 -0
- package/tmux/templates/devops.json +19 -0
- package/tmux/templates/documentation.json +19 -0
- package/tmux/templates/exploration.json +19 -0
- package/tmux/templates/feature-dev.json +24 -0
- package/tmux/templates/full-stack.json +24 -0
- package/tmux/templates/migration.json +24 -0
- package/tmux/templates/refactor.json +19 -0
- package/tmux/templates/security-audit.json +24 -0
- package/tmux/templates/testing.json +24 -0
- package/tmux/tmux.conf +167 -0
package/docs/TIPS.md
ADDED
|
@@ -0,0 +1,331 @@
|
|
|
1
|
+
# Power User Tips
|
|
2
|
+
|
|
3
|
+
Patterns and tricks for getting the most out of Claude Code Agent Teams with tmux.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Team Patterns That Actually Work
|
|
8
|
+
|
|
9
|
+
Based on [Addy Osmani's research](https://addyosmani.com/blog/claude-code-agent-teams/) and community experience:
|
|
10
|
+
|
|
11
|
+
### When Teams Add Value
|
|
12
|
+
- **Competing hypotheses** — Multiple agents investigating different theories for a bug
|
|
13
|
+
- **Parallel review** — Security, performance, and test coverage by dedicated reviewers
|
|
14
|
+
- **Cross-layer features** — Frontend, backend, and tests developed simultaneously
|
|
15
|
+
|
|
16
|
+
### When to Stay Single-Agent
|
|
17
|
+
- Sequential, tightly-coupled work where each step depends on the last
|
|
18
|
+
- Simple bugs or single-file changes
|
|
19
|
+
- Tasks where coordination overhead exceeds the parallel benefit
|
|
20
|
+
|
|
21
|
+
### The Task Sizing Sweet Spot
|
|
22
|
+
Too small and coordination overhead dominates. Too large and agents work too long without check-ins. Aim for **5-6 focused tasks per agent** with clear deliverables.
|
|
23
|
+
|
|
24
|
+
### Specification Quality = Output Quality
|
|
25
|
+
Detailed spawn prompts with technical constraints, acceptance criteria, and domain context produce dramatically better results. Don't just say "fix the tests" — say "fix the auth tests in src/auth/__tests__/, ensuring all edge cases for expired tokens are covered, using the existing MockAuthProvider pattern."
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Hook Patterns for Teams
|
|
30
|
+
|
|
31
|
+
### Quality Gates (Most Valuable)
|
|
32
|
+
- **TeammateIdle** — Run typecheck before letting agents idle. Catches errors early.
|
|
33
|
+
- **TaskCompleted** — Run lint + related tests before allowing task completion.
|
|
34
|
+
- **Stop** — Verify all work is complete before Claude stops responding.
|
|
35
|
+
|
|
36
|
+
### Observability
|
|
37
|
+
- **Notification** — Desktop alerts so you can work on other things.
|
|
38
|
+
- **PostToolUse** on `Bash` — Log all commands agents run to a file.
|
|
39
|
+
- **SubagentStart/SubagentStop** — Track when agents spawn and finish.
|
|
40
|
+
|
|
41
|
+
### Context Preservation
|
|
42
|
+
- **PreCompact** — Save git status, recent commits, and project reminders before compaction.
|
|
43
|
+
- **SessionStart** on `compact` — Re-inject critical context after compaction.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Team Size & Structure
|
|
48
|
+
|
|
49
|
+
### Keep teams small
|
|
50
|
+
|
|
51
|
+
Limit teams to **2-3 agents**. More agents increase the risk of the tmux `send-keys` race condition (#23615) and create more coordination overhead than they save in parallel work.
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
Good: 2 agents — backend + frontend
|
|
55
|
+
Good: 3 agents — backend + frontend + tests
|
|
56
|
+
Risky: 4+ agents — race conditions, context pressure
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Assign different files to each agent
|
|
60
|
+
|
|
61
|
+
File conflicts are the #1 source of wasted work in agent teams. If two agents edit the same file, one will overwrite the other. Always partition work by file ownership:
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
Agent 1 (backend): src/api/, src/services/
|
|
65
|
+
Agent 2 (frontend): apps/web/src/
|
|
66
|
+
Agent 3 (tests): src/tests/, *.test.ts
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Use git worktrees for complete isolation
|
|
70
|
+
|
|
71
|
+
For maximum safety, use [git worktrees](https://git-scm.com/docs/git-worktree) so each agent works in its own copy of the repo:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# Create worktrees for each agent
|
|
75
|
+
git worktree add ../project-backend feature/backend
|
|
76
|
+
git worktree add ../project-frontend feature/frontend
|
|
77
|
+
git worktree add ../project-tests feature/tests
|
|
78
|
+
|
|
79
|
+
# Each agent works in its own directory — zero conflict risk
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Agent Configuration
|
|
85
|
+
|
|
86
|
+
### Use `delegate` mode for maximum autonomy
|
|
87
|
+
|
|
88
|
+
When you trust the agents to work independently (e.g., they have clear, well-scoped tasks), use `delegate` mode to minimize permission prompts:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
# In your Claude Code launch or CLAUDE.md
|
|
92
|
+
# "mode": "delegate" gives agents more autonomy
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Use haiku for subagent lookups
|
|
96
|
+
|
|
97
|
+
Subagents (spawned via the `Task` tool) don't need a powerful model for simple file searches and code lookups. Save money and latency:
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"env": {
|
|
102
|
+
"CLAUDE_CODE_SUBAGENT_MODEL": "haiku"
|
|
103
|
+
}
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Prevent context overflow
|
|
108
|
+
|
|
109
|
+
Agent teams burn through context faster than solo sessions. Set aggressive auto-compact:
|
|
110
|
+
|
|
111
|
+
```json
|
|
112
|
+
{
|
|
113
|
+
"env": {
|
|
114
|
+
"CLAUDE_CODE_AUTOCOMPACT_PCT_OVERRIDE": "70"
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
This compacts the conversation when it hits 70% of the context window (default is 80%).
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Monitoring & Management
|
|
124
|
+
|
|
125
|
+
### Watch all agents at once
|
|
126
|
+
|
|
127
|
+
Use `shipwright status` (alias: `cct`, `sw`) to see a dashboard of running team sessions:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
shipwright status
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Or press `prefix + Ctrl-t` in tmux to show the dashboard inline.
|
|
134
|
+
|
|
135
|
+
### Zoom into a single agent
|
|
136
|
+
|
|
137
|
+
Press `prefix + G` to toggle zoom on the current pane. This makes one agent fill the entire terminal — useful for reading long output. Press again to return to the tiled layout.
|
|
138
|
+
|
|
139
|
+
### Synchronized input
|
|
140
|
+
|
|
141
|
+
Press `prefix + Alt-t` to toggle synchronized panes. When enabled, anything you type goes to ALL panes simultaneously. Useful for:
|
|
142
|
+
- Stopping all agents at once (`Ctrl-C` in all panes)
|
|
143
|
+
- Running the same command in all agent directories
|
|
144
|
+
|
|
145
|
+
**Remember to turn it off** when you're done — otherwise your input goes everywhere.
|
|
146
|
+
|
|
147
|
+
### Capture pane contents
|
|
148
|
+
|
|
149
|
+
Press `prefix + Alt-s` to save the current pane's visible content to a file in `/tmp/`. Useful for debugging agent output after the fact.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Hook Patterns
|
|
154
|
+
|
|
155
|
+
### Quality gates
|
|
156
|
+
|
|
157
|
+
The included `teammate-idle.sh` hook blocks agents from going idle until TypeScript errors are fixed. You can extend this pattern for other checks:
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
# Example: lint check on idle
|
|
161
|
+
#!/usr/bin/env bash
|
|
162
|
+
cd "$(find_project_root)" || exit 0
|
|
163
|
+
pnpm lint 2>&1 || {
|
|
164
|
+
echo "::error::Lint errors found. Fix them before going idle."
|
|
165
|
+
exit 2
|
|
166
|
+
}
|
|
167
|
+
exit 0
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Notification sounds
|
|
171
|
+
|
|
172
|
+
Play a sound when an agent completes a task (macOS):
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
# task-completed.sh
|
|
176
|
+
#!/usr/bin/env bash
|
|
177
|
+
afplay /System/Library/Sounds/Glass.aiff &
|
|
178
|
+
exit 0
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Auto-format on save
|
|
182
|
+
|
|
183
|
+
Run a formatter when agents complete work:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
# task-completed.sh
|
|
187
|
+
#!/usr/bin/env bash
|
|
188
|
+
cd "$(find_project_root)" || exit 0
|
|
189
|
+
pnpm format --write 2>&1
|
|
190
|
+
exit 0
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
## Task Design
|
|
196
|
+
|
|
197
|
+
### Write focused task descriptions
|
|
198
|
+
|
|
199
|
+
Vague tasks lead to wasted context and unfocused work. Compare:
|
|
200
|
+
|
|
201
|
+
```
|
|
202
|
+
Bad: "Improve the authentication system"
|
|
203
|
+
Good: "Add rate limiting to POST /api/auth/login — max 5 attempts per
|
|
204
|
+
IP per minute. Add the rate limiter in src/api/middleware/
|
|
205
|
+
and tests in src/tests/auth-rate-limit.test.ts"
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### 5-6 tasks per agent is the sweet spot
|
|
209
|
+
|
|
210
|
+
Too few tasks = agent finishes early and sits idle. Too many = context pressure and loss of focus.
|
|
211
|
+
|
|
212
|
+
### Put dependencies first
|
|
213
|
+
|
|
214
|
+
When creating task lists, order tasks so independent work comes first and dependent tasks come later. The team lead should assign blocked tasks only after their dependencies are complete.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## tmux Session Management
|
|
219
|
+
|
|
220
|
+
### Named sessions
|
|
221
|
+
|
|
222
|
+
Always use named sessions so you can find them later:
|
|
223
|
+
|
|
224
|
+
```bash
|
|
225
|
+
tmux new -s my-feature # Not just "tmux new"
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
### Detach and reattach
|
|
229
|
+
|
|
230
|
+
You can detach from a session (`prefix + d`) and agents keep running. Reattach later:
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
tmux attach -t my-feature
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Clean up orphaned sessions
|
|
237
|
+
|
|
238
|
+
After a team finishes, clean up leftover tmux sessions and panes:
|
|
239
|
+
|
|
240
|
+
```bash
|
|
241
|
+
shipwright cleanup # Dry-run: shows what would be killed
|
|
242
|
+
shipwright cleanup --force # Actually kills orphaned sessions
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## Environment Variables Reference
|
|
248
|
+
|
|
249
|
+
| Variable | Default | What it does |
|
|
250
|
+
|----------|---------|--------------|
|
|
251
|
+
| `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` | — | **Required.** Enables agent teams feature |
|
|
252
|
+
| `CLAUDE_CODE_SUBAGENT_MODEL` | (parent model) | Model for subagent lookups. Set to `"haiku"` to save money |
|
|
253
|
+
| `CLAUDE_CODE_AUTOCOMPACT_PCT_OVERRIDE` | `"80"` | Context compaction threshold. Lower = more aggressive |
|
|
254
|
+
| `CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY` | `"3"` | Parallel tool calls per agent. Higher = faster but more API usage |
|
|
255
|
+
| `CLAUDE_CODE_GLOB_HIDDEN` | — | Include dotfiles in glob searches |
|
|
256
|
+
| `CLAUDE_CODE_BASH_MAINTAIN_PROJECT_WORKING_DIR` | — | Keep bash cwd consistent across tool calls |
|
|
257
|
+
| `CLAUDE_CODE_EMIT_TOOL_USE_SUMMARIES` | — | Show tool use summaries in output |
|
|
258
|
+
| `CLAUDE_CODE_TST_NAMES_IN_MESSAGES` | — | Show teammate names in messages |
|
|
259
|
+
| `CLAUDE_CODE_EAGER_FLUSH` | — | Flush output eagerly (reduces perceived latency) |
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Wave-Style Iteration
|
|
264
|
+
|
|
265
|
+
For complex, multi-step tasks, use **wave patterns** — iterative cycles of parallel agent work followed by synthesis. See the full pattern guides in [docs/patterns/](patterns/).
|
|
266
|
+
|
|
267
|
+
### The Wave Cycle
|
|
268
|
+
|
|
269
|
+
Each wave follows four steps:
|
|
270
|
+
|
|
271
|
+
1. **Assess** — Read agent outputs from the previous wave. What succeeded? What failed?
|
|
272
|
+
2. **Decompose** — What work remains? What can run in parallel?
|
|
273
|
+
3. **Spawn** — Launch agents in separate tmux panes for each independent task
|
|
274
|
+
4. **Synthesize** — Gather results, update the state file, plan the next wave
|
|
275
|
+
|
|
276
|
+
Repeat until done. Set a reasonable wave limit (5-10 for most tasks).
|
|
277
|
+
|
|
278
|
+
### File-Based State
|
|
279
|
+
|
|
280
|
+
Track progress through a markdown state file instead of keeping everything in agent memory. This survives compactions, context resets, and lets any agent pick up where others left off.
|
|
281
|
+
|
|
282
|
+
**State file:** `.claude/team-state.local.md`
|
|
283
|
+
|
|
284
|
+
```markdown
|
|
285
|
+
---
|
|
286
|
+
wave: 2
|
|
287
|
+
status: in_progress
|
|
288
|
+
goal: "Build user auth with JWT"
|
|
289
|
+
started_at: 2026-02-07T10:00:00Z
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## Completed
|
|
293
|
+
- [x] Scanned existing auth patterns
|
|
294
|
+
- [x] Built User model
|
|
295
|
+
|
|
296
|
+
## In Progress
|
|
297
|
+
- [ ] JWT route handlers
|
|
298
|
+
- [ ] React login components
|
|
299
|
+
|
|
300
|
+
## Blocked
|
|
301
|
+
- Integration tests blocked on route completion
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
**Agent outputs:** `.claude/team-outputs/*.md`
|
|
305
|
+
|
|
306
|
+
Each agent writes findings/results to a file in this directory. The team lead reads all outputs between waves.
|
|
307
|
+
|
|
308
|
+
**Add to `.gitignore`:**
|
|
309
|
+
```
|
|
310
|
+
.claude/team-state.local.md
|
|
311
|
+
.claude/team-outputs/
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### When to Use Waves vs. Single-Pass Teams
|
|
315
|
+
|
|
316
|
+
| Situation | Approach |
|
|
317
|
+
|-----------|----------|
|
|
318
|
+
| Independent tasks with clear file ownership | Single-pass team — spawn agents, collect results |
|
|
319
|
+
| Tasks that require iteration (tests must pass, errors must be fixed) | Wave pattern — iterate until completion criteria met |
|
|
320
|
+
| Exploratory work that builds on previous findings | Wave pattern — each wave goes deeper based on last wave's results |
|
|
321
|
+
| Simple parallel review (code quality + security + tests) | Single-pass team — each reviewer works independently |
|
|
322
|
+
|
|
323
|
+
### Quick Reference: Five Wave Patterns
|
|
324
|
+
|
|
325
|
+
| Pattern | Waves | Agents | Best For |
|
|
326
|
+
|---------|-------|--------|----------|
|
|
327
|
+
| [Feature Implementation](patterns/feature-implementation.md) | 3-4 | 2-3 | Multi-component features |
|
|
328
|
+
| [Research & Exploration](patterns/research-exploration.md) | 2-3 | 2-3 | Understanding codebases |
|
|
329
|
+
| [Test Generation](patterns/test-generation.md) | 3-4+ | 2-3 | Coverage campaigns |
|
|
330
|
+
| [Refactoring](patterns/refactoring.md) | 3-4 | 2 | Large-scale transformations |
|
|
331
|
+
| [Bug Hunt](patterns/bug-hunt.md) | 3-4 | 2-3 | Complex, elusive bugs |
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Definition of Done
|
|
2
|
+
|
|
3
|
+
Use this template with `shipwright loop --definition-of-done <file>` to enforce completion criteria.
|
|
4
|
+
Copy and customize for your project.
|
|
5
|
+
|
|
6
|
+
## Checklist
|
|
7
|
+
|
|
8
|
+
- [ ] All specified functionality is implemented
|
|
9
|
+
- [ ] Unit tests exist for new code
|
|
10
|
+
- [ ] All tests pass
|
|
11
|
+
- [ ] No TODO/FIXME markers in new code
|
|
12
|
+
- [ ] Public functions have docstrings/comments
|
|
13
|
+
- [ ] README reflects current state
|
|
14
|
+
- [ ] No hardcoded values that should be configurable
|
|
15
|
+
- [ ] Error handling covers likely failure modes
|
|
16
|
+
- [ ] Code follows existing patterns in the codebase
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# Wave-Style Team Patterns
|
|
2
|
+
|
|
3
|
+
Structured patterns for running Claude Code Agent Teams in tmux using iterative, parallel "waves" of work.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What Are Wave Patterns?
|
|
8
|
+
|
|
9
|
+
A **wave** is a cycle of parallel work followed by synthesis. Instead of one agent grinding through a task sequentially, you decompose work into independent chunks, assign them to agents in separate tmux panes, and iterate until done.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Wave 1: Research Wave 2: Build Wave 3: Integrate
|
|
13
|
+
┌─────┬─────┐ ┌─────┬─────┬─────┐ ┌─────┬─────┐
|
|
14
|
+
│ A1 │ A2 │ → │ A1 │ A2 │ A3 │ → │ A1 │ A2 │
|
|
15
|
+
│scan │scan │ │model│routes│ UI │ │wire │tests│
|
|
16
|
+
└─────┴─────┘ └─────┴─────┴─────┘ └─────┴─────┘
|
|
17
|
+
↓ synthesize ↓ synthesize ↓ done
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Each wave:
|
|
21
|
+
1. **Assess** — What did the previous wave accomplish? What failed?
|
|
22
|
+
2. **Decompose** — What can be done in parallel now?
|
|
23
|
+
3. **Spawn** — Launch agents in tmux panes for each independent task
|
|
24
|
+
4. **Synthesize** — Gather results, update state, plan next wave
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Available Patterns
|
|
29
|
+
|
|
30
|
+
| Pattern | When to Use | Typical Waves | Team Size |
|
|
31
|
+
|---------|-------------|---------------|-----------|
|
|
32
|
+
| [Feature Implementation](feature-implementation.md) | Building multi-component features | 3-4 | 2-3 agents |
|
|
33
|
+
| [Research & Exploration](research-exploration.md) | Understanding a codebase or problem space | 2-3 | 2-3 agents |
|
|
34
|
+
| [Test Generation](test-generation.md) | Comprehensive test coverage campaigns | 3-4+ | 2-3 agents |
|
|
35
|
+
| [Refactoring](refactoring.md) | Large-scale code transformations | 3-4 | 2 agents |
|
|
36
|
+
| [Bug Hunt](bug-hunt.md) | Tracking down complex, elusive bugs | 3-4 | 2-3 agents |
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## File-Based State
|
|
41
|
+
|
|
42
|
+
Wave patterns use a **file-based state file** to track progress across iterations. This works everywhere — no special tools required.
|
|
43
|
+
|
|
44
|
+
**State file:** `.claude/team-state.local.md`
|
|
45
|
+
|
|
46
|
+
```markdown
|
|
47
|
+
---
|
|
48
|
+
wave: 2
|
|
49
|
+
status: in_progress
|
|
50
|
+
goal: "Build user auth with JWT"
|
|
51
|
+
started_at: 2026-02-07T10:00:00Z
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Completed
|
|
55
|
+
- [x] Scanned existing auth patterns
|
|
56
|
+
- [x] Identified middleware structure
|
|
57
|
+
- [x] Built User model
|
|
58
|
+
|
|
59
|
+
## In Progress
|
|
60
|
+
- [ ] JWT route handlers
|
|
61
|
+
- [ ] Login/signup React components
|
|
62
|
+
|
|
63
|
+
## Blocked
|
|
64
|
+
- None
|
|
65
|
+
|
|
66
|
+
## Agent Outputs
|
|
67
|
+
- wave-1-scan-auth.md — Existing auth analysis
|
|
68
|
+
- wave-1-scan-deps.md — Dependency audit
|
|
69
|
+
- wave-2-model.md — User model implementation notes
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Agent outputs directory:** `.claude/team-outputs/`
|
|
73
|
+
|
|
74
|
+
Each agent writes its results to a markdown file in this directory. The team lead reads all outputs between waves to synthesize progress.
|
|
75
|
+
|
|
76
|
+
> **Tip:** Add `.claude/team-state.local.md` and `.claude/team-outputs/` to your `.gitignore`. These are ephemeral working files.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Quick Start
|
|
81
|
+
|
|
82
|
+
Pick a pattern, then use `shipwright` (alias: `cct`, `sw`) to set up the team:
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
# Start a tmux session
|
|
86
|
+
tmux new -s my-feature
|
|
87
|
+
|
|
88
|
+
# Create a 3-agent team
|
|
89
|
+
shipwright session my-feature
|
|
90
|
+
|
|
91
|
+
# In the team lead pane, describe the work using a wave pattern
|
|
92
|
+
# The team lead decomposes into waves and assigns tasks
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Key Principles
|
|
98
|
+
|
|
99
|
+
### 1. Parallel Everything
|
|
100
|
+
If two tasks don't depend on each other, run them at the same time in separate panes. The whole point of waves is maximizing parallel throughput.
|
|
101
|
+
|
|
102
|
+
### 2. Synthesize Between Waves
|
|
103
|
+
Don't just fire-and-forget. After each wave, the team lead reads all agent outputs, identifies gaps, and adjusts the plan. This is where the real value happens.
|
|
104
|
+
|
|
105
|
+
### 3. Iterate Until Done
|
|
106
|
+
Waves repeat until the goal is met. Failed tasks get retried with better instructions. Each wave builds on the last. Set a reasonable max (5-10 waves for most tasks).
|
|
107
|
+
|
|
108
|
+
### 4. File-Based State Is the Source of Truth
|
|
109
|
+
The `.claude/team-state.local.md` file tracks what's done, what's pending, and what's blocked. Agents update their output files; the team lead updates the state file.
|
|
110
|
+
|
|
111
|
+
### 5. Keep Teams Small
|
|
112
|
+
2-3 agents per team. More agents means more tmux panes, more coordination overhead, and more risk of file conflicts. The sweet spot is 2-3 focused agents.
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Anti-Patterns
|
|
117
|
+
|
|
118
|
+
| Don't | Why | Instead |
|
|
119
|
+
|-------|-----|---------|
|
|
120
|
+
| Spawn 5+ agents per wave | Coordination overhead, race conditions | 2-3 agents per wave |
|
|
121
|
+
| Skip synthesis between waves | You'll lose track of progress and duplicate work | Always read outputs and update state |
|
|
122
|
+
| Give vague task descriptions | Agents waste context figuring out what to do | Be specific: files, functions, acceptance criteria |
|
|
123
|
+
| Let agents touch overlapping files | One will overwrite the other's changes | Partition files by agent |
|
|
124
|
+
| Keep iterating when stuck | Wastes tokens and your time | After 3 failed attempts, rethink the approach |
|
|
125
|
+
| Use waves for trivial tasks | Overhead exceeds benefit | Just do it in a single agent |
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Model Selection
|
|
130
|
+
|
|
131
|
+
Choose the right model for each agent's task:
|
|
132
|
+
|
|
133
|
+
| Task Type | Model | Why |
|
|
134
|
+
|-----------|-------|-----|
|
|
135
|
+
| File search, simple lookups | `haiku` | Fast, cheap |
|
|
136
|
+
| Implementation, clear requirements | `sonnet` | Balanced speed/quality |
|
|
137
|
+
| Architecture decisions, complex debugging | `opus` | Best reasoning |
|
|
138
|
+
| Test generation | `sonnet` | Good pattern matching |
|
|
139
|
+
| Documentation, reports | `sonnet` | Clear writing |
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
# Pattern: Audit Loop
|
|
2
|
+
|
|
3
|
+
Add self-reflection and quality gates to the continuous agent loop (`shipwright loop`) to prevent premature completion, catch regressions, and enforce project-specific standards.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
- Running `shipwright loop` on tasks where **correctness matters more than speed** (production features, refactors, data migrations)
|
|
10
|
+
- The agent keeps declaring LOOP_COMPLETE **before the work is actually done**
|
|
11
|
+
- You want **automated quality checks** (tests, linting, type-checking) between iterations
|
|
12
|
+
- Your project has a **Definition of Done** that goes beyond "code compiles"
|
|
13
|
+
|
|
14
|
+
**Don't use** for quick prototypes, throwaway scripts, or exploration tasks where speed matters more than rigor.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Audit Modes
|
|
19
|
+
|
|
20
|
+
### `--audit` (Self-Reflection)
|
|
21
|
+
|
|
22
|
+
The agent pauses after each iteration to review its own work before deciding whether to continue or declare completion.
|
|
23
|
+
|
|
24
|
+
**Cost:** Minimal — adds ~30 seconds per iteration (one extra prompt to the same agent).
|
|
25
|
+
|
|
26
|
+
**Best for:** Solo agent work where you want a sanity check without the overhead of a second agent.
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
shipwright loop "Build user auth with JWT" --audit --test-cmd "npm test"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### `--audit-agent` (Separate Auditor)
|
|
33
|
+
|
|
34
|
+
Spawns a dedicated auditor agent that reviews the work agent's output each iteration. The auditor can reject LOOP_COMPLETE and send the work agent back with specific feedback.
|
|
35
|
+
|
|
36
|
+
**Cost:** Higher — each iteration runs two agents (worker + auditor). Roughly 2x the API cost.
|
|
37
|
+
|
|
38
|
+
**Best for:** Complex features, production code, or tasks where you've seen the agent cut corners.
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
shipwright loop "Refactor auth to use refresh tokens" --audit-agent --model sonnet
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### `--quality-gates` (Automated Checks)
|
|
45
|
+
|
|
46
|
+
Runs your test command, linter, or type-checker between iterations. The loop only advances if gates pass.
|
|
47
|
+
|
|
48
|
+
**Cost:** Depends on your test suite. Adds wall-clock time but no extra API cost.
|
|
49
|
+
|
|
50
|
+
**Best for:** Projects with existing CI checks you want to enforce locally.
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
shipwright loop "Add pagination to API" --quality-gates --test-cmd "npm test && npm run lint"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Combining Modes
|
|
57
|
+
|
|
58
|
+
Modes stack. The most rigorous setup combines all three:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
shipwright loop "Build payment integration" \
|
|
62
|
+
--audit-agent \
|
|
63
|
+
--quality-gates \
|
|
64
|
+
--test-cmd "npm test" \
|
|
65
|
+
--definition-of-done dod.md
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Writing Effective Definition of Done Files
|
|
71
|
+
|
|
72
|
+
A good DoD file is the single most effective way to prevent premature LOOP_COMPLETE.
|
|
73
|
+
|
|
74
|
+
### Template
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
cp ~/.shipwright/templates/definition-of-done.example.md my-dod.md
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Tips
|
|
81
|
+
|
|
82
|
+
- **Be specific.** "Tests pass" is weak. "Unit tests cover the 3 API endpoints and the auth middleware" is strong.
|
|
83
|
+
- **Include negative checks.** "No hardcoded API keys" or "No TODO markers" catch things agents skip.
|
|
84
|
+
- **Keep it short.** 8-15 items. More than that and the agent loses focus.
|
|
85
|
+
- **Order by importance.** The agent checks items top-to-bottom. Put critical items first.
|
|
86
|
+
|
|
87
|
+
### Example: Feature DoD
|
|
88
|
+
|
|
89
|
+
```markdown
|
|
90
|
+
# Definition of Done — Payment Integration
|
|
91
|
+
|
|
92
|
+
- [ ] Stripe webhook handler processes charge.succeeded and charge.failed
|
|
93
|
+
- [ ] Idempotency keys prevent duplicate charges
|
|
94
|
+
- [ ] Unit tests cover success, failure, and duplicate scenarios
|
|
95
|
+
- [ ] Integration test hits Stripe test mode
|
|
96
|
+
- [ ] All amounts stored as cents (integer), never floats
|
|
97
|
+
- [ ] No Stripe secret keys in source code
|
|
98
|
+
- [ ] Error responses follow existing API error format
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## Preventing Premature LOOP_COMPLETE
|
|
104
|
+
|
|
105
|
+
The most common failure mode is the agent declaring victory too early. Countermeasures:
|
|
106
|
+
|
|
107
|
+
| Technique | How it helps |
|
|
108
|
+
|-----------|-------------|
|
|
109
|
+
| `--audit` | Agent re-reads its own output and catches obvious gaps |
|
|
110
|
+
| `--audit-agent` | Second opinion catches blind spots the worker has |
|
|
111
|
+
| `--definition-of-done` | Explicit checklist the agent must verify before completing |
|
|
112
|
+
| `--quality-gates` | Hard gate — tests must pass or the loop continues |
|
|
113
|
+
| `--test-cmd` | Even without quality gates, a test command gives the agent feedback |
|
|
114
|
+
| `--max-iterations` | Safety net — prevents infinite loops if nothing else works |
|
|
115
|
+
|
|
116
|
+
**Pro tip:** If the agent still completes too early, make your goal statement more specific. "Build auth" is vague. "Build JWT auth with login, signup, password reset, and refresh token rotation" gives the agent a clear finish line.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Example Commands
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# Quick audit for a small task
|
|
124
|
+
shipwright loop "Fix the N+1 query in user list" --audit --test-cmd "pytest tests/test_users.py"
|
|
125
|
+
|
|
126
|
+
# Rigorous audit for production feature
|
|
127
|
+
shipwright loop "Add RBAC to the API" --audit-agent --quality-gates \
|
|
128
|
+
--test-cmd "npm test" --definition-of-done rbac-dod.md
|
|
129
|
+
|
|
130
|
+
# Cost-conscious: quality gates only, no extra agent
|
|
131
|
+
shipwright loop "Migrate DB schema" --quality-gates --test-cmd "npm run db:test"
|
|
132
|
+
|
|
133
|
+
# Maximum rigor: all checks enabled
|
|
134
|
+
shipwright loop "PCI compliance updates" --audit-agent --quality-gates \
|
|
135
|
+
--test-cmd "npm test && npm run lint && npm run typecheck" \
|
|
136
|
+
--definition-of-done pci-dod.md --max-iterations 15
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Anti-Patterns
|
|
142
|
+
|
|
143
|
+
| Don't | Why |
|
|
144
|
+
|-------|-----|
|
|
145
|
+
| Use `--audit-agent` for trivial tasks | 2x cost for a one-file fix is wasteful |
|
|
146
|
+
| Write a 30-item DoD | The agent loses focus. Keep it under 15 items |
|
|
147
|
+
| Skip `--test-cmd` with `--quality-gates` | Quality gates with no test command does nothing useful |
|
|
148
|
+
| Set `--max-iterations 1` with `--audit` | The audit has nowhere to send feedback if there's only one iteration |
|
|
149
|
+
| Rely solely on `--audit` for critical work | Self-reflection catches ~60% of issues. Add `--quality-gates` for the rest |
|