shipwright-cli 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (72) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +926 -0
  3. package/claude-code/CLAUDE.md.shipwright +125 -0
  4. package/claude-code/hooks/notify-idle.sh +35 -0
  5. package/claude-code/hooks/pre-compact-save.sh +57 -0
  6. package/claude-code/hooks/task-completed.sh +170 -0
  7. package/claude-code/hooks/teammate-idle.sh +68 -0
  8. package/claude-code/settings.json.template +184 -0
  9. package/completions/_shipwright +140 -0
  10. package/completions/shipwright.bash +89 -0
  11. package/completions/shipwright.fish +107 -0
  12. package/docs/KNOWN-ISSUES.md +199 -0
  13. package/docs/TIPS.md +331 -0
  14. package/docs/definition-of-done.example.md +16 -0
  15. package/docs/patterns/README.md +139 -0
  16. package/docs/patterns/audit-loop.md +149 -0
  17. package/docs/patterns/bug-hunt.md +183 -0
  18. package/docs/patterns/feature-implementation.md +159 -0
  19. package/docs/patterns/refactoring.md +183 -0
  20. package/docs/patterns/research-exploration.md +144 -0
  21. package/docs/patterns/test-generation.md +173 -0
  22. package/package.json +49 -0
  23. package/scripts/adapters/docker-deploy.sh +50 -0
  24. package/scripts/adapters/fly-deploy.sh +41 -0
  25. package/scripts/adapters/iterm2-adapter.sh +122 -0
  26. package/scripts/adapters/railway-deploy.sh +34 -0
  27. package/scripts/adapters/tmux-adapter.sh +87 -0
  28. package/scripts/adapters/vercel-deploy.sh +35 -0
  29. package/scripts/adapters/wezterm-adapter.sh +103 -0
  30. package/scripts/cct +242 -0
  31. package/scripts/cct-cleanup.sh +172 -0
  32. package/scripts/cct-cost.sh +590 -0
  33. package/scripts/cct-daemon.sh +3189 -0
  34. package/scripts/cct-doctor.sh +328 -0
  35. package/scripts/cct-fix.sh +478 -0
  36. package/scripts/cct-fleet.sh +904 -0
  37. package/scripts/cct-init.sh +282 -0
  38. package/scripts/cct-logs.sh +273 -0
  39. package/scripts/cct-loop.sh +1332 -0
  40. package/scripts/cct-memory.sh +1148 -0
  41. package/scripts/cct-pipeline.sh +3844 -0
  42. package/scripts/cct-prep.sh +1352 -0
  43. package/scripts/cct-ps.sh +168 -0
  44. package/scripts/cct-reaper.sh +390 -0
  45. package/scripts/cct-session.sh +284 -0
  46. package/scripts/cct-status.sh +169 -0
  47. package/scripts/cct-templates.sh +242 -0
  48. package/scripts/cct-upgrade.sh +422 -0
  49. package/scripts/cct-worktree.sh +405 -0
  50. package/scripts/postinstall.mjs +96 -0
  51. package/templates/pipelines/autonomous.json +71 -0
  52. package/templates/pipelines/cost-aware.json +95 -0
  53. package/templates/pipelines/deployed.json +79 -0
  54. package/templates/pipelines/enterprise.json +114 -0
  55. package/templates/pipelines/fast.json +63 -0
  56. package/templates/pipelines/full.json +104 -0
  57. package/templates/pipelines/hotfix.json +63 -0
  58. package/templates/pipelines/standard.json +91 -0
  59. package/tmux/claude-teams-overlay.conf +109 -0
  60. package/tmux/templates/architecture.json +19 -0
  61. package/tmux/templates/bug-fix.json +24 -0
  62. package/tmux/templates/code-review.json +24 -0
  63. package/tmux/templates/devops.json +19 -0
  64. package/tmux/templates/documentation.json +19 -0
  65. package/tmux/templates/exploration.json +19 -0
  66. package/tmux/templates/feature-dev.json +24 -0
  67. package/tmux/templates/full-stack.json +24 -0
  68. package/tmux/templates/migration.json +24 -0
  69. package/tmux/templates/refactor.json +19 -0
  70. package/tmux/templates/security-audit.json +24 -0
  71. package/tmux/templates/testing.json +24 -0
  72. package/tmux/tmux.conf +167 -0
package/docs/TIPS.md ADDED
@@ -0,0 +1,331 @@
1
+ # Power User Tips
2
+
3
+ Patterns and tricks for getting the most out of Claude Code Agent Teams with tmux.
4
+
5
+ ---
6
+
7
+ ## Team Patterns That Actually Work
8
+
9
+ Based on [Addy Osmani's research](https://addyosmani.com/blog/claude-code-agent-teams/) and community experience:
10
+
11
+ ### When Teams Add Value
12
+ - **Competing hypotheses** — Multiple agents investigating different theories for a bug
13
+ - **Parallel review** — Security, performance, and test coverage by dedicated reviewers
14
+ - **Cross-layer features** — Frontend, backend, and tests developed simultaneously
15
+
16
+ ### When to Stay Single-Agent
17
+ - Sequential, tightly-coupled work where each step depends on the last
18
+ - Simple bugs or single-file changes
19
+ - Tasks where coordination overhead exceeds the parallel benefit
20
+
21
+ ### The Task Sizing Sweet Spot
22
+ Too small and coordination overhead dominates. Too large and agents work too long without check-ins. Aim for **5-6 focused tasks per agent** with clear deliverables.
23
+
24
+ ### Specification Quality = Output Quality
25
+ Detailed spawn prompts with technical constraints, acceptance criteria, and domain context produce dramatically better results. Don't just say "fix the tests" — say "fix the auth tests in src/auth/__tests__/, ensuring all edge cases for expired tokens are covered, using the existing MockAuthProvider pattern."
26
+
27
+ ---
28
+
29
+ ## Hook Patterns for Teams
30
+
31
+ ### Quality Gates (Most Valuable)
32
+ - **TeammateIdle** — Run typecheck before letting agents idle. Catches errors early.
33
+ - **TaskCompleted** — Run lint + related tests before allowing task completion.
34
+ - **Stop** — Verify all work is complete before Claude stops responding.
35
+
36
+ ### Observability
37
+ - **Notification** — Desktop alerts so you can work on other things.
38
+ - **PostToolUse** on `Bash` — Log all commands agents run to a file.
39
+ - **SubagentStart/SubagentStop** — Track when agents spawn and finish.
40
+
41
+ ### Context Preservation
42
+ - **PreCompact** — Save git status, recent commits, and project reminders before compaction.
43
+ - **SessionStart** on `compact` — Re-inject critical context after compaction.
44
+
45
+ ---
46
+
47
+ ## Team Size & Structure
48
+
49
+ ### Keep teams small
50
+
51
+ Limit teams to **2-3 agents**. More agents increase the risk of the tmux `send-keys` race condition (#23615) and create more coordination overhead than they save in parallel work.
52
+
53
+ ```
54
+ Good: 2 agents — backend + frontend
55
+ Good: 3 agents — backend + frontend + tests
56
+ Risky: 4+ agents — race conditions, context pressure
57
+ ```
58
+
59
+ ### Assign different files to each agent
60
+
61
+ File conflicts are the #1 source of wasted work in agent teams. If two agents edit the same file, one will overwrite the other. Always partition work by file ownership:
62
+
63
+ ```
64
+ Agent 1 (backend): src/api/, src/services/
65
+ Agent 2 (frontend): apps/web/src/
66
+ Agent 3 (tests): src/tests/, *.test.ts
67
+ ```
68
+
69
+ ### Use git worktrees for complete isolation
70
+
71
+ For maximum safety, use [git worktrees](https://git-scm.com/docs/git-worktree) so each agent works in its own copy of the repo:
72
+
73
+ ```bash
74
+ # Create worktrees for each agent
75
+ git worktree add ../project-backend feature/backend
76
+ git worktree add ../project-frontend feature/frontend
77
+ git worktree add ../project-tests feature/tests
78
+
79
+ # Each agent works in its own directory — zero conflict risk
80
+ ```
81
+
82
+ ---
83
+
84
+ ## Agent Configuration
85
+
86
+ ### Use `delegate` mode for maximum autonomy
87
+
88
+ When you trust the agents to work independently (e.g., they have clear, well-scoped tasks), use `delegate` mode to minimize permission prompts:
89
+
90
+ ```bash
91
+ # In your Claude Code launch or CLAUDE.md
92
+ # "mode": "delegate" gives agents more autonomy
93
+ ```
94
+
95
+ ### Use haiku for subagent lookups
96
+
97
+ Subagents (spawned via the `Task` tool) don't need a powerful model for simple file searches and code lookups. Save money and latency:
98
+
99
+ ```json
100
+ {
101
+ "env": {
102
+ "CLAUDE_CODE_SUBAGENT_MODEL": "haiku"
103
+ }
104
+ }
105
+ ```
106
+
107
+ ### Prevent context overflow
108
+
109
+ Agent teams burn through context faster than solo sessions. Set aggressive auto-compact:
110
+
111
+ ```json
112
+ {
113
+ "env": {
114
+ "CLAUDE_CODE_AUTOCOMPACT_PCT_OVERRIDE": "70"
115
+ }
116
+ }
117
+ ```
118
+
119
+ This compacts the conversation when it hits 70% of the context window (default is 80%).
120
+
121
+ ---
122
+
123
+ ## Monitoring & Management
124
+
125
+ ### Watch all agents at once
126
+
127
+ Use `shipwright status` (alias: `cct`, `sw`) to see a dashboard of running team sessions:
128
+
129
+ ```bash
130
+ shipwright status
131
+ ```
132
+
133
+ Or press `prefix + Ctrl-t` in tmux to show the dashboard inline.
134
+
135
+ ### Zoom into a single agent
136
+
137
+ Press `prefix + G` to toggle zoom on the current pane. This makes one agent fill the entire terminal — useful for reading long output. Press again to return to the tiled layout.
138
+
139
+ ### Synchronized input
140
+
141
+ Press `prefix + Alt-t` to toggle synchronized panes. When enabled, anything you type goes to ALL panes simultaneously. Useful for:
142
+ - Stopping all agents at once (`Ctrl-C` in all panes)
143
+ - Running the same command in all agent directories
144
+
145
+ **Remember to turn it off** when you're done — otherwise your input goes everywhere.
146
+
147
+ ### Capture pane contents
148
+
149
+ Press `prefix + Alt-s` to save the current pane's visible content to a file in `/tmp/`. Useful for debugging agent output after the fact.
150
+
151
+ ---
152
+
153
+ ## Hook Patterns
154
+
155
+ ### Quality gates
156
+
157
+ The included `teammate-idle.sh` hook blocks agents from going idle until TypeScript errors are fixed. You can extend this pattern for other checks:
158
+
159
+ ```bash
160
+ # Example: lint check on idle
161
+ #!/usr/bin/env bash
162
+ cd "$(find_project_root)" || exit 0
163
+ pnpm lint 2>&1 || {
164
+ echo "::error::Lint errors found. Fix them before going idle."
165
+ exit 2
166
+ }
167
+ exit 0
168
+ ```
169
+
170
+ ### Notification sounds
171
+
172
+ Play a sound when an agent completes a task (macOS):
173
+
174
+ ```bash
175
+ # task-completed.sh
176
+ #!/usr/bin/env bash
177
+ afplay /System/Library/Sounds/Glass.aiff &
178
+ exit 0
179
+ ```
180
+
181
+ ### Auto-format on save
182
+
183
+ Run a formatter when agents complete work:
184
+
185
+ ```bash
186
+ # task-completed.sh
187
+ #!/usr/bin/env bash
188
+ cd "$(find_project_root)" || exit 0
189
+ pnpm format --write 2>&1
190
+ exit 0
191
+ ```
192
+
193
+ ---
194
+
195
+ ## Task Design
196
+
197
+ ### Write focused task descriptions
198
+
199
+ Vague tasks lead to wasted context and unfocused work. Compare:
200
+
201
+ ```
202
+ Bad: "Improve the authentication system"
203
+ Good: "Add rate limiting to POST /api/auth/login — max 5 attempts per
204
+ IP per minute. Add the rate limiter in src/api/middleware/
205
+ and tests in src/tests/auth-rate-limit.test.ts"
206
+ ```
207
+
208
+ ### 5-6 tasks per agent is the sweet spot
209
+
210
+ Too few tasks = agent finishes early and sits idle. Too many = context pressure and loss of focus.
211
+
212
+ ### Put dependencies first
213
+
214
+ When creating task lists, order tasks so independent work comes first and dependent tasks come later. The team lead should assign blocked tasks only after their dependencies are complete.
215
+
216
+ ---
217
+
218
+ ## tmux Session Management
219
+
220
+ ### Named sessions
221
+
222
+ Always use named sessions so you can find them later:
223
+
224
+ ```bash
225
+ tmux new -s my-feature # Not just "tmux new"
226
+ ```
227
+
228
+ ### Detach and reattach
229
+
230
+ You can detach from a session (`prefix + d`) and agents keep running. Reattach later:
231
+
232
+ ```bash
233
+ tmux attach -t my-feature
234
+ ```
235
+
236
+ ### Clean up orphaned sessions
237
+
238
+ After a team finishes, clean up leftover tmux sessions and panes:
239
+
240
+ ```bash
241
+ shipwright cleanup # Dry-run: shows what would be killed
242
+ shipwright cleanup --force # Actually kills orphaned sessions
243
+ ```
244
+
245
+ ---
246
+
247
+ ## Environment Variables Reference
248
+
249
+ | Variable | Default | What it does |
250
+ |----------|---------|--------------|
251
+ | `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` | — | **Required.** Enables agent teams feature |
252
+ | `CLAUDE_CODE_SUBAGENT_MODEL` | (parent model) | Model for subagent lookups. Set to `"haiku"` to save money |
253
+ | `CLAUDE_CODE_AUTOCOMPACT_PCT_OVERRIDE` | `"80"` | Context compaction threshold. Lower = more aggressive |
254
+ | `CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY` | `"3"` | Parallel tool calls per agent. Higher = faster but more API usage |
255
+ | `CLAUDE_CODE_GLOB_HIDDEN` | — | Include dotfiles in glob searches |
256
+ | `CLAUDE_CODE_BASH_MAINTAIN_PROJECT_WORKING_DIR` | — | Keep bash cwd consistent across tool calls |
257
+ | `CLAUDE_CODE_EMIT_TOOL_USE_SUMMARIES` | — | Show tool use summaries in output |
258
+ | `CLAUDE_CODE_TST_NAMES_IN_MESSAGES` | — | Show teammate names in messages |
259
+ | `CLAUDE_CODE_EAGER_FLUSH` | — | Flush output eagerly (reduces perceived latency) |
260
+
261
+ ---
262
+
263
+ ## Wave-Style Iteration
264
+
265
+ For complex, multi-step tasks, use **wave patterns** — iterative cycles of parallel agent work followed by synthesis. See the full pattern guides in [docs/patterns/](patterns/).
266
+
267
+ ### The Wave Cycle
268
+
269
+ Each wave follows four steps:
270
+
271
+ 1. **Assess** — Read agent outputs from the previous wave. What succeeded? What failed?
272
+ 2. **Decompose** — What work remains? What can run in parallel?
273
+ 3. **Spawn** — Launch agents in separate tmux panes for each independent task
274
+ 4. **Synthesize** — Gather results, update the state file, plan the next wave
275
+
276
+ Repeat until done. Set a reasonable wave limit (5-10 for most tasks).
277
+
278
+ ### File-Based State
279
+
280
+ Track progress through a markdown state file instead of keeping everything in agent memory. This survives compactions, context resets, and lets any agent pick up where others left off.
281
+
282
+ **State file:** `.claude/team-state.local.md`
283
+
284
+ ```markdown
285
+ ---
286
+ wave: 2
287
+ status: in_progress
288
+ goal: "Build user auth with JWT"
289
+ started_at: 2026-02-07T10:00:00Z
290
+ ---
291
+
292
+ ## Completed
293
+ - [x] Scanned existing auth patterns
294
+ - [x] Built User model
295
+
296
+ ## In Progress
297
+ - [ ] JWT route handlers
298
+ - [ ] React login components
299
+
300
+ ## Blocked
301
+ - Integration tests blocked on route completion
302
+ ```
303
+
304
+ **Agent outputs:** `.claude/team-outputs/*.md`
305
+
306
+ Each agent writes findings/results to a file in this directory. The team lead reads all outputs between waves.
307
+
308
+ **Add to `.gitignore`:**
309
+ ```
310
+ .claude/team-state.local.md
311
+ .claude/team-outputs/
312
+ ```
313
+
314
+ ### When to Use Waves vs. Single-Pass Teams
315
+
316
+ | Situation | Approach |
317
+ |-----------|----------|
318
+ | Independent tasks with clear file ownership | Single-pass team — spawn agents, collect results |
319
+ | Tasks that require iteration (tests must pass, errors must be fixed) | Wave pattern — iterate until completion criteria met |
320
+ | Exploratory work that builds on previous findings | Wave pattern — each wave goes deeper based on last wave's results |
321
+ | Simple parallel review (code quality + security + tests) | Single-pass team — each reviewer works independently |
322
+
323
+ ### Quick Reference: Five Wave Patterns
324
+
325
+ | Pattern | Waves | Agents | Best For |
326
+ |---------|-------|--------|----------|
327
+ | [Feature Implementation](patterns/feature-implementation.md) | 3-4 | 2-3 | Multi-component features |
328
+ | [Research & Exploration](patterns/research-exploration.md) | 2-3 | 2-3 | Understanding codebases |
329
+ | [Test Generation](patterns/test-generation.md) | 3-4+ | 2-3 | Coverage campaigns |
330
+ | [Refactoring](patterns/refactoring.md) | 3-4 | 2 | Large-scale transformations |
331
+ | [Bug Hunt](patterns/bug-hunt.md) | 3-4 | 2-3 | Complex, elusive bugs |
@@ -0,0 +1,16 @@
1
+ # Definition of Done
2
+
3
+ Use this template with `shipwright loop --definition-of-done <file>` to enforce completion criteria.
4
+ Copy and customize for your project.
5
+
6
+ ## Checklist
7
+
8
+ - [ ] All specified functionality is implemented
9
+ - [ ] Unit tests exist for new code
10
+ - [ ] All tests pass
11
+ - [ ] No TODO/FIXME markers in new code
12
+ - [ ] Public functions have docstrings/comments
13
+ - [ ] README reflects current state
14
+ - [ ] No hardcoded values that should be configurable
15
+ - [ ] Error handling covers likely failure modes
16
+ - [ ] Code follows existing patterns in the codebase
@@ -0,0 +1,139 @@
1
+ # Wave-Style Team Patterns
2
+
3
+ Structured patterns for running Claude Code Agent Teams in tmux using iterative, parallel "waves" of work.
4
+
5
+ ---
6
+
7
+ ## What Are Wave Patterns?
8
+
9
+ A **wave** is a cycle of parallel work followed by synthesis. Instead of one agent grinding through a task sequentially, you decompose work into independent chunks, assign them to agents in separate tmux panes, and iterate until done.
10
+
11
+ ```
12
+ Wave 1: Research Wave 2: Build Wave 3: Integrate
13
+ ┌─────┬─────┐ ┌─────┬─────┬─────┐ ┌─────┬─────┐
14
+ │ A1 │ A2 │ → │ A1 │ A2 │ A3 │ → │ A1 │ A2 │
15
+ │scan │scan │ │model│routes│ UI │ │wire │tests│
16
+ └─────┴─────┘ └─────┴─────┴─────┘ └─────┴─────┘
17
+ ↓ synthesize ↓ synthesize ↓ done
18
+ ```
19
+
20
+ Each wave:
21
+ 1. **Assess** — What did the previous wave accomplish? What failed?
22
+ 2. **Decompose** — What can be done in parallel now?
23
+ 3. **Spawn** — Launch agents in tmux panes for each independent task
24
+ 4. **Synthesize** — Gather results, update state, plan next wave
25
+
26
+ ---
27
+
28
+ ## Available Patterns
29
+
30
+ | Pattern | When to Use | Typical Waves | Team Size |
31
+ |---------|-------------|---------------|-----------|
32
+ | [Feature Implementation](feature-implementation.md) | Building multi-component features | 3-4 | 2-3 agents |
33
+ | [Research & Exploration](research-exploration.md) | Understanding a codebase or problem space | 2-3 | 2-3 agents |
34
+ | [Test Generation](test-generation.md) | Comprehensive test coverage campaigns | 3-4+ | 2-3 agents |
35
+ | [Refactoring](refactoring.md) | Large-scale code transformations | 3-4 | 2 agents |
36
+ | [Bug Hunt](bug-hunt.md) | Tracking down complex, elusive bugs | 3-4 | 2-3 agents |
37
+
38
+ ---
39
+
40
+ ## File-Based State
41
+
42
+ Wave patterns use a **file-based state file** to track progress across iterations. This works everywhere — no special tools required.
43
+
44
+ **State file:** `.claude/team-state.local.md`
45
+
46
+ ```markdown
47
+ ---
48
+ wave: 2
49
+ status: in_progress
50
+ goal: "Build user auth with JWT"
51
+ started_at: 2026-02-07T10:00:00Z
52
+ ---
53
+
54
+ ## Completed
55
+ - [x] Scanned existing auth patterns
56
+ - [x] Identified middleware structure
57
+ - [x] Built User model
58
+
59
+ ## In Progress
60
+ - [ ] JWT route handlers
61
+ - [ ] Login/signup React components
62
+
63
+ ## Blocked
64
+ - None
65
+
66
+ ## Agent Outputs
67
+ - wave-1-scan-auth.md — Existing auth analysis
68
+ - wave-1-scan-deps.md — Dependency audit
69
+ - wave-2-model.md — User model implementation notes
70
+ ```
71
+
72
+ **Agent outputs directory:** `.claude/team-outputs/`
73
+
74
+ Each agent writes its results to a markdown file in this directory. The team lead reads all outputs between waves to synthesize progress.
75
+
76
+ > **Tip:** Add `.claude/team-state.local.md` and `.claude/team-outputs/` to your `.gitignore`. These are ephemeral working files.
77
+
78
+ ---
79
+
80
+ ## Quick Start
81
+
82
+ Pick a pattern, then use `shipwright` (alias: `cct`, `sw`) to set up the team:
83
+
84
+ ```bash
85
+ # Start a tmux session
86
+ tmux new -s my-feature
87
+
88
+ # Create a 3-agent team
89
+ shipwright session my-feature
90
+
91
+ # In the team lead pane, describe the work using a wave pattern
92
+ # The team lead decomposes into waves and assigns tasks
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Key Principles
98
+
99
+ ### 1. Parallel Everything
100
+ If two tasks don't depend on each other, run them at the same time in separate panes. The whole point of waves is maximizing parallel throughput.
101
+
102
+ ### 2. Synthesize Between Waves
103
+ Don't just fire-and-forget. After each wave, the team lead reads all agent outputs, identifies gaps, and adjusts the plan. This is where the real value happens.
104
+
105
+ ### 3. Iterate Until Done
106
+ Waves repeat until the goal is met. Failed tasks get retried with better instructions. Each wave builds on the last. Set a reasonable max (5-10 waves for most tasks).
107
+
108
+ ### 4. File-Based State Is the Source of Truth
109
+ The `.claude/team-state.local.md` file tracks what's done, what's pending, and what's blocked. Agents update their output files; the team lead updates the state file.
110
+
111
+ ### 5. Keep Teams Small
112
+ 2-3 agents per team. More agents means more tmux panes, more coordination overhead, and more risk of file conflicts. The sweet spot is 2-3 focused agents.
113
+
114
+ ---
115
+
116
+ ## Anti-Patterns
117
+
118
+ | Don't | Why | Instead |
119
+ |-------|-----|---------|
120
+ | Spawn 5+ agents per wave | Coordination overhead, race conditions | 2-3 agents per wave |
121
+ | Skip synthesis between waves | You'll lose track of progress and duplicate work | Always read outputs and update state |
122
+ | Give vague task descriptions | Agents waste context figuring out what to do | Be specific: files, functions, acceptance criteria |
123
+ | Let agents touch overlapping files | One will overwrite the other's changes | Partition files by agent |
124
+ | Keep iterating when stuck | Wastes tokens and your time | After 3 failed attempts, rethink the approach |
125
+ | Use waves for trivial tasks | Overhead exceeds benefit | Just do it in a single agent |
126
+
127
+ ---
128
+
129
+ ## Model Selection
130
+
131
+ Choose the right model for each agent's task:
132
+
133
+ | Task Type | Model | Why |
134
+ |-----------|-------|-----|
135
+ | File search, simple lookups | `haiku` | Fast, cheap |
136
+ | Implementation, clear requirements | `sonnet` | Balanced speed/quality |
137
+ | Architecture decisions, complex debugging | `opus` | Best reasoning |
138
+ | Test generation | `sonnet` | Good pattern matching |
139
+ | Documentation, reports | `sonnet` | Clear writing |
@@ -0,0 +1,149 @@
1
+ # Pattern: Audit Loop
2
+
3
+ Add self-reflection and quality gates to the continuous agent loop (`shipwright loop`) to prevent premature completion, catch regressions, and enforce project-specific standards.
4
+
5
+ ---
6
+
7
+ ## When to Use
8
+
9
+ - Running `shipwright loop` on tasks where **correctness matters more than speed** (production features, refactors, data migrations)
10
+ - The agent keeps declaring LOOP_COMPLETE **before the work is actually done**
11
+ - You want **automated quality checks** (tests, linting, type-checking) between iterations
12
+ - Your project has a **Definition of Done** that goes beyond "code compiles"
13
+
14
+ **Don't use** for quick prototypes, throwaway scripts, or exploration tasks where speed matters more than rigor.
15
+
16
+ ---
17
+
18
+ ## Audit Modes
19
+
20
+ ### `--audit` (Self-Reflection)
21
+
22
+ The agent pauses after each iteration to review its own work before deciding whether to continue or declare completion.
23
+
24
+ **Cost:** Minimal — adds ~30 seconds per iteration (one extra prompt to the same agent).
25
+
26
+ **Best for:** Solo agent work where you want a sanity check without the overhead of a second agent.
27
+
28
+ ```bash
29
+ shipwright loop "Build user auth with JWT" --audit --test-cmd "npm test"
30
+ ```
31
+
32
+ ### `--audit-agent` (Separate Auditor)
33
+
34
+ Spawns a dedicated auditor agent that reviews the work agent's output each iteration. The auditor can reject LOOP_COMPLETE and send the work agent back with specific feedback.
35
+
36
+ **Cost:** Higher — each iteration runs two agents (worker + auditor). Roughly 2x the API cost.
37
+
38
+ **Best for:** Complex features, production code, or tasks where you've seen the agent cut corners.
39
+
40
+ ```bash
41
+ shipwright loop "Refactor auth to use refresh tokens" --audit-agent --model sonnet
42
+ ```
43
+
44
+ ### `--quality-gates` (Automated Checks)
45
+
46
+ Runs your test command, linter, or type-checker between iterations. The loop only advances if gates pass.
47
+
48
+ **Cost:** Depends on your test suite. Adds wall-clock time but no extra API cost.
49
+
50
+ **Best for:** Projects with existing CI checks you want to enforce locally.
51
+
52
+ ```bash
53
+ shipwright loop "Add pagination to API" --quality-gates --test-cmd "npm test && npm run lint"
54
+ ```
55
+
56
+ ### Combining Modes
57
+
58
+ Modes stack. The most rigorous setup combines all three:
59
+
60
+ ```bash
61
+ shipwright loop "Build payment integration" \
62
+ --audit-agent \
63
+ --quality-gates \
64
+ --test-cmd "npm test" \
65
+ --definition-of-done dod.md
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Writing Effective Definition of Done Files
71
+
72
+ A good DoD file is the single most effective way to prevent premature LOOP_COMPLETE.
73
+
74
+ ### Template
75
+
76
+ ```bash
77
+ cp ~/.shipwright/templates/definition-of-done.example.md my-dod.md
78
+ ```
79
+
80
+ ### Tips
81
+
82
+ - **Be specific.** "Tests pass" is weak. "Unit tests cover the 3 API endpoints and the auth middleware" is strong.
83
+ - **Include negative checks.** "No hardcoded API keys" or "No TODO markers" catch things agents skip.
84
+ - **Keep it short.** 8-15 items. More than that and the agent loses focus.
85
+ - **Order by importance.** The agent checks items top-to-bottom. Put critical items first.
86
+
87
+ ### Example: Feature DoD
88
+
89
+ ```markdown
90
+ # Definition of Done — Payment Integration
91
+
92
+ - [ ] Stripe webhook handler processes charge.succeeded and charge.failed
93
+ - [ ] Idempotency keys prevent duplicate charges
94
+ - [ ] Unit tests cover success, failure, and duplicate scenarios
95
+ - [ ] Integration test hits Stripe test mode
96
+ - [ ] All amounts stored as cents (integer), never floats
97
+ - [ ] No Stripe secret keys in source code
98
+ - [ ] Error responses follow existing API error format
99
+ ```
100
+
101
+ ---
102
+
103
+ ## Preventing Premature LOOP_COMPLETE
104
+
105
+ The most common failure mode is the agent declaring victory too early. Countermeasures:
106
+
107
+ | Technique | How it helps |
108
+ |-----------|-------------|
109
+ | `--audit` | Agent re-reads its own output and catches obvious gaps |
110
+ | `--audit-agent` | Second opinion catches blind spots the worker has |
111
+ | `--definition-of-done` | Explicit checklist the agent must verify before completing |
112
+ | `--quality-gates` | Hard gate — tests must pass or the loop continues |
113
+ | `--test-cmd` | Even without quality gates, a test command gives the agent feedback |
114
+ | `--max-iterations` | Safety net — prevents infinite loops if nothing else works |
115
+
116
+ **Pro tip:** If the agent still completes too early, make your goal statement more specific. "Build auth" is vague. "Build JWT auth with login, signup, password reset, and refresh token rotation" gives the agent a clear finish line.
117
+
118
+ ---
119
+
120
+ ## Example Commands
121
+
122
+ ```bash
123
+ # Quick audit for a small task
124
+ shipwright loop "Fix the N+1 query in user list" --audit --test-cmd "pytest tests/test_users.py"
125
+
126
+ # Rigorous audit for production feature
127
+ shipwright loop "Add RBAC to the API" --audit-agent --quality-gates \
128
+ --test-cmd "npm test" --definition-of-done rbac-dod.md
129
+
130
+ # Cost-conscious: quality gates only, no extra agent
131
+ shipwright loop "Migrate DB schema" --quality-gates --test-cmd "npm run db:test"
132
+
133
+ # Maximum rigor: all checks enabled
134
+ shipwright loop "PCI compliance updates" --audit-agent --quality-gates \
135
+ --test-cmd "npm test && npm run lint && npm run typecheck" \
136
+ --definition-of-done pci-dod.md --max-iterations 15
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Anti-Patterns
142
+
143
+ | Don't | Why |
144
+ |-------|-----|
145
+ | Use `--audit-agent` for trivial tasks | 2x cost for a one-file fix is wasteful |
146
+ | Write a 30-item DoD | The agent loses focus. Keep it under 15 items |
147
+ | Skip `--test-cmd` with `--quality-gates` | Quality gates with no test command does nothing useful |
148
+ | Set `--max-iterations 1` with `--audit` | The audit has nowhere to send feedback if there's only one iteration |
149
+ | Rely solely on `--audit` for critical work | Self-reflection catches ~60% of issues. Add `--quality-gates` for the rest |