anvil-dev-framework 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (190) hide show
  1. package/README.md +719 -0
  2. package/VERSION +1 -0
  3. package/docs/ANVIL-REPO-IMPLEMENTATION-PLAN.md +441 -0
  4. package/docs/FIRST-SKILL-TUTORIAL.md +408 -0
  5. package/docs/INSTALLATION-RETRO-NOTES.md +458 -0
  6. package/docs/INSTALLATION.md +984 -0
  7. package/docs/anvil-hud.md +469 -0
  8. package/docs/anvil-init.md +255 -0
  9. package/docs/anvil-state.md +210 -0
  10. package/docs/boris-cherny-ralph-wiggum-insights.md +608 -0
  11. package/docs/command-reference.md +2022 -0
  12. package/docs/hooks-tts.md +368 -0
  13. package/docs/implementation-guide.md +810 -0
  14. package/docs/linear-github-integration.md +247 -0
  15. package/docs/local-issues.md +677 -0
  16. package/docs/patterns/README.md +419 -0
  17. package/docs/planning-responsibilities.md +139 -0
  18. package/docs/session-workflow.md +573 -0
  19. package/docs/simplification-plan-template.md +297 -0
  20. package/docs/simplification-principles.md +129 -0
  21. package/docs/specifications/CCS-RALPH-INTEGRATION-DESIGN.md +633 -0
  22. package/docs/specifications/CCS-RESEARCH-REPORT.md +169 -0
  23. package/docs/specifications/PLAN-ANV-verification-ralph-wiggum.md +403 -0
  24. package/docs/specifications/PLAN-parallel-tracks-anvil-memory-ccs.md +494 -0
  25. package/docs/specifications/SPEC-ANV-VRW/component-01-verify.md +208 -0
  26. package/docs/specifications/SPEC-ANV-VRW/component-02-stop-gate.md +226 -0
  27. package/docs/specifications/SPEC-ANV-VRW/component-03-posttooluse.md +209 -0
  28. package/docs/specifications/SPEC-ANV-VRW/component-04-ralph-wiggum.md +604 -0
  29. package/docs/specifications/SPEC-ANV-VRW/component-05-atomic-actions.md +311 -0
  30. package/docs/specifications/SPEC-ANV-VRW/component-06-verify-subagent.md +264 -0
  31. package/docs/specifications/SPEC-ANV-VRW/component-07-claude-md.md +363 -0
  32. package/docs/specifications/SPEC-ANV-VRW/index.md +182 -0
  33. package/docs/specifications/SPEC-ANV-anvil-memory.md +573 -0
  34. package/docs/specifications/SPEC-ANV-context-checkpoints.md +781 -0
  35. package/docs/specifications/SPEC-ANV-verification-ralph-wiggum.md +789 -0
  36. package/docs/sync.md +122 -0
  37. package/global/CLAUDE.md +140 -0
  38. package/global/agents/verify-app.md +164 -0
  39. package/global/commands/anvil-settings.md +527 -0
  40. package/global/commands/anvil-sync.md +121 -0
  41. package/global/commands/change.md +197 -0
  42. package/global/commands/clarify.md +252 -0
  43. package/global/commands/cleanup.md +292 -0
  44. package/global/commands/commit-push-pr.md +207 -0
  45. package/global/commands/decay-review.md +127 -0
  46. package/global/commands/discover.md +158 -0
  47. package/global/commands/doc-coverage.md +122 -0
  48. package/global/commands/evidence.md +307 -0
  49. package/global/commands/explore.md +121 -0
  50. package/global/commands/force-exit.md +135 -0
  51. package/global/commands/handoff.md +191 -0
  52. package/global/commands/healthcheck.md +302 -0
  53. package/global/commands/hud.md +84 -0
  54. package/global/commands/insights.md +319 -0
  55. package/global/commands/linear-setup.md +184 -0
  56. package/global/commands/lint-fix.md +198 -0
  57. package/global/commands/orient.md +510 -0
  58. package/global/commands/plan.md +228 -0
  59. package/global/commands/ralph.md +346 -0
  60. package/global/commands/ready.md +182 -0
  61. package/global/commands/release.md +305 -0
  62. package/global/commands/retro.md +96 -0
  63. package/global/commands/shard.md +166 -0
  64. package/global/commands/spec.md +227 -0
  65. package/global/commands/sprint.md +184 -0
  66. package/global/commands/tasks.md +228 -0
  67. package/global/commands/test-and-commit.md +151 -0
  68. package/global/commands/validate.md +132 -0
  69. package/global/commands/verify.md +251 -0
  70. package/global/commands/weekly-review.md +156 -0
  71. package/global/hooks/__pycache__/ralph_context_monitor.cpython-314.pyc +0 -0
  72. package/global/hooks/__pycache__/statusline_agent_sync.cpython-314.pyc +0 -0
  73. package/global/hooks/anvil_memory_observe.ts +322 -0
  74. package/global/hooks/anvil_memory_session.ts +166 -0
  75. package/global/hooks/anvil_memory_stop.ts +187 -0
  76. package/global/hooks/parse_transcript.py +116 -0
  77. package/global/hooks/post_merge_cleanup.sh +132 -0
  78. package/global/hooks/post_tool_format.sh +215 -0
  79. package/global/hooks/ralph_context_monitor.py +240 -0
  80. package/global/hooks/ralph_stop.sh +502 -0
  81. package/global/hooks/statusline.sh +1110 -0
  82. package/global/hooks/statusline_agent_sync.py +224 -0
  83. package/global/hooks/stop_gate.sh +250 -0
  84. package/global/lib/.claude/anvil-state.json +21 -0
  85. package/global/lib/__pycache__/agent_registry.cpython-314.pyc +0 -0
  86. package/global/lib/__pycache__/claim_service.cpython-314.pyc +0 -0
  87. package/global/lib/__pycache__/coderabbit_service.cpython-314.pyc +0 -0
  88. package/global/lib/__pycache__/config_service.cpython-314.pyc +0 -0
  89. package/global/lib/__pycache__/coordination_service.cpython-314.pyc +0 -0
  90. package/global/lib/__pycache__/doc_coverage_service.cpython-314.pyc +0 -0
  91. package/global/lib/__pycache__/gate_logger.cpython-314.pyc +0 -0
  92. package/global/lib/__pycache__/github_service.cpython-314.pyc +0 -0
  93. package/global/lib/__pycache__/hygiene_service.cpython-314.pyc +0 -0
  94. package/global/lib/__pycache__/issue_models.cpython-314.pyc +0 -0
  95. package/global/lib/__pycache__/issue_provider.cpython-314.pyc +0 -0
  96. package/global/lib/__pycache__/linear_data_service.cpython-314.pyc +0 -0
  97. package/global/lib/__pycache__/linear_provider.cpython-314.pyc +0 -0
  98. package/global/lib/__pycache__/local_provider.cpython-314.pyc +0 -0
  99. package/global/lib/__pycache__/quality_service.cpython-314.pyc +0 -0
  100. package/global/lib/__pycache__/ralph_state.cpython-314.pyc +0 -0
  101. package/global/lib/__pycache__/state_manager.cpython-314.pyc +0 -0
  102. package/global/lib/__pycache__/transcript_parser.cpython-314.pyc +0 -0
  103. package/global/lib/__pycache__/verification_runner.cpython-314.pyc +0 -0
  104. package/global/lib/__pycache__/verify_iteration.cpython-314.pyc +0 -0
  105. package/global/lib/__pycache__/verify_subagent.cpython-314.pyc +0 -0
  106. package/global/lib/agent_registry.py +995 -0
  107. package/global/lib/anvil-state.sh +435 -0
  108. package/global/lib/claim_service.py +515 -0
  109. package/global/lib/coderabbit_service.py +314 -0
  110. package/global/lib/config_service.py +423 -0
  111. package/global/lib/coordination_service.py +331 -0
  112. package/global/lib/doc_coverage_service.py +1305 -0
  113. package/global/lib/gate_logger.py +316 -0
  114. package/global/lib/github_service.py +310 -0
  115. package/global/lib/handoff_generator.py +775 -0
  116. package/global/lib/hygiene_service.py +712 -0
  117. package/global/lib/issue_models.py +257 -0
  118. package/global/lib/issue_provider.py +339 -0
  119. package/global/lib/linear_data_service.py +210 -0
  120. package/global/lib/linear_provider.py +987 -0
  121. package/global/lib/linear_provider.py.backup +671 -0
  122. package/global/lib/local_provider.py +486 -0
  123. package/global/lib/orient_fast.py +457 -0
  124. package/global/lib/quality_service.py +470 -0
  125. package/global/lib/ralph_prompt_generator.py +563 -0
  126. package/global/lib/ralph_state.py +1202 -0
  127. package/global/lib/state_manager.py +417 -0
  128. package/global/lib/transcript_parser.py +597 -0
  129. package/global/lib/verification_runner.py +557 -0
  130. package/global/lib/verify_iteration.py +490 -0
  131. package/global/lib/verify_subagent.py +250 -0
  132. package/global/skills/README.md +155 -0
  133. package/global/skills/quality-gates/SKILL.md +252 -0
  134. package/global/skills/skill-template/SKILL.md +109 -0
  135. package/global/skills/testing-strategies/SKILL.md +337 -0
  136. package/global/templates/CHANGE-template.md +105 -0
  137. package/global/templates/HANDOFF-template.md +63 -0
  138. package/global/templates/PLAN-template.md +111 -0
  139. package/global/templates/SPEC-template.md +93 -0
  140. package/global/templates/ralph/PROMPT.md.template +89 -0
  141. package/global/templates/ralph/fix_plan.md.template +31 -0
  142. package/global/templates/ralph/progress.txt.template +23 -0
  143. package/global/tests/__pycache__/test_doc_coverage.cpython-314.pyc +0 -0
  144. package/global/tests/test_doc_coverage.py +520 -0
  145. package/global/tests/test_issue_models.py +299 -0
  146. package/global/tests/test_local_provider.py +323 -0
  147. package/global/tools/README.md +178 -0
  148. package/global/tools/__pycache__/anvil-hud.cpython-314.pyc +0 -0
  149. package/global/tools/anvil-hud.py +3622 -0
  150. package/global/tools/anvil-hud.py.bak +3318 -0
  151. package/global/tools/anvil-issue.py +432 -0
  152. package/global/tools/anvil-memory/CLAUDE.md +49 -0
  153. package/global/tools/anvil-memory/README.md +42 -0
  154. package/global/tools/anvil-memory/bun.lock +25 -0
  155. package/global/tools/anvil-memory/bunfig.toml +9 -0
  156. package/global/tools/anvil-memory/package.json +23 -0
  157. package/global/tools/anvil-memory/src/__tests__/ccs/context-monitor.test.ts +535 -0
  158. package/global/tools/anvil-memory/src/__tests__/ccs/edge-cases.test.ts +645 -0
  159. package/global/tools/anvil-memory/src/__tests__/ccs/fixtures.ts +363 -0
  160. package/global/tools/anvil-memory/src/__tests__/ccs/index.ts +8 -0
  161. package/global/tools/anvil-memory/src/__tests__/ccs/integration.test.ts +417 -0
  162. package/global/tools/anvil-memory/src/__tests__/ccs/prompt-generator.test.ts +571 -0
  163. package/global/tools/anvil-memory/src/__tests__/ccs/ralph-stop.test.ts +440 -0
  164. package/global/tools/anvil-memory/src/__tests__/ccs/test-utils.ts +252 -0
  165. package/global/tools/anvil-memory/src/__tests__/commands.test.ts +657 -0
  166. package/global/tools/anvil-memory/src/__tests__/db.test.ts +641 -0
  167. package/global/tools/anvil-memory/src/__tests__/hooks.test.ts +272 -0
  168. package/global/tools/anvil-memory/src/__tests__/performance.test.ts +427 -0
  169. package/global/tools/anvil-memory/src/__tests__/test-utils.ts +113 -0
  170. package/global/tools/anvil-memory/src/commands/checkpoint.ts +197 -0
  171. package/global/tools/anvil-memory/src/commands/get.ts +115 -0
  172. package/global/tools/anvil-memory/src/commands/init.ts +94 -0
  173. package/global/tools/anvil-memory/src/commands/observe.ts +163 -0
  174. package/global/tools/anvil-memory/src/commands/search.ts +112 -0
  175. package/global/tools/anvil-memory/src/db.ts +638 -0
  176. package/global/tools/anvil-memory/src/index.ts +205 -0
  177. package/global/tools/anvil-memory/src/types.ts +122 -0
  178. package/global/tools/anvil-memory/tsconfig.json +29 -0
  179. package/global/tools/ralph-loop.sh +359 -0
  180. package/package.json +45 -0
  181. package/scripts/anvil +822 -0
  182. package/scripts/extract_patterns.py +222 -0
  183. package/scripts/init-project.sh +541 -0
  184. package/scripts/install.sh +229 -0
  185. package/scripts/postinstall.js +41 -0
  186. package/scripts/rollback.sh +188 -0
  187. package/scripts/sync.sh +623 -0
  188. package/scripts/test-statusline.sh +248 -0
  189. package/scripts/update_claude_md.py +224 -0
  190. package/scripts/verify.sh +255 -0
@@ -0,0 +1,608 @@
1
+ # Boris Cherny & Ralph Wiggum: Claude Code Best Practices Research
2
+
3
+ > Comprehensive analysis of Boris Cherny's (Claude Code creator) workflow and Geoffrey Huntley's Ralph Wiggum technique for integration into the Anvil framework.
4
+
5
+ **Research Date**: 2026-01-04
6
+ **Sources**:
7
+ - [Boris Cherny's Twitter Thread](https://twitter-thread.com/t/2007179832300581177)
8
+ - [DEV Community Breakdown](https://dev.to/sivarampg/how-the-creator-of-claude-code-uses-claude-code-a-complete-breakdown-4f07)
9
+ - [Anthropic: Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
10
+ - [nibzard: Boris Cherny's Guide](https://www.nibzard.com/claude-code/)
11
+ - [Geoffrey Huntley: Ralph Wiggum as a Software Engineer](https://ghuntley.com/ralph-wiggum/)
12
+ - [Ralph Wiggum Plugin](https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum)
13
+
14
+ ---
15
+
16
+ ## Table of Contents
17
+
18
+ - [Part 1: Boris Cherny's Claude Code Setup](#part-1-boris-chernys-claude-code-setup)
19
+ - [Part 2: Ralph Wiggum Technique](#part-2-ralph-wiggum-technique)
20
+ - [Part 3: Gap Analysis - Anvil vs Best Practices](#part-3-gap-analysis---anvil-vs-best-practices)
21
+ - [Part 4: Implementation Recommendations](#part-4-implementation-recommendations)
22
+
23
+ ---
24
+
25
+ ## Part 1: Boris Cherny's Claude Code Setup
26
+
27
+ Boris Cherny created Claude Code as a side project in September 2024. His setup is "surprisingly vanilla" - he says Claude Code works great out of the box.
28
+
29
+ ### Core Principles
30
+
31
+ | Principle | Details |
32
+ |-----------|---------|
33
+ | **Verification is #1** | "Give Claude a way to verify its work. It will 2-3x the quality of the final result" |
34
+ | **Opus over Sonnet** | Fewer iterations with stronger model beats rapid weak iterations |
35
+ | **Plan Mode First** | "Invest time upfront in planning, and save way more time in execution" |
36
+ | **Inner Loop Commands** | Every repeated workflow becomes a slash command |
37
+
38
+ ### Parallel Session Architecture
39
+
40
+ Boris runs **15+ concurrent Claude sessions**:
41
+ - 5 terminal instances with numbered tabs and system notifications
42
+ - 5-10 web sessions on claude.ai/code
43
+ - iOS app sessions started from phone, monitored on desktop
44
+ - Cross-platform handoff using `&` (background to web) and `--teleport <session-id>`
45
+
46
+ ### Model Selection
47
+
48
+ > "Sonnet (fast) → 5 iterations → Total time: 5 minutes
49
+ > Opus (slow) → 1 iteration → Total time: 2 minutes"
50
+
51
+ **Rationale**: Fewer iterations with stronger model beats rapid weak iterations. Opus provides better autonomous tool selection without constant steering.
52
+
53
+ ### Results
54
+
55
+ In thirty days, Boris landed **259 PRs** — 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5.
56
+
57
+ ### CLAUDE.md Structure
58
+
59
+ Boris maintains a team-shared CLAUDE.md file:
60
+ - **Checked into git**
61
+ - **Updated multiple times weekly** by team members
62
+ - When Claude makes errors, team adds entries so "Claude knows not to do it next time"
63
+ - Captures tribal knowledge not in official docs
64
+
65
+ **File Hierarchy**:
66
+ ```
67
+ /<enterprise root>/CLAUDE.md → Organizational policies
68
+ ~/.claude/CLAUDE.md → User-global context
69
+ <project>/CLAUDE.md → Project-specific (checked into git)
70
+ <project>/CLAUDE.local.md → Local overrides (not in git)
71
+ ```
72
+
73
+ **Best Practice**: Keep to 100-200 lines maximum. If larger, move details into per-folder CLAUDE.md files.
74
+
75
+ ### Automated Knowledge Extraction
76
+
77
+ Boris's team has a GitHub Action triggered by `@.claude` PR comments that:
78
+ 1. Analyzes pull requests
79
+ 2. Extracts patterns, anti-patterns, conventions
80
+ 3. Auto-commits CLAUDE.md updates
81
+ 4. Implements "compounding engineering" where knowledge accumulates
82
+
83
+ ### Slash Commands
84
+
85
+ Pre-configured workflows stored in `.claude/commands/`:
86
+
87
+ **`/commit-push-pr`**:
88
+ - Inline bash executes `git status`, `git branch`, `git log -5`
89
+ - Stages and commits with conventional format
90
+ - Creates PR with detailed description
91
+ - Requests appropriate reviews
92
+
93
+ **`/test-and-commit`**:
94
+ - Runs `npm test`, `npm run typecheck`, `npm run lint`
95
+ - Only commits if all checks pass
96
+ - Reports failures without committing
97
+
98
+ ### Subagents
99
+
100
+ Domain-specific AI personalities with:
101
+ - Separate context windows (isolated from main)
102
+ - Custom system prompts
103
+ - Limited tool access
104
+
105
+ **Examples**:
106
+ - `code-simplifier` - Reduces complexity, extracts repeated logic
107
+ - `verify-app` - Automated verification post-completion
108
+
109
+ ### Hooks
110
+
111
+ **PostToolUse Hook for Formatting**:
112
+ ```json
113
+ {
114
+ "hooks": {
115
+ "PostToolUse": [
116
+ {
117
+ "matcher": "Edit|Write",
118
+ "hooks": [
119
+ {
120
+ "type": "command",
121
+ "command": "npx prettier --write \"$CLAUDE_FILE_PATH\""
122
+ }
123
+ ]
124
+ }
125
+ ]
126
+ }
127
+ }
128
+ ```
129
+
130
+ "Last 10% to avoid formatting errors" - catches issues before CI.
131
+
132
+ ### Stop Hook for Verification Gate
133
+
134
+ ```bash
135
+ #!/bin/bash
136
+ npm test
137
+ if [ $? -eq 0 ]; then
138
+ echo "✓ All tests passing"
139
+ exit 0
140
+ else
141
+ echo "✗ Tests failing - continue working"
142
+ exit 1 # Block exit, force continuation
143
+ fi
144
+ ```
145
+
146
+ Non-zero exit blocks Claude from stopping, forcing iteration until tests pass.
147
+
148
+ ### Verification Methods
149
+
150
+ Boris emphasizes verification as "probably the most important thing":
151
+
152
+ | Type | Implementation |
153
+ |------|----------------|
154
+ | **Command-line** | `npm test`, `npm run typecheck`, `npm run lint` |
155
+ | **Test Suite** | Unit, integration, e2e tests with >80% coverage |
156
+ | **Browser** | Chrome Extension opens localhost, tests forms, validates UX |
157
+ | **API** | curl tests for creation, validation, auth flows |
158
+ | **Pre-commit** | Hooks run lint, typecheck, test before each commit |
159
+
160
+ ### Workflow Patterns
161
+
162
+ **Explore, Plan, Code, Commit**:
163
+ 1. Request Claude read relevant files without writing code initially
164
+ 2. Ask Claude to create a detailed plan using thinking mode
165
+ 3. Have Claude implement solutions while verifying reasonableness
166
+ 4. Request commits and documentation updates
167
+
168
+ > "Steps #1-#2 are crucial—without them, Claude tends to jump straight to coding."
169
+
170
+ **Test-Driven Development (TDD)**:
171
+ 1. Request test cases based on input/output pairs
172
+ 2. Confirm tests fail initially
173
+ 3. Commit tests before implementation
174
+ 4. Have Claude code to pass tests with multiple iterations
175
+ 5. Verify implementation doesn't overfit
176
+ 6. Commit final code
177
+
178
+ **Visual Iteration**:
179
+ 1. Provide browser screenshots via Puppeteer MCP or drag-and-drop
180
+ 2. Supply design mocks as reference points
181
+ 3. Request Claude iterate 2-3 times for refinement
182
+ 4. Commit when satisfied
183
+
184
+ ### Permission Management
185
+
186
+ Instead of `--dangerously-skip-permissions`, use `/permissions` to pre-allow safe commands:
187
+ - git operations (status, diff, commit, push)
188
+ - npm scripts (test, build, lint)
189
+ - File operations (cat, ls, grep)
190
+
191
+ Stored in `.claude/settings.json` for team access.
192
+
193
+ ---
194
+
195
+ ## Part 2: Ralph Wiggum Technique
196
+
197
+ Geoffrey Huntley's Ralph Wiggum is a technique for long-running, unattended Claude Code execution. Named after the Simpsons character, it embodies persistent iteration despite setbacks.
198
+
199
+ ### Core Philosophy
200
+
201
+ > "The technique is deterministically bad in an undeterministic world."
202
+
203
+ Ralph is designed to handle the non-determinism of LLMs through iteration. It requires:
204
+ - Faith in eventual consistency
205
+ - Belief that most issues can be resolved through more loops
206
+ - Patience - "Ralph will test you"
207
+
208
+ ### The Basic Loop
209
+
210
+ In its purest form, Ralph is a Bash loop:
211
+
212
+ ```bash
213
+ while :; do cat PROMPT.md | npx --yes @sourcegraph/amp ; done
214
+ ```
215
+
216
+ Ralph can be done with any tool that doesn't cap tool calls and usage.
217
+
218
+ ### Key Insight: One Item Per Loop
219
+
220
+ > "To get good outcomes with Ralph, you need to ask Ralph to do one thing per loop. **Only one thing.**"
221
+
222
+ You may relax this as the project progresses, but if Ralph goes off the rails, narrow it back to just one item.
223
+
224
+ ### Context Window Management
225
+
226
+ > "The name of the game is that you only have approximately 170k of context window to work with. So it's essential to use as little of it as possible. The more you use the context window, the worse the outcomes you'll get."
227
+
228
+ **Solution**: Don't allocate to primary context window. Spawn subagents instead.
229
+
230
+ - Primary context window = **scheduler**
231
+ - Subagents = **workers** doing expensive operations
232
+ - Use up to **500 parallel subagents** for file searching/research
233
+ - Use only **1 subagent** for build/tests (avoid backpressure)
234
+
235
+ ### The Three Phases
236
+
237
+ #### Phase 1: Requirements
238
+
239
+ - Have a long conversation about requirements
240
+ - Don't ask the agent to implement - just discuss what you're about to implement
241
+ - Once agent understands the task, write specifications out
242
+ - One file per topic in the specifications folder
243
+
244
+ #### Phase 2: TODO (Planning)
245
+
246
+ - Study specs to learn about requirements
247
+ - Manually allocate context window with discussion about the activity
248
+ - Goal: shape the context window, not do implementation
249
+ - Target the job-to-be-done with up to 100 subagents
250
+ - Write out an `IMPLEMENTATION_PLAN.md`
251
+
252
+ #### Phase 3: Implementation (Incremental Loop)
253
+
254
+ - Study specs and IMPLEMENTATION_PLAN.md
255
+ - Pick most important item to address
256
+ - Update implementation plan once tests pass
257
+ - Use as many subagents as needed for research
258
+ - Only use **one subagent** when doing tests/builds
259
+ - Commit on success
260
+
261
+ ### Key Files
262
+
263
+ | File | Purpose |
264
+ |------|---------|
265
+ | `PROMPT.md` | Main prompt fed each loop |
266
+ | `AGENT.md` | How Ralph should compile/run (self-improvement allowed) |
267
+ | `fix_plan.md` | Living TODO list sorted by priority |
268
+ | `specs/` | Specifications folder (one file per topic) |
269
+
270
+ ### Deterministic Stack Allocation
271
+
272
+ > "Deterministically allocate the stack the same way every loop."
273
+
274
+ Every loop should:
275
+ 1. Study specs/* to learn about specifications
276
+ 2. The source code is in src/
277
+ 3. Study fix_plan.md
278
+ 4. Then do the task
279
+
280
+ This ensures consistent context regardless of iteration number.
281
+
282
+ ### Backpressure
283
+
284
+ Backpressure = mechanisms that reject invalid code:
285
+ - Type systems
286
+ - Test suites
287
+ - Linters
288
+ - Security scanners
289
+ - Static analyzers
290
+
291
+ > "After implementing functionality or resolving problems, run the tests for that unit of code that was improved."
292
+
293
+ For dynamically typed languages, strongly recommend wiring in static analyzers (e.g., Pyrefly for Python).
294
+
295
+ ### No Cheating (Anti-Placeholder Pattern)
296
+
297
+ Claude has inherent bias toward minimal/placeholder implementations. Counter this:
298
+
299
+ > "DO NOT IMPLEMENT PLACEHOLDER OR SIMPLE IMPLEMENTATIONS. WE WANT FULL IMPLEMENTATIONS. **DO IT OR I WILL YELL AT YOU**"
300
+
301
+ If Ralph ignores this, you can always run more Ralphs to identify placeholders and transform them into a todo list.
302
+
303
+ ### Don't Assume It's Not Implemented
304
+
305
+ > "Before making changes search codebase (don't assume not implemented) using subagents. Think hard."
306
+
307
+ A common failure is when the LLM runs ripgrep and incorrectly concludes code hasn't been implemented. Erect a sign for Ralph: don't make assumptions.
308
+
309
+ ### Self-Learning
310
+
311
+ > "When you learn something new about how to run the compiler or examples make sure you update @AGENT.md using a subagent but keep it brief."
312
+
313
+ Allow Ralph to take himself to "university" - permit self-improvement of the AGENT.md file.
314
+
315
+ ### Commit on Success
316
+
317
+ > "When the tests pass update the fix_plan.md, then add changed code and @fix_plan.md with 'git add -A' via bash then do a 'git commit' with a message that describes the changes. After the commit do a 'git push'."
318
+
319
+ > "As soon as there are no build or test errors create a git tag. Start at 0.0.0 and increment patch by 1."
320
+
321
+ ### Loop Back is Everything
322
+
323
+ > "You want to program in ways where Ralph can loop himself back into the LLM for evaluation. This is incredibly important."
324
+
325
+ Examples:
326
+ - Add additional logging
327
+ - Compile the application and look at LLVM IR representation
328
+ - Run the app and check output
329
+
330
+ ### The TODO List
331
+
332
+ ```
333
+ study specs/* to learn about the compiler specifications and fix_plan.md to understand plan so far.
334
+
335
+ The source code of the compiler is in src/*
336
+
337
+ First task is to study @fix_plan.md (it may be incorrect) and is to use up to 500 subagents to study existing source code in src/ and compare it against the compiler specifications. From that create/update a @fix_plan.md which is a bullet point list sorted in priority of the items which have yet to be implemented.
338
+
339
+ Consider searching for TODO, minimal implementations and placeholders. Study @fix_plan.md to determine starting point for research and keep it up to date with items considered complete/incomplete.
340
+ ```
341
+
342
+ ### Results
343
+
344
+ - **Y Combinator hackathon**: Generated 6 repositories overnight
345
+ - **Real contract**: $50k USD contract delivered, tested, reviewed for **$297** in API costs
346
+ - **CURSED**: Built an entire programming language over 3 months using this technique
347
+
348
+ ### Three States of Ralph
349
+
350
+ > "Ralph has three states. Under baked, baked, or baked with unspecified latent behaviours (which are sometimes quite nice!)"
351
+
352
+ ### When to Use Ralph
353
+
354
+ **Good for**:
355
+ - Greenfield projects
356
+ - Well-defined tasks with clear success criteria
357
+ - Tasks requiring iteration (getting tests to pass)
358
+ - Tasks with automatic verification
359
+
360
+ **Not good for**:
361
+ - Existing codebases (Ralph works best for bootstrapping)
362
+ - Tasks requiring human judgment
363
+ - One-shot operations
364
+ - Tasks with unclear success criteria
365
+
366
+ ---
367
+
368
+ ## Part 3: Gap Analysis - Anvil vs Best Practices
369
+
370
+ ### Verification Loops
371
+
372
+ | Aspect | Boris's Approach | Anvil Current | Gap |
373
+ |--------|------------------|---------------|-----|
374
+ | Feedback loop | Chrome extension, test suites, API tests | `/validate`, `/evidence` capture | No **iteration on failure** |
375
+ | Stop hook gate | Blocks exit until tests pass | No stop hook verification | **Missing** |
376
+ | Retry mechanism | Ralph Wiggum pattern | None | **Missing** |
377
+
378
+ ### CLAUDE.md & Knowledge
379
+
380
+ | Aspect | Boris's Approach | Anvil Current | Gap |
381
+ |--------|------------------|---------------|-----|
382
+ | Shared CLAUDE.md | Team-updated, git-tracked | Project `.claude/CLAUDE.md` | **Aligned** |
383
+ | Auto-extraction | GitHub Action on PRs | `/insights` generates patches | **Manual application** |
384
+ | Pattern capture | PR-triggered | Session-based via `/retro` | Different trigger |
385
+
386
+ ### Slash Commands
387
+
388
+ | Aspect | Boris's Approach | Anvil Current | Gap |
389
+ |--------|------------------|---------------|-----|
390
+ | Workflow commands | `/commit-push-pr`, `/test-and-commit` | 23 workflow commands | Missing **atomic actions** |
391
+ | Inner loops | Codified for daily tasks | Comprehensive but workflow-oriented | Could add action commands |
392
+
393
+ ### Subagents
394
+
395
+ | Aspect | Boris's Approach | Anvil Current | Gap |
396
+ |--------|------------------|---------------|-----|
397
+ | code-simplifier | Reduces complexity | None | **Missing** |
398
+ | verify-app | Post-completion verification | None | **Missing** |
399
+ | Parallel research | Up to 500 subagents | Task tool with various agents | Could add Ralph-style limits |
400
+
401
+ ### Hooks
402
+
403
+ | Aspect | Boris's Approach | Anvil Current | Gap |
404
+ |--------|------------------|---------------|-----|
405
+ | PostToolUse formatting | prettier/eslint on Edit/Write | None | **Missing** |
406
+ | Stop verification | Tests must pass to exit | None | **Missing** |
407
+ | Session tracking | StatusLine | StatusLine hook | **Aligned** |
408
+
409
+ ### Ralph Wiggum Integration
410
+
411
+ | Aspect | Ralph Technique | Anvil Current | Gap |
412
+ |--------|-----------------|---------------|-----|
413
+ | Loop mechanism | Stop hook re-feeds prompt | No loop mechanism | **Missing** |
414
+ | One item per loop | Single task focus | Multi-task capable | Philosophy difference |
415
+ | fix_plan.md | Living TODO list | TodoWrite tool | Different format |
416
+ | AGENT.md | Self-improvement allowed | CLAUDE.md static | Could enable |
417
+ | Completion promise | `<promise>COMPLETE</promise>` | None | **Missing** |
418
+
419
+ ---
420
+
421
+ ## Part 4: Implementation Recommendations
422
+
423
+ ### Priority 1: Verification Feedback Loop (HIGH IMPACT)
424
+
425
+ **Why**: Boris says this is "probably the most important thing" that "2-3x the quality."
426
+
427
+ **Implementation**:
428
+
429
+ 1. **New `/verify` Command**
430
+ ```markdown
431
+ # /verify - Verification Feedback Loop
432
+
433
+ 1. Run test suite, lint, typecheck
434
+ 2. If failures:
435
+ - Parse error output structured: file, line, error type, message
436
+ - Return to Claude: "Fix these issues and run /verify again"
437
+ - Track iteration count
438
+ 3. If passing:
439
+ - Output: "✅ Verification passed"
440
+ - Save evidence to .claude/evidence/
441
+
442
+ Options:
443
+ --loop # Keep iterating until pass (max 10)
444
+ --strict # Block session end until pass
445
+ ```
446
+
447
+ 2. **Stop Hook Verification Gate**
448
+ ```bash
449
+ # global/hooks/stop_verification.sh
450
+ if [ "$ANVIL_VERIFY_ON_STOP" = "true" ]; then
451
+ npm test && npm run lint && npm run typecheck
452
+ if [ $? -ne 0 ]; then
453
+ echo "⚠️ VERIFICATION FAILED - Fix issues before ending"
454
+ exit 1
455
+ fi
456
+ fi
457
+ ```
458
+
459
+ ### Priority 2: PostToolUse Formatting Hook
460
+
461
+ **Why**: Catches "last 10%" of formatting issues before CI.
462
+
463
+ **Implementation**:
464
+ ```json
465
+ {
466
+ "hooks": {
467
+ "PostToolUse": [{
468
+ "matcher": {"tool_name": "Edit|Write"},
469
+ "hooks": [{
470
+ "type": "command",
471
+ "command": "prettier --write \"$CLAUDE_FILE_PATH\" 2>/dev/null; eslint --fix \"$CLAUDE_FILE_PATH\" 2>/dev/null || true"
472
+ }]
473
+ }]
474
+ }
475
+ }
476
+ ```
477
+
478
+ ### Priority 3: Ralph Wiggum Mode
479
+
480
+ **Why**: Enables hands-off, long-running execution for well-defined tasks.
481
+
482
+ **Implementation**:
483
+
484
+ 1. **New `/ralph` Command (or Plugin)**
485
+ ```markdown
486
+ # /ralph - Start Ralph Wiggum Loop
487
+
488
+ /ralph "<prompt>" --max-iterations 50 --completion-promise "COMPLETE"
489
+
490
+ Behavior:
491
+ 1. Sets up stop hook to intercept exit
492
+ 2. Re-feeds prompt until completion promise or max iterations
493
+ 3. Tracks iteration count
494
+ 4. On completion: reports summary
495
+ ```
496
+
497
+ 2. **Ralph-Compatible File Structure**
498
+ ```
499
+ .claude/
500
+ ├── ralph/
501
+ │ ├── PROMPT.md # Current task prompt
502
+ │ ├── AGENT.md # Self-improvement file
503
+ │ └── fix_plan.md # Living TODO list
504
+ ```
505
+
506
+ 3. **Stop Hook for Ralph**
507
+ ```bash
508
+ # Check for completion promise in last output
509
+ if grep -q "<promise>COMPLETE</promise>" /tmp/claude_last_output; then
510
+ exit 0 # Allow exit
511
+ fi
512
+
513
+ if [ $RALPH_ITERATIONS -lt $RALPH_MAX_ITERATIONS ]; then
514
+ # Re-feed prompt
515
+ cat .claude/ralph/PROMPT.md
516
+ exit 1 # Block exit, continue loop
517
+ fi
518
+ ```
519
+
520
+ ### Priority 4: Atomic Action Commands
521
+
522
+ **Why**: Boris has these for daily inner loops.
523
+
524
+ **Implementation**:
525
+ - `/commit-push-pr` - Stage, commit, push, create PR
526
+ - `/test-and-commit` - Run tests, only commit if passing
527
+ - `/format` - Run formatters on changed files
528
+
529
+ ### Priority 5: Verification Subagent
530
+
531
+ **Why**: Boris uses `verify-app` for post-completion checks.
532
+
533
+ **Implementation**:
534
+ ```yaml
535
+ # .claude/agents/verify-app.yaml
536
+ name: verify-app
537
+ description: Post-implementation verification agent
538
+ tools: [Bash, Read, Grep, Glob]
539
+ system_prompt: |
540
+ You are a verification specialist. Your job is to:
541
+ 1. Run the test suite and report results
542
+ 2. Run lint and typecheck
543
+ 3. Verify no regressions
544
+ 4. Report pass/fail with specific errors
545
+
546
+ You do NOT fix issues - you only verify and report.
547
+ ```
548
+
549
+ ### Priority 6: Automated CLAUDE.md Updates
550
+
551
+ **Why**: Boris's team has GitHub Action for this.
552
+
553
+ **Implementation**:
554
+ ```yaml
555
+ # .github/workflows/claude-knowledge.yml
556
+ name: Extract Claude Knowledge
557
+ on:
558
+ pull_request:
559
+ types: [closed]
560
+
561
+ jobs:
562
+ extract:
563
+ if: github.event.pull_request.merged == true
564
+ runs-on: ubuntu-latest
565
+ steps:
566
+ - uses: actions/checkout@v4
567
+ - name: Extract patterns
568
+ run: |
569
+ # Analyze PR diff for patterns
570
+ # Update CLAUDE.md with findings
571
+ # Commit if changes made
572
+ ```
573
+
574
+ ---
575
+
576
+ ## Summary: Key Takeaways
577
+
578
+ ### From Boris Cherny
579
+
580
+ 1. **Verification is #1** - Build feedback loops into every workflow
581
+ 2. **Plan before code** - Steps 1-2 are crucial; Claude jumps to coding otherwise
582
+ 3. **Opus for fewer iterations** - Stronger model = less steering = faster results
583
+ 4. **Hooks for automation** - PostToolUse formatting catches the last 10%
584
+ 5. **Shared knowledge compounds** - Team CLAUDE.md updated multiple times weekly
585
+
586
+ ### From Ralph Wiggum
587
+
588
+ 1. **One item per loop** - Focus prevents derailing
589
+ 2. **Subagents for context management** - Don't pollute primary context
590
+ 3. **Backpressure is essential** - Tests/types/linters reject invalid code
591
+ 4. **No placeholders** - Full implementations or nothing
592
+ 5. **Self-learning** - Allow Ralph to update AGENT.md
593
+ 6. **Commit on success** - Tests pass → commit → push → tag
594
+ 7. **Loop back is everything** - Design ways for self-evaluation
595
+
596
+ ### Integration Philosophy
597
+
598
+ The key insight is that **verification feedback loops** are the highest-leverage improvement. Everything else is optimization, but verification is transformational.
599
+
600
+ Anvil already has strong workflow commands (`/explore`, `/spec`, `/plan`). What's missing is:
601
+ 1. **Automated iteration on failure** (Ralph Wiggum pattern)
602
+ 2. **Stop hooks that gate on verification** (Boris's approach)
603
+ 3. **PostToolUse formatting** (catch last 10%)
604
+ 4. **Atomic action commands** (inner loop efficiency)
605
+
606
+ ---
607
+
608
+ *This research document should be used as the foundation for SPEC-ANV-verification-loops.md and related implementation work.*