anvil-dev-framework 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +719 -0
- package/VERSION +1 -0
- package/docs/ANVIL-REPO-IMPLEMENTATION-PLAN.md +441 -0
- package/docs/FIRST-SKILL-TUTORIAL.md +408 -0
- package/docs/INSTALLATION-RETRO-NOTES.md +458 -0
- package/docs/INSTALLATION.md +984 -0
- package/docs/anvil-hud.md +469 -0
- package/docs/anvil-init.md +255 -0
- package/docs/anvil-state.md +210 -0
- package/docs/boris-cherny-ralph-wiggum-insights.md +608 -0
- package/docs/command-reference.md +2022 -0
- package/docs/hooks-tts.md +368 -0
- package/docs/implementation-guide.md +810 -0
- package/docs/linear-github-integration.md +247 -0
- package/docs/local-issues.md +677 -0
- package/docs/patterns/README.md +419 -0
- package/docs/planning-responsibilities.md +139 -0
- package/docs/session-workflow.md +573 -0
- package/docs/simplification-plan-template.md +297 -0
- package/docs/simplification-principles.md +129 -0
- package/docs/specifications/CCS-RALPH-INTEGRATION-DESIGN.md +633 -0
- package/docs/specifications/CCS-RESEARCH-REPORT.md +169 -0
- package/docs/specifications/PLAN-ANV-verification-ralph-wiggum.md +403 -0
- package/docs/specifications/PLAN-parallel-tracks-anvil-memory-ccs.md +494 -0
- package/docs/specifications/SPEC-ANV-VRW/component-01-verify.md +208 -0
- package/docs/specifications/SPEC-ANV-VRW/component-02-stop-gate.md +226 -0
- package/docs/specifications/SPEC-ANV-VRW/component-03-posttooluse.md +209 -0
- package/docs/specifications/SPEC-ANV-VRW/component-04-ralph-wiggum.md +604 -0
- package/docs/specifications/SPEC-ANV-VRW/component-05-atomic-actions.md +311 -0
- package/docs/specifications/SPEC-ANV-VRW/component-06-verify-subagent.md +264 -0
- package/docs/specifications/SPEC-ANV-VRW/component-07-claude-md.md +363 -0
- package/docs/specifications/SPEC-ANV-VRW/index.md +182 -0
- package/docs/specifications/SPEC-ANV-anvil-memory.md +573 -0
- package/docs/specifications/SPEC-ANV-context-checkpoints.md +781 -0
- package/docs/specifications/SPEC-ANV-verification-ralph-wiggum.md +789 -0
- package/docs/sync.md +122 -0
- package/global/CLAUDE.md +140 -0
- package/global/agents/verify-app.md +164 -0
- package/global/commands/anvil-settings.md +527 -0
- package/global/commands/anvil-sync.md +121 -0
- package/global/commands/change.md +197 -0
- package/global/commands/clarify.md +252 -0
- package/global/commands/cleanup.md +292 -0
- package/global/commands/commit-push-pr.md +207 -0
- package/global/commands/decay-review.md +127 -0
- package/global/commands/discover.md +158 -0
- package/global/commands/doc-coverage.md +122 -0
- package/global/commands/evidence.md +307 -0
- package/global/commands/explore.md +121 -0
- package/global/commands/force-exit.md +135 -0
- package/global/commands/handoff.md +191 -0
- package/global/commands/healthcheck.md +302 -0
- package/global/commands/hud.md +84 -0
- package/global/commands/insights.md +319 -0
- package/global/commands/linear-setup.md +184 -0
- package/global/commands/lint-fix.md +198 -0
- package/global/commands/orient.md +510 -0
- package/global/commands/plan.md +228 -0
- package/global/commands/ralph.md +346 -0
- package/global/commands/ready.md +182 -0
- package/global/commands/release.md +305 -0
- package/global/commands/retro.md +96 -0
- package/global/commands/shard.md +166 -0
- package/global/commands/spec.md +227 -0
- package/global/commands/sprint.md +184 -0
- package/global/commands/tasks.md +228 -0
- package/global/commands/test-and-commit.md +151 -0
- package/global/commands/validate.md +132 -0
- package/global/commands/verify.md +251 -0
- package/global/commands/weekly-review.md +156 -0
- package/global/hooks/__pycache__/ralph_context_monitor.cpython-314.pyc +0 -0
- package/global/hooks/__pycache__/statusline_agent_sync.cpython-314.pyc +0 -0
- package/global/hooks/anvil_memory_observe.ts +322 -0
- package/global/hooks/anvil_memory_session.ts +166 -0
- package/global/hooks/anvil_memory_stop.ts +187 -0
- package/global/hooks/parse_transcript.py +116 -0
- package/global/hooks/post_merge_cleanup.sh +132 -0
- package/global/hooks/post_tool_format.sh +215 -0
- package/global/hooks/ralph_context_monitor.py +240 -0
- package/global/hooks/ralph_stop.sh +502 -0
- package/global/hooks/statusline.sh +1110 -0
- package/global/hooks/statusline_agent_sync.py +224 -0
- package/global/hooks/stop_gate.sh +250 -0
- package/global/lib/.claude/anvil-state.json +21 -0
- package/global/lib/__pycache__/agent_registry.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/claim_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/coderabbit_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/config_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/coordination_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/doc_coverage_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/gate_logger.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/github_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/hygiene_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/issue_models.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/issue_provider.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/linear_data_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/linear_provider.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/local_provider.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/quality_service.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/ralph_state.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/state_manager.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/transcript_parser.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/verification_runner.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/verify_iteration.cpython-314.pyc +0 -0
- package/global/lib/__pycache__/verify_subagent.cpython-314.pyc +0 -0
- package/global/lib/agent_registry.py +995 -0
- package/global/lib/anvil-state.sh +435 -0
- package/global/lib/claim_service.py +515 -0
- package/global/lib/coderabbit_service.py +314 -0
- package/global/lib/config_service.py +423 -0
- package/global/lib/coordination_service.py +331 -0
- package/global/lib/doc_coverage_service.py +1305 -0
- package/global/lib/gate_logger.py +316 -0
- package/global/lib/github_service.py +310 -0
- package/global/lib/handoff_generator.py +775 -0
- package/global/lib/hygiene_service.py +712 -0
- package/global/lib/issue_models.py +257 -0
- package/global/lib/issue_provider.py +339 -0
- package/global/lib/linear_data_service.py +210 -0
- package/global/lib/linear_provider.py +987 -0
- package/global/lib/linear_provider.py.backup +671 -0
- package/global/lib/local_provider.py +486 -0
- package/global/lib/orient_fast.py +457 -0
- package/global/lib/quality_service.py +470 -0
- package/global/lib/ralph_prompt_generator.py +563 -0
- package/global/lib/ralph_state.py +1202 -0
- package/global/lib/state_manager.py +417 -0
- package/global/lib/transcript_parser.py +597 -0
- package/global/lib/verification_runner.py +557 -0
- package/global/lib/verify_iteration.py +490 -0
- package/global/lib/verify_subagent.py +250 -0
- package/global/skills/README.md +155 -0
- package/global/skills/quality-gates/SKILL.md +252 -0
- package/global/skills/skill-template/SKILL.md +109 -0
- package/global/skills/testing-strategies/SKILL.md +337 -0
- package/global/templates/CHANGE-template.md +105 -0
- package/global/templates/HANDOFF-template.md +63 -0
- package/global/templates/PLAN-template.md +111 -0
- package/global/templates/SPEC-template.md +93 -0
- package/global/templates/ralph/PROMPT.md.template +89 -0
- package/global/templates/ralph/fix_plan.md.template +31 -0
- package/global/templates/ralph/progress.txt.template +23 -0
- package/global/tests/__pycache__/test_doc_coverage.cpython-314.pyc +0 -0
- package/global/tests/test_doc_coverage.py +520 -0
- package/global/tests/test_issue_models.py +299 -0
- package/global/tests/test_local_provider.py +323 -0
- package/global/tools/README.md +178 -0
- package/global/tools/__pycache__/anvil-hud.cpython-314.pyc +0 -0
- package/global/tools/anvil-hud.py +3622 -0
- package/global/tools/anvil-hud.py.bak +3318 -0
- package/global/tools/anvil-issue.py +432 -0
- package/global/tools/anvil-memory/CLAUDE.md +49 -0
- package/global/tools/anvil-memory/README.md +42 -0
- package/global/tools/anvil-memory/bun.lock +25 -0
- package/global/tools/anvil-memory/bunfig.toml +9 -0
- package/global/tools/anvil-memory/package.json +23 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/context-monitor.test.ts +535 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/edge-cases.test.ts +645 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/fixtures.ts +363 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/index.ts +8 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/integration.test.ts +417 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/prompt-generator.test.ts +571 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/ralph-stop.test.ts +440 -0
- package/global/tools/anvil-memory/src/__tests__/ccs/test-utils.ts +252 -0
- package/global/tools/anvil-memory/src/__tests__/commands.test.ts +657 -0
- package/global/tools/anvil-memory/src/__tests__/db.test.ts +641 -0
- package/global/tools/anvil-memory/src/__tests__/hooks.test.ts +272 -0
- package/global/tools/anvil-memory/src/__tests__/performance.test.ts +427 -0
- package/global/tools/anvil-memory/src/__tests__/test-utils.ts +113 -0
- package/global/tools/anvil-memory/src/commands/checkpoint.ts +197 -0
- package/global/tools/anvil-memory/src/commands/get.ts +115 -0
- package/global/tools/anvil-memory/src/commands/init.ts +94 -0
- package/global/tools/anvil-memory/src/commands/observe.ts +163 -0
- package/global/tools/anvil-memory/src/commands/search.ts +112 -0
- package/global/tools/anvil-memory/src/db.ts +638 -0
- package/global/tools/anvil-memory/src/index.ts +205 -0
- package/global/tools/anvil-memory/src/types.ts +122 -0
- package/global/tools/anvil-memory/tsconfig.json +29 -0
- package/global/tools/ralph-loop.sh +359 -0
- package/package.json +45 -0
- package/scripts/anvil +822 -0
- package/scripts/extract_patterns.py +222 -0
- package/scripts/init-project.sh +541 -0
- package/scripts/install.sh +229 -0
- package/scripts/postinstall.js +41 -0
- package/scripts/rollback.sh +188 -0
- package/scripts/sync.sh +623 -0
- package/scripts/test-statusline.sh +248 -0
- package/scripts/update_claude_md.py +224 -0
- package/scripts/verify.sh +255 -0
|
@@ -0,0 +1,608 @@
|
|
|
1
|
+
# Boris Cherny & Ralph Wiggum: Claude Code Best Practices Research
|
|
2
|
+
|
|
3
|
+
> Comprehensive analysis of Boris Cherny's (Claude Code creator) workflow and Geoffrey Huntley's Ralph Wiggum technique for integration into the Anvil framework.
|
|
4
|
+
|
|
5
|
+
**Research Date**: 2026-01-04
|
|
6
|
+
**Sources**:
|
|
7
|
+
- [Boris Cherny's Twitter Thread](https://twitter-thread.com/t/2007179832300581177)
|
|
8
|
+
- [DEV Community Breakdown](https://dev.to/sivarampg/how-the-creator-of-claude-code-uses-claude-code-a-complete-breakdown-4f07)
|
|
9
|
+
- [Anthropic: Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
|
|
10
|
+
- [nibzard: Boris Cherny's Guide](https://www.nibzard.com/claude-code/)
|
|
11
|
+
- [Geoffrey Huntley: Ralph Wiggum as a Software Engineer](https://ghuntley.com/ralph-wiggum/)
|
|
12
|
+
- [Ralph Wiggum Plugin](https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum)
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Table of Contents
|
|
17
|
+
|
|
18
|
+
- [Part 1: Boris Cherny's Claude Code Setup](#part-1-boris-chernys-claude-code-setup)
|
|
19
|
+
- [Part 2: Ralph Wiggum Technique](#part-2-ralph-wiggum-technique)
|
|
20
|
+
- [Part 3: Gap Analysis - Anvil vs Best Practices](#part-3-gap-analysis---anvil-vs-best-practices)
|
|
21
|
+
- [Part 4: Implementation Recommendations](#part-4-implementation-recommendations)
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Part 1: Boris Cherny's Claude Code Setup
|
|
26
|
+
|
|
27
|
+
Boris Cherny created Claude Code as a side project in September 2024. His setup is "surprisingly vanilla" - he says Claude Code works great out of the box.
|
|
28
|
+
|
|
29
|
+
### Core Principles
|
|
30
|
+
|
|
31
|
+
| Principle | Details |
|
|
32
|
+
|-----------|---------|
|
|
33
|
+
| **Verification is #1** | "Give Claude a way to verify its work. It will 2-3x the quality of the final result" |
|
|
34
|
+
| **Opus over Sonnet** | Fewer iterations with stronger model beats rapid weak iterations |
|
|
35
|
+
| **Plan Mode First** | "Invest time upfront in planning, and save way more time in execution" |
|
|
36
|
+
| **Inner Loop Commands** | Every repeated workflow becomes a slash command |
|
|
37
|
+
|
|
38
|
+
### Parallel Session Architecture
|
|
39
|
+
|
|
40
|
+
Boris runs **15+ concurrent Claude sessions**:
|
|
41
|
+
- 5 terminal instances with numbered tabs and system notifications
|
|
42
|
+
- 5-10 web sessions on claude.ai/code
|
|
43
|
+
- iOS app sessions started from phone, monitored on desktop
|
|
44
|
+
- Cross-platform handoff using `&` (background to web) and `--teleport <session-id>`
|
|
45
|
+
|
|
46
|
+
### Model Selection
|
|
47
|
+
|
|
48
|
+
> "Sonnet (fast) → 5 iterations → Total time: 5 minutes
|
|
49
|
+
> Opus (slow) → 1 iteration → Total time: 2 minutes"
|
|
50
|
+
|
|
51
|
+
**Rationale**: Fewer iterations with stronger model beats rapid weak iterations. Opus provides better autonomous tool selection without constant steering.
|
|
52
|
+
|
|
53
|
+
### Results
|
|
54
|
+
|
|
55
|
+
In thirty days, Boris landed **259 PRs** — 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5.
|
|
56
|
+
|
|
57
|
+
### CLAUDE.md Structure
|
|
58
|
+
|
|
59
|
+
Boris maintains a team-shared CLAUDE.md file:
|
|
60
|
+
- **Checked into git**
|
|
61
|
+
- **Updated multiple times weekly** by team members
|
|
62
|
+
- When Claude makes errors, team adds entries so "Claude knows not to do it next time"
|
|
63
|
+
- Captures tribal knowledge not in official docs
|
|
64
|
+
|
|
65
|
+
**File Hierarchy**:
|
|
66
|
+
```
|
|
67
|
+
/<enterprise root>/CLAUDE.md → Organizational policies
|
|
68
|
+
~/.claude/CLAUDE.md → User-global context
|
|
69
|
+
<project>/CLAUDE.md → Project-specific (checked into git)
|
|
70
|
+
<project>/CLAUDE.local.md → Local overrides (not in git)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
**Best Practice**: Keep to 100-200 lines maximum. If larger, move details into per-folder CLAUDE.md files.
|
|
74
|
+
|
|
75
|
+
### Automated Knowledge Extraction
|
|
76
|
+
|
|
77
|
+
Boris's team has a GitHub Action triggered by `@.claude` PR comments that:
|
|
78
|
+
1. Analyzes pull requests
|
|
79
|
+
2. Extracts patterns, anti-patterns, conventions
|
|
80
|
+
3. Auto-commits CLAUDE.md updates
|
|
81
|
+
4. Implements "compounding engineering" where knowledge accumulates
|
|
82
|
+
|
|
83
|
+
### Slash Commands
|
|
84
|
+
|
|
85
|
+
Pre-configured workflows stored in `.claude/commands/`:
|
|
86
|
+
|
|
87
|
+
**`/commit-push-pr`**:
|
|
88
|
+
- Inline bash executes `git status`, `git branch`, `git log -5`
|
|
89
|
+
- Stages and commits with conventional format
|
|
90
|
+
- Creates PR with detailed description
|
|
91
|
+
- Requests appropriate reviews
|
|
92
|
+
|
|
93
|
+
**`/test-and-commit`**:
|
|
94
|
+
- Runs `npm test`, `npm run typecheck`, `npm run lint`
|
|
95
|
+
- Only commits if all checks pass
|
|
96
|
+
- Reports failures without committing
|
|
97
|
+
|
|
98
|
+
### Subagents
|
|
99
|
+
|
|
100
|
+
Domain-specific AI personalities with:
|
|
101
|
+
- Separate context windows (isolated from main)
|
|
102
|
+
- Custom system prompts
|
|
103
|
+
- Limited tool access
|
|
104
|
+
|
|
105
|
+
**Examples**:
|
|
106
|
+
- `code-simplifier` - Reduces complexity, extracts repeated logic
|
|
107
|
+
- `verify-app` - Automated verification post-completion
|
|
108
|
+
|
|
109
|
+
### Hooks
|
|
110
|
+
|
|
111
|
+
**PostToolUse Hook for Formatting**:
|
|
112
|
+
```json
|
|
113
|
+
{
|
|
114
|
+
"hooks": {
|
|
115
|
+
"PostToolUse": [
|
|
116
|
+
{
|
|
117
|
+
"matcher": "Edit|Write",
|
|
118
|
+
"hooks": [
|
|
119
|
+
{
|
|
120
|
+
"type": "command",
|
|
121
|
+
"command": "npx prettier --write \"$CLAUDE_FILE_PATH\""
|
|
122
|
+
}
|
|
123
|
+
]
|
|
124
|
+
}
|
|
125
|
+
]
|
|
126
|
+
}
|
|
127
|
+
}
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
"Last 10% to avoid formatting errors" - catches issues before CI.
|
|
131
|
+
|
|
132
|
+
### Stop Hook for Verification Gate
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
#!/bin/bash
|
|
136
|
+
npm test
|
|
137
|
+
if [ $? -eq 0 ]; then
|
|
138
|
+
echo "✓ All tests passing"
|
|
139
|
+
exit 0
|
|
140
|
+
else
|
|
141
|
+
echo "✗ Tests failing - continue working"
|
|
142
|
+
exit 1 # Block exit, force continuation
|
|
143
|
+
fi
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Non-zero exit blocks Claude from stopping, forcing iteration until tests pass.
|
|
147
|
+
|
|
148
|
+
### Verification Methods
|
|
149
|
+
|
|
150
|
+
Boris emphasizes verification as "probably the most important thing":
|
|
151
|
+
|
|
152
|
+
| Type | Implementation |
|
|
153
|
+
|------|----------------|
|
|
154
|
+
| **Command-line** | `npm test`, `npm run typecheck`, `npm run lint` |
|
|
155
|
+
| **Test Suite** | Unit, integration, e2e tests with >80% coverage |
|
|
156
|
+
| **Browser** | Chrome Extension opens localhost, tests forms, validates UX |
|
|
157
|
+
| **API** | curl tests for creation, validation, auth flows |
|
|
158
|
+
| **Pre-commit** | Hooks run lint, typecheck, test before each commit |
|
|
159
|
+
|
|
160
|
+
### Workflow Patterns
|
|
161
|
+
|
|
162
|
+
**Explore, Plan, Code, Commit**:
|
|
163
|
+
1. Request Claude read relevant files without writing code initially
|
|
164
|
+
2. Ask Claude to create a detailed plan using thinking mode
|
|
165
|
+
3. Have Claude implement solutions while verifying reasonableness
|
|
166
|
+
4. Request commits and documentation updates
|
|
167
|
+
|
|
168
|
+
> "Steps #1-#2 are crucial—without them, Claude tends to jump straight to coding."
|
|
169
|
+
|
|
170
|
+
**Test-Driven Development (TDD)**:
|
|
171
|
+
1. Request test cases based on input/output pairs
|
|
172
|
+
2. Confirm tests fail initially
|
|
173
|
+
3. Commit tests before implementation
|
|
174
|
+
4. Have Claude code to pass tests with multiple iterations
|
|
175
|
+
5. Verify implementation doesn't overfit
|
|
176
|
+
6. Commit final code
|
|
177
|
+
|
|
178
|
+
**Visual Iteration**:
|
|
179
|
+
1. Provide browser screenshots via Puppeteer MCP or drag-and-drop
|
|
180
|
+
2. Supply design mocks as reference points
|
|
181
|
+
3. Request Claude iterate 2-3 times for refinement
|
|
182
|
+
4. Commit when satisfied
|
|
183
|
+
|
|
184
|
+
### Permission Management
|
|
185
|
+
|
|
186
|
+
Instead of `--dangerously-skip-permissions`, use `/permissions` to pre-allow safe commands:
|
|
187
|
+
- git operations (status, diff, commit, push)
|
|
188
|
+
- npm scripts (test, build, lint)
|
|
189
|
+
- File operations (cat, ls, grep)
|
|
190
|
+
|
|
191
|
+
Stored in `.claude/settings.json` for team access.
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
## Part 2: Ralph Wiggum Technique
|
|
196
|
+
|
|
197
|
+
Geoffrey Huntley's Ralph Wiggum is a technique for long-running, unattended Claude Code execution. Named after the Simpsons character, it embodies persistent iteration despite setbacks.
|
|
198
|
+
|
|
199
|
+
### Core Philosophy
|
|
200
|
+
|
|
201
|
+
> "The technique is deterministically bad in an undeterministic world."
|
|
202
|
+
|
|
203
|
+
Ralph is designed to handle the non-determinism of LLMs through iteration. It requires:
|
|
204
|
+
- Faith in eventual consistency
|
|
205
|
+
- Belief that most issues can be resolved through more loops
|
|
206
|
+
- Patience - "Ralph will test you"
|
|
207
|
+
|
|
208
|
+
### The Basic Loop
|
|
209
|
+
|
|
210
|
+
In its purest form, Ralph is a Bash loop:
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
while :; do cat PROMPT.md | npx --yes @sourcegraph/amp ; done
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
Ralph can be done with any tool that doesn't cap tool calls and usage.
|
|
217
|
+
|
|
218
|
+
### Key Insight: One Item Per Loop
|
|
219
|
+
|
|
220
|
+
> "To get good outcomes with Ralph, you need to ask Ralph to do one thing per loop. **Only one thing.**"
|
|
221
|
+
|
|
222
|
+
You may relax this as the project progresses, but if Ralph goes off the rails, narrow it back to just one item.
|
|
223
|
+
|
|
224
|
+
### Context Window Management
|
|
225
|
+
|
|
226
|
+
> "The name of the game is that you only have approximately 170k of context window to work with. So it's essential to use as little of it as possible. The more you use the context window, the worse the outcomes you'll get."
|
|
227
|
+
|
|
228
|
+
**Solution**: Don't allocate to primary context window. Spawn subagents instead.
|
|
229
|
+
|
|
230
|
+
- Primary context window = **scheduler**
|
|
231
|
+
- Subagents = **workers** doing expensive operations
|
|
232
|
+
- Use up to **500 parallel subagents** for file searching/research
|
|
233
|
+
- Use only **1 subagent** for build/tests (avoid backpressure)
|
|
234
|
+
|
|
235
|
+
### The Three Phases
|
|
236
|
+
|
|
237
|
+
#### Phase 1: Requirements
|
|
238
|
+
|
|
239
|
+
- Have a long conversation about requirements
|
|
240
|
+
- Don't ask the agent to implement - just discuss what you're about to implement
|
|
241
|
+
- Once agent understands the task, write specifications out
|
|
242
|
+
- One file per topic in the specifications folder
|
|
243
|
+
|
|
244
|
+
#### Phase 2: TODO (Planning)
|
|
245
|
+
|
|
246
|
+
- Study specs to learn about requirements
|
|
247
|
+
- Manually allocate context window with discussion about the activity
|
|
248
|
+
- Goal: shape the context window, not do implementation
|
|
249
|
+
- Target the job-to-be-done with up to 100 subagents
|
|
250
|
+
- Write out an `IMPLEMENTATION_PLAN.md`
|
|
251
|
+
|
|
252
|
+
#### Phase 3: Implementation (Incremental Loop)
|
|
253
|
+
|
|
254
|
+
- Study specs and IMPLEMENTATION_PLAN.md
|
|
255
|
+
- Pick most important item to address
|
|
256
|
+
- Update implementation plan once tests pass
|
|
257
|
+
- Use as many subagents as needed for research
|
|
258
|
+
- Only use **one subagent** when doing tests/builds
|
|
259
|
+
- Commit on success
|
|
260
|
+
|
|
261
|
+
### Key Files
|
|
262
|
+
|
|
263
|
+
| File | Purpose |
|
|
264
|
+
|------|---------|
|
|
265
|
+
| `PROMPT.md` | Main prompt fed each loop |
|
|
266
|
+
| `AGENT.md` | How Ralph should compile/run (self-improvement allowed) |
|
|
267
|
+
| `fix_plan.md` | Living TODO list sorted by priority |
|
|
268
|
+
| `specs/` | Specifications folder (one file per topic) |
|
|
269
|
+
|
|
270
|
+
### Deterministic Stack Allocation
|
|
271
|
+
|
|
272
|
+
> "Deterministically allocate the stack the same way every loop."
|
|
273
|
+
|
|
274
|
+
Every loop should:
|
|
275
|
+
1. Study specs/* to learn about specifications
|
|
276
|
+
2. The source code is in src/
|
|
277
|
+
3. Study fix_plan.md
|
|
278
|
+
4. Then do the task
|
|
279
|
+
|
|
280
|
+
This ensures consistent context regardless of iteration number.
|
|
281
|
+
|
|
282
|
+
### Backpressure
|
|
283
|
+
|
|
284
|
+
Backpressure = mechanisms that reject invalid code:
|
|
285
|
+
- Type systems
|
|
286
|
+
- Test suites
|
|
287
|
+
- Linters
|
|
288
|
+
- Security scanners
|
|
289
|
+
- Static analyzers
|
|
290
|
+
|
|
291
|
+
> "After implementing functionality or resolving problems, run the tests for that unit of code that was improved."
|
|
292
|
+
|
|
293
|
+
For dynamically typed languages, strongly recommend wiring in static analyzers (e.g., Pyrefly for Python).
|
|
294
|
+
|
|
295
|
+
### No Cheating (Anti-Placeholder Pattern)
|
|
296
|
+
|
|
297
|
+
Claude has inherent bias toward minimal/placeholder implementations. Counter this:
|
|
298
|
+
|
|
299
|
+
> "DO NOT IMPLEMENT PLACEHOLDER OR SIMPLE IMPLEMENTATIONS. WE WANT FULL IMPLEMENTATIONS. **DO IT OR I WILL YELL AT YOU**"
|
|
300
|
+
|
|
301
|
+
If Ralph ignores this, you can always run more Ralphs to identify placeholders and transform them into a todo list.
|
|
302
|
+
|
|
303
|
+
### Don't Assume It's Not Implemented
|
|
304
|
+
|
|
305
|
+
> "Before making changes search codebase (don't assume not implemented) using subagents. Think hard."
|
|
306
|
+
|
|
307
|
+
A common failure is when the LLM runs ripgrep and incorrectly concludes code hasn't been implemented. Erect a sign for Ralph: don't make assumptions.
|
|
308
|
+
|
|
309
|
+
### Self-Learning
|
|
310
|
+
|
|
311
|
+
> "When you learn something new about how to run the compiler or examples make sure you update @AGENT.md using a subagent but keep it brief."
|
|
312
|
+
|
|
313
|
+
Allow Ralph to take himself to "university" - permit self-improvement of the AGENT.md file.
|
|
314
|
+
|
|
315
|
+
### Commit on Success
|
|
316
|
+
|
|
317
|
+
> "When the tests pass update the fix_plan.md, then add changed code and @fix_plan.md with 'git add -A' via bash then do a 'git commit' with a message that describes the changes. After the commit do a 'git push'."
|
|
318
|
+
|
|
319
|
+
> "As soon as there are no build or test errors create a git tag. Start at 0.0.0 and increment patch by 1."
|
|
320
|
+
|
|
321
|
+
### Loop Back is Everything
|
|
322
|
+
|
|
323
|
+
> "You want to program in ways where Ralph can loop himself back into the LLM for evaluation. This is incredibly important."
|
|
324
|
+
|
|
325
|
+
Examples:
|
|
326
|
+
- Add additional logging
|
|
327
|
+
- Compile the application and look at LLVM IR representation
|
|
328
|
+
- Run the app and check output
|
|
329
|
+
|
|
330
|
+
### The TODO List
|
|
331
|
+
|
|
332
|
+
```
|
|
333
|
+
study specs/* to learn about the compiler specifications and fix_plan.md to understand plan so far.
|
|
334
|
+
|
|
335
|
+
The source code of the compiler is in src/*
|
|
336
|
+
|
|
337
|
+
First task is to study @fix_plan.md (it may be incorrect) and is to use up to 500 subagents to study existing source code in src/ and compare it against the compiler specifications. From that create/update a @fix_plan.md which is a bullet point list sorted in priority of the items which have yet to be implemented.
|
|
338
|
+
|
|
339
|
+
Consider searching for TODO, minimal implementations and placeholders. Study @fix_plan.md to determine starting point for research and keep it up to date with items considered complete/incomplete.
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
### Results
|
|
343
|
+
|
|
344
|
+
- **Y Combinator hackathon**: Generated 6 repositories overnight
|
|
345
|
+
- **Real contract**: $50k USD contract delivered, tested, reviewed for **$297** in API costs
|
|
346
|
+
- **CURSED**: Built an entire programming language over 3 months using this technique
|
|
347
|
+
|
|
348
|
+
### Three States of Ralph
|
|
349
|
+
|
|
350
|
+
> "Ralph has three states. Under baked, baked, or baked with unspecified latent behaviours (which are sometimes quite nice!)"
|
|
351
|
+
|
|
352
|
+
### When to Use Ralph
|
|
353
|
+
|
|
354
|
+
**Good for**:
|
|
355
|
+
- Greenfield projects
|
|
356
|
+
- Well-defined tasks with clear success criteria
|
|
357
|
+
- Tasks requiring iteration (getting tests to pass)
|
|
358
|
+
- Tasks with automatic verification
|
|
359
|
+
|
|
360
|
+
**Not good for**:
|
|
361
|
+
- Existing codebases (Ralph works best for bootstrapping)
|
|
362
|
+
- Tasks requiring human judgment
|
|
363
|
+
- One-shot operations
|
|
364
|
+
- Tasks with unclear success criteria
|
|
365
|
+
|
|
366
|
+
---
|
|
367
|
+
|
|
368
|
+
## Part 3: Gap Analysis - Anvil vs Best Practices
|
|
369
|
+
|
|
370
|
+
### Verification Loops
|
|
371
|
+
|
|
372
|
+
| Aspect | Boris's Approach | Anvil Current | Gap |
|
|
373
|
+
|--------|------------------|---------------|-----|
|
|
374
|
+
| Feedback loop | Chrome extension, test suites, API tests | `/validate`, `/evidence` capture | No **iteration on failure** |
|
|
375
|
+
| Stop hook gate | Blocks exit until tests pass | No stop hook verification | **Missing** |
|
|
376
|
+
| Retry mechanism | Ralph Wiggum pattern | None | **Missing** |
|
|
377
|
+
|
|
378
|
+
### CLAUDE.md & Knowledge
|
|
379
|
+
|
|
380
|
+
| Aspect | Boris's Approach | Anvil Current | Gap |
|
|
381
|
+
|--------|------------------|---------------|-----|
|
|
382
|
+
| Shared CLAUDE.md | Team-updated, git-tracked | Project `.claude/CLAUDE.md` | **Aligned** |
|
|
383
|
+
| Auto-extraction | GitHub Action on PRs | `/insights` generates patches | **Manual application** |
|
|
384
|
+
| Pattern capture | PR-triggered | Session-based via `/retro` | Different trigger |
|
|
385
|
+
|
|
386
|
+
### Slash Commands
|
|
387
|
+
|
|
388
|
+
| Aspect | Boris's Approach | Anvil Current | Gap |
|
|
389
|
+
|--------|------------------|---------------|-----|
|
|
390
|
+
| Workflow commands | `/commit-push-pr`, `/test-and-commit` | 23 workflow commands | Missing **atomic actions** |
|
|
391
|
+
| Inner loops | Codified for daily tasks | Comprehensive but workflow-oriented | Could add action commands |
|
|
392
|
+
|
|
393
|
+
### Subagents
|
|
394
|
+
|
|
395
|
+
| Aspect | Boris's Approach | Anvil Current | Gap |
|
|
396
|
+
|--------|------------------|---------------|-----|
|
|
397
|
+
| code-simplifier | Reduces complexity | None | **Missing** |
|
|
398
|
+
| verify-app | Post-completion verification | None | **Missing** |
|
|
399
|
+
| Parallel research | Up to 500 subagents | Task tool with various agents | Could add Ralph-style limits |
|
|
400
|
+
|
|
401
|
+
### Hooks
|
|
402
|
+
|
|
403
|
+
| Aspect | Boris's Approach | Anvil Current | Gap |
|
|
404
|
+
|--------|------------------|---------------|-----|
|
|
405
|
+
| PostToolUse formatting | prettier/eslint on Edit/Write | None | **Missing** |
|
|
406
|
+
| Stop verification | Tests must pass to exit | None | **Missing** |
|
|
407
|
+
| Session tracking | StatusLine | StatusLine hook | **Aligned** |
|
|
408
|
+
|
|
409
|
+
### Ralph Wiggum Integration
|
|
410
|
+
|
|
411
|
+
| Aspect | Ralph Technique | Anvil Current | Gap |
|
|
412
|
+
|--------|-----------------|---------------|-----|
|
|
413
|
+
| Loop mechanism | Stop hook re-feeds prompt | No loop mechanism | **Missing** |
|
|
414
|
+
| One item per loop | Single task focus | Multi-task capable | Philosophy difference |
|
|
415
|
+
| fix_plan.md | Living TODO list | TodoWrite tool | Different format |
|
|
416
|
+
| AGENT.md | Self-improvement allowed | CLAUDE.md static | Could enable |
|
|
417
|
+
| Completion promise | `<promise>COMPLETE</promise>` | None | **Missing** |
|
|
418
|
+
|
|
419
|
+
---
|
|
420
|
+
|
|
421
|
+
## Part 4: Implementation Recommendations
|
|
422
|
+
|
|
423
|
+
### Priority 1: Verification Feedback Loop (HIGH IMPACT)
|
|
424
|
+
|
|
425
|
+
**Why**: Boris says this is "probably the most important thing" that "2-3x the quality."
|
|
426
|
+
|
|
427
|
+
**Implementation**:
|
|
428
|
+
|
|
429
|
+
1. **New `/verify` Command**
|
|
430
|
+
```markdown
|
|
431
|
+
# /verify - Verification Feedback Loop
|
|
432
|
+
|
|
433
|
+
1. Run test suite, lint, typecheck
|
|
434
|
+
2. If failures:
|
|
435
|
+
- Parse error output structured: file, line, error type, message
|
|
436
|
+
- Return to Claude: "Fix these issues and run /verify again"
|
|
437
|
+
- Track iteration count
|
|
438
|
+
3. If passing:
|
|
439
|
+
- Output: "✅ Verification passed"
|
|
440
|
+
- Save evidence to .claude/evidence/
|
|
441
|
+
|
|
442
|
+
Options:
|
|
443
|
+
--loop # Keep iterating until pass (max 10)
|
|
444
|
+
--strict # Block session end until pass
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
2. **Stop Hook Verification Gate**
|
|
448
|
+
```bash
|
|
449
|
+
# global/hooks/stop_verification.sh
|
|
450
|
+
if [ "$ANVIL_VERIFY_ON_STOP" = "true" ]; then
|
|
451
|
+
npm test && npm run lint && npm run typecheck
|
|
452
|
+
if [ $? -ne 0 ]; then
|
|
453
|
+
echo "⚠️ VERIFICATION FAILED - Fix issues before ending"
|
|
454
|
+
exit 1
|
|
455
|
+
fi
|
|
456
|
+
fi
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
### Priority 2: PostToolUse Formatting Hook
|
|
460
|
+
|
|
461
|
+
**Why**: Catches "last 10%" of formatting issues before CI.
|
|
462
|
+
|
|
463
|
+
**Implementation**:
|
|
464
|
+
```json
|
|
465
|
+
{
|
|
466
|
+
"hooks": {
|
|
467
|
+
"PostToolUse": [{
|
|
468
|
+
"matcher": {"tool_name": "Edit|Write"},
|
|
469
|
+
"hooks": [{
|
|
470
|
+
"type": "command",
|
|
471
|
+
"command": "prettier --write \"$CLAUDE_FILE_PATH\" 2>/dev/null; eslint --fix \"$CLAUDE_FILE_PATH\" 2>/dev/null || true"
|
|
472
|
+
}]
|
|
473
|
+
}]
|
|
474
|
+
}
|
|
475
|
+
}
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### Priority 3: Ralph Wiggum Mode
|
|
479
|
+
|
|
480
|
+
**Why**: Enables hands-off, long-running execution for well-defined tasks.
|
|
481
|
+
|
|
482
|
+
**Implementation**:
|
|
483
|
+
|
|
484
|
+
1. **New `/ralph` Command (or Plugin)**
|
|
485
|
+
```markdown
|
|
486
|
+
# /ralph - Start Ralph Wiggum Loop
|
|
487
|
+
|
|
488
|
+
/ralph "<prompt>" --max-iterations 50 --completion-promise "COMPLETE"
|
|
489
|
+
|
|
490
|
+
Behavior:
|
|
491
|
+
1. Sets up stop hook to intercept exit
|
|
492
|
+
2. Re-feeds prompt until completion promise or max iterations
|
|
493
|
+
3. Tracks iteration count
|
|
494
|
+
4. On completion: reports summary
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
2. **Ralph-Compatible File Structure**
|
|
498
|
+
```
|
|
499
|
+
.claude/
|
|
500
|
+
├── ralph/
|
|
501
|
+
│ ├── PROMPT.md # Current task prompt
|
|
502
|
+
│ ├── AGENT.md # Self-improvement file
|
|
503
|
+
│ └── fix_plan.md # Living TODO list
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
3. **Stop Hook for Ralph**
|
|
507
|
+
```bash
|
|
508
|
+
# Check for completion promise in last output
|
|
509
|
+
if grep -q "<promise>COMPLETE</promise>" /tmp/claude_last_output; then
|
|
510
|
+
exit 0 # Allow exit
|
|
511
|
+
fi
|
|
512
|
+
|
|
513
|
+
if [ $RALPH_ITERATIONS -lt $RALPH_MAX_ITERATIONS ]; then
|
|
514
|
+
# Re-feed prompt
|
|
515
|
+
cat .claude/ralph/PROMPT.md
|
|
516
|
+
exit 1 # Block exit, continue loop
|
|
517
|
+
fi
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
### Priority 4: Atomic Action Commands
|
|
521
|
+
|
|
522
|
+
**Why**: Boris has these for daily inner loops.
|
|
523
|
+
|
|
524
|
+
**Implementation**:
|
|
525
|
+
- `/commit-push-pr` - Stage, commit, push, create PR
|
|
526
|
+
- `/test-and-commit` - Run tests, only commit if passing
|
|
527
|
+
- `/format` - Run formatters on changed files
|
|
528
|
+
|
|
529
|
+
### Priority 5: Verification Subagent
|
|
530
|
+
|
|
531
|
+
**Why**: Boris uses `verify-app` for post-completion checks.
|
|
532
|
+
|
|
533
|
+
**Implementation**:
|
|
534
|
+
```yaml
|
|
535
|
+
# .claude/agents/verify-app.yaml
|
|
536
|
+
name: verify-app
|
|
537
|
+
description: Post-implementation verification agent
|
|
538
|
+
tools: [Bash, Read, Grep, Glob]
|
|
539
|
+
system_prompt: |
|
|
540
|
+
You are a verification specialist. Your job is to:
|
|
541
|
+
1. Run the test suite and report results
|
|
542
|
+
2. Run lint and typecheck
|
|
543
|
+
3. Verify no regressions
|
|
544
|
+
4. Report pass/fail with specific errors
|
|
545
|
+
|
|
546
|
+
You do NOT fix issues - you only verify and report.
|
|
547
|
+
```
|
|
548
|
+
|
|
549
|
+
### Priority 6: Automated CLAUDE.md Updates
|
|
550
|
+
|
|
551
|
+
**Why**: Boris's team has GitHub Action for this.
|
|
552
|
+
|
|
553
|
+
**Implementation**:
|
|
554
|
+
```yaml
|
|
555
|
+
# .github/workflows/claude-knowledge.yml
|
|
556
|
+
name: Extract Claude Knowledge
|
|
557
|
+
on:
|
|
558
|
+
pull_request:
|
|
559
|
+
types: [closed]
|
|
560
|
+
|
|
561
|
+
jobs:
|
|
562
|
+
extract:
|
|
563
|
+
if: github.event.pull_request.merged == true
|
|
564
|
+
runs-on: ubuntu-latest
|
|
565
|
+
steps:
|
|
566
|
+
- uses: actions/checkout@v4
|
|
567
|
+
- name: Extract patterns
|
|
568
|
+
run: |
|
|
569
|
+
# Analyze PR diff for patterns
|
|
570
|
+
# Update CLAUDE.md with findings
|
|
571
|
+
# Commit if changes made
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
---
|
|
575
|
+
|
|
576
|
+
## Summary: Key Takeaways
|
|
577
|
+
|
|
578
|
+
### From Boris Cherny
|
|
579
|
+
|
|
580
|
+
1. **Verification is #1** - Build feedback loops into every workflow
|
|
581
|
+
2. **Plan before code** - Steps 1-2 are crucial; Claude jumps to coding otherwise
|
|
582
|
+
3. **Opus for fewer iterations** - Stronger model = less steering = faster results
|
|
583
|
+
4. **Hooks for automation** - PostToolUse formatting catches the last 10%
|
|
584
|
+
5. **Shared knowledge compounds** - Team CLAUDE.md updated multiple times weekly
|
|
585
|
+
|
|
586
|
+
### From Ralph Wiggum
|
|
587
|
+
|
|
588
|
+
1. **One item per loop** - Focus prevents derailing
|
|
589
|
+
2. **Subagents for context management** - Don't pollute primary context
|
|
590
|
+
3. **Backpressure is essential** - Tests/types/linters reject invalid code
|
|
591
|
+
4. **No placeholders** - Full implementations or nothing
|
|
592
|
+
5. **Self-learning** - Allow Ralph to update AGENT.md
|
|
593
|
+
6. **Commit on success** - Tests pass → commit → push → tag
|
|
594
|
+
7. **Loop back is everything** - Design ways for self-evaluation
|
|
595
|
+
|
|
596
|
+
### Integration Philosophy
|
|
597
|
+
|
|
598
|
+
The key insight is that **verification feedback loops** are the highest-leverage improvement. Everything else is optimization, but verification is transformational.
|
|
599
|
+
|
|
600
|
+
Anvil already has strong workflow commands (`/explore`, `/spec`, `/plan`). What's missing is:
|
|
601
|
+
1. **Automated iteration on failure** (Ralph Wiggum pattern)
|
|
602
|
+
2. **Stop hooks that gate on verification** (Boris's approach)
|
|
603
|
+
3. **PostToolUse formatting** (catch last 10%)
|
|
604
|
+
4. **Atomic action commands** (inner loop efficiency)
|
|
605
|
+
|
|
606
|
+
---
|
|
607
|
+
|
|
608
|
+
*This research document should be used as the foundation for SPEC-ANV-verification-loops.md and related implementation work.*
|