@ai-dev-methodologies/rlp-desk 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +182 -0
- package/docs/architecture.md +129 -0
- package/docs/getting-started.md +153 -0
- package/docs/protocol-reference.md +220 -0
- package/examples/calculator/.claude/ralph-desk/plans/prd-loop-test.md +37 -0
- package/examples/calculator/.claude/ralph-desk/plans/test-spec-loop-test.md +29 -0
- package/examples/calculator/.claude/ralph-desk/prompts/loop-test.verifier.prompt.md +29 -0
- package/examples/calculator/.claude/ralph-desk/prompts/loop-test.worker.prompt.md +32 -0
- package/install.sh +53 -0
- package/package.json +40 -0
- package/scripts/postinstall.js +48 -0
- package/scripts/uninstall.js +45 -0
- package/src/commands/rlp-desk.md +164 -0
- package/src/governance.md +170 -0
- package/src/scripts/init_ralph_desk.zsh +245 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ai-dev-methodologies
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# RLP Desk
|
|
2
|
+
|
|
3
|
+
> Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification.
|
|
4
|
+
|
|
5
|
+
RLP Desk brings [Geoffrey Huntley's Ralph Loop](https://ghuntley.com/ralph/) philosophy to Claude Code. Inspired by [OpenAI Codex's long-horizon tasks](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/) and [design-desk](https://github.com/derrickchoi-openai/design-desk), it orchestrates fresh-context workers and verifiers through Claude Code's `Agent()` tool.
|
|
6
|
+
|
|
7
|
+
**Key insight**: Each iteration starts fresh. No accumulated context drift. The filesystem is the only memory.
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
[Your Session = LEADER]
|
|
11
|
+
│
|
|
12
|
+
Agent()├──▶ [Worker (fresh context)]
|
|
13
|
+
│ └── reads PRD + memory → implements → updates memory
|
|
14
|
+
│
|
|
15
|
+
Agent()└──▶ [Verifier (fresh context)]
|
|
16
|
+
└── reads done-claim → runs checks → writes verdict
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Quick Start
|
|
20
|
+
|
|
21
|
+
### 1. Install
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
npm install -g @ai-dev-methodologies/rlp-desk
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Or without npm:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### 2. Brainstorm
|
|
34
|
+
|
|
35
|
+
In your project directory, start a Claude Code session:
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
/rlp-desk brainstorm "implement a Python calculator with tests"
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
This interactively defines the contract: slug, objective, user stories, verification commands, and iteration settings.
|
|
42
|
+
|
|
43
|
+
### 3. Run
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
/rlp-desk run calculator
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The leader loop runs autonomously — spawning workers, verifying results, and tracking progress until completion or a circuit breaker triggers.
|
|
50
|
+
|
|
51
|
+
## Why?
|
|
52
|
+
|
|
53
|
+
### The Context Problem
|
|
54
|
+
|
|
55
|
+
LLM conversations accumulate context. Long sessions drift, hallucinate, and forget earlier decisions. The Ralph Loop solves this by treating **context as a disposable resource**:
|
|
56
|
+
|
|
57
|
+
- Each worker gets a **fresh context** — no prior conversation, no accumulated confusion
|
|
58
|
+
- **Filesystem = memory** — PRDs, campaign memory, and context files are the only state
|
|
59
|
+
- **Independent verification** — a separate fresh-context verifier checks the worker's claims against real evidence
|
|
60
|
+
|
|
61
|
+
### Lineage
|
|
62
|
+
|
|
63
|
+
| Concept | Source |
|
|
64
|
+
|---------|--------|
|
|
65
|
+
| Fresh context per iteration | [Ralph Loop](https://ghuntley.com/ralph/) ([guide](https://www.aihero.dev/getting-started-with-ralph), [tips](https://www.aihero.dev/tips-for-ai-coding-with-ralph-wiggum)) |
|
|
66
|
+
| Long-horizon autonomous tasks | [OpenAI Codex](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/) |
|
|
67
|
+
| Desk-based orchestration | [design-desk](https://github.com/derrickchoi-openai/design-desk) |
|
|
68
|
+
| Agent() subprocess model | Claude Code native |
|
|
69
|
+
|
|
70
|
+
## How It Works
|
|
71
|
+
|
|
72
|
+
### Three Roles
|
|
73
|
+
|
|
74
|
+
| Role | Runs In | Responsibility |
|
|
75
|
+
|------|---------|----------------|
|
|
76
|
+
| **Leader** | Your current session | Orchestrates the loop, reads memory, selects models, writes sentinels |
|
|
77
|
+
| **Worker** | Fresh `Agent()` context | Executes one bounded action per iteration, updates memory |
|
|
78
|
+
| **Verifier** | Fresh `Agent()` context | Independently verifies worker claims with fresh evidence |
|
|
79
|
+
|
|
80
|
+
### The Loop
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
for iteration in 1..max_iter:
|
|
84
|
+
|
|
85
|
+
1. Check sentinels (complete? blocked?)
|
|
86
|
+
2. Read campaign memory → get next iteration contract
|
|
87
|
+
3. Select model (haiku/sonnet/opus based on complexity)
|
|
88
|
+
4. Build worker prompt → dispatch via Agent()
|
|
89
|
+
5. Worker executes one bounded action, updates memory
|
|
90
|
+
6. If worker claims done → dispatch Verifier via Agent()
|
|
91
|
+
7. Verifier runs fresh checks → pass/fail/blocked
|
|
92
|
+
8. Update status, report to user, continue or stop
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Circuit Breakers
|
|
96
|
+
|
|
97
|
+
| Condition | Action |
|
|
98
|
+
|-----------|--------|
|
|
99
|
+
| Context unchanged for 3 iterations | BLOCKED |
|
|
100
|
+
| Same error repeated twice | Upgrade model, retry once, then BLOCKED |
|
|
101
|
+
| Max iterations reached | TIMEOUT |
|
|
102
|
+
|
|
103
|
+
### Model Routing
|
|
104
|
+
|
|
105
|
+
| Scenario | Model |
|
|
106
|
+
|----------|-------|
|
|
107
|
+
| Simple, single-file changes | `haiku` |
|
|
108
|
+
| Standard work (default) | `sonnet` |
|
|
109
|
+
| Architecture changes, multi-file, prior failure | `opus` |
|
|
110
|
+
| Standard verification | `sonnet` |
|
|
111
|
+
| Security/critical logic verification | `opus` |
|
|
112
|
+
|
|
113
|
+
## Commands
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
/rlp-desk brainstorm <description> Plan before init (interactive)
|
|
117
|
+
/rlp-desk init <slug> [objective] Create project scaffold
|
|
118
|
+
/rlp-desk run <slug> [--opts] Run the loop (this session = leader)
|
|
119
|
+
/rlp-desk status <slug> Show loop status
|
|
120
|
+
/rlp-desk logs <slug> [N] Show iteration logs
|
|
121
|
+
/rlp-desk clean <slug> Reset for re-run
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Run Options
|
|
125
|
+
|
|
126
|
+
| Flag | Default | Description |
|
|
127
|
+
|------|---------|-------------|
|
|
128
|
+
| `--max-iter N` | 100 | Maximum iterations before timeout |
|
|
129
|
+
| `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
|
|
130
|
+
| `--verifier-model MODEL` | sonnet | Verifier model (haiku/sonnet/opus) |
|
|
131
|
+
|
|
132
|
+
## Project Structure
|
|
133
|
+
|
|
134
|
+
After `init`, your project gets this scaffold:
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
your-project/
|
|
138
|
+
└── .claude/ralph-desk/
|
|
139
|
+
├── prompts/
|
|
140
|
+
│ ├── <slug>.worker.prompt.md
|
|
141
|
+
│ └── <slug>.verifier.prompt.md
|
|
142
|
+
├── context/
|
|
143
|
+
│ └── <slug>-latest.md
|
|
144
|
+
├── memos/
|
|
145
|
+
│ └── <slug>-memory.md
|
|
146
|
+
├── plans/
|
|
147
|
+
│ ├── prd-<slug>.md
|
|
148
|
+
│ └── test-spec-<slug>.md
|
|
149
|
+
└── logs/<slug>/
|
|
150
|
+
└── status.json
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## Example: Calculator
|
|
154
|
+
|
|
155
|
+
See [`examples/calculator/`](examples/calculator/) for a complete example that implements a Python calculator module with tests using the RLP Desk loop.
|
|
156
|
+
|
|
157
|
+
The example demonstrates:
|
|
158
|
+
- A PRD with two user stories (calculator functions + pytest tests)
|
|
159
|
+
- Test specification with verification commands
|
|
160
|
+
- Worker and verifier prompts configured for the task
|
|
161
|
+
|
|
162
|
+
To try it yourself:
|
|
163
|
+
|
|
164
|
+
```
|
|
165
|
+
mkdir my-calc && cd my-calc
|
|
166
|
+
/rlp-desk brainstorm "Python calculator with add, subtract, multiply, divide + pytest tests"
|
|
167
|
+
/rlp-desk run loop-test
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## Documentation
|
|
171
|
+
|
|
172
|
+
- [Architecture](docs/architecture.md) — Design philosophy and the Agent() approach
|
|
173
|
+
- [Getting Started](docs/getting-started.md) — Step-by-step tutorial with the calculator example
|
|
174
|
+
- [Protocol Reference](docs/protocol-reference.md) — Full protocol specification
|
|
175
|
+
|
|
176
|
+
## Contributing
|
|
177
|
+
|
|
178
|
+
See [CONTRIBUTING.md](.github/CONTRIBUTING.md).
|
|
179
|
+
|
|
180
|
+
## License
|
|
181
|
+
|
|
182
|
+
[MIT](LICENSE)
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
## Design Philosophy
|
|
4
|
+
|
|
5
|
+
RLP Desk is built on a single conviction: **context is a liability, not an asset**.
|
|
6
|
+
|
|
7
|
+
In long-running LLM sessions, accumulated context causes drift, hallucination, and forgotten decisions. RLP Desk eliminates this by treating each iteration as a fresh start, with the filesystem as the sole source of truth.
|
|
8
|
+
|
|
9
|
+
### Why Fresh Context Matters
|
|
10
|
+
|
|
11
|
+
Traditional approaches:
|
|
12
|
+
```
|
|
13
|
+
Session start → Task 1 → Task 2 → ... → Task N
|
|
14
|
+
↑ context accumulates, quality degrades
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
RLP Desk approach:
|
|
18
|
+
```
|
|
19
|
+
Leader ──Agent()──▶ Worker 1 (fresh) ──▶ writes to filesystem
|
|
20
|
+
──Agent()──▶ Worker 2 (fresh) ──▶ reads filesystem, continues
|
|
21
|
+
──Agent()──▶ Worker 3 (fresh) ──▶ reads filesystem, continues
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Each worker reads the same filesystem state that any human could inspect. No hidden context. No accumulated confusion.
|
|
25
|
+
|
|
26
|
+
## The Agent() Approach
|
|
27
|
+
|
|
28
|
+
Claude Code's `Agent()` tool spawns a subprocess — a completely new context window with no knowledge of the parent conversation. RLP Desk exploits this property:
|
|
29
|
+
|
|
30
|
+
```python
|
|
31
|
+
# Each call = new process = fresh context = no prior conversation
|
|
32
|
+
Agent(
|
|
33
|
+
subagent_type="executor", # Worker or Verifier
|
|
34
|
+
model="sonnet", # Model selection per iteration
|
|
35
|
+
prompt=full_prompt_text, # Everything the agent needs
|
|
36
|
+
mode="bypassPermissions" # Autonomous execution
|
|
37
|
+
)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The Agent returns synchronously. No polling, no signal files, no tmux. The Leader simply reads the filesystem after each Agent completes.
|
|
41
|
+
|
|
42
|
+
### Why Agent() Over Other Approaches
|
|
43
|
+
|
|
44
|
+
| Approach | Problem |
|
|
45
|
+
|----------|---------|
|
|
46
|
+
| Single long session | Context drift, token limits |
|
|
47
|
+
| tmux + polling | Complex, brittle, race conditions |
|
|
48
|
+
| Signal files + sleep loops | Fragile timing, wasted compute |
|
|
49
|
+
| **Agent() subprocess** | **Clean, synchronous, guaranteed fresh context** |
|
|
50
|
+
|
|
51
|
+
## Three-Role Architecture
|
|
52
|
+
|
|
53
|
+
### Leader (Your Session)
|
|
54
|
+
|
|
55
|
+
The Leader is the currently running Claude Code session. It:
|
|
56
|
+
|
|
57
|
+
- Reads campaign memory to understand current state
|
|
58
|
+
- Decides which model to use for the next iteration
|
|
59
|
+
- Builds prompts by combining base prompts with iteration-specific context
|
|
60
|
+
- Dispatches Workers and Verifiers via `Agent()`
|
|
61
|
+
- Writes sentinel files (COMPLETE/BLOCKED) based on results
|
|
62
|
+
- Tracks circuit breaker conditions
|
|
63
|
+
|
|
64
|
+
The Leader **never writes code**. It orchestrates.
|
|
65
|
+
|
|
66
|
+
### Worker (Fresh Context)
|
|
67
|
+
|
|
68
|
+
Each Worker:
|
|
69
|
+
|
|
70
|
+
- Receives a complete prompt with everything it needs (PRD, memory, context, task)
|
|
71
|
+
- Executes exactly **one bounded action** (e.g., implement one user story)
|
|
72
|
+
- Updates the filesystem:
|
|
73
|
+
- `context/<slug>-latest.md` — current frontier
|
|
74
|
+
- `memos/<slug>-memory.md` — campaign memory for the next worker
|
|
75
|
+
- `memos/<slug>-done-claim.json` — if claiming all work is complete
|
|
76
|
+
- Exits
|
|
77
|
+
|
|
78
|
+
The Worker has no memory of previous iterations. It relies entirely on what prior Workers wrote to the filesystem.
|
|
79
|
+
|
|
80
|
+
### Verifier (Fresh Context)
|
|
81
|
+
|
|
82
|
+
The Verifier exists because **Worker claims are not trustworthy**. A Worker may claim "all tests pass" without actually running them.
|
|
83
|
+
|
|
84
|
+
Each Verifier:
|
|
85
|
+
|
|
86
|
+
- Reads the PRD, test spec, and the Worker's done-claim
|
|
87
|
+
- Runs verification commands **from scratch** (build, test, lint)
|
|
88
|
+
- Checks each acceptance criterion against fresh evidence
|
|
89
|
+
- Writes a verdict: pass, fail, or blocked
|
|
90
|
+
- **Never modifies code**
|
|
91
|
+
|
|
92
|
+
## Filesystem as Memory
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
.claude/ralph-desk/
|
|
96
|
+
├── plans/ # Contracts (PRD, test spec) — written once, rarely modified
|
|
97
|
+
├── prompts/ # Base prompts — templates for Worker/Verifier
|
|
98
|
+
├── context/ # Current frontier — Worker updates each iteration
|
|
99
|
+
├── memos/ # Runtime state — memory, claims, verdicts, sentinels
|
|
100
|
+
└── logs/ # Audit trail — every prompt sent, every status change
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### State Lifecycle
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
plans/prd-*.md Written at init, stable reference
|
|
107
|
+
plans/test-spec-*.md Written at init, stable reference
|
|
108
|
+
context/*-latest.md Updated by Worker each iteration
|
|
109
|
+
memos/*-memory.md Rewritten by Worker each iteration
|
|
110
|
+
memos/*-done-claim.json Created by Worker, cleaned by Leader
|
|
111
|
+
memos/*-verify-verdict Created by Verifier, cleaned by Leader
|
|
112
|
+
memos/*-complete.md Written once by Leader (terminal)
|
|
113
|
+
memos/*-blocked.md Written once by Leader (terminal)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Model Routing Strategy
|
|
117
|
+
|
|
118
|
+
Not every task needs the most powerful model. RLP Desk routes based on complexity:
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
Simple fix (typo, config) → haiku (fast, cheap)
|
|
122
|
+
Standard implementation → sonnet (balanced)
|
|
123
|
+
Architecture / debugging → opus (thorough)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
The Leader adapts dynamically:
|
|
127
|
+
- Previous iteration failed → upgrade model
|
|
128
|
+
- Simple repetitive task → downgrade model
|
|
129
|
+
- User explicitly specified → respect the choice
|
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
# Getting Started
|
|
2
|
+
|
|
3
|
+
This guide walks you through your first RLP Desk loop using a simple Python calculator example.
|
|
4
|
+
|
|
5
|
+
## Prerequisites
|
|
6
|
+
|
|
7
|
+
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed and authenticated
|
|
8
|
+
- A terminal with bash or zsh
|
|
9
|
+
|
|
10
|
+
## Step 1: Install RLP Desk
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
npm install -g @ai-dev-methodologies/rlp-desk
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
Or without npm:
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
This installs three files:
|
|
23
|
+
- `~/.claude/commands/rlp-desk.md` — the slash command
|
|
24
|
+
- `~/.claude/ralph-desk/init_ralph_desk.zsh` — the scaffold generator
|
|
25
|
+
- `~/.claude/ralph-desk/governance.md` — the protocol document
|
|
26
|
+
|
|
27
|
+
## Step 2: Create a Project
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
mkdir calculator-demo && cd calculator-demo
|
|
31
|
+
git init
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Step 3: Brainstorm
|
|
35
|
+
|
|
36
|
+
Open Claude Code and run:
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
/rlp-desk brainstorm "Python calculator module with add, subtract, multiply, divide functions and pytest tests"
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
The brainstorm phase interactively determines:
|
|
43
|
+
|
|
44
|
+
| Item | Example |
|
|
45
|
+
|------|---------|
|
|
46
|
+
| **Slug** | `loop-test` |
|
|
47
|
+
| **Objective** | Implement calc.py + test_calc.py |
|
|
48
|
+
| **User Stories** | US-001: calculator functions, US-002: pytest tests |
|
|
49
|
+
| **Iteration Unit** | One user story per iteration |
|
|
50
|
+
| **Verification** | `python3 -m pytest test_calc.py -v` |
|
|
51
|
+
| **Models** | Worker: sonnet, Verifier: sonnet |
|
|
52
|
+
| **Max Iterations** | 10 |
|
|
53
|
+
|
|
54
|
+
On approval, brainstorm offers to run `init` automatically.
|
|
55
|
+
|
|
56
|
+
## Step 4: Initialize (if not done in brainstorm)
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
/rlp-desk init loop-test "Python calculator with tests"
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
This creates the scaffold:
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
.claude/ralph-desk/
|
|
66
|
+
├── prompts/
|
|
67
|
+
│ ├── loop-test.worker.prompt.md
|
|
68
|
+
│ └── loop-test.verifier.prompt.md
|
|
69
|
+
├── context/
|
|
70
|
+
│ └── loop-test-latest.md
|
|
71
|
+
├── memos/
|
|
72
|
+
│ └── loop-test-memory.md
|
|
73
|
+
├── plans/
|
|
74
|
+
│ ├── prd-loop-test.md
|
|
75
|
+
│ └── test-spec-loop-test.md
|
|
76
|
+
└── logs/loop-test/
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Step 5: Customize the PRD
|
|
80
|
+
|
|
81
|
+
Edit `.claude/ralph-desk/plans/prd-loop-test.md` to define your user stories and acceptance criteria. See [`examples/calculator/`](../examples/calculator/.claude/ralph-desk/plans/prd-loop-test.md) for a complete example.
|
|
82
|
+
|
|
83
|
+
Key sections:
|
|
84
|
+
- **User Stories** with specific, testable acceptance criteria
|
|
85
|
+
- **Technical Constraints** (e.g., "Python 3 + pytest only")
|
|
86
|
+
- **Done When** conditions
|
|
87
|
+
|
|
88
|
+
## Step 6: Define the Test Spec
|
|
89
|
+
|
|
90
|
+
Edit `.claude/ralph-desk/plans/test-spec-loop-test.md` to specify verification commands:
|
|
91
|
+
|
|
92
|
+
```markdown
|
|
93
|
+
## Verification Commands
|
|
94
|
+
### Test
|
|
95
|
+
python3 -m pytest test_calc.py -v
|
|
96
|
+
|
|
97
|
+
## Criteria → Verification Mapping
|
|
98
|
+
| Criterion | Method | Command |
|
|
99
|
+
|-----------|--------|---------|
|
|
100
|
+
| calc.py exists | automated | test -f calc.py |
|
|
101
|
+
| All tests pass | automated | python3 -m pytest test_calc.py -v |
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Step 7: Run the Loop
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
/rlp-desk run loop-test
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
What happens:
|
|
111
|
+
|
|
112
|
+
1. **Iteration 1**: Worker reads the PRD, implements `calc.py` (US-001), updates memory
|
|
113
|
+
2. **Iteration 2**: Worker reads memory, implements `test_calc.py` (US-002), writes done-claim
|
|
114
|
+
3. **Iteration 3**: Verifier runs all checks, writes pass verdict
|
|
115
|
+
4. Leader writes COMPLETE sentinel, reports success
|
|
116
|
+
|
|
117
|
+
You'll see status updates after each iteration:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
Iteration 1 | Worker (sonnet) | US-001 complete, continuing
|
|
121
|
+
Iteration 2 | Worker (sonnet) | All stories done, requesting verification
|
|
122
|
+
Iteration 3 | Verifier (sonnet) | PASS — all criteria met
|
|
123
|
+
✓ COMPLETE
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Step 8: Check Status
|
|
127
|
+
|
|
128
|
+
At any point during or after a run:
|
|
129
|
+
|
|
130
|
+
```
|
|
131
|
+
/rlp-desk status loop-test
|
|
132
|
+
/rlp-desk logs loop-test # latest iteration
|
|
133
|
+
/rlp-desk logs loop-test 2 # specific iteration
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
## Step 9: Re-run (if needed)
|
|
137
|
+
|
|
138
|
+
If you want to run the loop again:
|
|
139
|
+
|
|
140
|
+
```
|
|
141
|
+
/rlp-desk clean loop-test
|
|
142
|
+
/rlp-desk run loop-test
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
`clean` removes runtime artifacts (sentinels, claims, verdicts) but preserves the PRD, test spec, and prompts.
|
|
146
|
+
|
|
147
|
+
## Tips
|
|
148
|
+
|
|
149
|
+
- **Start small**: One or two user stories for your first loop
|
|
150
|
+
- **Be specific in acceptance criteria**: "function returns float" is testable; "function works well" is not
|
|
151
|
+
- **Include verification commands**: The verifier needs concrete commands to run
|
|
152
|
+
- **One story per iteration**: Each worker should do one bounded action
|
|
153
|
+
- **Check logs when stuck**: `logs/<slug>/iter-NNN.worker-prompt.md` shows exactly what the worker received
|
|
@@ -0,0 +1,220 @@
|
|
|
1
|
+
# Protocol Reference
|
|
2
|
+
|
|
3
|
+
Complete specification of the RLP Desk leader loop protocol, signal contracts, circuit breakers, and model routing.
|
|
4
|
+
|
|
5
|
+
## Leader Loop Protocol
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
for iteration in 1..max_iter:
|
|
9
|
+
|
|
10
|
+
① Check sentinels
|
|
11
|
+
- <slug>-complete.md exists → stop (success)
|
|
12
|
+
- <slug>-blocked.md exists → stop (failure)
|
|
13
|
+
|
|
14
|
+
② Read memory.md
|
|
15
|
+
- Parse "Stop Status" section → continue/verify/blocked
|
|
16
|
+
- Parse "Next Iteration Contract" → task for this iteration
|
|
17
|
+
|
|
18
|
+
③ Select model
|
|
19
|
+
- Apply model routing rules (see below)
|
|
20
|
+
- Check circuit breaker conditions
|
|
21
|
+
|
|
22
|
+
④ Build Worker prompt
|
|
23
|
+
- Read base prompt from prompts/<slug>.worker.prompt.md
|
|
24
|
+
- Append iteration number + contract from memory
|
|
25
|
+
- Write audit copy to logs/<slug>/iter-NNN.worker-prompt.md
|
|
26
|
+
|
|
27
|
+
⑤ Execute Worker
|
|
28
|
+
Agent(subagent_type="executor", model=selected, prompt=prompt)
|
|
29
|
+
- Synchronous return — wait for completion
|
|
30
|
+
- Each Agent() = fresh context (new subprocess)
|
|
31
|
+
|
|
32
|
+
⑥ Read memory.md again (Worker updated it)
|
|
33
|
+
- stop=continue → go to ⑧
|
|
34
|
+
- stop=verify → go to ⑦
|
|
35
|
+
- stop=blocked → write BLOCKED sentinel, stop
|
|
36
|
+
|
|
37
|
+
⑦ Execute Verifier
|
|
38
|
+
- Build verifier prompt → write to logs/<slug>/iter-NNN.verifier-prompt.md
|
|
39
|
+
- Agent(subagent_type="executor", model=selected, prompt=prompt)
|
|
40
|
+
- Read verify-verdict.json:
|
|
41
|
+
• verdict=pass + recommended=complete → write COMPLETE sentinel, stop
|
|
42
|
+
• verdict=fail + recommended=continue → go to ⑧
|
|
43
|
+
• verdict=blocked → write BLOCKED sentinel, stop
|
|
44
|
+
|
|
45
|
+
⑧ Update status.json, report to user, clean runtime files, next iteration
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Signal Contracts
|
|
49
|
+
|
|
50
|
+
### Campaign Memory (`<slug>-memory.md`)
|
|
51
|
+
|
|
52
|
+
Written by the Worker at the end of each iteration. Must contain:
|
|
53
|
+
|
|
54
|
+
```markdown
|
|
55
|
+
# <slug> - Campaign Memory
|
|
56
|
+
|
|
57
|
+
## Stop Status
|
|
58
|
+
continue | verify | blocked
|
|
59
|
+
|
|
60
|
+
## Objective
|
|
61
|
+
<original objective>
|
|
62
|
+
|
|
63
|
+
## Current State
|
|
64
|
+
Iteration N - <description>
|
|
65
|
+
|
|
66
|
+
## Next Iteration Contract
|
|
67
|
+
<specific task for the next worker>
|
|
68
|
+
|
|
69
|
+
## Patterns Discovered
|
|
70
|
+
## Learnings
|
|
71
|
+
## Evidence Chain
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
The Leader reads **Stop Status** and **Next Iteration Contract** to decide what happens next.
|
|
75
|
+
|
|
76
|
+
### Done Claim (`<slug>-done-claim.json`)
|
|
77
|
+
|
|
78
|
+
Written by the Worker when claiming all work is complete:
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"iteration": 3,
|
|
83
|
+
"claimed_at_utc": "2025-01-15T10:30:00Z",
|
|
84
|
+
"summary": "All user stories implemented and tests passing",
|
|
85
|
+
"stories_completed": ["US-001", "US-002"],
|
|
86
|
+
"evidence": {
|
|
87
|
+
"test_output": "8 passed in 0.05s",
|
|
88
|
+
"files_created": ["calc.py", "test_calc.py"]
|
|
89
|
+
}
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Verify Verdict (`<slug>-verify-verdict.json`)
|
|
94
|
+
|
|
95
|
+
Written by the Verifier after independent verification:
|
|
96
|
+
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"verdict": "pass|fail|blocked",
|
|
100
|
+
"verified_at_utc": "2025-01-15T10:35:00Z",
|
|
101
|
+
"summary": "All criteria verified with fresh evidence",
|
|
102
|
+
"criteria_results": [
|
|
103
|
+
{
|
|
104
|
+
"criterion": "US-001 AC1: calc.py exists",
|
|
105
|
+
"met": true,
|
|
106
|
+
"evidence": "test -f calc.py → exit 0"
|
|
107
|
+
}
|
|
108
|
+
],
|
|
109
|
+
"missing_evidence": [],
|
|
110
|
+
"issues": [],
|
|
111
|
+
"recommended_state_transition": "complete|continue|blocked",
|
|
112
|
+
"next_iteration_contract": "Fix failing test for divide by zero",
|
|
113
|
+
"evidence_paths": ["test_calc.py::test_divide_by_zero"]
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Sentinels
|
|
118
|
+
|
|
119
|
+
Leader-only files that terminate the loop:
|
|
120
|
+
|
|
121
|
+
| File | Meaning | Written When |
|
|
122
|
+
|------|---------|--------------|
|
|
123
|
+
| `<slug>-complete.md` | Loop succeeded | Verifier passes all criteria |
|
|
124
|
+
| `<slug>-blocked.md` | Loop cannot continue | Autonomous blocker or circuit breaker |
|
|
125
|
+
|
|
126
|
+
**Only the Leader writes sentinels.** Workers and Verifiers never touch them.
|
|
127
|
+
|
|
128
|
+
## Context File (`<slug>-latest.md`)
|
|
129
|
+
|
|
130
|
+
Updated by the Worker each iteration to reflect the current frontier:
|
|
131
|
+
|
|
132
|
+
```markdown
|
|
133
|
+
# <slug> - Latest Context
|
|
134
|
+
|
|
135
|
+
## Current Frontier
|
|
136
|
+
### Completed
|
|
137
|
+
- US-001: calculator functions implemented
|
|
138
|
+
### In Progress
|
|
139
|
+
- US-002: pytest tests
|
|
140
|
+
### Next
|
|
141
|
+
- Run verification
|
|
142
|
+
|
|
143
|
+
## Key Decisions
|
|
144
|
+
- Using ValueError for divide-by-zero (not ZeroDivisionError)
|
|
145
|
+
|
|
146
|
+
## Known Issues
|
|
147
|
+
## Files Changed This Iteration
|
|
148
|
+
- calc.py (created)
|
|
149
|
+
## Verification Status
|
|
150
|
+
- python3 -m pytest → not yet run
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## Circuit Breakers
|
|
154
|
+
|
|
155
|
+
| Condition | Detection | Action |
|
|
156
|
+
|-----------|-----------|--------|
|
|
157
|
+
| Stale context | `context-latest.md` hash unchanged for 3 consecutive iterations | Write BLOCKED sentinel |
|
|
158
|
+
| Repeated error | Worker produces the same error message 2 iterations in a row | Upgrade model, retry once; still failing → BLOCKED |
|
|
159
|
+
| Timeout | Iteration count reaches `max_iter` | Write TIMEOUT status, report to user |
|
|
160
|
+
|
|
161
|
+
### Stale Context Detection
|
|
162
|
+
|
|
163
|
+
The Leader computes a hash (or diff) of `context-latest.md` before and after each Worker runs. If the content doesn't change for 3 consecutive iterations, the Worker is stuck and the loop is blocked.
|
|
164
|
+
|
|
165
|
+
### Error Escalation
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Error in iteration N (sonnet) → retry with opus in iteration N+1
|
|
169
|
+
Same error in iteration N+1 (opus) → BLOCKED
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## Model Routing
|
|
173
|
+
|
|
174
|
+
### Selection Matrix
|
|
175
|
+
|
|
176
|
+
| Scenario | Model | Rationale |
|
|
177
|
+
|----------|-------|-----------|
|
|
178
|
+
| Single file, clear change | `haiku` | Fast, sufficient |
|
|
179
|
+
| Standard implementation | `sonnet` | Balanced (default) |
|
|
180
|
+
| Multi-file, architecture | `opus` | Needs broad understanding |
|
|
181
|
+
| Previous iteration failed | upgrade | Harder model may succeed |
|
|
182
|
+
| Verification (standard) | `sonnet` | Sufficient for running checks |
|
|
183
|
+
| Verification (security) | `opus` | Critical logic needs thoroughness |
|
|
184
|
+
|
|
185
|
+
### Dynamic Adaptation
|
|
186
|
+
|
|
187
|
+
The Leader reassesses the model every iteration:
|
|
188
|
+
|
|
189
|
+
1. Read memory for previous iteration outcome
|
|
190
|
+
2. If failed → upgrade one level (haiku → sonnet → opus)
|
|
191
|
+
3. If simple/repetitive → consider downgrade
|
|
192
|
+
4. User override via `--worker-model` / `--verifier-model` takes precedence
|
|
193
|
+
|
|
194
|
+
## Status File (`status.json`)
|
|
195
|
+
|
|
196
|
+
Updated by the Leader after each iteration:
|
|
197
|
+
|
|
198
|
+
```json
|
|
199
|
+
{
|
|
200
|
+
"slug": "loop-test",
|
|
201
|
+
"iteration": 2,
|
|
202
|
+
"max_iter": 100,
|
|
203
|
+
"phase": "worker|verifier|complete|blocked|timeout",
|
|
204
|
+
"worker_model": "sonnet",
|
|
205
|
+
"verifier_model": "sonnet",
|
|
206
|
+
"last_result": "continue|verify|pass|fail|blocked",
|
|
207
|
+
"updated_at_utc": "2025-01-15T10:30:00Z"
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
## Slash Command Reference
|
|
212
|
+
|
|
213
|
+
| Command | Arguments | Description |
|
|
214
|
+
|---------|-----------|-------------|
|
|
215
|
+
| `brainstorm` | `<description>` | Interactive planning before init |
|
|
216
|
+
| `init` | `<slug> [objective]` | Create project scaffold |
|
|
217
|
+
| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M]` | Run the leader loop |
|
|
218
|
+
| `status` | `<slug>` | Display current loop status |
|
|
219
|
+
| `logs` | `<slug> [N]` | Show iteration logs |
|
|
220
|
+
| `clean` | `<slug>` | Remove runtime artifacts for re-run |
|