@ai-dev-methodologies/rlp-desk 0.0.1 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +72 -8
- package/docs/architecture.md +34 -8
- package/docs/getting-started.md +2 -2
- package/docs/protocol-reference.md +267 -14
- package/examples/calculator/.claude/ralph-desk/context/loop-test-latest.md +12 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-output.log +0 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-prompt.md +38 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-trigger.sh +28 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/session-config.json +25 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/status.json +10 -0
- package/examples/calculator/.claude/ralph-desk/logs/loop-test/worker-heartbeat.json +1 -0
- package/examples/calculator/.claude/ralph-desk/memos/loop-test-memory.md +17 -0
- package/examples/calculator/.claude/ralph-desk/prompts/loop-test.worker.prompt.md +1 -1
- package/install.sh +14 -0
- package/package.json +1 -1
- package/scripts/postinstall.js +17 -1
- package/scripts/uninstall.js +1 -0
- package/src/commands/rlp-desk.md +112 -21
- package/src/governance.md +92 -7
- package/src/scripts/init_ralph_desk.zsh +51 -30
- package/src/scripts/run_ralph_desk.zsh +1259 -0
package/README.md
CHANGED
|
@@ -30,15 +30,20 @@ Or without npm:
|
|
|
30
30
|
curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
### 2. Brainstorm
|
|
33
|
+
### 2. Brainstorm (recommended)
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
**Always start with brainstorm.** It interactively walks you through the project contract:
|
|
36
36
|
|
|
37
37
|
```
|
|
38
38
|
/rlp-desk brainstorm "implement a Python calculator with tests"
|
|
39
39
|
```
|
|
40
40
|
|
|
41
|
-
|
|
41
|
+
You'll be asked to confirm each item:
|
|
42
|
+
- **Slug** — project identifier
|
|
43
|
+
- **User Stories** — discrete, testable units of work
|
|
44
|
+
- **Iteration Unit** — one story per iteration (incremental) or all at once (fast)
|
|
45
|
+
- **Verification Commands** — how to check the work
|
|
46
|
+
- **Models** — which Claude model for Worker/Verifier
|
|
42
47
|
|
|
43
48
|
### 3. Run
|
|
44
49
|
|
|
@@ -107,8 +112,8 @@ for iteration in 1..max_iter:
|
|
|
107
112
|
| Simple, single-file changes | `haiku` |
|
|
108
113
|
| Standard work (default) | `sonnet` |
|
|
109
114
|
| Architecture changes, multi-file, prior failure | `opus` |
|
|
110
|
-
|
|
|
111
|
-
|
|
|
115
|
+
| Verification (default) | `opus` |
|
|
116
|
+
| Lightweight verification | `sonnet` |
|
|
112
117
|
|
|
113
118
|
## Commands
|
|
114
119
|
|
|
@@ -118,7 +123,7 @@ for iteration in 1..max_iter:
|
|
|
118
123
|
/rlp-desk run <slug> [--opts] Run the loop (this session = leader)
|
|
119
124
|
/rlp-desk status <slug> Show loop status
|
|
120
125
|
/rlp-desk logs <slug> [N] Show iteration logs
|
|
121
|
-
/rlp-desk clean <slug>
|
|
126
|
+
/rlp-desk clean <slug> [--kill-session] Reset for re-run
|
|
122
127
|
```
|
|
123
128
|
|
|
124
129
|
### Run Options
|
|
@@ -127,7 +132,66 @@ for iteration in 1..max_iter:
|
|
|
127
132
|
|------|---------|-------------|
|
|
128
133
|
| `--max-iter N` | 100 | Maximum iterations before timeout |
|
|
129
134
|
| `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
|
|
130
|
-
| `--verifier-model MODEL` |
|
|
135
|
+
| `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
|
|
136
|
+
| `--mode agent\|tmux` | agent | Execution mode (see below) |
|
|
137
|
+
|
|
138
|
+
## Execution Modes
|
|
139
|
+
|
|
140
|
+
RLP Desk supports two execution modes. Both honor the same governance protocol.
|
|
141
|
+
|
|
142
|
+
### Environment Compatibility
|
|
143
|
+
|
|
144
|
+
| Environment | Agent Mode | Tmux Mode |
|
|
145
|
+
|-------------|-----------|-----------|
|
|
146
|
+
| Claude Code (any terminal) | **Works** | Requires tmux |
|
|
147
|
+
| Inside tmux session | **Works** | **Works** — panes split in current window |
|
|
148
|
+
| Outside tmux session | **Works** | **Rejected** — "start tmux first" |
|
|
149
|
+
|
|
150
|
+
### Agent Mode (default) — "Smart Mode"
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
/rlp-desk run calculator
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. The Leader is an LLM that dynamically routes models and reasons about context.
|
|
157
|
+
|
|
158
|
+
- Works anywhere — no tmux required
|
|
159
|
+
- Dynamic model routing — Leader upgrades models on failure
|
|
160
|
+
- Fix Loop — extracts verifier issues and feeds them back to the next worker
|
|
161
|
+
- Best for interactive development
|
|
162
|
+
|
|
163
|
+
### Tmux Mode — "Lean Mode"
|
|
164
|
+
|
|
165
|
+
```
|
|
166
|
+
/rlp-desk run calculator --mode tmux
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**Requires running inside a tmux session.** A shell script takes over as Leader, splitting your current window into three panes. Workers run interactive `claude` sessions — you can watch them work in real-time.
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
+---------------------+---------------------+
|
|
173
|
+
| Your pane (Leader) | Worker pane |
|
|
174
|
+
| shell loop running | claude TUI running |
|
|
175
|
+
| polls signal files | you see it working |
|
|
176
|
+
| +---------------------+
|
|
177
|
+
| | Verifier pane |
|
|
178
|
+
| | claude TUI running |
|
|
179
|
+
| | (only when needed) |
|
|
180
|
+
+---------------------+---------------------+
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
- Real-time visibility — watch Worker/Verifier execute live
|
|
184
|
+
- Zero-token orchestration — shell loop, not LLM
|
|
185
|
+
- Automatic cleanup — panes removed on completion
|
|
186
|
+
- Best for long campaigns and observability
|
|
187
|
+
|
|
188
|
+
Prerequisites: `tmux` and `jq` must be installed.
|
|
189
|
+
|
|
190
|
+
To clean up tmux artifacts:
|
|
191
|
+
|
|
192
|
+
```
|
|
193
|
+
/rlp-desk clean calculator --kill-session
|
|
194
|
+
```
|
|
131
195
|
|
|
132
196
|
## Project Structure
|
|
133
197
|
|
|
@@ -169,7 +233,7 @@ mkdir my-calc && cd my-calc
|
|
|
169
233
|
|
|
170
234
|
## Documentation
|
|
171
235
|
|
|
172
|
-
- [Architecture](docs/architecture.md) — Design philosophy
|
|
236
|
+
- [Architecture](docs/architecture.md) — Design philosophy, Agent() and tmux execution modes
|
|
173
237
|
- [Getting Started](docs/getting-started.md) — Step-by-step tutorial with the calculator example
|
|
174
238
|
- [Protocol Reference](docs/protocol-reference.md) — Full protocol specification
|
|
175
239
|
|
package/docs/architecture.md
CHANGED
|
@@ -39,14 +39,40 @@ Agent(
|
|
|
39
39
|
|
|
40
40
|
The Agent returns synchronously. No polling, no signal files, no tmux. The Leader simply reads the filesystem after each Agent completes.
|
|
41
41
|
|
|
42
|
-
###
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
|
47
|
-
|
|
48
|
-
|
|
|
49
|
-
| **
|
|
42
|
+
### Two Execution Modes
|
|
43
|
+
|
|
44
|
+
RLP Desk supports two modes for running the Leader loop. Both honor the same governance protocol (section 7). Choose based on your use case.
|
|
45
|
+
|
|
46
|
+
| Mode | Leader | Model Routing | Session Required | Best For |
|
|
47
|
+
|------|--------|---------------|------------------|----------|
|
|
48
|
+
| **Agent() — "Smart mode"** (default) | LLM (current session) | Dynamic — Leader reasons about which model to use each iteration | Active Claude Code session | Interactive development, complex routing decisions |
|
|
49
|
+
| **Tmux — "Lean mode"** | Shell script (`run_ralph_desk.zsh`) | Static — set via `WORKER_MODEL`/`VERIFIER_MODEL` env vars | None (runs detached) | Long campaigns, CI, observability, zero-token orchestration |
|
|
50
|
+
|
|
51
|
+
**Agent() mode** is synchronous and simple: each `Agent()` call blocks until the subprocess finishes, then the Leader reads the filesystem. No polling, no signal files, no tmux.
|
|
52
|
+
|
|
53
|
+
**Tmux mode** trades dynamic routing for visibility and independence. The shell Leader writes prompts to files, sends short trigger commands via `tmux send-keys`, and polls structured JSON signal files (`iter-signal.json`, `verify-verdict.json`) for control flow. It uses proven [omc-teams](https://github.com/anthropics/omc-teams) tmux patterns — write-then-notify, pane ID stability, copy-mode guards, heartbeat monitoring — for reliable, race-free orchestration.
|
|
54
|
+
|
|
55
|
+
The tmux script is a second implementation of the governance protocol. Traceability is maintained via governance.md section 7 step-number comments throughout the script.
|
|
56
|
+
|
|
57
|
+
#### Tmux Architecture
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
[tmux session: rlp-desk-<slug>-<timestamp>]
|
|
61
|
+
+-------------------------------------+
|
|
62
|
+
| Leader pane (shell loop) |
|
|
63
|
+
| - writes prompts to files |
|
|
64
|
+
| - sends short triggers via send-keys|
|
|
65
|
+
| - polls iter-signal.json via jq |
|
|
66
|
+
| - monitors heartbeat files |
|
|
67
|
+
| - writes sentinels |
|
|
68
|
+
+------------------+------------------+
|
|
69
|
+
| Worker pane | Verifier pane |
|
|
70
|
+
| bash trigger.sh | bash trigger.sh |
|
|
71
|
+
| -> claude -p ... | -> claude -p ... |
|
|
72
|
+
| heartbeat writer | heartbeat writer |
|
|
73
|
+
| (fresh context) | (fresh context) |
|
|
74
|
+
+------------------+------------------+
|
|
75
|
+
```
|
|
50
76
|
|
|
51
77
|
## Three-Role Architecture
|
|
52
78
|
|
package/docs/getting-started.md
CHANGED
|
@@ -48,7 +48,7 @@ The brainstorm phase interactively determines:
|
|
|
48
48
|
| **User Stories** | US-001: calculator functions, US-002: pytest tests |
|
|
49
49
|
| **Iteration Unit** | One user story per iteration |
|
|
50
50
|
| **Verification** | `python3 -m pytest test_calc.py -v` |
|
|
51
|
-
| **Models** | Worker: sonnet, Verifier:
|
|
51
|
+
| **Models** | Worker: sonnet, Verifier: opus |
|
|
52
52
|
| **Max Iterations** | 10 |
|
|
53
53
|
|
|
54
54
|
On approval, brainstorm offers to run `init` automatically.
|
|
@@ -119,7 +119,7 @@ You'll see status updates after each iteration:
|
|
|
119
119
|
```
|
|
120
120
|
Iteration 1 | Worker (sonnet) | US-001 complete, continuing
|
|
121
121
|
Iteration 2 | Worker (sonnet) | All stories done, requesting verification
|
|
122
|
-
Iteration 3 | Verifier (
|
|
122
|
+
Iteration 3 | Verifier (opus) | PASS — all criteria met
|
|
123
123
|
✓ COMPLETE
|
|
124
124
|
```
|
|
125
125
|
|
|
@@ -11,9 +11,16 @@ for iteration in 1..max_iter:
|
|
|
11
11
|
- <slug>-complete.md exists → stop (success)
|
|
12
12
|
- <slug>-blocked.md exists → stop (failure)
|
|
13
13
|
|
|
14
|
+
①½ Prep-stage cleanup (before each iteration)
|
|
15
|
+
- Delete <slug>-done-claim.json if exists [leader-measured]
|
|
16
|
+
- Delete <slug>-verify-verdict.json if exists [leader-measured]
|
|
17
|
+
(Ensures stale runtime files from a previous run cannot mislead the loop)
|
|
18
|
+
|
|
14
19
|
② Read memory.md
|
|
15
20
|
- Parse "Stop Status" section → continue/verify/blocked
|
|
16
21
|
- Parse "Next Iteration Contract" → task for this iteration
|
|
22
|
+
• Also read "Completed Stories" → track what has been verified
|
|
23
|
+
• Also read "Key Decisions" → architectural choices already settled
|
|
17
24
|
|
|
18
25
|
③ Select model
|
|
19
26
|
- Apply model routing rules (see below)
|
|
@@ -42,7 +49,8 @@ for iteration in 1..max_iter:
|
|
|
42
49
|
• verdict=fail + recommended=continue → go to ⑧
|
|
43
50
|
• verdict=blocked → write BLOCKED sentinel, stop
|
|
44
51
|
|
|
45
|
-
⑧
|
|
52
|
+
⑧ Write iter-NNN.result.md (see Result Log below)
|
|
53
|
+
Update status.json, report to user, next iteration
|
|
46
54
|
```
|
|
47
55
|
|
|
48
56
|
## Signal Contracts
|
|
@@ -63,15 +71,66 @@ continue | verify | blocked
|
|
|
63
71
|
## Current State
|
|
64
72
|
Iteration N - <description>
|
|
65
73
|
|
|
74
|
+
## Completed Stories
|
|
75
|
+
- US-001: Calculator add/subtract implemented [interface: `add(a, b) -> float`]
|
|
76
|
+
- US-002: pytest suite — 8 tests passing
|
|
77
|
+
|
|
66
78
|
## Next Iteration Contract
|
|
67
|
-
|
|
79
|
+
**Story**: US-003 — Edge case handling
|
|
80
|
+
**Task**: Handle divide-by-zero in calc.py.
|
|
81
|
+
1. Raise ValueError with message "division by zero"
|
|
82
|
+
2. Add test_divide_by_zero to test_calc.py
|
|
83
|
+
|
|
84
|
+
**Criteria**:
|
|
85
|
+
- `pytest` exits 0
|
|
86
|
+
- `grep "ValueError" calc.py` matches
|
|
87
|
+
|
|
88
|
+
## Key Decisions
|
|
89
|
+
- Iteration 2: Chose ValueError over ZeroDivisionError — matches project error style.
|
|
90
|
+
- Iteration 3: Skipped type hints — out of scope per PRD Non-Goals.
|
|
68
91
|
|
|
69
92
|
## Patterns Discovered
|
|
70
93
|
## Learnings
|
|
71
94
|
## Evidence Chain
|
|
72
95
|
```
|
|
73
96
|
|
|
74
|
-
The Leader reads
|
|
97
|
+
The Leader reads:
|
|
98
|
+
- **Stop Status** and **Next Iteration Contract** to decide what happens next.
|
|
99
|
+
- **Completed Stories** to track verified work without re-reading full history.
|
|
100
|
+
- **Key Decisions** to carry forward settled architectural choices.
|
|
101
|
+
|
|
102
|
+
All sections use plain Markdown. No YAML.
|
|
103
|
+
|
|
104
|
+
### Iteration Signal (`<slug>-iter-signal.json`)
|
|
105
|
+
|
|
106
|
+
Written by the Worker at the end of every iteration. Provides a structured JSON signal for the Leader to detect iteration completion without parsing markdown.
|
|
107
|
+
|
|
108
|
+
```json
|
|
109
|
+
{
|
|
110
|
+
"iteration": 3,
|
|
111
|
+
"status": "continue|verify|blocked",
|
|
112
|
+
"summary": "Completed US-001, other stories remain",
|
|
113
|
+
"timestamp": "2025-01-15T10:30:00Z"
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
| Field | Type | Description |
|
|
118
|
+
|-------|------|-------------|
|
|
119
|
+
| `iteration` | number | Current iteration number |
|
|
120
|
+
| `status` | string | One of: `continue`, `verify`, `blocked` |
|
|
121
|
+
| `summary` | string | Brief description of what was accomplished |
|
|
122
|
+
| `timestamp` | string | ISO 8601 UTC timestamp |
|
|
123
|
+
|
|
124
|
+
**Status values:**
|
|
125
|
+
- `continue` -- Current action done but more work remains. Leader proceeds to next iteration.
|
|
126
|
+
- `verify` -- All work complete and done-claim written. Leader dispatches Verifier.
|
|
127
|
+
- `blocked` -- Autonomous blocker encountered. Leader writes BLOCKED sentinel.
|
|
128
|
+
|
|
129
|
+
**Usage by mode:**
|
|
130
|
+
- **Tmux mode:** The shell Leader polls for this file's existence after dispatching the Worker. Once it appears, the Leader reads the `status` field via `jq` to decide the next step. This is the primary control-flow mechanism in tmux mode.
|
|
131
|
+
- **Agent() mode:** The Leader MAY read this file as a structured alternative to parsing `memory.md`'s Stop Status section. Agent() mode primarily uses memory.md, so iter-signal.json is supplementary.
|
|
132
|
+
|
|
133
|
+
**Worker obligation:** The Worker MUST write this file at the end of every iteration, regardless of execution mode. This ensures both Agent() and tmux modes can use the same Worker prompt templates.
|
|
75
134
|
|
|
76
135
|
### Done Claim (`<slug>-done-claim.json`)
|
|
77
136
|
|
|
@@ -92,11 +151,15 @@ Written by the Worker when claiming all work is complete:
|
|
|
92
151
|
|
|
93
152
|
### Verify Verdict (`<slug>-verify-verdict.json`)
|
|
94
153
|
|
|
95
|
-
Written by the Verifier after independent verification
|
|
154
|
+
Written by the Verifier after independent verification.
|
|
155
|
+
|
|
156
|
+
**Tmux mode polling:** In tmux mode, after dispatching the Verifier, the shell Leader polls for the existence of `verify-verdict.json` (same pattern as `iter-signal.json`). Once it appears, the Leader reads the `verdict` and `recommended_state_transition` fields via `jq` to decide whether to write a COMPLETE sentinel, continue iterating, or write a BLOCKED sentinel.
|
|
157
|
+
|
|
158
|
+
**Schema:**
|
|
96
159
|
|
|
97
160
|
```json
|
|
98
161
|
{
|
|
99
|
-
"verdict": "pass|fail|
|
|
162
|
+
"verdict": "pass|fail|request_info",
|
|
100
163
|
"verified_at_utc": "2025-01-15T10:35:00Z",
|
|
101
164
|
"summary": "All criteria verified with fresh evidence",
|
|
102
165
|
"criteria_results": [
|
|
@@ -107,13 +170,76 @@ Written by the Verifier after independent verification:
|
|
|
107
170
|
}
|
|
108
171
|
],
|
|
109
172
|
"missing_evidence": [],
|
|
110
|
-
"issues": [
|
|
173
|
+
"issues": [
|
|
174
|
+
{
|
|
175
|
+
"criterion": "US-002 AC1",
|
|
176
|
+
"description": "Test file missing",
|
|
177
|
+
"severity": "critical|major|minor",
|
|
178
|
+
"fix_hint": "(suggestion, non-authoritative) Add test_calc.py"
|
|
179
|
+
}
|
|
180
|
+
],
|
|
111
181
|
"recommended_state_transition": "complete|continue|blocked",
|
|
112
182
|
"next_iteration_contract": "Fix failing test for divide by zero",
|
|
113
183
|
"evidence_paths": ["test_calc.py::test_divide_by_zero"]
|
|
114
184
|
}
|
|
115
185
|
```
|
|
116
186
|
|
|
187
|
+
**Verdict values:**
|
|
188
|
+
- `pass`: all criteria met — Leader may write COMPLETE sentinel
|
|
189
|
+
- `fail`: one or more criteria not met — Leader reads issues, builds next contract
|
|
190
|
+
- `request_info`: Verifier cannot determine pass/fail without more information — summary contains specific questions; Leader decides outcome and may relay questions to Worker
|
|
191
|
+
|
|
192
|
+
**Issues severity:**
|
|
193
|
+
- `critical`: blocking — must be fixed before COMPLETE
|
|
194
|
+
- `major`: significant gap in acceptance criteria
|
|
195
|
+
- `minor`: cosmetic or non-blocking concern
|
|
196
|
+
|
|
197
|
+
**Verifier scope:**
|
|
198
|
+
- Identify changed files via `git diff --name-only` — read those files and their direct imports only
|
|
199
|
+
- Campaign Memory (`<slug>-memory.md`) is for orientation only — not the source of truth for verification
|
|
200
|
+
- Delegate deterministic checks (type hints, linting, security) to tools defined in test-spec
|
|
201
|
+
- Focus on: AC verification, semantic review, smoke tests
|
|
202
|
+
- Do NOT use `fail` when uncertain — use `request_info` with specific questions instead
|
|
203
|
+
|
|
204
|
+
### Fix Loop Protocol
|
|
205
|
+
|
|
206
|
+
When the Verifier returns `fail`, the Leader executes the Fix Loop before dispatching the next Worker:
|
|
207
|
+
|
|
208
|
+
#### Flow
|
|
209
|
+
|
|
210
|
+
```
|
|
211
|
+
Verifier fail
|
|
212
|
+
→ Leader reads verify-verdict.json issues
|
|
213
|
+
→ Sort issues by severity: critical → major → minor
|
|
214
|
+
→ Build structured fix contract (see format below)
|
|
215
|
+
→ Increment consecutive_failures in status.json
|
|
216
|
+
→ Dispatch Worker with fix contract as Next Iteration Contract
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
#### Fix Contract Format
|
|
220
|
+
|
|
221
|
+
```markdown
|
|
222
|
+
## Next Iteration Contract
|
|
223
|
+
**Mode**: fix
|
|
224
|
+
**Verifier verdict reference**: iter-NNN
|
|
225
|
+
|
|
226
|
+
**Issues to fix** (severity-sorted):
|
|
227
|
+
1. [critical] US-002 AC3: <description>
|
|
228
|
+
- fix_hint: (suggestion, non-authoritative) <hint text>
|
|
229
|
+
2. [major] US-001 AC1: <description>
|
|
230
|
+
3. [minor] US-003 AC2: <description>
|
|
231
|
+
|
|
232
|
+
**Traceability rule**: Only changes that resolve a listed issue are allowed (traceability enforcement).
|
|
233
|
+
Every change must be justified by the issue it addresses.
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
#### Rules
|
|
237
|
+
|
|
238
|
+
- `fix_hint` is optional. When present it is labeled `(suggestion, non-authoritative)` — the Worker may choose a different approach.
|
|
239
|
+
- **traceability**: the Worker must not introduce changes beyond what is needed to resolve the listed issues.
|
|
240
|
+
- The Leader increments `consecutive_failures` in `status.json` after each `fail` verdict, and resets it to 0 after any `pass`.
|
|
241
|
+
- The Leader (not the Worker) owns the `consecutive_failures` counter.
|
|
242
|
+
|
|
117
243
|
### Sentinels
|
|
118
244
|
|
|
119
245
|
Leader-only files that terminate the loop:
|
|
@@ -155,7 +281,8 @@ Updated by the Worker each iteration to reflect the current frontier:
|
|
|
155
281
|
| Condition | Detection | Action |
|
|
156
282
|
|-----------|-----------|--------|
|
|
157
283
|
| Stale context | `context-latest.md` hash unchanged for 3 consecutive iterations | Write BLOCKED sentinel |
|
|
158
|
-
| Repeated
|
|
284
|
+
| Repeated criterion failure | Same acceptance criterion fails in 2 consecutive Verifier verdicts | Upgrade model, retry once; still failing → BLOCKED |
|
|
285
|
+
| Persistent diverse failures | 3 consecutive **fail** verdicts on 3 unique acceptance criterion IDs | Upgrade to opus, retry once; still failing → BLOCKED |
|
|
159
286
|
| Timeout | Iteration count reaches `max_iter` | Write TIMEOUT status, report to user |
|
|
160
287
|
|
|
161
288
|
### Stale Context Detection
|
|
@@ -165,10 +292,20 @@ The Leader computes a hash (or diff) of `context-latest.md` before and after eac
|
|
|
165
292
|
### Error Escalation
|
|
166
293
|
|
|
167
294
|
```
|
|
168
|
-
|
|
169
|
-
Same
|
|
295
|
+
Same acceptance criterion fails iteration N (sonnet) → retry with opus in iteration N+1
|
|
296
|
+
Same acceptance criterion still fails iteration N+1 (opus) → BLOCKED
|
|
170
297
|
```
|
|
171
298
|
|
|
299
|
+
"Same error" is defined as: **the same acceptance criterion ID appears in the `issues` list of two consecutive Verifier `fail` verdicts.** A `request_info` verdict does not break or contribute to this chain — only `fail` verdicts are counted.
|
|
300
|
+
|
|
301
|
+
### Consecutive Failures Counter
|
|
302
|
+
|
|
303
|
+
The Leader maintains `consecutive_failures` in `status.json`. This counter:
|
|
304
|
+
- Increments by 1 after each Verifier `fail` verdict
|
|
305
|
+
- Resets to 0 after any Verifier `pass` verdict
|
|
306
|
+
- **Unchanged** by `request_info` verdicts (neither increments nor resets)
|
|
307
|
+
- Triggers the 3-consecutive-diverse-failures CB when it reaches 3 and the 3 most recent `fail` verdicts each have a unique criterion ID
|
|
308
|
+
|
|
172
309
|
## Model Routing
|
|
173
310
|
|
|
174
311
|
### Selection Matrix
|
|
@@ -179,8 +316,8 @@ Same error in iteration N+1 (opus) → BLOCKED
|
|
|
179
316
|
| Standard implementation | `sonnet` | Balanced (default) |
|
|
180
317
|
| Multi-file, architecture | `opus` | Needs broad understanding |
|
|
181
318
|
| Previous iteration failed | upgrade | Harder model may succeed |
|
|
182
|
-
| Verification (
|
|
183
|
-
| Verification (
|
|
319
|
+
| Verification (default) | `opus` | Independent verification requires thoroughness |
|
|
320
|
+
| Verification (lightweight) | `sonnet` | Simple, well-defined checks only |
|
|
184
321
|
|
|
185
322
|
### Dynamic Adaptation
|
|
186
323
|
|
|
@@ -191,6 +328,29 @@ The Leader reassesses the model every iteration:
|
|
|
191
328
|
3. If simple/repetitive → consider downgrade
|
|
192
329
|
4. User override via `--worker-model` / `--verifier-model` takes precedence
|
|
193
330
|
|
|
331
|
+
## Result Log (`iter-NNN.result.md`)
|
|
332
|
+
|
|
333
|
+
Written by the Leader after each iteration completes (step ⑧). Stored in `logs/<slug>/`.
|
|
334
|
+
|
|
335
|
+
```markdown
|
|
336
|
+
# Iteration NNN Result
|
|
337
|
+
|
|
338
|
+
## Result Status
|
|
339
|
+
pass | fail | continue [leader-measured]
|
|
340
|
+
|
|
341
|
+
## Files Changed
|
|
342
|
+
(output of `git diff --stat HEAD~1 HEAD`) [git-measured]
|
|
343
|
+
|
|
344
|
+
## Summary
|
|
345
|
+
<1–2 sentence summary of what the Worker did this iteration>
|
|
346
|
+
|
|
347
|
+
## Verifier Verdict
|
|
348
|
+
pass | fail | blocked | (not run) [leader-measured]
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
- `[leader-measured]`: value determined by the Leader reading memory/verdict files.
|
|
352
|
+
- `[git-measured]`: value determined by running `git diff --stat` — not from Worker's claim.
|
|
353
|
+
|
|
194
354
|
## Status File (`status.json`)
|
|
195
355
|
|
|
196
356
|
Updated by the Leader after each iteration:
|
|
@@ -202,19 +362,112 @@ Updated by the Leader after each iteration:
|
|
|
202
362
|
"max_iter": 100,
|
|
203
363
|
"phase": "worker|verifier|complete|blocked|timeout",
|
|
204
364
|
"worker_model": "sonnet",
|
|
205
|
-
"verifier_model": "
|
|
365
|
+
"verifier_model": "opus",
|
|
206
366
|
"last_result": "continue|verify|pass|fail|blocked",
|
|
367
|
+
"consecutive_failures": 0,
|
|
207
368
|
"updated_at_utc": "2025-01-15T10:30:00Z"
|
|
208
369
|
}
|
|
209
370
|
```
|
|
210
371
|
|
|
372
|
+
- `consecutive_failures`: number of consecutive Verifier `fail` verdicts since the last `pass`. Reset to 0 on any `pass`. Unchanged by `request_info`. Used by the Circuit Breaker (see above).
|
|
373
|
+
- `last_failing_criteria`: (optional) array of criterion IDs from recent `fail` verdicts, used by Leader to detect same-criterion and diverse-failure CB patterns. Leaders may add additional tracking fields as needed.
|
|
374
|
+
|
|
375
|
+
## Project Plans Files
|
|
376
|
+
|
|
377
|
+
The `plans/` directory holds documents that define the project's acceptance criteria and verification approach:
|
|
378
|
+
|
|
379
|
+
| File | Required | Description |
|
|
380
|
+
|------|----------|-------------|
|
|
381
|
+
| `plans/prd-<slug>.md` | Yes | Product Requirements Document — user stories, acceptance criteria, non-goals |
|
|
382
|
+
| `plans/test-spec-<slug>.md` | Yes | Test specification — verification commands, criteria-to-test mapping |
|
|
383
|
+
| `plans/quality-spec-<slug>.md` | Optional | Additional quality constraints (coding standards, performance budgets, security requirements). Not generated by `init` — create manually when needed. |
|
|
384
|
+
|
|
385
|
+
The `quality-spec` file is not generated by `init`. Create it manually when a project requires additional quality constraints beyond the acceptance criteria in the PRD.
|
|
386
|
+
|
|
211
387
|
## Slash Command Reference
|
|
212
388
|
|
|
213
389
|
| Command | Arguments | Description |
|
|
214
390
|
|---------|-----------|-------------|
|
|
215
391
|
| `brainstorm` | `<description>` | Interactive planning before init |
|
|
216
392
|
| `init` | `<slug> [objective]` | Create project scaffold |
|
|
217
|
-
| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M]` | Run the leader loop |
|
|
393
|
+
| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux]` | Run the leader loop |
|
|
218
394
|
| `status` | `<slug>` | Display current loop status |
|
|
219
395
|
| `logs` | `<slug> [N]` | Show iteration logs |
|
|
220
|
-
| `clean` | `<slug
|
|
396
|
+
| `clean` | `<slug> [--kill-session]` | Remove runtime artifacts for re-run |
|
|
397
|
+
|
|
398
|
+
### `--mode` Flag
|
|
399
|
+
|
|
400
|
+
The `run` command accepts `--mode agent|tmux` (default: `agent`).
|
|
401
|
+
|
|
402
|
+
- **`--mode agent`** (default): The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. Synchronous, no tmux required.
|
|
403
|
+
- **`--mode tmux`**: Validates the scaffold, checks prerequisites (`tmux`, `jq`), then launches `run_ralph_desk.zsh` as the Leader. The LLM session exits after launching the script. The shell script runs independently in a tmux session.
|
|
404
|
+
|
|
405
|
+
### `--kill-session` Flag
|
|
406
|
+
|
|
407
|
+
The `clean` command accepts `--kill-session` to kill any tmux sessions matching the slug pattern (`rlp-desk-<slug>-*`) in addition to removing runtime files.
|
|
408
|
+
|
|
409
|
+
## Tmux Mode Specifics
|
|
410
|
+
|
|
411
|
+
This section documents the tmux-specific patterns used by `run_ralph_desk.zsh`. These apply only when running with `--mode tmux`.
|
|
412
|
+
|
|
413
|
+
### Write-Then-Notify
|
|
414
|
+
|
|
415
|
+
The single most important pattern. **Never** send data (prompts, large strings) through `tmux send-keys` directly.
|
|
416
|
+
|
|
417
|
+
1. Write the prompt to a file: `logs/<slug>/iter-NNN.worker-prompt.md`
|
|
418
|
+
2. Write a trigger script to a file: `logs/<slug>/iter-NNN.worker-trigger.sh`
|
|
419
|
+
3. Send only a short command via `send-keys`: `bash /path/to/trigger.sh`
|
|
420
|
+
|
|
421
|
+
The trigger script reads the prompt file and invokes `claude -p "$(cat /path/to/prompt.md)" --model <model> --dangerously-skip-permissions`.
|
|
422
|
+
|
|
423
|
+
### Signal File Polling
|
|
424
|
+
|
|
425
|
+
In tmux mode, the shell Leader cannot call `Agent()` synchronously. Instead, it polls for signal files:
|
|
426
|
+
|
|
427
|
+
| Signal | Written By | Polled By Leader | Purpose |
|
|
428
|
+
|--------|-----------|------------------|---------|
|
|
429
|
+
| `<slug>-iter-signal.json` | Worker | After dispatching Worker | Detect Worker iteration completion |
|
|
430
|
+
| `<slug>-verify-verdict.json` | Verifier | After dispatching Verifier | Detect Verifier completion |
|
|
431
|
+
|
|
432
|
+
The Leader reads these files with `jq` to extract status/verdict fields for control-flow decisions.
|
|
433
|
+
|
|
434
|
+
### Heartbeat Monitoring
|
|
435
|
+
|
|
436
|
+
Each trigger script writes a heartbeat file (`worker-heartbeat.json` or `verifier-heartbeat.json`) in a background loop. The Leader periodically checks the heartbeat's timestamp to detect stale processes (no update within `HEARTBEAT_STALE_THRESHOLD` seconds).
|
|
437
|
+
|
|
438
|
+
### Idle Pane Nudging
|
|
439
|
+
|
|
440
|
+
If a pane produces no output for `IDLE_NUDGE_THRESHOLD` seconds, the Leader sends a nudge (an Enter keystroke) to prompt activity. After `MAX_NUDGES` attempts without progress, the Leader treats the pane as stuck.
|
|
441
|
+
|
|
442
|
+
### Exponential Backoff Restarts
|
|
443
|
+
|
|
444
|
+
If a Worker or Verifier process crashes, the Leader restarts it with exponential backoff: 5s, 10s, 20s, 60s. After `MAX_RESTARTS` consecutive failures, the Leader writes a BLOCKED sentinel.
|
|
445
|
+
|
|
446
|
+
### Per-Iteration Timeout
|
|
447
|
+
|
|
448
|
+
Each iteration has a configurable timeout (`ITER_TIMEOUT`, default 600s). If a Worker does not produce an `iter-signal.json` within this period, the Leader kills the process and records the timeout.
|
|
449
|
+
|
|
450
|
+
### Static Model Routing
|
|
451
|
+
|
|
452
|
+
Unlike Agent() mode where the LLM Leader dynamically selects models, tmux mode uses static model routing via environment variables:
|
|
453
|
+
|
|
454
|
+
| Variable | Default | Description |
|
|
455
|
+
|----------|---------|-------------|
|
|
456
|
+
| `WORKER_MODEL` | `sonnet` | Model for Worker invocations |
|
|
457
|
+
| `VERIFIER_MODEL` | `opus` | Model for Verifier invocations |
|
|
458
|
+
|
|
459
|
+
### Session Config
|
|
460
|
+
|
|
461
|
+
Session metadata is stored in `logs/<slug>/session-config.json`:
|
|
462
|
+
|
|
463
|
+
```json
|
|
464
|
+
{
|
|
465
|
+
"session_name": "rlp-desk-<slug>-20260318-143000",
|
|
466
|
+
"leader_pane": "%0",
|
|
467
|
+
"worker_pane": "%1",
|
|
468
|
+
"verifier_pane": "%2",
|
|
469
|
+
"created_at": "2026-03-18T14:30:00Z"
|
|
470
|
+
}
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
This file is used by the `status` and `clean` commands to find and interact with the running tmux session.
|
|
File without changes
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
Execute the plan for loop-test.
|
|
2
|
+
|
|
3
|
+
Required reads every iteration:
|
|
4
|
+
- PRD: .claude/ralph-desk/plans/prd-loop-test.md
|
|
5
|
+
- Test Spec: .claude/ralph-desk/plans/test-spec-loop-test.md
|
|
6
|
+
- Campaign Memory: .claude/ralph-desk/memos/loop-test-memory.md
|
|
7
|
+
- Latest Context: .claude/ralph-desk/context/loop-test-latest.md
|
|
8
|
+
|
|
9
|
+
CRITICAL RULE: Work on only ONE User Story per iteration.
|
|
10
|
+
- Check campaign memory's "Next Iteration Contract" first and do that.
|
|
11
|
+
- Do not touch already-completed stories.
|
|
12
|
+
|
|
13
|
+
Iteration rules:
|
|
14
|
+
- Use fresh context only; do NOT depend on prior chat history.
|
|
15
|
+
- Execute exactly ONE bounded next action (ONE user story).
|
|
16
|
+
- Refresh context file with the current frontier.
|
|
17
|
+
- Rewrite campaign memory in full.
|
|
18
|
+
|
|
19
|
+
MANDATORY: When done, write the following signal file:
|
|
20
|
+
- Path: .claude/ralph-desk/memos/loop-test-iter-signal.json
|
|
21
|
+
- Format: {"iteration": N, "status": "continue|verify|blocked", "summary": "what was done", "timestamp": "ISO"}
|
|
22
|
+
- Status values:
|
|
23
|
+
- "continue" = current story done but other stories remain
|
|
24
|
+
- "verify" = all stories complete + done-claim written
|
|
25
|
+
- "blocked" = autonomous blocker
|
|
26
|
+
|
|
27
|
+
Stop behavior:
|
|
28
|
+
- Current story done but other stories remain → memory stop=continue, signal status=continue
|
|
29
|
+
- All stories complete + all tests pass → write done-claim JSON (.claude/ralph-desk/memos/loop-test-done-claim.json) + signal status=verify
|
|
30
|
+
- Autonomous blocker → write blocked.md + signal status=blocked
|
|
31
|
+
|
|
32
|
+
Objective: Implement a Python calculator module: calc.py (4 functions + type hints + ValueError) + test_calc.py (pytest, 8+ tests, all passed)
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
## Iteration Context
|
|
36
|
+
- **Iteration**: 1
|
|
37
|
+
- **Memory Stop Status**: continue
|
|
38
|
+
- **Next Iteration Contract**: Start from the beginning: read PRD and implement US-001 (calc.py with 4 functions).
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
#!/bin/zsh
|
|
2
|
+
# Trigger for iteration 1 worker - generated by run_ralph_desk.zsh
|
|
3
|
+
# DO NOT use exec here -- it breaks heartbeat cleanup
|
|
4
|
+
|
|
5
|
+
HEARTBEAT_FILE="/Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/worker-heartbeat.json"
|
|
6
|
+
|
|
7
|
+
# Background heartbeat writer (omc-teams pattern)
|
|
8
|
+
(
|
|
9
|
+
while true; do
|
|
10
|
+
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","pid":'"$$"'}' > "${HEARTBEAT_FILE}.tmp.$$"
|
|
11
|
+
mv "${HEARTBEAT_FILE}.tmp.$$" "$HEARTBEAT_FILE"
|
|
12
|
+
sleep 15
|
|
13
|
+
done
|
|
14
|
+
) &
|
|
15
|
+
HEARTBEAT_PID=$!
|
|
16
|
+
|
|
17
|
+
# Run claude with fresh context (governance.md s7 step 5)
|
|
18
|
+
claude -p "$(cat /Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-prompt.md)" \
|
|
19
|
+
--model sonnet \
|
|
20
|
+
--dangerously-skip-permissions \
|
|
21
|
+
--output-format text \
|
|
22
|
+
2>&1 | tee /Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-output.log
|
|
23
|
+
|
|
24
|
+
# Cleanup heartbeat writer
|
|
25
|
+
kill $HEARTBEAT_PID 2>/dev/null
|
|
26
|
+
wait $HEARTBEAT_PID 2>/dev/null
|
|
27
|
+
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","status":"exited"}' > "${HEARTBEAT_FILE}.tmp.$$"
|
|
28
|
+
mv "${HEARTBEAT_FILE}.tmp.$$" "$HEARTBEAT_FILE"
|