@ai-dev-methodologies/rlp-desk 0.1.2 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +98 -0
- package/docs/protocol-reference.md +90 -3
- package/package.json +1 -1
- package/src/commands/rlp-desk.md +97 -10
- package/src/governance.md +87 -10
- package/src/scripts/init_ralph_desk.zsh +22 -6
- package/src/scripts/run_ralph_desk.zsh +729 -100
package/README.md
CHANGED
|
@@ -134,6 +134,12 @@ for iteration in 1..max_iter:
|
|
|
134
134
|
| `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
|
|
135
135
|
| `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
|
|
136
136
|
| `--mode agent\|tmux` | agent | Execution mode (see below) |
|
|
137
|
+
| `--worker-engine claude\|codex` | claude | Engine for Worker (claude uses Agent(), codex uses Bash CLI) |
|
|
138
|
+
| `--verifier-engine claude\|codex` | claude | Engine for Verifier |
|
|
139
|
+
| `--codex-model MODEL` | gpt-5.4 | Model passed to the Codex CLI (when engine=codex) |
|
|
140
|
+
| `--codex-reasoning low\|medium\|high` | high | Reasoning effort for Codex |
|
|
141
|
+
| `--verify-mode per-us\|batch` | per-us | Verification strategy (see below) |
|
|
142
|
+
| `--verify-consensus` | off | Cross-engine consensus verification (see below) |
|
|
137
143
|
|
|
138
144
|
## Execution Modes
|
|
139
145
|
|
|
@@ -193,6 +199,98 @@ To clean up tmux artifacts:
|
|
|
193
199
|
/rlp-desk clean calculator --kill-session
|
|
194
200
|
```
|
|
195
201
|
|
|
202
|
+
## Engine Support
|
|
203
|
+
|
|
204
|
+
RLP Desk supports two execution engines for Worker and Verifier. **Claude is the default.** Codex is opt-in.
|
|
205
|
+
|
|
206
|
+
### Claude (default)
|
|
207
|
+
|
|
208
|
+
```
|
|
209
|
+
/rlp-desk run calculator
|
|
210
|
+
/rlp-desk run calculator --worker-engine claude --verifier-engine claude
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
Uses Claude Code's `Agent()` tool (agent mode) or `claude -p` CLI (tmux mode). Supports dynamic model routing (haiku/sonnet/opus).
|
|
214
|
+
|
|
215
|
+
### Codex (opt-in)
|
|
216
|
+
|
|
217
|
+
```bash
|
|
218
|
+
# Install codex CLI first
|
|
219
|
+
npm install -g @openai/codex
|
|
220
|
+
|
|
221
|
+
# Run with codex worker
|
|
222
|
+
/rlp-desk run calculator --worker-engine codex
|
|
223
|
+
|
|
224
|
+
# Customize model and reasoning effort
|
|
225
|
+
/rlp-desk run calculator --worker-engine codex --codex-model gpt-5.4 --codex-reasoning high
|
|
226
|
+
|
|
227
|
+
# Mix engines: codex worker, claude verifier
|
|
228
|
+
/rlp-desk run calculator --worker-engine codex --verifier-engine claude
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Uses the `codex` CLI via `Bash()` (agent mode) or as an interactive TUI (tmux mode). The `codex` binary is only required when an engine is set to `codex`.
|
|
232
|
+
|
|
233
|
+
| Engine | Agent Mode | Tmux Mode | Dynamic Routing |
|
|
234
|
+
|--------|-----------|-----------|-----------------|
|
|
235
|
+
| claude | `Agent()` tool | `claude -p` TUI | Yes (haiku/sonnet/opus) |
|
|
236
|
+
| codex | `Bash("codex ...")` | `codex` TUI | No (static model) |
|
|
237
|
+
|
|
238
|
+
## Verification Modes
|
|
239
|
+
|
|
240
|
+
RLP Desk supports two verification strategies. **Per-US is the default.**
|
|
241
|
+
|
|
242
|
+
### Per-US Verification (default)
|
|
243
|
+
|
|
244
|
+
```
|
|
245
|
+
/rlp-desk run calculator
|
|
246
|
+
/rlp-desk run calculator --verify-mode per-us
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
Each user story is verified independently after completion, then a final full verification runs after all stories pass:
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
Worker: US-001 → Verifier: US-001 AC only → pass
|
|
253
|
+
Worker: US-002 → Verifier: US-002 AC only → pass
|
|
254
|
+
Worker: US-003 → Verifier: US-003 AC only → pass
|
|
255
|
+
Final full verify: ALL AC → pass → COMPLETE
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
Benefits:
|
|
259
|
+
- Catch issues early, before later stories build on broken foundations
|
|
260
|
+
- Smaller verification scope = faster, more accurate checks
|
|
261
|
+
- Failed verification retries only the specific US
|
|
262
|
+
|
|
263
|
+
### Batch Verification
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
/rlp-desk run calculator --verify-mode batch
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
Legacy behavior: Worker completes all stories, then a single verification checks all acceptance criteria at once.
|
|
270
|
+
|
|
271
|
+
### Cross-Engine Consensus Verification
|
|
272
|
+
|
|
273
|
+
```
|
|
274
|
+
/rlp-desk run calculator --verify-consensus
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
When enabled, **both claude and codex verify independently**. Both must pass for verification to succeed.
|
|
278
|
+
|
|
279
|
+
```
|
|
280
|
+
Worker completes US → Claude verifies → Codex verifies
|
|
281
|
+
Both pass → proceed
|
|
282
|
+
Either fails → combined fix contract → Worker retry
|
|
283
|
+
3 rounds without consensus → BLOCKED
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
Consensus can be combined with per-US mode for maximum rigor:
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
/rlp-desk run calculator --verify-mode per-us --verify-consensus
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Prerequisites: Both `claude` and `codex` CLIs must be installed.
|
|
293
|
+
|
|
196
294
|
## Project Structure
|
|
197
295
|
|
|
198
296
|
After `init`, your project gets this scaffold:
|
|
@@ -109,6 +109,7 @@ Written by the Worker at the end of every iteration. Provides a structured JSON
|
|
|
109
109
|
{
|
|
110
110
|
"iteration": 3,
|
|
111
111
|
"status": "continue|verify|blocked",
|
|
112
|
+
"us_id": "US-001",
|
|
112
113
|
"summary": "Completed US-001, other stories remain",
|
|
113
114
|
"timestamp": "2025-01-15T10:30:00Z"
|
|
114
115
|
}
|
|
@@ -118,12 +119,13 @@ Written by the Worker at the end of every iteration. Provides a structured JSON
|
|
|
118
119
|
|-------|------|-------------|
|
|
119
120
|
| `iteration` | number | Current iteration number |
|
|
120
121
|
| `status` | string | One of: `continue`, `verify`, `blocked` |
|
|
122
|
+
| `us_id` | string\|null | US being verified: `"US-001"`, `"ALL"` (final full verify), or null (batch mode) |
|
|
121
123
|
| `summary` | string | Brief description of what was accomplished |
|
|
122
124
|
| `timestamp` | string | ISO 8601 UTC timestamp |
|
|
123
125
|
|
|
124
126
|
**Status values:**
|
|
125
127
|
- `continue` -- Current action done but more work remains. Leader proceeds to next iteration.
|
|
126
|
-
- `verify` --
|
|
128
|
+
- `verify` -- Current US complete (per-US mode) or all work complete (batch mode). Leader dispatches Verifier scoped to `us_id`.
|
|
127
129
|
- `blocked` -- Autonomous blocker encountered. Leader writes BLOCKED sentinel.
|
|
128
130
|
|
|
129
131
|
**Usage by mode:**
|
|
@@ -160,6 +162,7 @@ Written by the Verifier after independent verification.
|
|
|
160
162
|
```json
|
|
161
163
|
{
|
|
162
164
|
"verdict": "pass|fail|request_info",
|
|
165
|
+
"us_id": "US-001",
|
|
163
166
|
"verified_at_utc": "2025-01-15T10:35:00Z",
|
|
164
167
|
"summary": "All criteria verified with fresh evidence",
|
|
165
168
|
"criteria_results": [
|
|
@@ -185,7 +188,7 @@ Written by the Verifier after independent verification.
|
|
|
185
188
|
```
|
|
186
189
|
|
|
187
190
|
**Verdict values:**
|
|
188
|
-
- `pass`: all criteria met — Leader may write COMPLETE sentinel
|
|
191
|
+
- `pass`: all criteria met — Leader may write COMPLETE sentinel (or add US to `verified_us` in per-US mode)
|
|
189
192
|
- `fail`: one or more criteria not met — Leader reads issues, builds next contract
|
|
190
193
|
- `request_info`: Verifier cannot determine pass/fail without more information — summary contains specific questions; Leader decides outcome and may relay questions to Worker
|
|
191
194
|
|
|
@@ -363,14 +366,28 @@ Updated by the Leader after each iteration:
|
|
|
363
366
|
"phase": "worker|verifier|complete|blocked|timeout",
|
|
364
367
|
"worker_model": "sonnet",
|
|
365
368
|
"verifier_model": "opus",
|
|
369
|
+
"worker_engine": "claude",
|
|
370
|
+
"verifier_engine": "claude",
|
|
371
|
+
"verify_mode": "per-us",
|
|
372
|
+
"verify_consensus": 0,
|
|
366
373
|
"last_result": "continue|verify|pass|fail|blocked",
|
|
367
374
|
"consecutive_failures": 0,
|
|
375
|
+
"verified_us": ["US-001", "US-002"],
|
|
376
|
+
"consensus_round": 0,
|
|
377
|
+
"claude_verdict": "",
|
|
378
|
+
"codex_verdict": "",
|
|
368
379
|
"updated_at_utc": "2025-01-15T10:30:00Z"
|
|
369
380
|
}
|
|
370
381
|
```
|
|
371
382
|
|
|
372
383
|
- `consecutive_failures`: number of consecutive Verifier `fail` verdicts since the last `pass`. Reset to 0 on any `pass`. Unchanged by `request_info`. Used by the Circuit Breaker (see above).
|
|
373
384
|
- `last_failing_criteria`: (optional) array of criterion IDs from recent `fail` verdicts, used by Leader to detect same-criterion and diverse-failure CB patterns. Leaders may add additional tracking fields as needed.
|
|
385
|
+
- `verified_us`: array of US IDs that have individually passed verification (per-US mode only). Empty in batch mode.
|
|
386
|
+
- `verify_mode`: `per-us` or `batch`. Controls the verification strategy.
|
|
387
|
+
- `verify_consensus`: `0` or `1`. Whether cross-engine consensus verification is enabled.
|
|
388
|
+
- `consensus_round`: current consensus round for the active US (resets per US). Only present when `verify_consensus=1`.
|
|
389
|
+
- `claude_verdict`: latest claude verifier verdict. Only present when `verify_consensus=1`.
|
|
390
|
+
- `codex_verdict`: latest codex verifier verdict. Only present when `verify_consensus=1`.
|
|
374
391
|
|
|
375
392
|
## Project Plans Files
|
|
376
393
|
|
|
@@ -390,7 +407,7 @@ The `quality-spec` file is not generated by `init`. Create it manually when a pr
|
|
|
390
407
|
|---------|-----------|-------------|
|
|
391
408
|
| `brainstorm` | `<description>` | Interactive planning before init |
|
|
392
409
|
| `init` | `<slug> [objective]` | Create project scaffold |
|
|
393
|
-
| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux]` | Run the leader loop |
|
|
410
|
+
| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux] [--worker-engine claude\|codex] [--verifier-engine claude\|codex] [--codex-model MODEL] [--codex-reasoning low\|medium\|high] [--verify-mode per-us\|batch] [--verify-consensus]` | Run the leader loop |
|
|
394
411
|
| `status` | `<slug>` | Display current loop status |
|
|
395
412
|
| `logs` | `<slug> [N]` | Show iteration logs |
|
|
396
413
|
| `clean` | `<slug> [--kill-session]` | Remove runtime artifacts for re-run |
|
|
@@ -402,6 +419,76 @@ The `run` command accepts `--mode agent|tmux` (default: `agent`).
|
|
|
402
419
|
- **`--mode agent`** (default): The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. Synchronous, no tmux required.
|
|
403
420
|
- **`--mode tmux`**: Validates the scaffold, checks prerequisites (`tmux`, `jq`), then launches `run_ralph_desk.zsh` as the Leader. The LLM session exits after launching the script. The shell script runs independently in a tmux session.
|
|
404
421
|
|
|
422
|
+
### Engine Options
|
|
423
|
+
|
|
424
|
+
The `run` command accepts engine flags to control which CLI executes Worker and Verifier prompts. **Claude is the default engine.**
|
|
425
|
+
|
|
426
|
+
| Flag | Default | Description |
|
|
427
|
+
|------|---------|-------------|
|
|
428
|
+
| `--worker-engine claude\|codex` | `claude` | Engine for Worker |
|
|
429
|
+
| `--verifier-engine claude\|codex` | `claude` | Engine for Verifier |
|
|
430
|
+
| `--codex-model MODEL` | `gpt-5.4` | Model passed to the `codex` CLI (when engine=codex) |
|
|
431
|
+
| `--codex-reasoning low\|medium\|high` | `high` | Reasoning effort for the `codex` CLI |
|
|
432
|
+
|
|
433
|
+
**Claude engine** (default): uses `Agent()` in agent mode, `claude -p` with `--dangerously-skip-permissions` in tmux mode.
|
|
434
|
+
|
|
435
|
+
**Codex engine** (opt-in): uses `Bash("codex ...")` in agent mode, interactive `codex` TUI in tmux mode. The `codex` binary must be installed separately (`npm install -g @openai/codex`) and is only required when an engine is set to `codex`.
|
|
436
|
+
|
|
437
|
+
Engine flags are passed to tmux mode via environment variables: `WORKER_ENGINE`, `VERIFIER_ENGINE`, `CODEX_MODEL`, `CODEX_REASONING`.
|
|
438
|
+
|
|
439
|
+
### Verify Mode Options
|
|
440
|
+
|
|
441
|
+
The `run` command accepts `--verify-mode` to control the verification strategy. **Per-US is the default.**
|
|
442
|
+
|
|
443
|
+
| Flag | Default | Description |
|
|
444
|
+
|------|---------|-------------|
|
|
445
|
+
| `--verify-mode per-us\|batch` | `per-us` | Verification strategy |
|
|
446
|
+
| `--verify-consensus` | off | Cross-engine consensus verification |
|
|
447
|
+
|
|
448
|
+
**Per-US mode** (default): After each user story is completed, the Verifier checks only that story's acceptance criteria. After all stories individually pass, a final full verify checks all AC. The Leader tracks `verified_us` in `status.json`.
|
|
449
|
+
|
|
450
|
+
**Batch mode**: Legacy behavior. Worker completes all stories, then a single verification checks all AC at once.
|
|
451
|
+
|
|
452
|
+
**Consensus verification** (`--verify-consensus`): After the primary verifier runs, a second verifier runs with the alternate engine (claude or codex). Both must pass. If either fails, combined issues form a fix contract. Max 3 consensus rounds per US before BLOCKED. Requires both `claude` and `codex` CLIs.
|
|
453
|
+
|
|
454
|
+
Verify mode flags are passed to tmux mode via environment variables: `VERIFY_MODE`, `VERIFY_CONSENSUS`.
|
|
455
|
+
|
|
456
|
+
### Iteration Signal (`us_id` field)
|
|
457
|
+
|
|
458
|
+
When using per-US verification, the Worker includes a `us_id` field in `iter-signal.json`:
|
|
459
|
+
|
|
460
|
+
```json
|
|
461
|
+
{
|
|
462
|
+
"iteration": 3,
|
|
463
|
+
"status": "verify",
|
|
464
|
+
"us_id": "US-001",
|
|
465
|
+
"summary": "Completed US-001",
|
|
466
|
+
"timestamp": "2025-01-15T10:30:00Z"
|
|
467
|
+
}
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
| Value | Meaning |
|
|
471
|
+
|-------|---------|
|
|
472
|
+
| `"US-001"` (specific) | Verify only this story's AC |
|
|
473
|
+
| `"ALL"` | Final full verify — check all AC |
|
|
474
|
+
| absent/null | Legacy batch mode — check all AC |
|
|
475
|
+
|
|
476
|
+
The Verifier's verdict JSON also includes a `us_id` field to confirm which scope was verified.
|
|
477
|
+
|
|
478
|
+
### Consensus Verdict Fields
|
|
479
|
+
|
|
480
|
+
When `--verify-consensus` is enabled, `status.json` includes additional fields:
|
|
481
|
+
|
|
482
|
+
```json
|
|
483
|
+
{
|
|
484
|
+
"consensus_round": 1,
|
|
485
|
+
"claude_verdict": "pass",
|
|
486
|
+
"codex_verdict": "pass"
|
|
487
|
+
}
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
Individual engine verdicts are saved as `verify-verdict-claude.json` and `verify-verdict-codex.json` in the logs directory.
|
|
491
|
+
|
|
405
492
|
### `--kill-session` Flag
|
|
406
493
|
|
|
407
494
|
The `clean` command accepts `--kill-session` to kill any tmux sessions matching the slug pattern (`rlp-desk-<slug>-*`) in addition to removing runtime files.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.0",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
package/src/commands/rlp-desk.md
CHANGED
|
@@ -31,7 +31,10 @@ Ask about these items one by one (or in small groups):
|
|
|
31
31
|
5. **Verification Commands** — build, test, lint commands
|
|
32
32
|
6. **Completion / Blocked Criteria**
|
|
33
33
|
7. **Worker / Verifier Model** — haiku, sonnet, opus. Suggest defaults (worker: sonnet, verifier: opus), ask if OK.
|
|
34
|
-
8. **
|
|
34
|
+
8. **Engine** — claude (default) or codex for Worker/Verifier. Ask: "Use claude (default) or codex for Worker/Verifier?" If codex: ask for model (default: gpt-5.4) and reasoning effort (default: high).
|
|
35
|
+
9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
|
|
36
|
+
10. **Verify Consensus** — Ask: "Use cross-engine consensus verification? (Both claude and codex verify independently, both must pass.) Requires codex CLI." Default: no.
|
|
37
|
+
11. **Max Iterations** — suggest based on story count, ask if OK.
|
|
35
38
|
|
|
36
39
|
After all items are confirmed, present the full contract summary.
|
|
37
40
|
On approval, offer to run `init`.
|
|
@@ -56,6 +59,14 @@ Options (parse from `$ARGUMENTS`):
|
|
|
56
59
|
- `--max-iter N` (default: 100)
|
|
57
60
|
- `--worker-model MODEL` (default: sonnet)
|
|
58
61
|
- `--verifier-model MODEL` (default: opus)
|
|
62
|
+
- `--worker-engine claude|codex` (default: `claude`) — engine for Worker
|
|
63
|
+
- `--verifier-engine claude|codex` (default: `claude`) — engine for Verifier
|
|
64
|
+
- `--codex-model MODEL` (default: `gpt-5.4`) — model passed to codex CLI
|
|
65
|
+
- `--codex-reasoning low|medium|high` (default: `high`) — reasoning effort for codex
|
|
66
|
+
- `--verify-mode per-us|batch` (default: `per-us`) — verification strategy
|
|
67
|
+
- `per-us`: verify after each US, then final full verify of all AC
|
|
68
|
+
- `batch`: verify only after all US done (legacy behavior)
|
|
69
|
+
- `--verify-consensus` — enable cross-engine consensus verification (both claude and codex verify independently; both must pass)
|
|
59
70
|
- `--debug` — enable debug logging (tmux mode only, writes to logs/<slug>/debug.log)
|
|
60
71
|
|
|
61
72
|
### Mode Selection
|
|
@@ -77,13 +88,22 @@ ROOT="$PWD" \
|
|
|
77
88
|
MAX_ITER=<--max-iter value> \
|
|
78
89
|
WORKER_MODEL=<--worker-model value> \
|
|
79
90
|
VERIFIER_MODEL=<--verifier-model value> \
|
|
91
|
+
WORKER_ENGINE=<--worker-engine value, default: claude> \
|
|
92
|
+
VERIFIER_ENGINE=<--verifier-engine value, default: claude> \
|
|
93
|
+
CODEX_MODEL=<--codex-model value, default: gpt-5.4> \
|
|
94
|
+
CODEX_REASONING=<--codex-reasoning value, default: high> \
|
|
95
|
+
VERIFY_MODE=<--verify-mode value, default: per-us> \
|
|
96
|
+
VERIFY_CONSENSUS=<1 if --verify-consensus, else 0> \
|
|
80
97
|
DEBUG=<1 if --debug, else 0> \
|
|
81
98
|
zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
82
99
|
```
|
|
83
100
|
6. **If the script exits with error (exit code 1)** — report the error to the user and STOP. Do NOT attempt to work around it. Do NOT create tmux sessions yourself. Do NOT re-launch the script in a different way. Just tell the user what went wrong and suggest using Agent mode instead.
|
|
84
101
|
7. **If successful** — tell the user the tmux session has been started. The shell script takes over as the deterministic Leader. No Agent() calls are made in tmux mode.
|
|
85
102
|
|
|
86
|
-
**IMPORTANT:**
|
|
103
|
+
**IMPORTANT RULES:**
|
|
104
|
+
- Tmux mode requires the user to already be inside a tmux session. If the runner script rejects because $TMUX is not set, do NOT try to create a tmux session yourself. Tell the user: "Start tmux first, then retry."
|
|
105
|
+
- Do NOT run the script in background (`&`, `run_in_background`). The script must run in foreground so panes remain visible to the user. The user needs to see Worker/Verifier panes in real-time.
|
|
106
|
+
- Do NOT kill panes after completion. Panes stay alive for inspection. User cleans up with `/rlp-desk clean <slug> --kill-session`.
|
|
87
107
|
|
|
88
108
|
#### Agent Mode (`--mode agent` or default)
|
|
89
109
|
|
|
@@ -124,7 +144,9 @@ rm -f .claude/ralph-desk/memos/<slug>-verify-verdict.json
|
|
|
124
144
|
- Combine with iteration number + memory contract
|
|
125
145
|
- Write to `.claude/ralph-desk/logs/<slug>/iter-NNN.worker-prompt.md` (audit trail)
|
|
126
146
|
|
|
127
|
-
**⑤ Execute Worker
|
|
147
|
+
**⑤ Execute Worker**
|
|
148
|
+
|
|
149
|
+
If `--worker-engine claude` (default):
|
|
128
150
|
```
|
|
129
151
|
Agent(
|
|
130
152
|
description="rlp-desk worker iter-NNN",
|
|
@@ -137,24 +159,69 @@ Agent(
|
|
|
137
159
|
- Agent returns synchronously. No polling needed.
|
|
138
160
|
- Each Agent() = fresh context. Guaranteed.
|
|
139
161
|
|
|
162
|
+
If `--worker-engine codex`:
|
|
163
|
+
```
|
|
164
|
+
Bash("codex exec --model <codex_model> --reasoning-effort <codex_reasoning> <full worker prompt text>")
|
|
165
|
+
```
|
|
166
|
+
- Codex runs as a subprocess via Bash(), not Agent().
|
|
167
|
+
- Each Bash() call = fresh context for codex.
|
|
168
|
+
|
|
140
169
|
**⑥ Read memory.md again** (Worker updated it)
|
|
141
170
|
- `stop=continue` → go to ⑧
|
|
142
171
|
- `stop=verify` → go to ⑦
|
|
143
172
|
- `stop=blocked` → write BLOCKED sentinel, stop
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
173
|
+
- Also read `iter-signal.json` for `us_id` field (which US was just completed)
|
|
174
|
+
|
|
175
|
+
**⑦ Execute Verifier**
|
|
176
|
+
|
|
177
|
+
**Per-US mode** (default, `--verify-mode per-us`):
|
|
178
|
+
- Read `us_id` from `iter-signal.json` (e.g., "US-001" or "ALL")
|
|
179
|
+
- Build verifier prompt scoped to `us_id`:
|
|
180
|
+
- If `us_id` is a specific story: "Verify ONLY the acceptance criteria for {us_id}"
|
|
181
|
+
- If `us_id` is "ALL": "Verify ALL acceptance criteria (final full verify)"
|
|
182
|
+
- Write to `iter-NNN.verifier-prompt.md`
|
|
183
|
+
- Track verified US in `status.json` field `verified_us` (array)
|
|
184
|
+
- After verifier passes a specific US:
|
|
185
|
+
- Add that US to `verified_us` in status.json
|
|
186
|
+
- If more US remain → Worker does next US → verify → ...
|
|
187
|
+
- If all US individually passed → signal final full verify (us_id=ALL)
|
|
188
|
+
- After final full verify passes → COMPLETE
|
|
189
|
+
|
|
190
|
+
**Batch mode** (`--verify-mode batch`):
|
|
191
|
+
- Legacy behavior: verify only when Worker signals all work is done
|
|
192
|
+
- Verifier checks all AC at once
|
|
193
|
+
|
|
194
|
+
**⑦a Dispatch Verifier**
|
|
195
|
+
|
|
196
|
+
If `--verifier-engine claude` (default):
|
|
147
197
|
```
|
|
148
198
|
Agent(
|
|
149
|
-
description="rlp-desk verifier iter-NNN",
|
|
199
|
+
description="rlp-desk verifier iter-NNN (us_id)",
|
|
150
200
|
subagent_type="executor",
|
|
151
201
|
model=<verifier_model>,
|
|
152
202
|
mode="bypassPermissions",
|
|
153
|
-
prompt=<full verifier prompt text>
|
|
203
|
+
prompt=<full verifier prompt text with US scope>
|
|
154
204
|
)
|
|
155
205
|
```
|
|
156
|
-
|
|
206
|
+
|
|
207
|
+
If `--verifier-engine codex`:
|
|
208
|
+
```
|
|
209
|
+
Bash("codex exec --model <codex_model> --reasoning-effort <codex_reasoning> <full verifier prompt text>")
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
**⑦b Consensus Verification** (when `--verify-consensus` is enabled):
|
|
213
|
+
After the primary verifier runs, run a second verifier with the OTHER engine:
|
|
214
|
+
- If primary engine is claude → run codex verifier
|
|
215
|
+
- If primary engine is codex → run claude verifier
|
|
216
|
+
- Both produce `verify-verdict.json` (Leader renames to `verify-verdict-claude.json` and `verify-verdict-codex.json`)
|
|
217
|
+
- **Both pass** → proceed (next US or COMPLETE)
|
|
218
|
+
- **Either fails** → combine issues from both verdicts into a single fix contract → Worker retry
|
|
219
|
+
- Max 3 consensus rounds per US. After 3 rounds → BLOCKED.
|
|
220
|
+
|
|
221
|
+
**⑦c Read verdict(s)**
|
|
222
|
+
- Read `verify-verdict.json` (or both `-claude.json` and `-codex.json` if consensus):
|
|
157
223
|
- `pass` + `complete` → write COMPLETE sentinel, report done!
|
|
224
|
+
- `pass` + specific US → add to `verified_us`, Worker does next US
|
|
158
225
|
- `fail` + `continue` → **run Fix Loop** (governance.md §7½):
|
|
159
226
|
1. Read `issues` array, sort by severity (`critical` → `major` → `minor`)
|
|
160
227
|
2. Build structured fix contract with traceability rule
|
|
@@ -180,6 +247,13 @@ Agent(
|
|
|
180
247
|
|
|
181
248
|
Track `consecutive_failures` in `status.json` (increment on `fail`, reset on `pass`, unchanged by `request_info`). Only **fail** verdicts count for CB chains — `request_info` does not break or contribute.
|
|
182
249
|
|
|
250
|
+
Track `verified_us` (array of US IDs that passed verification) in `status.json` when using `--verify-mode per-us`.
|
|
251
|
+
|
|
252
|
+
When `--verify-consensus` is enabled, also track in `status.json`:
|
|
253
|
+
- `consensus_round`: current consensus round for this US (resets per US)
|
|
254
|
+
- `claude_verdict`: latest claude verifier verdict for this US
|
|
255
|
+
- `codex_verdict`: latest codex verifier verdict for this US
|
|
256
|
+
|
|
183
257
|
### Important Rules
|
|
184
258
|
- Each Agent() = new process = fresh context
|
|
185
259
|
- YOU track iteration count
|
|
@@ -216,10 +290,23 @@ tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "^rlp-desk-<slug>-" |
|
|
|
216
290
|
```
|
|
217
291
|
/rlp-desk brainstorm <description> Plan before init (interactive)
|
|
218
292
|
/rlp-desk init <slug> [objective] Create project scaffold
|
|
219
|
-
/rlp-desk run <slug> [
|
|
293
|
+
/rlp-desk run <slug> [options] Run loop (agent=LLM leader, tmux=shell leader)
|
|
220
294
|
/rlp-desk status <slug> Show loop status
|
|
221
295
|
/rlp-desk logs <slug> [N] Show iteration log
|
|
222
296
|
/rlp-desk clean <slug> [--kill-session] Reset for re-run (--kill-session kills tmux)
|
|
297
|
+
|
|
298
|
+
Run options:
|
|
299
|
+
--mode agent|tmux Execution mode (default: agent)
|
|
300
|
+
--max-iter N Max iterations (default: 100)
|
|
301
|
+
--worker-model MODEL Worker model (default: sonnet)
|
|
302
|
+
--verifier-model MODEL Verifier model (default: opus)
|
|
303
|
+
--worker-engine claude|codex Worker engine (default: claude)
|
|
304
|
+
--verifier-engine claude|codex Verifier engine (default: claude)
|
|
305
|
+
--codex-model MODEL Codex model (default: gpt-5.4)
|
|
306
|
+
--codex-reasoning LEVEL Codex reasoning (default: high)
|
|
307
|
+
--verify-mode per-us|batch Verification strategy (default: per-us)
|
|
308
|
+
--verify-consensus Cross-engine consensus verification
|
|
309
|
+
--debug Debug logging (tmux mode only)
|
|
223
310
|
```
|
|
224
311
|
|
|
225
312
|
## Architecture
|
package/src/governance.md
CHANGED
|
@@ -12,7 +12,7 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
|
|
|
12
12
|
- **Worker claim ≠ complete**: A Worker's DONE is merely a claim. The Verifier must independently verify before it's confirmed.
|
|
13
13
|
- **Verifier is independent**: The Verifier judges based on evidence alone, without knowledge of the Worker's reasoning process.
|
|
14
14
|
- **Sentinels are Leader-owned**: Only the Leader writes COMPLETE/BLOCKED sentinels.
|
|
15
|
-
- **
|
|
15
|
+
- **Supported engines**: claude (default; models: haiku, sonnet, opus) and codex (opt-in via `--worker-engine codex` / `--verifier-engine codex`).
|
|
16
16
|
|
|
17
17
|
## 2. Roles
|
|
18
18
|
|
|
@@ -43,7 +43,9 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
|
|
|
43
43
|
RUNNING → DONE_CLAIMED → VERIFYING → COMPLETE | CONTINUE | BLOCKED
|
|
44
44
|
```
|
|
45
45
|
|
|
46
|
-
## 4. Model Routing
|
|
46
|
+
## 4. Model Routing
|
|
47
|
+
|
|
48
|
+
### Claude (default engine)
|
|
47
49
|
|
|
48
50
|
| Role | Default Model | Override Criteria |
|
|
49
51
|
|------|---------------|-------------------|
|
|
@@ -58,12 +60,21 @@ The Leader decides each iteration. Decision criteria:
|
|
|
58
60
|
- Simple repetitive task → downgrade model
|
|
59
61
|
- User explicitly specified → use as given
|
|
60
62
|
|
|
63
|
+
### Codex (opt-in engine)
|
|
64
|
+
|
|
65
|
+
| Option | Default | Description |
|
|
66
|
+
|--------|---------|-------------|
|
|
67
|
+
| `--codex-model` | `gpt-5.4` | Model passed to the `codex` CLI |
|
|
68
|
+
| `--codex-reasoning` | `high` | Reasoning effort: `low`, `medium`, or `high` |
|
|
69
|
+
|
|
70
|
+
Model routing is static when using codex: the same model and reasoning effort apply to both Worker and Verifier. There is no dynamic upgrade path. Claude is the default engine; codex is explicitly opt-in.
|
|
71
|
+
|
|
61
72
|
## 5a. Execution: Agent() Approach (default) — "Smart Mode"
|
|
62
73
|
|
|
63
74
|
All environments (Claude Code, OpenCode) use the same Agent tool.
|
|
64
75
|
|
|
65
76
|
```
|
|
66
|
-
# Worker
|
|
77
|
+
# Worker (claude engine, default)
|
|
67
78
|
Agent(
|
|
68
79
|
subagent_type="executor",
|
|
69
80
|
model="sonnet",
|
|
@@ -71,7 +82,7 @@ Agent(
|
|
|
71
82
|
mode="bypassPermissions"
|
|
72
83
|
)
|
|
73
84
|
|
|
74
|
-
# Verifier
|
|
85
|
+
# Verifier (claude engine, default)
|
|
75
86
|
Agent(
|
|
76
87
|
subagent_type="executor",
|
|
77
88
|
model="sonnet",
|
|
@@ -80,6 +91,15 @@ Agent(
|
|
|
80
91
|
)
|
|
81
92
|
```
|
|
82
93
|
|
|
94
|
+
If `--worker-engine codex` or `--verifier-engine codex` (opt-in):
|
|
95
|
+
```
|
|
96
|
+
# Worker or Verifier (codex engine)
|
|
97
|
+
Bash("codex -m <codex_model> -c model_reasoning_effort=<codex_reasoning> --dangerously-bypass-approvals-and-sandbox <prompt>")
|
|
98
|
+
```
|
|
99
|
+
- Codex runs as a subprocess via `Bash()`, not `Agent()` — the Agent tool is Claude-specific.
|
|
100
|
+
- Each `Bash()` call = fresh context for codex.
|
|
101
|
+
- Claude is the default engine. Codex is explicitly opt-in.
|
|
102
|
+
|
|
83
103
|
Characteristics:
|
|
84
104
|
- Each call = fresh context (new subprocess)
|
|
85
105
|
- Synchronous return. No polling or signal files needed.
|
|
@@ -106,14 +126,25 @@ The tmux runner (`run_ralph_desk.zsh`) creates a tmux session with three panes:
|
|
|
106
126
|
- **Worker pane** — receives `claude -p` invocations via trigger scripts
|
|
107
127
|
- **Verifier pane** — receives `claude -p` invocations via trigger scripts
|
|
108
128
|
|
|
109
|
-
|
|
129
|
+
By default, `claude` CLI calls use `--dangerously-skip-permissions`:
|
|
110
130
|
```bash
|
|
131
|
+
# claude engine (default)
|
|
111
132
|
claude -p "$(cat /path/to/prompt.md)" \
|
|
112
133
|
--model sonnet \
|
|
113
134
|
--dangerously-skip-permissions
|
|
114
135
|
```
|
|
115
136
|
|
|
116
|
-
|
|
137
|
+
When `WORKER_ENGINE=codex` or `VERIFIER_ENGINE=codex`, the `codex` CLI is used instead:
|
|
138
|
+
```bash
|
|
139
|
+
# codex engine (opt-in)
|
|
140
|
+
codex -m gpt-5.4 \
|
|
141
|
+
-c model_reasoning_effort="high" \
|
|
142
|
+
--dangerously-bypass-approvals-and-sandbox \
|
|
143
|
+
"$(cat /path/to/prompt.md)"
|
|
144
|
+
```
|
|
145
|
+
The codex CLI is only required when an engine is set to `codex`. Claude remains the default engine throughout.
|
|
146
|
+
|
|
147
|
+
**Security implication:** Both `--dangerously-skip-permissions` (claude) and `--dangerously-bypass-approvals-and-sandbox` (codex) allow the CLI to execute code without user confirmation. The tmux runner requires this because there is no interactive user to approve each action. Only run tmux mode in trusted environments with trusted prompts.
|
|
117
148
|
|
|
118
149
|
Characteristics:
|
|
119
150
|
- Leader is a shell script, not an LLM — zero tokens consumed for orchestration.
|
|
@@ -193,17 +224,19 @@ for iteration in 1..max_iter:
|
|
|
193
224
|
|
|
194
225
|
⑥ Read memory.md again → check Worker's updated state
|
|
195
226
|
- "continue" → go to ⑧
|
|
196
|
-
- "verify" → go to ⑦
|
|
227
|
+
- "verify" → go to ⑦ (also read iter-signal.json for us_id)
|
|
197
228
|
- "blocked" → write BLOCKED sentinel, stop
|
|
198
229
|
Note: In tmux mode, the Leader polls `<slug>-iter-signal.json` instead of
|
|
199
230
|
parsing memory.md. In Agent() mode, the Leader MAY read iter-signal.json
|
|
200
231
|
as a structured alternative to parsing the Stop Status from memory.md.
|
|
201
232
|
|
|
202
|
-
⑦ Execute Verifier
|
|
203
|
-
- Build prompt
|
|
233
|
+
⑦ Execute Verifier (see §7a for per-US and §7b for consensus details)
|
|
234
|
+
- Build prompt (scoped to us_id if per-us mode) → log
|
|
204
235
|
- Agent(subagent_type="executor", model=selected, prompt=prompt)
|
|
236
|
+
- If --verify-consensus: run second verifier with alternate engine (see §7b)
|
|
205
237
|
- Read verify-verdict.json:
|
|
206
|
-
• pass +
|
|
238
|
+
• pass + specific US → add to verified_us, Worker does next US
|
|
239
|
+
• pass + us_id=ALL or complete → write COMPLETE sentinel, stop
|
|
207
240
|
• fail + continue → go to ⑧
|
|
208
241
|
• blocked → write BLOCKED sentinel, stop
|
|
209
242
|
|
|
@@ -211,6 +244,50 @@ for iteration in 1..max_iter:
|
|
|
211
244
|
Update status.json, report to user, continue to next iteration
|
|
212
245
|
```
|
|
213
246
|
|
|
247
|
+
## 7a. Per-US Verification
|
|
248
|
+
|
|
249
|
+
By default (`--verify-mode per-us`), each user story is verified independently before proceeding to the next:
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
Worker completes US-001 → signal verify (us_id: "US-001")
|
|
253
|
+
→ Verifier checks ONLY US-001 AC → pass
|
|
254
|
+
→ Worker completes US-002 → signal verify (us_id: "US-002")
|
|
255
|
+
→ Verifier checks ONLY US-002 AC → pass
|
|
256
|
+
→ ...
|
|
257
|
+
→ All US individually pass → signal verify (us_id: "ALL")
|
|
258
|
+
→ Verifier runs FINAL FULL VERIFY (all AC) → pass → COMPLETE
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
**Key rules:**
|
|
262
|
+
- Worker signals `verify` after each US with `us_id` set in `iter-signal.json`
|
|
263
|
+
- Verifier checks only the scoped US acceptance criteria (or all if us_id=ALL)
|
|
264
|
+
- Leader tracks `verified_us` array in `status.json`
|
|
265
|
+
- If a per-US verify fails, the Worker retries that specific US (fix loop)
|
|
266
|
+
- Final full verify ensures nothing was broken by later changes
|
|
267
|
+
|
|
268
|
+
**Batch mode** (`--verify-mode batch`) preserves legacy behavior: Worker signals `verify` only after all work is done, and the Verifier checks all AC at once.
|
|
269
|
+
|
|
270
|
+
## 7b. Cross-Engine Consensus Verification
|
|
271
|
+
|
|
272
|
+
When `--verify-consensus` is enabled, after the primary verifier runs, a second verifier runs with the alternate engine:
|
|
273
|
+
|
|
274
|
+
```
|
|
275
|
+
Worker completes US → signal verify
|
|
276
|
+
→ Claude Verifier runs (checks AC)
|
|
277
|
+
→ Codex Verifier runs (checks AC)
|
|
278
|
+
→ Both pass → proceed (next US or COMPLETE)
|
|
279
|
+
→ Either fails → combined issues → fix contract → Worker retry
|
|
280
|
+
→ Max 3 consensus rounds per US → BLOCKED if still disagreeing
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
**Key rules:**
|
|
284
|
+
- Both claude and codex CLI must be installed
|
|
285
|
+
- Verifiers run sequentially in the same Verifier pane (tmux) or as sequential calls (Agent mode)
|
|
286
|
+
- Verdicts are saved as `verify-verdict-claude.json` and `verify-verdict-codex.json`
|
|
287
|
+
- Combined fix contracts include issues from both engines
|
|
288
|
+
- `status.json` includes `consensus_round`, `claude_verdict`, and `codex_verdict` fields
|
|
289
|
+
- Consensus can be combined with per-US verification (each US gets consensus-verified)
|
|
290
|
+
|
|
214
291
|
## 7½. Fix Loop Protocol
|
|
215
292
|
|
|
216
293
|
When the Verifier returns `fail`, the Leader runs the Fix Loop before issuing the next Worker contract:
|