@ai-dev-methodologies/rlp-desk 0.3.6 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +145 -69
- package/docs/blueprints/blueprint-v0.4-evolution.md +347 -0
- package/docs/plans/cozy-gliding-trinket.md +53 -0
- package/docs/plans/toasty-whistling-diffie-agent-a6814625642e956da.md +201 -0
- package/docs/plans/toasty-whistling-diffie.md +117 -0
- package/docs/prompts/ralplan-codex-review.md +55 -0
- package/install.sh +5 -0
- package/package.json +1 -1
- package/scripts/postinstall.js +5 -0
- package/scripts/uninstall.js +1 -0
- package/src/commands/rlp-desk.md +252 -70
- package/src/governance.md +63 -28
- package/src/model-upgrade-table.md +50 -0
- package/src/scripts/init_ralph_desk.zsh +329 -13
- package/src/scripts/lib_ralph_desk.zsh +837 -0
- package/src/scripts/run_ralph_desk.zsh +978 -482
package/src/commands/rlp-desk.md
CHANGED
|
@@ -41,13 +41,44 @@ Ask about these items one by one (or in small groups):
|
|
|
41
41
|
- Default recommendation: one US per iteration for 3+ stories.
|
|
42
42
|
5. **Verification Commands** — build, test, lint commands
|
|
43
43
|
6. **Completion / Blocked Criteria**
|
|
44
|
-
7. **Worker / Verifier Model** —
|
|
44
|
+
7. **Worker / Verifier Model** — Evaluate PRD complexity using 5 factors (overall = highest factor), then recommend model.
|
|
45
|
+
|
|
46
|
+
**Complexity Evaluation Table**:
|
|
47
|
+
|
|
48
|
+
| Factor | LOW | MEDIUM | HIGH | CRITICAL |
|
|
49
|
+
|--------|-----|--------|------|----------|
|
|
50
|
+
| US count | 1-2 | 3-5 | 6-10 | 10+ |
|
|
51
|
+
| File change scope | single | 2-5 files | 6+ files | cross-repo |
|
|
52
|
+
| Logic complexity | simple | conditionals | algorithms | security |
|
|
53
|
+
| External dependencies | none | 1-2 | 3+ | distributed |
|
|
54
|
+
| Existing code impact | new only | modify | refactor | architecture |
|
|
55
|
+
|
|
56
|
+
**Model mapping** (Worker / Verifier):
|
|
57
|
+
- LOW → haiku / sonnet
|
|
58
|
+
- MEDIUM → sonnet / opus
|
|
59
|
+
- HIGH → opus / opus
|
|
60
|
+
- CRITICAL → opus / opus + require human review
|
|
61
|
+
|
|
62
|
+
Present complexity score with evidence to the user, e.g.: "I rate this MEDIUM because: US count=4 (MEDIUM), file scope=2 (MEDIUM), logic=conditionals (MEDIUM), deps=none (LOW), impact=modify (MEDIUM). Highest=MEDIUM → I suggest Worker: sonnet, Verifier: opus."
|
|
63
|
+
|
|
45
64
|
8. **Engine & Model** — For each role (Worker, Verifier):
|
|
46
65
|
- Engine: claude (default) or codex
|
|
47
66
|
- If claude: suggest model (haiku/sonnet/opus) based on task complexity
|
|
48
67
|
- If codex: suggest model (default: gpt-5.4) and reasoning effort (low/medium/high)
|
|
49
|
-
|
|
50
|
-
|
|
68
|
+
|
|
69
|
+
**Codex Detection** — check if codex CLI is installed (`command -v codex`):
|
|
70
|
+
|
|
71
|
+
**If codex IS installed** — recommend cross-engine Worker:
|
|
72
|
+
- Suggest: `--worker-model gpt-5.4:high --verify-consensus` (cross-engine + consensus)
|
|
73
|
+
- Alternative: `--worker-model gpt-5.3-codex-spark:high` (spark preset — note: 100k output token limit per request, best for smaller scope PRDs)
|
|
74
|
+
- Say: "Codex is installed. I recommend it as Worker for cost savings (codex tokens are cheaper than claude tokens for bulk iteration) and cross-engine blind-spot coverage (claude Verifier catches issues codex Worker misses)."
|
|
75
|
+
|
|
76
|
+
**If codex is NOT installed** — recommend claude-only + install suggestion:
|
|
77
|
+
- Defaulting to claude-only Worker (sonnet).
|
|
78
|
+
- Say: "Codex is not installed. Defaulting to claude-only Worker. Note: without a second engine, your Verifier shares the same perspective as the Worker — there is a risk of blind spots where both Worker and Verifier miss the same issue. To unlock cross-engine coverage: `npm install -g @openai/codex`"
|
|
79
|
+
|
|
80
|
+
AI should recommend: "For this task complexity, I suggest Worker: sonnet, Verifier: opus"
|
|
81
|
+
If codex selected: "For codex Worker, I suggest gpt-5.4 with high reasoning"
|
|
51
82
|
9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
|
|
52
83
|
10. **Verify Consensus** — Ask: "Use cross-engine consensus verification? (Both claude and codex verify independently, both must pass.) Requires codex CLI." Default: no.
|
|
53
84
|
11. **Consensus Scope** — If consensus enabled, ask: "Consensus on every verify (all, default) or only on final verify (final-only)?" Default: all.
|
|
@@ -62,7 +93,11 @@ After all items are confirmed:
|
|
|
62
93
|
Present the score table to the user before proceeding.
|
|
63
94
|
2. Present the full contract summary.
|
|
64
95
|
3. **Self-Verification** — Ask: "Enable self-verification? Worker records step-by-step evidence, Verifier cross-validates process. Recommended for MEDIUM+ risk." Default: yes for HIGH/CRITICAL, no for LOW/MEDIUM.
|
|
65
|
-
4.
|
|
96
|
+
4. **Re-execution check**: After slug is confirmed, check if `.claude/ralph-desk/plans/prd-<slug>.md` already exists. If a PRD already exists for this slug, ask: "A PRD already exists for this slug. Improve the existing PRD or start fresh (delete and recreate)?"
|
|
97
|
+
- "improve" → pass `--mode improve` to init
|
|
98
|
+
- "start fresh" → pass `--mode fresh` to init
|
|
99
|
+
- If no PRD exists: standard first-run (no --mode needed)
|
|
100
|
+
5. On approval, offer to run `init`.
|
|
66
101
|
|
|
67
102
|
Do NOT create files during brainstorm.
|
|
68
103
|
Do NOT auto-decide iteration unit — the user MUST explicitly choose.
|
|
@@ -71,7 +106,7 @@ Do NOT auto-decide iteration unit — the user MUST explicitly choose.
|
|
|
71
106
|
|
|
72
107
|
## `init <slug> [objective]`
|
|
73
108
|
|
|
74
|
-
Run: `~/.claude/ralph-desk/init_ralph_desk.zsh <slug> "<objective>"`
|
|
109
|
+
Run: `~/.claude/ralph-desk/init_ralph_desk.zsh <slug> "<objective>" [--mode fresh|improve]`
|
|
75
110
|
If brainstorm was done, auto-fill PRD and test-spec with the results.
|
|
76
111
|
|
|
77
112
|
**After init completes, STOP. Do NOT auto-run the loop.**
|
|
@@ -79,33 +114,63 @@ If brainstorm was done, auto-fill PRD and test-spec with the results.
|
|
|
79
114
|
Tell the user:
|
|
80
115
|
1. The scaffold has been created — list the generated files
|
|
81
116
|
2. Ask them to review/edit the PRD and test-spec if needed
|
|
82
|
-
3. Present run options with explanations and ONE recommendation. The user MUST copy and paste the command themselves
|
|
117
|
+
3. Present run options with explanations and ONE recommendation. The user MUST copy and paste the command themselves.
|
|
83
118
|
|
|
84
|
-
|
|
85
|
-
Available run commands (copy the one you want):
|
|
119
|
+
Check if codex CLI is installed: run `command -v codex` in shell or check if the binary exists.
|
|
86
120
|
|
|
87
|
-
|
|
88
|
-
/rlp-desk run <slug> --debug
|
|
121
|
+
**If codex IS installed** — show cross-engine presets first:
|
|
89
122
|
|
|
90
|
-
|
|
91
|
-
|
|
123
|
+
```
|
|
124
|
+
Available run commands (copy the one you want):
|
|
92
125
|
|
|
93
|
-
#
|
|
94
|
-
/rlp-desk run <slug> --
|
|
126
|
+
# Recommended: cross-engine + final-consensus (cost savings + blind-spot coverage):
|
|
127
|
+
/rlp-desk run <actual-slug> --worker-model gpt-5.4:high --final-consensus --debug
|
|
95
128
|
|
|
96
|
-
#
|
|
97
|
-
/rlp-desk run <slug> --
|
|
129
|
+
# Spark Pro preset (fast codex worker, lower cost):
|
|
130
|
+
/rlp-desk run <actual-slug> --worker-model gpt-5.3-codex-spark:high --debug
|
|
98
131
|
|
|
99
|
-
#
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
#
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
#
|
|
106
|
-
# --
|
|
107
|
-
# --
|
|
108
|
-
|
|
132
|
+
# Claude-only:
|
|
133
|
+
/rlp-desk run <actual-slug> --debug
|
|
134
|
+
|
|
135
|
+
# Basic agent:
|
|
136
|
+
/rlp-desk run <actual-slug>
|
|
137
|
+
|
|
138
|
+
# Full options reference:
|
|
139
|
+
# --mode agent|tmux (default: agent)
|
|
140
|
+
# --worker-model MODEL haiku|sonnet|opus or gpt-5.4:low|medium|high (default: sonnet)
|
|
141
|
+
# --verifier-model MODEL haiku|sonnet|opus (default: opus)
|
|
142
|
+
# --verify-consensus both claude+codex must pass
|
|
143
|
+
# --verify-mode per-us|batch (default: per-us)
|
|
144
|
+
# --max-iter N (default: 100)
|
|
145
|
+
# --debug enable debug logging
|
|
146
|
+
# --with-self-verification post-campaign analysis report
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
**If codex is NOT installed** — show claude-only presets + install recommendation:
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
Available run commands (copy the one you want):
|
|
153
|
+
|
|
154
|
+
# Recommended: tmux mode + claude-only (real-time visibility):
|
|
155
|
+
/rlp-desk run <actual-slug> --mode tmux --debug
|
|
156
|
+
|
|
157
|
+
# Agent mode:
|
|
158
|
+
/rlp-desk run <actual-slug> --debug
|
|
159
|
+
|
|
160
|
+
# Install codex for cost savings + cross-engine blind-spot coverage:
|
|
161
|
+
npm install -g @openai/codex
|
|
162
|
+
|
|
163
|
+
# Full options reference:
|
|
164
|
+
# --mode agent|tmux (default: agent)
|
|
165
|
+
# --worker-model MODEL haiku|sonnet|opus (default: sonnet)
|
|
166
|
+
# --verifier-model MODEL haiku|sonnet|opus (default: opus)
|
|
167
|
+
# --verify-mode per-us|batch (default: per-us)
|
|
168
|
+
# --max-iter N (default: 100)
|
|
169
|
+
# --debug enable debug logging
|
|
170
|
+
# --with-self-verification post-campaign analysis report
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Replace `<actual-slug>` with the real slug from this init (e.g. `auth-refactor`).
|
|
109
174
|
|
|
110
175
|
**CRITICAL: Do NOT offer to run for the user. Do NOT ask "shall I run?" or offer to execute. The user MUST type the run command themselves. Just present the options, recommend one, and STOP.**
|
|
111
176
|
|
|
@@ -133,9 +198,22 @@ Options (parse from `$ARGUMENTS`):
|
|
|
133
198
|
- `--consensus-scope all|final-only` — when consensus runs (default: `all`)
|
|
134
199
|
- `all`: consensus runs on every verify (current behavior)
|
|
135
200
|
- `final-only`: consensus only on final ALL verify
|
|
136
|
-
- `--
|
|
201
|
+
- `--cb-threshold N` — circuit breaker threshold: consecutive failures before BLOCKED (default: 3). When `--verify-consensus` is active, effective threshold is automatically doubled (e.g., default becomes 6).
|
|
202
|
+
- `--consensus-fail-fast` — skip second verifier if first verifier fails (saves time/tokens in consensus mode)
|
|
203
|
+
- `--iter-timeout N` — per-iteration timeout in seconds (default: 600). Enforced in tmux mode only. Agent mode: not enforced (Agent() has no timeout API).
|
|
204
|
+
- `--debug` — enable debug logging (writes to ~/.claude/ralph-desk/analytics/<slug>/debug.log)
|
|
137
205
|
- `--with-self-verification` — enable campaign-level self-verification analysis. After COMPLETE, Leader analyzes all iteration records (done-claims + verdicts) and generates a campaign self-verification summary with patterns and recommendations for next planning cycle. (Note: execution_steps and reasoning are ALWAYS recorded per governance §1f — this flag adds post-campaign analysis.)
|
|
138
206
|
|
|
207
|
+
### Analytics Directory (`~/.claude/ralph-desk/analytics/<slug>/`)
|
|
208
|
+
When `--debug` or `--with-self-verification` is active, analytics data is written to a user-level directory for cross-project aggregation. Contents:
|
|
209
|
+
- `metadata.json` — campaign metadata: slug, project_root, campaign_status, start_time, end_time
|
|
210
|
+
- `debug.log` — debug output (versioned: `debug-v{N}.log` on re-execution)
|
|
211
|
+
- `campaign.jsonl` — per-iteration structured data (versioned: `campaign-v{N}.jsonl` on re-execution). Schema: iter, us_id, worker_model, worker_engine, verifier_engine, claude_verdict, codex_verdict, consensus, duration_worker_s, duration_verifier_s, project_root, slug, timestamp
|
|
212
|
+
- `self-verification-data.json` — cumulative SV records (agent-mode only, when `--with-self-verification`)
|
|
213
|
+
- `self-verification-report-NNN.md` — versioned SV reports (when `--with-self-verification`)
|
|
214
|
+
|
|
215
|
+
Cross-project aggregation: scan `~/.claude/ralph-desk/analytics/` and read each slug's `metadata.json` to discover project_root, campaign_status, and timestamps. Slug directories use `<slug>--<root_hash>` format to prevent collision across projects.
|
|
216
|
+
|
|
139
217
|
### Mode Selection
|
|
140
218
|
|
|
141
219
|
Parse the `--mode` flag. If absent or `agent`, use the Agent() path below. If `tmux`, use the Tmux path.
|
|
@@ -164,7 +242,10 @@ VERIFIER_CODEX_REASONING=<--verifier-codex-reasoning value, default: high> \
|
|
|
164
242
|
VERIFY_MODE=<--verify-mode value, default: per-us> \
|
|
165
243
|
VERIFY_CONSENSUS=<1 if --verify-consensus, else 0> \
|
|
166
244
|
CONSENSUS_SCOPE=<--consensus-scope value, default: all> \
|
|
245
|
+
CB_THRESHOLD=<--cb-threshold value, default: 3> \
|
|
246
|
+
ITER_TIMEOUT=<--iter-timeout value, default: 600> \
|
|
167
247
|
DEBUG=<1 if --debug, else 0> \
|
|
248
|
+
WITH_SELF_VERIFICATION=<1 if --with-self-verification, else 0> \
|
|
168
249
|
zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
169
250
|
```
|
|
170
251
|
6. **If the script exits with error (exit code 1)** — report the error to the user and STOP. Do NOT attempt to work around it. Do NOT create tmux sessions yourself. Do NOT re-launch the script in a different way. Just tell the user what went wrong and suggest using Agent mode instead.
|
|
@@ -172,17 +253,31 @@ DEBUG=<1 if --debug, else 0> \
|
|
|
172
253
|
|
|
173
254
|
**IMPORTANT RULES:**
|
|
174
255
|
- Tmux mode requires the user to already be inside a tmux session. If the runner script rejects because $TMUX is not set, do NOT try to create a tmux session yourself. Tell the user: "Start tmux first, then retry."
|
|
175
|
-
-
|
|
256
|
+
- MUST launch the runner with `run_in_background: true` so `/rlp-desk` returns control immediately while preserving live tmux visibility.
|
|
257
|
+
- Run-in-background is used so the shell can keep the command visible and keep the pane layout stable for status checks and completion flow.
|
|
176
258
|
- Do NOT kill panes after completion. Panes stay alive for inspection. User cleans up with `/rlp-desk clean <slug> --kill-session`.
|
|
259
|
+
- `--with-self-verification` is accepted in tmux mode. After campaign completion, `run_ralph_desk.zsh` spawns `claude CLI` to generate the SV report from campaign artifacts (done-claims, verify-verdicts, campaign-report). SV reports are written to `~/.claude/ralph-desk/analytics/<slug>/`. Requires `claude` CLI available in PATH; if not found, an error is appended to the campaign report.
|
|
260
|
+
|
|
261
|
+
**tmux UX model (5 items):**
|
|
262
|
+
- The session returns immediately after launch (`run_in_background: true`) so the command returns control to the parent CLI.
|
|
263
|
+
- Worker/Verifier panes remain visible to the user during execution.
|
|
264
|
+
- Users check progress with the **status command**: `/rlp-desk status <slug>`.
|
|
265
|
+
- On completion, the command returns a completion notification before the loop ends.
|
|
266
|
+
- Agent mode remains unchanged, and no tmux-specific behavior is mixed into Agent mode.
|
|
177
267
|
|
|
178
268
|
#### Agent Mode (`--mode agent` or default)
|
|
179
269
|
|
|
180
270
|
### Preparation
|
|
181
271
|
1. Validate scaffold: `.claude/ralph-desk/prompts/<slug>.worker.prompt.md` etc.
|
|
182
|
-
2.
|
|
183
|
-
3.
|
|
184
|
-
4.
|
|
185
|
-
5.
|
|
272
|
+
2. **Codex CLI pre-validation**: If `--verify-consensus` is enabled OR `--worker-engine codex` / `--verifier-engine codex` is set, check that `codex` CLI exists in PATH. If codex CLI not found → STOP immediately, print install instructions (`npm install -g @openai/codex`), do not start the loop.
|
|
273
|
+
3. Check sentinels (complete/blocked). Found → tell user `/rlp-desk clean <slug>`.
|
|
274
|
+
4. Clean previous `done-claim.json`, `verify-verdict.json`.
|
|
275
|
+
5. **Always**: write baseline log entry to `.claude/ralph-desk/logs/<slug>/baseline.log`: `[timestamp] iter=0 phase=start slug=<slug> worker_model=<model> verifier_model=<model>`. Baseline.log captures 1 line per iteration for lightweight post-mortem (always-on, no flag needed).
|
|
276
|
+
6. If `--debug`: also create/clear `~/.claude/ralph-desk/analytics/<slug>/debug.log`. Define a helper: to "debug_log" means append a timestamped line to this file via `Bash("echo \"[$(date '+%Y-%m-%d %H:%M:%S')] $msg\" >> ~/.claude/ralph-desk/analytics/<slug>/debug.log")`. When `--debug` is active, debug.log contains all baseline.log fields plus detailed phase logs.
|
|
277
|
+
- **4-category log system**: all debug_log entries use exactly one of: `[GOV]` (governance checks: IL enforcement, CB triggers, scope lock, verdict evaluation), `[DECIDE]` (leader decisions: model selection, fix contracts, escalation), `[OPTION]` (configuration snapshot at loop start: thresholds, modes, models), `[FLOW]` (execution progress: worker/verifier dispatch, signal reads, phase transitions)
|
|
278
|
+
- **Re-execution versioning**: If `debug.log` already exists at `--debug` start, rename it to `debug-v{N}.log` (N = next available integer ≥ 1) before creating a fresh `debug.log`.
|
|
279
|
+
- **baseline.log lifecycle**: baseline.log is deleted on re-execution (when `init --mode improve` or `init --mode fresh` is run).
|
|
280
|
+
7. Capture baseline commit: `Bash("git rev-parse HEAD 2>/dev/null || echo none")` → store as `BASELINE_COMMIT`. Include in the first `status.json` write as `baseline_commit` field.
|
|
186
281
|
|
|
187
282
|
### Leader Loop
|
|
188
283
|
|
|
@@ -194,7 +289,10 @@ DEBUG=<1 if --debug, else 0> \
|
|
|
194
289
|
- **After every step result, IMMEDIATELY start the next step's tool call in the SAME response.** For example, after reading the verdict (⑦c), report via Bash("echo") AND start ⑧'s tool calls in one response.
|
|
195
290
|
- If you output "Iter 1 complete, moving to iter 2" as plain text without a tool call, the turn terminates and the loop breaks. This is a platform constraint, not a compliance issue — no amount of "DO NOT STOP" text can override it.
|
|
196
291
|
|
|
197
|
-
If `--debug`, at loop start debug_log
|
|
292
|
+
If `--debug`, at loop start debug_log the following (3 [OPTION] entries):
|
|
293
|
+
- `[OPTION] slug=<slug> max_iter=<N> verify_mode=<mode> consensus=<0|1> consensus_scope=<scope>`
|
|
294
|
+
- `[OPTION] cb_threshold=<N> effective_cb_threshold=<N>`
|
|
295
|
+
- `[OPTION] worker_engine=<engine> worker_model=<model> verifier_engine=<engine> verifier_model=<model>`
|
|
198
296
|
|
|
199
297
|
For each iteration (1 to max_iter):
|
|
200
298
|
|
|
@@ -213,17 +311,22 @@ rm -f .claude/ralph-desk/memos/<slug>-verify-verdict.json
|
|
|
213
311
|
**② Read memory.md** → Stop Status, Next Iteration Contract
|
|
214
312
|
- Also read **Completed Stories** → verified work so far
|
|
215
313
|
- Also read **Key Decisions** → settled architectural choices
|
|
216
|
-
- If `--debug`: debug_log `[
|
|
314
|
+
- If `--debug`: debug_log `[FLOW] iter=N phase=read_memory stop_status=<status> contract="<summary>"`
|
|
217
315
|
|
|
218
316
|
**③ Decide model** (§4 of governance.md)
|
|
219
317
|
- Previous iteration failed → upgrade model
|
|
220
318
|
- Simple task → downgrade
|
|
221
319
|
- User specified → use that
|
|
222
|
-
- If `--debug`: debug_log `[
|
|
320
|
+
- If `--debug`: debug_log `[DECIDE] iter=N phase=model_select worker_model=<model> reason=<reason>`
|
|
223
321
|
|
|
224
322
|
**④ Build worker prompt (Prompt Assembly Protocol)**
|
|
225
323
|
1. Capture `WORKING_DIR` once: use `$PWD` from when `/rlp-desk run` was invoked. Store for all prompt construction.
|
|
226
324
|
2. Read `.claude/ralph-desk/prompts/<slug>.worker.prompt.md` — use its content **verbatim**. Do NOT rewrite, paraphrase, or regenerate paths. The prompt file contains correct absolute paths from init.
|
|
325
|
+
2a. **Per-US PRD injection** (when targeting a specific `us_id`, not "ALL"):
|
|
326
|
+
- Check if `.claude/ralph-desk/plans/prd-<slug>-{us_id}.md` exists (created by init split)
|
|
327
|
+
- If yes: in the assembled prompt text, replace the full PRD reference (`prd-<slug>.md`) with the per-US file path (`prd-<slug>-{us_id}.md`) — so Worker reads only the relevant US section
|
|
328
|
+
- If no per-US file: fall back to full PRD (`prd-<slug>.md`) with no change needed
|
|
329
|
+
- Note: this absolute-path substitution is permitted — only absolute→relative rewrites are forbidden.
|
|
227
330
|
3. Prepend meta comment: `## WORKING_DIR: {absolute path}` — Worker must use this as its working directory.
|
|
228
331
|
4. Append iteration number + memory contract.
|
|
229
332
|
5. Write to `.claude/ralph-desk/logs/<slug>/iter-NNN.worker-prompt.md` (audit trail).
|
|
@@ -232,11 +335,11 @@ rm -f .claude/ralph-desk/memos/<slug>-verify-verdict.json
|
|
|
232
335
|
|
|
233
336
|
**④½ Contract review** (agent mode only)
|
|
234
337
|
- Before dispatching Worker, spawn a lightweight review: "Is this iteration contract sufficient to achieve the US's AC? Any missing steps?"
|
|
235
|
-
- If `--debug`: debug_log `[
|
|
236
|
-
- In tmux mode: skip (shell leader cannot reason). Log: `[
|
|
338
|
+
- If `--debug`: debug_log `[GOV] iter=N phase=contract_review scope_lock=<us_id|null> ac_count=<N> result=<ok|issues>`
|
|
339
|
+
- In tmux mode: skip (shell leader cannot reason). Log: `[FLOW] iter=N phase=contract_review skipped=tmux_mode`
|
|
237
340
|
|
|
238
341
|
**⑤ Execute Worker**
|
|
239
|
-
- If `--debug`: debug_log `[
|
|
342
|
+
- If `--debug`: debug_log `[FLOW] iter=N phase=worker engine=<engine> model=<model> dispatched=true`
|
|
240
343
|
|
|
241
344
|
If `--worker-engine claude` (default):
|
|
242
345
|
```
|
|
@@ -258,14 +361,13 @@ Bash("codex exec --model <worker_codex_model> --reasoning-effort <worker_codex_r
|
|
|
258
361
|
- Codex runs as a subprocess via Bash(), not Agent().
|
|
259
362
|
- Each Bash() call = fresh context for codex.
|
|
260
363
|
|
|
261
|
-
- If `--debug`: debug_log `[EXEC] iter=N phase=worker_done engine=<engine>`
|
|
262
364
|
|
|
263
365
|
**⑥ Read memory.md again** (Worker updated it)
|
|
264
366
|
- `stop=continue` → go to ⑧
|
|
265
367
|
- `stop=verify` → go to ⑦
|
|
266
368
|
- `stop=blocked` → write BLOCKED sentinel, stop
|
|
267
369
|
- Also read `iter-signal.json` for `us_id` field (which US was just completed)
|
|
268
|
-
- If `--debug`: debug_log `[
|
|
370
|
+
- If `--debug`: debug_log `[FLOW] iter=N phase=worker_done_signal engine=<engine> status=<stop_status> us_id=<us_id>`
|
|
269
371
|
|
|
270
372
|
**CRITICAL: Immediately proceed to ⑦. Do NOT pause, do NOT ask the user, do NOT wait for confirmation. The loop is autonomous.**
|
|
271
373
|
|
|
@@ -282,7 +384,8 @@ Bash("codex exec --model <worker_codex_model> --reasoning-effort <worker_codex_r
|
|
|
282
384
|
- Add that US to `verified_us` in status.json
|
|
283
385
|
- If more US remain → Worker does next US → verify → ...
|
|
284
386
|
- If all US individually passed → signal final full verify (us_id=ALL)
|
|
285
|
-
-
|
|
387
|
+
- **Sequential final verify** (timeout prevention): Instead of one big ALL verify, loop through each US individually with scoped verifier. After all per-US pass, run the project's test suite as a cross-US integration check. Only COMPLETE if both per-US checks and integration check pass.
|
|
388
|
+
- After sequential final verify passes → COMPLETE
|
|
286
389
|
|
|
287
390
|
**Batch mode** (`--verify-mode batch`):
|
|
288
391
|
- Legacy behavior: verify only when Worker signals all work is done
|
|
@@ -291,7 +394,7 @@ Bash("codex exec --model <worker_codex_model> --reasoning-effort <worker_codex_r
|
|
|
291
394
|
**⑦a Dispatch Verifier**
|
|
292
395
|
- Note: Verifier ALWAYS records reasoning in verify-verdict.json per governance §1f. No flag needed.
|
|
293
396
|
- **Prompt Assembly Protocol (same as ④)**: Read verifier prompt file verbatim. Prepend `## WORKING_DIR: {absolute path}`. Do NOT rewrite paths.
|
|
294
|
-
- If `--debug`: debug_log `[
|
|
397
|
+
- If `--debug`: debug_log `[FLOW] iter=N phase=verifier engine=<engine> model=<model> scope=<us_id> dispatched=true`
|
|
295
398
|
|
|
296
399
|
If `--verifier-engine claude` (default):
|
|
297
400
|
```
|
|
@@ -316,7 +419,7 @@ After the primary verifier runs, run a second verifier with the OTHER engine:
|
|
|
316
419
|
- Both produce `verify-verdict.json` (Leader renames to `verify-verdict-claude.json` and `verify-verdict-codex.json`)
|
|
317
420
|
- **Both pass** → proceed (next US or COMPLETE)
|
|
318
421
|
- **Either fails** → combine issues from both verdicts into a single fix contract → Worker retry
|
|
319
|
-
- Max
|
|
422
|
+
- Max 6 consensus rounds per US. After 6 rounds → BLOCKED.
|
|
320
423
|
|
|
321
424
|
**NO ENGINE PRIORITY (ABSOLUTE):** There is no primary or secondary engine. Claude and Codex have EQUAL weight. If one passes and the other fails, the verdict is FAIL — always. The Leader MUST NOT override, prioritize, or dismiss either engine's verdict. "Claude priority", "primary engine override", "infrastructure failure" (when a valid verdict file exists), or any similar rationalization = governance violation. Infrastructure failure means ONLY: CLI crash (exit ≠ 0), timeout, or verdict file not generated.
|
|
322
425
|
|
|
@@ -330,22 +433,26 @@ After the primary verifier runs, run a second verifier with the OTHER engine:
|
|
|
330
433
|
3. Include `fix_hint` values labeled `(suggestion, non-authoritative)` if present
|
|
331
434
|
4. Include impacted tests from test-spec (so Worker can run them before and after the fix)
|
|
332
435
|
5. Increment `consecutive_failures` in `status.json`
|
|
333
|
-
6. If `consecutive_failures >=
|
|
436
|
+
6. If `consecutive_failures >= cb_threshold` for same US → **Architecture Escalation** (governance §7¾): stop fixing, report to user
|
|
437
|
+
- If `--debug`: debug_log `[GOV] iter=N phase=CB_trigger consecutive_failures=<N> us_id=<us_id> action=architecture_escalation`
|
|
334
438
|
7. Go to ⑧ with fix contract as next Worker contract
|
|
335
439
|
- `request_info` → Leader reads Verifier's questions, decides outcome (or relays to Worker in next contract) → go to ⑧
|
|
336
440
|
- `blocked` → write BLOCKED sentinel, stop
|
|
337
|
-
- If `--debug`: debug_log `[
|
|
338
|
-
- If `--debug`: debug_log `[
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
441
|
+
- If `--debug`: debug_log `[GOV] iter=N phase=verdict engine=<engine> verdict=<pass|fail|request_info> us_id=<us_id> L1=<status> L2=<status> L3=<status> L4=<status>`
|
|
442
|
+
- If `--debug`: debug_log `[GOV] iter=N phase=sufficiency test_count=<N> ac_count=<N> ratio=<N> verdict=<pass|fail>`
|
|
443
|
+
|
|
444
|
+
**⑦d Archive iteration artifacts** (always — independent of --debug)
|
|
445
|
+
After reading the verdict, archive to `logs/<slug>/`:
|
|
446
|
+
- `iter-NNN-done-claim.json` ← copy from `memos/<slug>-done-claim.json`
|
|
447
|
+
- `iter-NNN-verify-verdict.json` ← copy from `memos/<slug>-verify-verdict.json`
|
|
448
|
+
(Preserved across clean; data source for campaign report generation and SV analysis.)
|
|
342
449
|
|
|
343
450
|
**CRITICAL: Immediately proceed to ⑧. Do NOT pause, do NOT ask the user. Continue the loop.**
|
|
344
451
|
|
|
345
452
|
**⑧ Write result log and report to user, continue loop**
|
|
346
453
|
- Write `logs/<slug>/iter-NNN.result.md`:
|
|
347
454
|
- Result status `[leader-measured]`
|
|
348
|
-
- Files changed via `git diff --stat HEAD
|
|
455
|
+
- Files changed: cumulative working tree state via `git diff --stat HEAD` `[git-measured]` (note: cumulative in tmux mode, not per-iteration delta)
|
|
349
456
|
- Verifier verdict `[leader-measured]`
|
|
350
457
|
- **Record cost & performance per iteration**:
|
|
351
458
|
- Agent mode: record `total_tokens` and `duration_ms` from Agent() return metadata for both Worker and Verifier
|
|
@@ -354,18 +461,19 @@ After the primary verifier runs, run a second verifier with the OTHER engine:
|
|
|
354
461
|
- Write `status.json`
|
|
355
462
|
- Report via tool call: `Bash("echo 'Iter N | US-NNN | verdict | model | next_action'")` — NEVER plain text. This keeps the turn alive for the next iteration.
|
|
356
463
|
- **Always**: append to baseline.log: `[timestamp] iter=N verdict=<pass|fail|continue> us=<us_id> model=<worker_model>`
|
|
357
|
-
-
|
|
464
|
+
- **Always**: append JSONL to `~/.claude/ralph-desk/analytics/<slug>/campaign.jsonl`: `{"iter":N,"us_id":"US-NNN","verdict":"pass|fail","worker_model":"...","worker_engine":"...","verifier_model":"...","verifier_engine":"...","duration_worker_s":N,"duration_verifier_s":N,"timestamp":"ISO8601"}`
|
|
465
|
+
- If `--debug`: debug_log `[FLOW] iter=N phase=result status=<result> consecutive_failures=<N> verified_us=<list>`
|
|
358
466
|
|
|
359
467
|
At loop end (COMPLETE, BLOCKED, or TIMEOUT):
|
|
360
|
-
- If `--debug`: debug_log `[
|
|
468
|
+
- If `--debug`: debug_log `[FLOW] result=<COMPLETE|BLOCKED|TIMEOUT> iterations=<N> verified_us=<list>`
|
|
361
469
|
|
|
362
470
|
**⑨ Campaign Self-Verification** (when `--with-self-verification` is enabled):
|
|
363
471
|
|
|
364
472
|
After the loop ends, the Leader performs post-campaign analysis:
|
|
365
473
|
|
|
366
474
|
1. **Collect data**: Read all archived `iter-NNN.result.md`, done-claim.json (with execution_steps), and verify-verdict.json (with reasoning) from `logs/<slug>/`
|
|
367
|
-
2. **Write cumulative data**:
|
|
368
|
-
3. **Generate versioned report**:
|
|
475
|
+
2. **Write cumulative data**: `~/.claude/ralph-desk/analytics/<slug>/self-verification-data.json` — normalized iteration records (agent-mode only artifact)
|
|
476
|
+
3. **Generate versioned report**: `~/.claude/ralph-desk/analytics/<slug>/self-verification-report-NNN.md` (NNN = auto-increment from existing reports)
|
|
369
477
|
4. **Report to user**: Display the full report content
|
|
370
478
|
|
|
371
479
|
Report template (10 sections):
|
|
@@ -421,6 +529,23 @@ PRD, and test-spec. Information from source code inspection that is not in these
|
|
|
421
529
|
or explicitly marked as "[source-inspection]" with justification.
|
|
422
530
|
```
|
|
423
531
|
|
|
532
|
+
**⑩ Campaign Report** (always — independent of `--debug` and `--with-self-verification`)
|
|
533
|
+
|
|
534
|
+
After the loop ends (COMPLETE, BLOCKED, or TIMEOUT), generate `logs/<slug>/campaign-report.md`:
|
|
535
|
+
|
|
536
|
+
1. If `campaign-report.md` already exists, rename it to `campaign-report-v{N}.md` (N = next available integer ≥ 1) before writing new.
|
|
537
|
+
2. Generate report with 8 required sections:
|
|
538
|
+
- **Objective**: From PRD
|
|
539
|
+
- **Execution Summary**: Iterations run, terminal state (COMPLETE/BLOCKED/TIMEOUT), elapsed time
|
|
540
|
+
- **US Status**: Each US with final verified/failed/pending status (from `status.json`)
|
|
541
|
+
- **Verification Results**: Per-US and final verify outcomes (from archived iter artifacts)
|
|
542
|
+
- **Issues Encountered**: Fix contracts and failure verdicts from campaign
|
|
543
|
+
- **Cost & Performance**: Per-iter token/duration data from `status.json`
|
|
544
|
+
- **SV Summary**: If `--with-self-verification` ran, pointer to SV report file; otherwise "N/A — --with-self-verification not enabled"
|
|
545
|
+
- **Files Changed**: `git diff --stat <baseline_commit>` (working tree vs baseline, includes uncommitted changes and untracked files). Note: may include pre-existing uncommitted changes if the campaign started in a dirty worktree.
|
|
546
|
+
3. Data sources: `status.json` (baseline_commit, per-iter data), archived `iter-NNN-done-claim.json` / `iter-NNN-verify-verdict.json`, PRD, git diff.
|
|
547
|
+
4. If `--with-self-verification` was enabled: ⑨ SV report runs first, then ⑩ Campaign Report (which includes the SV Summary section pointing to the SV report file).
|
|
548
|
+
|
|
424
549
|
### Circuit Breaker
|
|
425
550
|
- context-latest.md unchanged 3 iterations → BLOCKED
|
|
426
551
|
- Same acceptance criterion fails 2 consecutive iterations → upgrade model, retry once, then BLOCKED
|
|
@@ -441,11 +566,27 @@ When `--verify-consensus` is enabled, also track in `status.json`:
|
|
|
441
566
|
- YOU track iteration count
|
|
442
567
|
- Write `status.json` after each iteration
|
|
443
568
|
- Worker claim ≠ complete. Only YOU write COMPLETE sentinel after verifier pass.
|
|
569
|
+
- **NEVER modify rlp-desk infrastructure files** (`~/.claude/ralph-desk/*`, `~/.claude/commands/rlp-desk.md`). If you or a Worker/Verifier discovers a bug in rlp-desk itself, write BLOCKED sentinel with reason `"rlp-desk bug: <description>"` and STOP. Do NOT attempt to fix rlp-desk — report the bug to the user.
|
|
444
570
|
|
|
445
571
|
---
|
|
446
572
|
|
|
447
573
|
## `status <slug>`
|
|
448
|
-
Read `.claude/ralph-desk/logs/<slug>/status.json` and display
|
|
574
|
+
Read `.claude/ralph-desk/logs/<slug>/runtime/status.json` and display a detailed report:
|
|
575
|
+
|
|
576
|
+
```
|
|
577
|
+
Campaign: <slug>
|
|
578
|
+
Iteration: <iteration> / <max_iter>
|
|
579
|
+
Phase: <phase> | Last Result: <last_result>
|
|
580
|
+
Worker Model: <worker_model> (<worker_engine>) | Verifier Model: <verifier_model> (<verifier_engine>)
|
|
581
|
+
Verify Mode: <verify_mode> | Consensus: <verify_consensus>
|
|
582
|
+
Consecutive Failures: <consecutive_failures>
|
|
583
|
+
Verified US: <verified_us array, comma-separated>
|
|
584
|
+
Updated: <updated_at_utc> (elapsed: now - updated_at)
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
If `status.json` does not exist, display "No active campaign for <slug>."
|
|
588
|
+
If the campaign has a `complete` or `blocked` sentinel, show that status prominently.
|
|
589
|
+
Read the last `verify-verdict.json` to show the most recent verdict summary and any failure issues.
|
|
449
590
|
|
|
450
591
|
## `logs <slug> [N]`
|
|
451
592
|
- No N: show latest `iter-*.worker-prompt.md` summary
|
|
@@ -459,25 +600,64 @@ Remove:
|
|
|
459
600
|
- `.claude/ralph-desk/memos/<slug>-verify-verdict.json`
|
|
460
601
|
- `.claude/ralph-desk/memos/<slug>-iter-signal.json`
|
|
461
602
|
- `.claude/ralph-desk/logs/<slug>/circuit-breaker.json`
|
|
462
|
-
- `.claude/ralph-desk/logs/<slug>/session-config.json`
|
|
463
|
-
- `.claude/ralph-desk/logs/<slug>/worker-heartbeat.json`
|
|
464
|
-
- `.claude/ralph-desk/logs/<slug>/verifier-heartbeat.json`
|
|
603
|
+
- `.claude/ralph-desk/logs/<slug>/runtime/session-config.json`
|
|
604
|
+
- `.claude/ralph-desk/logs/<slug>/runtime/worker-heartbeat.json`
|
|
605
|
+
- `.claude/ralph-desk/logs/<slug>/runtime/verifier-heartbeat.json`
|
|
465
606
|
- `.claude/ralph-desk/memos/<slug>-escalation.md`
|
|
466
|
-
Note: `
|
|
607
|
+
Note: `campaign-report.md`, `campaign-report-v{N}.md`, `iter-NNN-done-claim.json`, and `iter-NNN-verify-verdict.json` are intentionally preserved across clean for historical comparison. Analytics files (`debug.log`, `campaign.jsonl`, `self-verification-data.json`, `self-verification-report-NNN.md`) at `~/.claude/ralph-desk/analytics/<slug>/` are NOT affected by project-level clean.
|
|
467
608
|
|
|
468
|
-
If `--kill-session` is passed, clean up
|
|
609
|
+
If `--kill-session` is passed, clean up Worker/Verifier tmux panes using session-config.json:
|
|
469
610
|
```bash
|
|
470
|
-
#
|
|
471
|
-
|
|
611
|
+
# Read pane IDs from session-config.json (safe — targets only Worker/Verifier panes)
|
|
612
|
+
SESSION_CONFIG=".claude/ralph-desk/logs/<slug>/runtime/session-config.json"
|
|
613
|
+
if [ -f "$SESSION_CONFIG" ] && command -v jq &>/dev/null; then
|
|
614
|
+
WORKER_PANE=$(jq -r '.panes.worker // empty' "$SESSION_CONFIG")
|
|
615
|
+
VERIFIER_PANE=$(jq -r '.panes.verifier // empty' "$SESSION_CONFIG")
|
|
616
|
+
|
|
617
|
+
for pane_id in "$WORKER_PANE" "$VERIFIER_PANE"; do
|
|
618
|
+
if [ -n "$pane_id" ]; then
|
|
619
|
+
tmux send-keys -t "$pane_id" C-c 2>/dev/null
|
|
620
|
+
tmux send-keys -t "$pane_id" "/exit" Enter 2>/dev/null
|
|
621
|
+
fi
|
|
622
|
+
done
|
|
623
|
+
sleep 2
|
|
624
|
+
for pane_id in "$WORKER_PANE" "$VERIFIER_PANE"; do
|
|
625
|
+
if [ -n "$pane_id" ]; then
|
|
626
|
+
tmux kill-pane -t "$pane_id" 2>/dev/null
|
|
627
|
+
fi
|
|
628
|
+
done
|
|
629
|
+
else
|
|
630
|
+
echo "WARNING: session-config.json not found or jq not installed."
|
|
631
|
+
echo "Cannot safely identify Worker/Verifier panes. Kill them manually."
|
|
632
|
+
fi
|
|
633
|
+
```
|
|
634
|
+
**CRITICAL: NEVER use `grep -i 'claude\|codex'` to find panes to kill.** The user's own Claude Code session matches those patterns. Always use the specific pane IDs from session-config.json.
|
|
635
|
+
|
|
636
|
+
## `analytics [slug]`
|
|
637
|
+
|
|
638
|
+
Cross-project analytics dashboard. Scans `~/.claude/ralph-desk/analytics/` for all campaign data.
|
|
639
|
+
|
|
640
|
+
- No slug: show summary across all projects (total campaigns, pass/fail rate, average iterations, total cost)
|
|
641
|
+
- With slug: show detailed analytics for that project (per-US pass rate, model upgrade frequency, iteration distribution, cost per US)
|
|
472
642
|
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
done
|
|
643
|
+
Data sources:
|
|
644
|
+
- `campaign.jsonl` — per-iteration structured records
|
|
645
|
+
- `metadata.json` — project root, campaign status, timestamps
|
|
646
|
+
- `self-verification-data.json` — campaign-level quality metrics
|
|
478
647
|
|
|
479
|
-
|
|
480
|
-
|
|
648
|
+
## `resume <slug>`
|
|
649
|
+
|
|
650
|
+
Resume a previously interrupted campaign. Equivalent to `run <slug>` but explicitly restores state:
|
|
651
|
+
|
|
652
|
+
1. Read `.claude/ralph-desk/logs/<slug>/runtime/status.json` for `verified_us`, `iteration`, `consecutive_failures`
|
|
653
|
+
2. Read `.claude/ralph-desk/memos/<slug>-memory.md` for completed stories and next iteration contract
|
|
654
|
+
3. Check for sentinels (`complete.md`, `blocked.md`) — if present, inform user and stop
|
|
655
|
+
4. If no sentinels, invoke `run <slug>` with the same options from the previous session (stored in status.json fields: `worker_model`, `verifier_model`, `verify_mode`, `verify_consensus`)
|
|
656
|
+
5. The runner automatically restores `verified_us` from memory or status.json on startup
|
|
657
|
+
|
|
658
|
+
Example:
|
|
659
|
+
```
|
|
660
|
+
/rlp-desk resume my-feature
|
|
481
661
|
```
|
|
482
662
|
|
|
483
663
|
## No args or `help`
|
|
@@ -503,7 +683,9 @@ Run options:
|
|
|
503
683
|
--verify-mode per-us|batch Verification strategy (default: per-us)
|
|
504
684
|
--verify-consensus Cross-engine consensus verification
|
|
505
685
|
--consensus-scope SCOPE When consensus runs: all|final-only (default: all)
|
|
506
|
-
--
|
|
686
|
+
--cb-threshold N CB threshold: consecutive failures before BLOCKED (default: 3)
|
|
687
|
+
--iter-timeout N Per-iteration timeout in seconds, tmux mode only (default: 600)
|
|
688
|
+
--debug Debug logging (~/.claude/ralph-desk/analytics/<slug>/debug.log)
|
|
507
689
|
--with-self-verification Campaign self-verification analysis (post-loop report)
|
|
508
690
|
```
|
|
509
691
|
|