@ai-dev-methodologies/rlp-desk 0.3.6 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +145 -69
- package/docs/blueprints/blueprint-v0.4-evolution.md +347 -0
- package/docs/plans/cozy-gliding-trinket.md +53 -0
- package/docs/plans/toasty-whistling-diffie-agent-a6814625642e956da.md +201 -0
- package/docs/plans/toasty-whistling-diffie.md +117 -0
- package/docs/prompts/ralplan-codex-review.md +55 -0
- package/install.sh +5 -0
- package/package.json +1 -1
- package/scripts/postinstall.js +5 -0
- package/scripts/uninstall.js +1 -0
- package/src/commands/rlp-desk.md +252 -70
- package/src/governance.md +63 -28
- package/src/model-upgrade-table.md +50 -0
- package/src/scripts/init_ralph_desk.zsh +329 -13
- package/src/scripts/lib_ralph_desk.zsh +837 -0
- package/src/scripts/run_ralph_desk.zsh +978 -482
package/src/governance.md
CHANGED
|
@@ -11,6 +11,7 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
|
|
|
11
11
|
- **Filesystem = memory**: State exists only on the filesystem (PRD, memory, context, memos).
|
|
12
12
|
- **Worker claim ≠ complete**: A Worker's DONE is merely a claim. The Verifier must independently verify before it's confirmed.
|
|
13
13
|
- **Worker scope is bounded**: Worker implements only the contracted US per iteration (Scope Lock). Out-of-scope changes are flagged by the Verifier.
|
|
14
|
+
- **Worker must NEVER modify Claude Code settings** (settings.json, settings.local.json). Permission prompts must be reported as blocked, not bypassed by editing settings.
|
|
14
15
|
- **Verifier is independent**: The Verifier judges based on evidence alone, without knowledge of the Worker's reasoning process.
|
|
15
16
|
- **Sentinels are Leader-owned**: Only the Leader writes COMPLETE/BLOCKED sentinels.
|
|
16
17
|
- **Supported engines**: claude (default; models: haiku, sonnet, opus) and codex (opt-in via `--worker-engine codex` / `--verifier-engine codex`).
|
|
@@ -199,14 +200,27 @@ This is the default behavior, not an optional flag. Without it, IL-1 (Evidence M
|
|
|
199
200
|
### Worker: execution_steps in done-claim.json
|
|
200
201
|
Worker records what was done, in what order, with command evidence in `done-claim.json`:
|
|
201
202
|
- Each step includes: what action, which AC, command executed, exit code, summary
|
|
202
|
-
- Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`
|
|
203
|
+
- Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
|
|
203
204
|
- This proves the Worker followed test-first approach and did not skip steps
|
|
205
|
+
- **Existing implementation rule**: When code already exists from a prior iteration/campaign, Worker MAY use `verify_existing` instead of `write_test → verify_red → implement → verify_green`. `verify_existing` requires: run all existing tests, record exit codes, confirm all AC are covered by passing tests. Worker MUST NOT skip recording evidence — `verify_existing` is evidence that existing code satisfies AC, not a shortcut to skip verification.
|
|
204
206
|
|
|
205
207
|
### Verifier: reasoning in verify-verdict.json
|
|
206
208
|
Verifier records WHY each judgment was made in `verify-verdict.json`:
|
|
207
209
|
- Each check includes: what was checked, decision (pass/fail), and the specific evidence basis
|
|
208
|
-
- Checks include: IL-1 Evidence Gate, Layer Enforcement, Test Sufficiency, Anti-Gaming, Worker Process Audit
|
|
210
|
+
- Checks include: IL-1 Evidence Gate, Layer Enforcement, Test Sufficiency, Anti-Gaming, Worker Process Audit, Test Coverage Audit
|
|
209
211
|
- This proves the Verifier actually performed each check rather than rubber-stamping
|
|
212
|
+
- **Test Coverage Audit (mandatory)**: Verifier MUST check that tests cover ALL code paths, not just happy paths. Specifically:
|
|
213
|
+
- Every branch in `case` statements must have a test (e.g., all model types in get_next_model)
|
|
214
|
+
- Every engine/model combination must be tested (claude, codex 5.4, spark — not just 1-2)
|
|
215
|
+
- Every ceiling/boundary must be tested (not just "opus ceiling" — also spark ceiling, 5.4 ceiling)
|
|
216
|
+
- If Worker's tests cover only 2 of 3 engine paths, verdict MUST be fail with "test coverage gap" issue
|
|
217
|
+
- "Tests pass" is NOT sufficient — "tests cover all code paths" is required
|
|
218
|
+
- **Integration Test (mandatory when functions call other functions)**: Verifier MUST check that function call chains produce correct end-to-end results, not just that each function works in isolation. Specifically:
|
|
219
|
+
- If function A's output is function B's input, there MUST be a test that runs A→B together and verifies the result
|
|
220
|
+
- Example: `get_model_string()` returns "gpt-5.3-codex-spark:medium" — `get_next_model()` must accept that exact value and return the correct upgrade. A test must verify this chain.
|
|
221
|
+
- Unit tests (extract_fn + isolated run) are necessary but NOT sufficient for refactored code
|
|
222
|
+
- Structural tests (grep for function existence) are necessary but NOT sufficient
|
|
223
|
+
- "All unit tests pass" does NOT prove the system works — integration tests prove it
|
|
210
224
|
|
|
211
225
|
### Why This Is Default (Not Optional)
|
|
212
226
|
- IL-1 says "no claims without evidence" — this applies to Worker AND Verifier
|
|
@@ -373,28 +387,38 @@ Characteristics:
|
|
|
373
387
|
```
|
|
374
388
|
.claude/ralph-desk/
|
|
375
389
|
├── prompts/
|
|
376
|
-
│ ├── <slug>.worker.prompt.md # Worker base prompt
|
|
377
|
-
│ └── <slug>.verifier.prompt.md # Verifier base prompt
|
|
390
|
+
│ ├── <slug>.worker.prompt.md # Worker base prompt (regenerated on re-execution)
|
|
391
|
+
│ └── <slug>.verifier.prompt.md # Verifier base prompt (regenerated on re-execution)
|
|
378
392
|
├── context/
|
|
379
|
-
│ └── <slug>-latest.md # Current frontier
|
|
393
|
+
│ └── <slug>-latest.md # Current frontier; Worker updates (reset to template on re-execution)
|
|
380
394
|
├── memos/
|
|
381
|
-
│ ├── <slug>-memory.md # Campaign memory
|
|
382
|
-
│ ├── <slug>-done-claim.json # Worker's completion claim (runtime)
|
|
383
|
-
│ ├── <slug>-iter-signal.json # Worker's iteration signal (runtime)
|
|
384
|
-
│ ├── <slug>-verify-verdict.json # Verifier's verdict (runtime)
|
|
385
|
-
│ ├── <slug>-escalation.md
|
|
386
|
-
│ ├── <slug>-complete.md # SENTINEL (Leader only)
|
|
387
|
-
│ └── <slug>-blocked.md # SENTINEL (Leader only)
|
|
395
|
+
│ ├── <slug>-memory.md # Campaign memory; Worker updates (reset to template on re-execution)
|
|
396
|
+
│ ├── <slug>-done-claim.json # Worker's completion claim (runtime; deleted on re-execution)
|
|
397
|
+
│ ├── <slug>-iter-signal.json # Worker's iteration signal (runtime; deleted on re-execution)
|
|
398
|
+
│ ├── <slug>-verify-verdict.json # Verifier's verdict (runtime; deleted on re-execution)
|
|
399
|
+
│ ├── <slug>-escalation.md # Architecture escalation report (tmux mode, §7¾; deleted on re-execution)
|
|
400
|
+
│ ├── <slug>-complete.md # SENTINEL (Leader only; deleted on re-execution)
|
|
401
|
+
│ └── <slug>-blocked.md # SENTINEL (Leader only; deleted on re-execution)
|
|
388
402
|
├── plans/
|
|
389
|
-
│ ├── prd-<slug>.md # PRD (
|
|
390
|
-
│ └── test-spec-<slug>.md # Verification criteria
|
|
391
|
-
└── logs/<slug>/
|
|
392
|
-
├──
|
|
393
|
-
├── iter-NNN.
|
|
394
|
-
├── iter-NNN.
|
|
395
|
-
├──
|
|
396
|
-
├──
|
|
397
|
-
|
|
403
|
+
│ ├── prd-<slug>.md # PRD (in-place: --mode improve | deleted: --mode fresh)
|
|
404
|
+
│ └── test-spec-<slug>.md # Verification criteria (regenerated on re-execution)
|
|
405
|
+
└── logs/<slug>/ # Project-level operational data
|
|
406
|
+
├── campaign-report.md # Campaign summary (versioned: campaign-report-v{N}.md on re-execution)
|
|
407
|
+
├── iter-NNN.worker-prompt.md # Audit trail prompt copy (deleted on re-execution)
|
|
408
|
+
├── iter-NNN.verifier-prompt.md # Audit trail prompt copy (deleted on re-execution)
|
|
409
|
+
├── iter-NNN.result.md # Iteration result (deleted on re-execution)
|
|
410
|
+
├── iter-NNN-done-claim.json # Archived done-claim per iteration (deleted on re-execution)
|
|
411
|
+
├── iter-NNN-verify-verdict.json # Archived verdict per iteration (deleted on re-execution)
|
|
412
|
+
├── status.json # Leader's loop state (deleted on re-execution)
|
|
413
|
+
├── baseline.log # Baseline capture (deleted on re-execution)
|
|
414
|
+
└── cost-log.jsonl # Per-iteration cost log (deleted on re-execution)
|
|
415
|
+
|
|
416
|
+
~/.claude/ralph-desk/analytics/<slug>--<root_hash>/ # User-level cross-project analytics
|
|
417
|
+
├── metadata.json # Campaign metadata (slug, project_root, status, times)
|
|
418
|
+
├── debug.log # Debug output (versioned: debug-v{N}.log on re-execution)
|
|
419
|
+
├── campaign.jsonl # Per-iteration structured data (versioned: campaign-v{N}.jsonl)
|
|
420
|
+
├── self-verification-data.json # Cumulative SV data (agent-mode only, --with-self-verification)
|
|
421
|
+
└── self-verification-report-NNN.md # Versioned SV report (--with-self-verification; NNN auto-increment)
|
|
398
422
|
```
|
|
399
423
|
|
|
400
424
|
## 7. Leader Loop Protocol
|
|
@@ -443,8 +467,19 @@ for iteration in 1..max_iter:
|
|
|
443
467
|
• fail + continue → go to ⑧
|
|
444
468
|
• blocked → write BLOCKED sentinel, stop
|
|
445
469
|
|
|
470
|
+
⑦d Archive iteration artifacts (after verdict read, before next prep)
|
|
471
|
+
- Archive done-claim.json → logs/<slug>/iter-NNN-done-claim.json
|
|
472
|
+
- Archive verify-verdict.json → logs/<slug>/iter-NNN-verify-verdict.json
|
|
473
|
+
(Preserved across clean; data source for Campaign Report and SV analysis)
|
|
474
|
+
|
|
446
475
|
⑧ Write iter-NNN.result.md to logs/<slug>/ (result status + git diff --stat)
|
|
447
476
|
Update status.json, report to user, continue to next iteration
|
|
477
|
+
|
|
478
|
+
After loop end (COMPLETE, BLOCKED, TIMEOUT):
|
|
479
|
+
⑧½ Campaign Report (always — independent of --debug)
|
|
480
|
+
- Generate logs/<slug>/campaign-report.md with 8 sections
|
|
481
|
+
- Version existing report to campaign-report-v{N}.md before writing new
|
|
482
|
+
- Data: status.json (baseline_commit, per-iter), archived iter artifacts, PRD, git diff
|
|
448
483
|
```
|
|
449
484
|
|
|
450
485
|
## 7a. Per-US Verification
|
|
@@ -480,7 +515,7 @@ Worker completes US → signal verify
|
|
|
480
515
|
→ Codex Verifier runs (checks AC)
|
|
481
516
|
→ Both pass → proceed (next US or COMPLETE)
|
|
482
517
|
→ Either fails → combined issues → fix contract → Worker retry
|
|
483
|
-
→ Max
|
|
518
|
+
→ Max 6 consensus rounds per US → BLOCKED if still disagreeing
|
|
484
519
|
|
|
485
520
|
**NO ENGINE PRIORITY:** Claude and Codex have equal weight. If one passes and the other fails, the verdict is FAIL. No engine may be prioritized or dismissed. Infrastructure failure = CLI crash, timeout, or verdict file not generated — NOT a valid verdict with verdict=fail.
|
|
486
521
|
```
|
|
@@ -518,12 +553,12 @@ Every change must be justified by the issue it addresses.
|
|
|
518
553
|
|
|
519
554
|
## 7¾. Architecture Escalation
|
|
520
555
|
|
|
521
|
-
Note: Circuit Breaker (§8) fires first at 2 consecutive failures (model upgrade + retry). If the retry also fails (
|
|
556
|
+
Note: Circuit Breaker (§8) fires first at 2 consecutive failures (model upgrade + retry — Path A: Agent-mode only; in tmux mode the shell CB triggers directly without model upgrade). If the retry also fails (`cb_threshold` reached), Architecture Escalation applies. The CB retry counts toward the consecutive_failures counter.
|
|
522
557
|
|
|
523
|
-
If
|
|
558
|
+
If `cb_threshold` or more consecutive fix attempts fail for the same US:
|
|
524
559
|
|
|
525
560
|
1. **STOP fixing symptoms** — the problem is likely architectural, not a bug.
|
|
526
|
-
2. **Leader reports to user**: "
|
|
561
|
+
2. **Leader reports to user**: "`cb_threshold` consecutive fix attempts failed for US-{id}. This suggests an architectural issue, not a simple bug."
|
|
527
562
|
3. **Include in report**:
|
|
528
563
|
- What was attempted in each fix
|
|
529
564
|
- What specifically kept failing
|
|
@@ -538,14 +573,14 @@ In tmux mode: Leader writes `<slug>-escalation.md` with the report and sets BLOC
|
|
|
538
573
|
| Condition | Verdict |
|
|
539
574
|
|-----------|---------|
|
|
540
575
|
| context-latest.md unchanged for 3 consecutive iterations | BLOCKED |
|
|
541
|
-
| Same acceptance criterion fails 2 consecutive iterations | Upgrade model, retry once; if still failing → Architecture Escalation (§7¾) → BLOCKED |
|
|
542
|
-
|
|
|
576
|
+
| Same acceptance criterion fails 2 consecutive iterations | Upgrade model, retry once (Agent mode only; tmux: same model retry); if still failing → Architecture Escalation (§7¾) → BLOCKED |
|
|
577
|
+
| `cb_threshold` consecutive **fail** verdicts on `cb_threshold` unique criterion IDs | Upgrade to opus, retry once; if still failing → BLOCKED (adjustable via `--cb-threshold`) |
|
|
543
578
|
| max_iter reached | TIMEOUT (report to user) |
|
|
544
579
|
|
|
545
580
|
The Leader tracks `consecutive_failures` in `status.json`:
|
|
546
581
|
- Increments on `fail`, resets on `pass`, **unchanged by `request_info`**.
|
|
547
582
|
- "Same error" = same acceptance criterion ID in two consecutive **fail** verdicts (`request_info` does not break or contribute to this chain).
|
|
548
|
-
- "Diverse failures" =
|
|
583
|
+
- "Diverse failures" = `cb_threshold` most recent `fail` verdicts each have a unique criterion ID.
|
|
549
584
|
|
|
550
585
|
## 9. Change Policy
|
|
551
586
|
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# Model Upgrade Table
|
|
2
|
+
|
|
3
|
+
Progressive Worker model upgrade on consecutive failure per US.
|
|
4
|
+
CB default: 6. Override: `--cb-threshold N`. Worker only — Verifier fixed at campaign start.
|
|
5
|
+
|
|
6
|
+
## Rules
|
|
7
|
+
- Each row = 2-attempt window (same model for 2 consecutive fails)
|
|
8
|
+
- Ceiling reached → repeat same model until CB
|
|
9
|
+
- CB < table columns → BLOCKED at that column
|
|
10
|
+
- CB > 6 → repeat ceiling model beyond column 6
|
|
11
|
+
|
|
12
|
+
## GPT Pro (spark — separate token limit)
|
|
13
|
+
|
|
14
|
+
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
15
|
+
|------------|-----|-----|-----|-----|
|
|
16
|
+
| LOW | spark:low | spark:medium | spark:high | BLOCKED |
|
|
17
|
+
| MEDIUM | spark:medium | spark:high | spark:xhigh | BLOCKED |
|
|
18
|
+
| HIGH | spark:high | spark:xhigh | spark:xhigh | BLOCKED |
|
|
19
|
+
| CRITICAL | spark:xhigh | spark:xhigh | spark:xhigh | BLOCKED |
|
|
20
|
+
|
|
21
|
+
## Non-Pro (gpt-5.4)
|
|
22
|
+
|
|
23
|
+
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
24
|
+
|------------|-----|-----|-----|-----|
|
|
25
|
+
| LOW | 5.4:low | 5.4:medium | 5.4:high | BLOCKED |
|
|
26
|
+
| MEDIUM | 5.4:medium | 5.4:high | 5.4:xhigh | BLOCKED |
|
|
27
|
+
| HIGH | 5.4:high | 5.4:xhigh | 5.4:xhigh | BLOCKED |
|
|
28
|
+
| CRITICAL | 5.4:xhigh | 5.4:xhigh | 5.4:xhigh | BLOCKED |
|
|
29
|
+
|
|
30
|
+
## Claude-only
|
|
31
|
+
|
|
32
|
+
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
33
|
+
|------------|-----|-----|-----|-----|
|
|
34
|
+
| LOW | haiku | sonnet | opus | BLOCKED |
|
|
35
|
+
| MEDIUM | sonnet | opus | opus | BLOCKED |
|
|
36
|
+
| HIGH | sonnet | opus | opus | BLOCKED |
|
|
37
|
+
| CRITICAL | opus | opus | opus | BLOCKED |
|
|
38
|
+
|
|
39
|
+
## Complexity Evaluation (brainstorm determines this)
|
|
40
|
+
|
|
41
|
+
| Factor | LOW | MEDIUM | HIGH | CRITICAL |
|
|
42
|
+
|--------|-----|--------|------|----------|
|
|
43
|
+
| US count | 1-2 | 3-5 | 6-10 | 10+ |
|
|
44
|
+
| File scope | single | 2-5 | 6+ | cross-repo |
|
|
45
|
+
| Logic | simple CRUD | conditionals | algorithms | security/crypto |
|
|
46
|
+
| Dependencies | none | 1-2 | 3+ API/DB | distributed |
|
|
47
|
+
| Code impact | new only | modify existing | refactor | architecture change |
|
|
48
|
+
|
|
49
|
+
Overall complexity = highest factor level.
|
|
50
|
+
Campaign starting model = lowest US risk level (progressive upgrade handles harder US).
|
|
@@ -8,22 +8,280 @@ set -euo pipefail
|
|
|
8
8
|
# Creates project-local scaffold in: .claude/ralph-desk/
|
|
9
9
|
#
|
|
10
10
|
# Usage:
|
|
11
|
-
# ~/.claude/ralph-desk/init_ralph_desk.zsh <slug> [objective]
|
|
11
|
+
# ~/.claude/ralph-desk/init_ralph_desk.zsh <slug> [objective] [--mode fresh|improve]
|
|
12
12
|
# =============================================================================
|
|
13
13
|
|
|
14
|
-
SLUG="${1:?Usage: $0 <slug> [objective]}"
|
|
15
|
-
|
|
14
|
+
SLUG="${1:?Usage: $0 <slug> [objective] [--mode fresh|improve]}"
|
|
15
|
+
MODE=""
|
|
16
|
+
OBJECTIVE="TBD - fill in the objective"
|
|
17
|
+
|
|
18
|
+
# Parse remaining arguments: --mode fresh|improve + optional positional objective
|
|
19
|
+
shift
|
|
20
|
+
while [[ $# -gt 0 ]]; do
|
|
21
|
+
case "$1" in
|
|
22
|
+
--mode)
|
|
23
|
+
MODE="${2:?--mode requires an argument: fresh|improve}"
|
|
24
|
+
shift 2
|
|
25
|
+
;;
|
|
26
|
+
--mode=*)
|
|
27
|
+
MODE="${1#--mode=}"
|
|
28
|
+
shift
|
|
29
|
+
;;
|
|
30
|
+
*)
|
|
31
|
+
OBJECTIVE="$1"
|
|
32
|
+
shift
|
|
33
|
+
;;
|
|
34
|
+
esac
|
|
35
|
+
done
|
|
36
|
+
|
|
16
37
|
ROOT="${ROOT:-$PWD}"
|
|
17
38
|
DESK="$ROOT/.claude/ralph-desk"
|
|
18
39
|
RUNNER_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
19
40
|
|
|
41
|
+
# --- Re-execution versioning helpers ---
|
|
42
|
+
# Handles ONLY debug.log and campaign-report.md versioning.
|
|
43
|
+
# SV reports use their own -NNN auto-increment pattern and are NOT handled here.
|
|
44
|
+
|
|
45
|
+
detect_next_version() {
|
|
46
|
+
local file_path="$1"
|
|
47
|
+
local dir base ext n=1
|
|
48
|
+
dir="$(dirname "$file_path")"
|
|
49
|
+
base="$(basename "$file_path")"
|
|
50
|
+
if [[ "$base" == *.* ]]; then
|
|
51
|
+
ext=".${base##*.}"
|
|
52
|
+
base="${base%.*}"
|
|
53
|
+
else
|
|
54
|
+
ext=""
|
|
55
|
+
fi
|
|
56
|
+
while [[ -f "$dir/${base}-v${n}${ext}" ]]; do
|
|
57
|
+
(( n++ ))
|
|
58
|
+
done
|
|
59
|
+
echo "$n"
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
version_file() {
|
|
63
|
+
local file_path="$1"
|
|
64
|
+
if [[ -f "$file_path" ]]; then
|
|
65
|
+
local n dir base ext
|
|
66
|
+
n="$(detect_next_version "$file_path")"
|
|
67
|
+
dir="$(dirname "$file_path")"
|
|
68
|
+
base="$(basename "$file_path")"
|
|
69
|
+
if [[ "$base" == *.* ]]; then
|
|
70
|
+
ext=".${base##*.}"
|
|
71
|
+
base="${base%.*}"
|
|
72
|
+
else
|
|
73
|
+
ext=""
|
|
74
|
+
fi
|
|
75
|
+
mv "$file_path" "$dir/${base}-v${n}${ext}"
|
|
76
|
+
echo " Versioned: $(basename "$file_path") → ${base}-v${n}${ext}"
|
|
77
|
+
fi
|
|
78
|
+
# Non-existent files silently skipped (no error)
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
# --- PRD/test-spec per-US splitting helpers ---
|
|
82
|
+
|
|
83
|
+
split_prd_by_us() {
|
|
84
|
+
local prd_file="$1"
|
|
85
|
+
local slug="$2"
|
|
86
|
+
local plans_dir
|
|
87
|
+
plans_dir="$(dirname "$prd_file")"
|
|
88
|
+
|
|
89
|
+
[[ -f "$prd_file" ]] || return 0
|
|
90
|
+
|
|
91
|
+
local us_count
|
|
92
|
+
us_count=$(grep -c "^### US-" "$prd_file" 2>/dev/null) || us_count=0
|
|
93
|
+
if [[ "$us_count" -eq 0 ]]; then
|
|
94
|
+
echo " WARNING: No US markers (### US-NNN:) found in PRD — falling back to full PRD injection" >&2
|
|
95
|
+
# Clean up any stale per-US split files from previous runs to prevent stale artifacts
|
|
96
|
+
local stale_count=0
|
|
97
|
+
for stale in "$plans_dir"/prd-"$slug"-US-*.md(N); do
|
|
98
|
+
rm "$stale"; stale_count=$(( stale_count + 1 ))
|
|
99
|
+
done
|
|
100
|
+
[[ $stale_count -gt 0 ]] && echo " Cleaned $stale_count stale prd per-US file(s)"
|
|
101
|
+
return 0
|
|
102
|
+
fi
|
|
103
|
+
|
|
104
|
+
awk -v dir="$plans_dir" -v slug="$slug" '
|
|
105
|
+
/^### US-[0-9]+:/ {
|
|
106
|
+
if (out != "") close(out)
|
|
107
|
+
match($0, /US-[0-9]+/)
|
|
108
|
+
us_id = substr($0, RSTART, RLENGTH)
|
|
109
|
+
out = dir "/prd-" slug "-" us_id ".md"
|
|
110
|
+
}
|
|
111
|
+
out != "" { print > out }
|
|
112
|
+
' "$prd_file"
|
|
113
|
+
|
|
114
|
+
local count
|
|
115
|
+
count=$(ls "$plans_dir"/prd-"$slug"-US-*.md 2>/dev/null | wc -l | tr -d ' ')
|
|
116
|
+
echo " Split PRD: $count per-US files"
|
|
117
|
+
}
|
|
118
|
+
|
|
119
|
+
split_test_spec_by_us() {
|
|
120
|
+
local ts_file="$1"
|
|
121
|
+
local slug="$2"
|
|
122
|
+
local plans_dir
|
|
123
|
+
plans_dir="$(dirname "$ts_file")"
|
|
124
|
+
|
|
125
|
+
[[ -f "$ts_file" ]] || return 0
|
|
126
|
+
|
|
127
|
+
local us_count
|
|
128
|
+
us_count=$(grep -c "^## US-" "$ts_file" 2>/dev/null) || us_count=0
|
|
129
|
+
if [[ "$us_count" -eq 0 ]]; then
|
|
130
|
+
echo " WARNING: No US section markers (## US-NNN:) in test-spec — skipping split" >&2
|
|
131
|
+
# Clean up any stale per-US test-spec files from previous runs
|
|
132
|
+
for stale in "$plans_dir"/test-spec-"$slug"-US-*.md(N); do
|
|
133
|
+
rm "$stale"
|
|
134
|
+
done
|
|
135
|
+
return 0
|
|
136
|
+
fi
|
|
137
|
+
|
|
138
|
+
# Extract global header (everything before first ## US- section, e.g. Verification Commands)
|
|
139
|
+
local header_tmp="${plans_dir}/test-spec-${slug}-header.tmp.$$"
|
|
140
|
+
awk '/^## US-[0-9]+:/{exit} {print}' "$ts_file" > "$header_tmp"
|
|
141
|
+
|
|
142
|
+
awk -v dir="$plans_dir" -v slug="$slug" '
|
|
143
|
+
/^## US-[0-9]+:/ {
|
|
144
|
+
if (out != "") close(out)
|
|
145
|
+
match($0, /US-[0-9]+/)
|
|
146
|
+
us_id = substr($0, RSTART, RLENGTH)
|
|
147
|
+
out = dir "/test-spec-" slug "-" us_id ".md"
|
|
148
|
+
}
|
|
149
|
+
out != "" { print > out }
|
|
150
|
+
' "$ts_file"
|
|
151
|
+
|
|
152
|
+
# Prepend global header (Verification Commands etc.) to each split file
|
|
153
|
+
for split_file in "$plans_dir"/test-spec-"$slug"-US-*.md; do
|
|
154
|
+
[[ -f "$split_file" ]] || continue
|
|
155
|
+
local tmp="${split_file}.tmp.$$"
|
|
156
|
+
cat "$header_tmp" "$split_file" > "$tmp" && mv "$tmp" "$split_file"
|
|
157
|
+
done
|
|
158
|
+
rm -f "$header_tmp"
|
|
159
|
+
|
|
160
|
+
local count
|
|
161
|
+
count=$(ls "$plans_dir"/test-spec-"$slug"-US-*.md 2>/dev/null | wc -l | tr -d ' ')
|
|
162
|
+
echo " Split test-spec: $count per-US files (with global header)"
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
# --- Run command presets ---
|
|
166
|
+
# Detects codex CLI availability and shows appropriate run command presets.
|
|
167
|
+
# AC1: codex installed → cross-engine preset first, spark Pro, claude-only, basic
|
|
168
|
+
# AC2: codex not installed → tmux + claude-only first, install recommendation
|
|
169
|
+
# AC3: full options reference with defaults always shown
|
|
170
|
+
print_run_presets() {
|
|
171
|
+
local slug="$1"
|
|
172
|
+
local codex_available=0
|
|
173
|
+
command -v codex &>/dev/null && codex_available=1
|
|
174
|
+
|
|
175
|
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
|
176
|
+
echo "Available run commands (copy the one you want):"
|
|
177
|
+
echo ""
|
|
178
|
+
if [[ $codex_available -eq 1 ]]; then
|
|
179
|
+
echo "# Recommended: cross-engine + final-consensus (cost savings + blind-spot coverage):"
|
|
180
|
+
echo "/rlp-desk run $slug --worker-model gpt-5.4:high --final-consensus --debug"
|
|
181
|
+
echo ""
|
|
182
|
+
echo "# Spark Pro preset (fast codex worker, lower cost):"
|
|
183
|
+
echo "/rlp-desk run $slug --worker-model gpt-5.3-codex-spark:high --debug"
|
|
184
|
+
echo ""
|
|
185
|
+
echo "# Claude-only:"
|
|
186
|
+
echo "/rlp-desk run $slug --debug"
|
|
187
|
+
echo ""
|
|
188
|
+
echo "# Basic agent:"
|
|
189
|
+
echo "/rlp-desk run $slug"
|
|
190
|
+
else
|
|
191
|
+
echo "# Recommended: tmux mode + claude-only (real-time visibility):"
|
|
192
|
+
echo "/rlp-desk run $slug --mode tmux --debug"
|
|
193
|
+
echo ""
|
|
194
|
+
echo "# Agent mode:"
|
|
195
|
+
echo "/rlp-desk run $slug --debug"
|
|
196
|
+
echo ""
|
|
197
|
+
echo "# Install codex for cost savings + cross-engine blind-spot coverage:"
|
|
198
|
+
echo "npm install -g @openai/codex"
|
|
199
|
+
fi
|
|
200
|
+
echo ""
|
|
201
|
+
echo "# Full options reference:"
|
|
202
|
+
echo "# --mode agent|tmux (default: agent)"
|
|
203
|
+
echo "# --worker-model MODEL haiku|sonnet|opus or gpt-5.4:low|medium|high (default: sonnet)"
|
|
204
|
+
echo "# --verifier-model MODEL haiku|sonnet|opus (default: opus)"
|
|
205
|
+
echo "# --verify-consensus both claude+codex must pass"
|
|
206
|
+
echo "# --verify-mode per-us|batch (default: per-us)"
|
|
207
|
+
echo "# --max-iter N (default: 100)"
|
|
208
|
+
echo "# --debug enable debug logging"
|
|
209
|
+
echo "# --with-self-verification post-campaign analysis report"
|
|
210
|
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
|
211
|
+
}
|
|
212
|
+
|
|
20
213
|
echo "Initializing Ralph Desk: $SLUG"
|
|
21
214
|
echo " Root: $ROOT"
|
|
22
215
|
echo " Desk: $DESK"
|
|
216
|
+
[[ -n "$MODE" ]] && echo " Mode: $MODE"
|
|
23
217
|
echo ""
|
|
24
218
|
|
|
25
219
|
mkdir -p "$DESK/prompts" "$DESK/context" "$DESK/memos" "$DESK/plans" "$DESK/logs/$SLUG"
|
|
26
220
|
|
|
221
|
+
# --- Re-execution lifecycle (--mode handling) ---
|
|
222
|
+
PRD_FILE="$DESK/plans/prd-$SLUG.md"
|
|
223
|
+
LOGS_DIR="$DESK/logs/$SLUG"
|
|
224
|
+
|
|
225
|
+
if [[ -n "$MODE" ]]; then
|
|
226
|
+
echo "Re-execution mode: --mode $MODE"
|
|
227
|
+
echo ""
|
|
228
|
+
|
|
229
|
+
DELETED_COUNT=0
|
|
230
|
+
|
|
231
|
+
# Version debug.log and campaign-report.md (NOT self-verification-report — uses -NNN)
|
|
232
|
+
version_file "$LOGS_DIR/debug.log"
|
|
233
|
+
version_file "$LOGS_DIR/campaign-report.md"
|
|
234
|
+
|
|
235
|
+
# Delete iter-* artifacts (archived done-claims, verdicts, prompt logs, results)
|
|
236
|
+
for f in "$LOGS_DIR"/iter-*(N); do
|
|
237
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
238
|
+
done
|
|
239
|
+
|
|
240
|
+
# Delete runtime memos
|
|
241
|
+
for f in \
|
|
242
|
+
"$DESK/memos/$SLUG-done-claim.json" \
|
|
243
|
+
"$DESK/memos/$SLUG-iter-signal.json" \
|
|
244
|
+
"$DESK/memos/$SLUG-verify-verdict.json" \
|
|
245
|
+
"$DESK/memos/$SLUG-complete.md" \
|
|
246
|
+
"$DESK/memos/$SLUG-blocked.md"; do
|
|
247
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
248
|
+
done
|
|
249
|
+
|
|
250
|
+
# Delete status.json, baseline.log, cost-log.jsonl
|
|
251
|
+
for f in "$LOGS_DIR/runtime/status.json" "$LOGS_DIR/status.json" "$LOGS_DIR/baseline.log" "$LOGS_DIR/cost-log.jsonl"; do
|
|
252
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
253
|
+
done
|
|
254
|
+
|
|
255
|
+
# Delete test-spec only for fresh re-execution mode; improve preserves custom edits
|
|
256
|
+
# and reruns split logic on the existing file.
|
|
257
|
+
for f in \
|
|
258
|
+
"$DESK/plans/test-spec-$SLUG.md" \
|
|
259
|
+
"$DESK/prompts/$SLUG.worker.prompt.md" \
|
|
260
|
+
"$DESK/prompts/$SLUG.verifier.prompt.md"; do
|
|
261
|
+
[[ -f "$f" ]] &&
|
|
262
|
+
if [[ "$MODE" == "fresh" ]] || [[ "$f" != "$DESK/plans/test-spec-$SLUG.md" ]]; then
|
|
263
|
+
rm "$f"; (( ++DELETED_COUNT ));
|
|
264
|
+
fi
|
|
265
|
+
done
|
|
266
|
+
|
|
267
|
+
# Reset memory and context to fresh templates (rm here; scaffold below regenerates them)
|
|
268
|
+
rm -f "$DESK/memos/$SLUG-memory.md" "$DESK/context/$SLUG-latest.md"
|
|
269
|
+
|
|
270
|
+
# PRD handling: --mode fresh deletes PRD; --mode improve preserves PRD in-place
|
|
271
|
+
if [[ "$MODE" == "fresh" ]]; then
|
|
272
|
+
[[ -f "$PRD_FILE" ]] && { rm "$PRD_FILE"; (( ++DELETED_COUNT )); echo " Deleted: prd-$SLUG.md (--mode fresh: PRD deleted for fresh start)"; }
|
|
273
|
+
fi
|
|
274
|
+
|
|
275
|
+
# Re-execution summary
|
|
276
|
+
echo " Re-execution summary:"
|
|
277
|
+
if [[ "$MODE" == "improve" ]]; then
|
|
278
|
+
echo " Preserved: prd-$SLUG.md (--mode improve: PRD kept in-place)"
|
|
279
|
+
fi
|
|
280
|
+
echo " Deleted: $DELETED_COUNT runtime artifacts"
|
|
281
|
+
echo " Reset: memory.md + context.md (regenerating from templates)"
|
|
282
|
+
echo ""
|
|
283
|
+
fi
|
|
284
|
+
|
|
27
285
|
# --- Worker Prompt ---
|
|
28
286
|
F="$DESK/prompts/$SLUG.worker.prompt.md"
|
|
29
287
|
if [[ ! -f "$F" ]]; then
|
|
@@ -64,6 +322,8 @@ Read these files in order:
|
|
|
64
322
|
- Do not say "I'm confident" — confidence is not evidence.
|
|
65
323
|
- Do not say "existing code has no tests" — you are improving it, add tests.
|
|
66
324
|
- Do not write code before tests — if you did, delete it and start with tests.
|
|
325
|
+
- **NEVER modify rlp-desk infrastructure files** (~/.claude/ralph-desk/*, ~/.claude/commands/rlp-desk.md). If you discover a bug in rlp-desk itself, report it in done-claim.json with {"status": "blocked", "reason": "rlp-desk bug: <description>"} and signal blocked. Do NOT attempt to fix rlp-desk — it is the orchestration tool, not your project code.
|
|
326
|
+
- **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
|
|
67
327
|
|
|
68
328
|
## Iteration rules
|
|
69
329
|
- Use fresh context only; do NOT depend on prior chat history.
|
|
@@ -169,6 +429,15 @@ Check the iter-signal.json "us_id" field:
|
|
|
169
429
|
- Test-specific logic: no environment-detection patterns
|
|
170
430
|
- "Code inspection" claims: Worker must run actual commands
|
|
171
431
|
- Tautological tests: expected values that mirror implementation logic
|
|
432
|
+
10¼. **Anti-Rubber-Stamp Self-Check**:
|
|
433
|
+
- If your verdict history shows a 100% pass rate, re-examine your last verdict with increased scrutiny — a 100% pass rate is a red flag for insufficient rigor
|
|
434
|
+
- When issuing PASS with explicit warning: note any concerning patterns (e.g., low test diversity, marginal coverage) even if technically passing
|
|
435
|
+
- Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
|
|
436
|
+
10½. **Worker Process Audit**:
|
|
437
|
+
- Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
|
|
438
|
+
- RED phase evidence: at least one verify_red step with exit_code=1 per AC (proves tests were written before passing)
|
|
439
|
+
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "already manually tested", "partial check")
|
|
440
|
+
- Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
|
|
172
441
|
11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
|
|
173
442
|
12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
|
|
174
443
|
|
|
@@ -185,7 +454,8 @@ Verdict JSON:
|
|
|
185
454
|
{"check": "IL-1 Evidence Gate", "decision": "pass|fail", "basis": "what command was run, what output confirmed the decision"},
|
|
186
455
|
{"check": "Layer Enforcement", "decision": "pass|fail", "basis": "which layers checked, any TODO found"},
|
|
187
456
|
{"check": "Test Sufficiency", "decision": "pass|fail", "basis": "test count per AC, category coverage"},
|
|
188
|
-
{"check": "Anti-Gaming", "decision": "pass|fail", "basis": "what was checked, any suspicious patterns"}
|
|
457
|
+
{"check": "Anti-Gaming", "decision": "pass|fail", "basis": "what was checked, any suspicious patterns"},
|
|
458
|
+
{"check": "Worker Process Audit", "decision": "pass|fail", "basis": "test-first followed: verify_red present per AC, no forbidden shortcuts in claims, execution_steps complete"}
|
|
189
459
|
],
|
|
190
460
|
"layer_status": {"L1":"pass|fail|todo|na","L2":"pass|fail|todo|na","L3":"pass|fail|todo|na","L4":"pass|fail|todo|na"},
|
|
191
461
|
"test_quality": {"test_count":0,"ac_count":0,"sufficiency":"pass|fail","anti_patterns_found":[]},
|
|
@@ -298,6 +568,9 @@ EOF
|
|
|
298
568
|
echo " + $F"
|
|
299
569
|
else echo " · $F"; fi
|
|
300
570
|
|
|
571
|
+
# Split PRD into per-US files (no-op with warning if no US markers)
|
|
572
|
+
split_prd_by_us "$DESK/plans/prd-$SLUG.md" "$SLUG"
|
|
573
|
+
|
|
301
574
|
# --- Test Spec ---
|
|
302
575
|
F="$DESK/plans/test-spec-$SLUG.md"
|
|
303
576
|
if [[ ! -f "$F" ]]; then
|
|
@@ -451,6 +724,9 @@ EOF
|
|
|
451
724
|
echo " + $F"
|
|
452
725
|
else echo " · $F"; fi
|
|
453
726
|
|
|
727
|
+
# Split test-spec into per-US files (no-op with warning if no US section markers)
|
|
728
|
+
split_test_spec_by_us "$DESK/plans/test-spec-$SLUG.md" "$SLUG"
|
|
729
|
+
|
|
454
730
|
# --- .gitignore for runtime artifacts ---
|
|
455
731
|
GITIGNORE="$ROOT/.gitignore"
|
|
456
732
|
MARKER="# RLP Desk runtime artifacts"
|
|
@@ -473,6 +749,53 @@ GIEOF
|
|
|
473
749
|
echo " + .gitignore (created with rlp-desk rules)"
|
|
474
750
|
fi
|
|
475
751
|
|
|
752
|
+
# --- Claude Code sensitive-file permissions for .claude/ralph-desk/ ---
|
|
753
|
+
# Worker/Verifier need Read/Edit/Write access to .claude/ralph-desk/ files.
|
|
754
|
+
# --dangerously-skip-permissions does NOT cover "sensitive file" access for .claude/ paths.
|
|
755
|
+
# Without these, every file operation triggers an interactive permission prompt that blocks automation.
|
|
756
|
+
SETTINGS_FILE="$ROOT/.claude/settings.local.json"
|
|
757
|
+
PERM_MARKER="Read(.claude/ralph-desk/**)"
|
|
758
|
+
|
|
759
|
+
if [[ -f "$SETTINGS_FILE" ]] && grep -qF "$PERM_MARKER" "$SETTINGS_FILE" 2>/dev/null; then
|
|
760
|
+
echo " · .claude/settings.local.json (rlp-desk permissions already present)"
|
|
761
|
+
else
|
|
762
|
+
PERMS='["Read(.claude/ralph-desk/**)", "Edit(.claude/ralph-desk/**)", "Write(.claude/ralph-desk/**)"]'
|
|
763
|
+
|
|
764
|
+
if [[ -f "$SETTINGS_FILE" ]]; then
|
|
765
|
+
if command -v jq &>/dev/null; then
|
|
766
|
+
jq --argjson perms "$PERMS" '
|
|
767
|
+
.permissions //= {} |
|
|
768
|
+
.permissions.allow //= [] |
|
|
769
|
+
.permissions.allow += ($perms - .permissions.allow)
|
|
770
|
+
' "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE"
|
|
771
|
+
echo " + .claude/settings.local.json (rlp-desk permissions merged)"
|
|
772
|
+
else
|
|
773
|
+
echo " ⚠ jq not found. Add to .claude/settings.local.json manually:"
|
|
774
|
+
echo " permissions.allow: Read/Edit/Write(.claude/ralph-desk/**)"
|
|
775
|
+
fi
|
|
776
|
+
else
|
|
777
|
+
mkdir -p "$(dirname "$SETTINGS_FILE")"
|
|
778
|
+
cat > "$SETTINGS_FILE" <<'SETEOF'
|
|
779
|
+
{
|
|
780
|
+
"permissions": {
|
|
781
|
+
"allow": [
|
|
782
|
+
"Read(.claude/ralph-desk/**)",
|
|
783
|
+
"Edit(.claude/ralph-desk/**)",
|
|
784
|
+
"Write(.claude/ralph-desk/**)"
|
|
785
|
+
]
|
|
786
|
+
}
|
|
787
|
+
}
|
|
788
|
+
SETEOF
|
|
789
|
+
echo " + .claude/settings.local.json (created with rlp-desk permissions)"
|
|
790
|
+
fi
|
|
791
|
+
echo ""
|
|
792
|
+
echo " NOTE: Added Read/Edit/Write permissions for .claude/ralph-desk/ to"
|
|
793
|
+
echo " .claude/settings.local.json (local, not committed to git)."
|
|
794
|
+
echo " This prevents Worker/Verifier from being blocked by Claude Code's"
|
|
795
|
+
echo " sensitive-file prompts during automated loop execution."
|
|
796
|
+
echo " See: https://github.com/ai-dev-methodologies/rlp-desk#project-structure"
|
|
797
|
+
fi
|
|
798
|
+
|
|
476
799
|
# --- Post-init validation gate ---
|
|
477
800
|
INIT_FAIL=0
|
|
478
801
|
for REQUIRED_FILE in \
|
|
@@ -501,13 +824,6 @@ echo ""
|
|
|
501
824
|
echo "Next:"
|
|
502
825
|
echo " 1. Edit PRD: $DESK/plans/prd-$SLUG.md"
|
|
503
826
|
echo " 2. Edit test spec: $DESK/plans/test-spec-$SLUG.md"
|
|
504
|
-
echo " 3. Run:"
|
|
827
|
+
echo " 3. Run (copy a command below):"
|
|
505
828
|
echo ""
|
|
506
|
-
|
|
507
|
-
echo " PROMPT_FILE=$DESK/prompts/$SLUG.worker.prompt.md \\"
|
|
508
|
-
echo " VERIFIER_PROMPT_FILE=$DESK/prompts/$SLUG.verifier.prompt.md \\"
|
|
509
|
-
echo " CONTEXT_FILE=$DESK/context/$SLUG-latest.md \\"
|
|
510
|
-
echo " EXTRA_REQUIRED_FILES=$DESK/plans/prd-$SLUG.md:$DESK/plans/test-spec-$SLUG.md:$DESK/memos/$SLUG-memory.md \\"
|
|
511
|
-
echo " MAX_ITER=20 \\"
|
|
512
|
-
echo " $RUNNER_DIR/run_ralph_desk.zsh"
|
|
513
|
-
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
|
829
|
+
print_run_presets "$SLUG"
|