@ai-dev-methodologies/rlp-desk 0.3.6 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/blueprints/blueprint-v0.4-evolution.md +347 -0
- package/docs/prompts/ralplan-codex-review.md +55 -0
- package/package.json +1 -1
- package/src/commands/rlp-desk.md +62 -22
- package/src/governance.md +39 -22
- package/src/scripts/init_ralph_desk.zsh +139 -4
- package/src/scripts/run_ralph_desk.zsh +358 -80
package/src/governance.md
CHANGED
|
@@ -373,28 +373,34 @@ Characteristics:
|
|
|
373
373
|
```
|
|
374
374
|
.claude/ralph-desk/
|
|
375
375
|
├── prompts/
|
|
376
|
-
│ ├── <slug>.worker.prompt.md # Worker base prompt
|
|
377
|
-
│ └── <slug>.verifier.prompt.md # Verifier base prompt
|
|
376
|
+
│ ├── <slug>.worker.prompt.md # Worker base prompt (regenerated on re-execution)
|
|
377
|
+
│ └── <slug>.verifier.prompt.md # Verifier base prompt (regenerated on re-execution)
|
|
378
378
|
├── context/
|
|
379
|
-
│ └── <slug>-latest.md # Current frontier
|
|
379
|
+
│ └── <slug>-latest.md # Current frontier; Worker updates (reset to template on re-execution)
|
|
380
380
|
├── memos/
|
|
381
|
-
│ ├── <slug>-memory.md # Campaign memory
|
|
382
|
-
│ ├── <slug>-done-claim.json # Worker's completion claim (runtime)
|
|
383
|
-
│ ├── <slug>-iter-signal.json # Worker's iteration signal (runtime)
|
|
384
|
-
│ ├── <slug>-verify-verdict.json # Verifier's verdict (runtime)
|
|
385
|
-
│ ├── <slug>-escalation.md
|
|
386
|
-
│ ├── <slug>-complete.md # SENTINEL (Leader only)
|
|
387
|
-
│ └── <slug>-blocked.md # SENTINEL (Leader only)
|
|
381
|
+
│ ├── <slug>-memory.md # Campaign memory; Worker updates (reset to template on re-execution)
|
|
382
|
+
│ ├── <slug>-done-claim.json # Worker's completion claim (runtime; deleted on re-execution)
|
|
383
|
+
│ ├── <slug>-iter-signal.json # Worker's iteration signal (runtime; deleted on re-execution)
|
|
384
|
+
│ ├── <slug>-verify-verdict.json # Verifier's verdict (runtime; deleted on re-execution)
|
|
385
|
+
│ ├── <slug>-escalation.md # Architecture escalation report (tmux mode, §7¾; deleted on re-execution)
|
|
386
|
+
│ ├── <slug>-complete.md # SENTINEL (Leader only; deleted on re-execution)
|
|
387
|
+
│ └── <slug>-blocked.md # SENTINEL (Leader only; deleted on re-execution)
|
|
388
388
|
├── plans/
|
|
389
|
-
│ ├── prd-<slug>.md # PRD (
|
|
390
|
-
│ └── test-spec-<slug>.md # Verification criteria
|
|
389
|
+
│ ├── prd-<slug>.md # PRD (in-place: --mode improve | deleted: --mode fresh)
|
|
390
|
+
│ └── test-spec-<slug>.md # Verification criteria (regenerated on re-execution)
|
|
391
391
|
└── logs/<slug>/
|
|
392
|
-
├──
|
|
393
|
-
├──
|
|
394
|
-
├── iter-NNN.
|
|
395
|
-
├──
|
|
396
|
-
├──
|
|
397
|
-
|
|
392
|
+
├── debug.log # Debug output (versioned: debug-v{N}.log on re-execution)
|
|
393
|
+
├── campaign-report.md # Campaign summary (versioned: campaign-report-v{N}.md on re-execution)
|
|
394
|
+
├── iter-NNN.worker-prompt.md # Audit trail prompt copy (deleted on re-execution)
|
|
395
|
+
├── iter-NNN.verifier-prompt.md # Audit trail prompt copy (deleted on re-execution)
|
|
396
|
+
├── iter-NNN.result.md # Iteration result (deleted on re-execution)
|
|
397
|
+
├── iter-NNN-done-claim.json # Archived done-claim per iteration (deleted on re-execution)
|
|
398
|
+
├── iter-NNN-verify-verdict.json # Archived verdict per iteration (deleted on re-execution)
|
|
399
|
+
├── self-verification-data.json # Cumulative SV data (--with-self-verification; deleted on re-execution)
|
|
400
|
+
├── self-verification-report-NNN.md # Versioned SV report (-NNN auto-increment; NOT versioned via version_file)
|
|
401
|
+
├── status.json # Leader's loop state (deleted on re-execution)
|
|
402
|
+
├── baseline.log # Baseline capture (deleted on re-execution)
|
|
403
|
+
└── cost-log.jsonl # Per-iteration cost log (deleted on re-execution)
|
|
398
404
|
```
|
|
399
405
|
|
|
400
406
|
## 7. Leader Loop Protocol
|
|
@@ -443,8 +449,19 @@ for iteration in 1..max_iter:
|
|
|
443
449
|
• fail + continue → go to ⑧
|
|
444
450
|
• blocked → write BLOCKED sentinel, stop
|
|
445
451
|
|
|
452
|
+
⑦d Archive iteration artifacts (after verdict read, before next prep)
|
|
453
|
+
- Archive done-claim.json → logs/<slug>/iter-NNN-done-claim.json
|
|
454
|
+
- Archive verify-verdict.json → logs/<slug>/iter-NNN-verify-verdict.json
|
|
455
|
+
(Preserved across clean; data source for Campaign Report and SV analysis)
|
|
456
|
+
|
|
446
457
|
⑧ Write iter-NNN.result.md to logs/<slug>/ (result status + git diff --stat)
|
|
447
458
|
Update status.json, report to user, continue to next iteration
|
|
459
|
+
|
|
460
|
+
After loop end (COMPLETE, BLOCKED, TIMEOUT):
|
|
461
|
+
⑧½ Campaign Report (always — independent of --debug)
|
|
462
|
+
- Generate logs/<slug>/campaign-report.md with 8 sections
|
|
463
|
+
- Version existing report to campaign-report-v{N}.md before writing new
|
|
464
|
+
- Data: status.json (baseline_commit, per-iter), archived iter artifacts, PRD, git diff
|
|
448
465
|
```
|
|
449
466
|
|
|
450
467
|
## 7a. Per-US Verification
|
|
@@ -480,7 +497,7 @@ Worker completes US → signal verify
|
|
|
480
497
|
→ Codex Verifier runs (checks AC)
|
|
481
498
|
→ Both pass → proceed (next US or COMPLETE)
|
|
482
499
|
→ Either fails → combined issues → fix contract → Worker retry
|
|
483
|
-
→ Max
|
|
500
|
+
→ Max 6 consensus rounds per US → BLOCKED if still disagreeing
|
|
484
501
|
|
|
485
502
|
**NO ENGINE PRIORITY:** Claude and Codex have equal weight. If one passes and the other fails, the verdict is FAIL. No engine may be prioritized or dismissed. Infrastructure failure = CLI crash, timeout, or verdict file not generated — NOT a valid verdict with verdict=fail.
|
|
486
503
|
```
|
|
@@ -518,12 +535,12 @@ Every change must be justified by the issue it addresses.
|
|
|
518
535
|
|
|
519
536
|
## 7¾. Architecture Escalation
|
|
520
537
|
|
|
521
|
-
Note: Circuit Breaker (§8) fires first at 2 consecutive failures (model upgrade + retry). If the retry also fails (
|
|
538
|
+
Note: Circuit Breaker (§8) fires first at 2 consecutive failures (model upgrade + retry — Path A: Agent-mode only; in tmux mode the shell CB triggers directly without model upgrade). If the retry also fails (`cb_threshold` reached), Architecture Escalation applies. The CB retry counts toward the consecutive_failures counter.
|
|
522
539
|
|
|
523
|
-
If
|
|
540
|
+
If `cb_threshold` or more consecutive fix attempts fail for the same US:
|
|
524
541
|
|
|
525
542
|
1. **STOP fixing symptoms** — the problem is likely architectural, not a bug.
|
|
526
|
-
2. **Leader reports to user**: "
|
|
543
|
+
2. **Leader reports to user**: "`cb_threshold` consecutive fix attempts failed for US-{id}. This suggests an architectural issue, not a simple bug."
|
|
527
544
|
3. **Include in report**:
|
|
528
545
|
- What was attempted in each fix
|
|
529
546
|
- What specifically kept failing
|
|
@@ -8,22 +8,150 @@ set -euo pipefail
|
|
|
8
8
|
# Creates project-local scaffold in: .claude/ralph-desk/
|
|
9
9
|
#
|
|
10
10
|
# Usage:
|
|
11
|
-
# ~/.claude/ralph-desk/init_ralph_desk.zsh <slug> [objective]
|
|
11
|
+
# ~/.claude/ralph-desk/init_ralph_desk.zsh <slug> [objective] [--mode fresh|improve]
|
|
12
12
|
# =============================================================================
|
|
13
13
|
|
|
14
|
-
SLUG="${1:?Usage: $0 <slug> [objective]}"
|
|
15
|
-
|
|
14
|
+
SLUG="${1:?Usage: $0 <slug> [objective] [--mode fresh|improve]}"
|
|
15
|
+
MODE=""
|
|
16
|
+
OBJECTIVE="TBD - fill in the objective"
|
|
17
|
+
|
|
18
|
+
# Parse remaining arguments: --mode fresh|improve + optional positional objective
|
|
19
|
+
shift
|
|
20
|
+
while [[ $# -gt 0 ]]; do
|
|
21
|
+
case "$1" in
|
|
22
|
+
--mode)
|
|
23
|
+
MODE="${2:?--mode requires an argument: fresh|improve}"
|
|
24
|
+
shift 2
|
|
25
|
+
;;
|
|
26
|
+
--mode=*)
|
|
27
|
+
MODE="${1#--mode=}"
|
|
28
|
+
shift
|
|
29
|
+
;;
|
|
30
|
+
*)
|
|
31
|
+
OBJECTIVE="$1"
|
|
32
|
+
shift
|
|
33
|
+
;;
|
|
34
|
+
esac
|
|
35
|
+
done
|
|
36
|
+
|
|
16
37
|
ROOT="${ROOT:-$PWD}"
|
|
17
38
|
DESK="$ROOT/.claude/ralph-desk"
|
|
18
39
|
RUNNER_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
19
40
|
|
|
41
|
+
# --- Re-execution versioning helpers ---
|
|
42
|
+
# Handles ONLY debug.log and campaign-report.md versioning.
|
|
43
|
+
# SV reports use their own -NNN auto-increment pattern and are NOT handled here.
|
|
44
|
+
|
|
45
|
+
detect_next_version() {
|
|
46
|
+
local file_path="$1"
|
|
47
|
+
local dir base ext n=1
|
|
48
|
+
dir="$(dirname "$file_path")"
|
|
49
|
+
base="$(basename "$file_path")"
|
|
50
|
+
if [[ "$base" == *.* ]]; then
|
|
51
|
+
ext=".${base##*.}"
|
|
52
|
+
base="${base%.*}"
|
|
53
|
+
else
|
|
54
|
+
ext=""
|
|
55
|
+
fi
|
|
56
|
+
while [[ -f "$dir/${base}-v${n}${ext}" ]]; do
|
|
57
|
+
(( n++ ))
|
|
58
|
+
done
|
|
59
|
+
echo "$n"
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
version_file() {
|
|
63
|
+
local file_path="$1"
|
|
64
|
+
if [[ -f "$file_path" ]]; then
|
|
65
|
+
local n dir base ext
|
|
66
|
+
n="$(detect_next_version "$file_path")"
|
|
67
|
+
dir="$(dirname "$file_path")"
|
|
68
|
+
base="$(basename "$file_path")"
|
|
69
|
+
if [[ "$base" == *.* ]]; then
|
|
70
|
+
ext=".${base##*.}"
|
|
71
|
+
base="${base%.*}"
|
|
72
|
+
else
|
|
73
|
+
ext=""
|
|
74
|
+
fi
|
|
75
|
+
mv "$file_path" "$dir/${base}-v${n}${ext}"
|
|
76
|
+
echo " Versioned: $(basename "$file_path") → ${base}-v${n}${ext}"
|
|
77
|
+
fi
|
|
78
|
+
# Non-existent files silently skipped (no error)
|
|
79
|
+
}
|
|
80
|
+
|
|
20
81
|
echo "Initializing Ralph Desk: $SLUG"
|
|
21
82
|
echo " Root: $ROOT"
|
|
22
83
|
echo " Desk: $DESK"
|
|
84
|
+
[[ -n "$MODE" ]] && echo " Mode: $MODE"
|
|
23
85
|
echo ""
|
|
24
86
|
|
|
25
87
|
mkdir -p "$DESK/prompts" "$DESK/context" "$DESK/memos" "$DESK/plans" "$DESK/logs/$SLUG"
|
|
26
88
|
|
|
89
|
+
# --- Re-execution lifecycle (--mode handling) ---
|
|
90
|
+
PRD_FILE="$DESK/plans/prd-$SLUG.md"
|
|
91
|
+
LOGS_DIR="$DESK/logs/$SLUG"
|
|
92
|
+
|
|
93
|
+
if [[ -n "$MODE" ]] && [[ -f "$PRD_FILE" ]]; then
|
|
94
|
+
echo "Re-execution mode: --mode $MODE"
|
|
95
|
+
echo ""
|
|
96
|
+
|
|
97
|
+
DELETED_COUNT=0
|
|
98
|
+
|
|
99
|
+
# Version debug.log and campaign-report.md (NOT self-verification-report — uses -NNN)
|
|
100
|
+
version_file "$LOGS_DIR/debug.log"
|
|
101
|
+
version_file "$LOGS_DIR/campaign-report.md"
|
|
102
|
+
|
|
103
|
+
# Delete iter-* artifacts (archived done-claims, verdicts, prompt logs, results)
|
|
104
|
+
for f in "$LOGS_DIR"/iter-*(N); do
|
|
105
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
106
|
+
done
|
|
107
|
+
|
|
108
|
+
# Delete runtime memos
|
|
109
|
+
for f in \
|
|
110
|
+
"$DESK/memos/$SLUG-done-claim.json" \
|
|
111
|
+
"$DESK/memos/$SLUG-iter-signal.json" \
|
|
112
|
+
"$DESK/memos/$SLUG-verify-verdict.json" \
|
|
113
|
+
"$DESK/memos/$SLUG-complete.md" \
|
|
114
|
+
"$DESK/memos/$SLUG-blocked.md"; do
|
|
115
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
116
|
+
done
|
|
117
|
+
|
|
118
|
+
# Delete status.json, baseline.log, cost-log.jsonl
|
|
119
|
+
for f in "$LOGS_DIR/status.json" "$LOGS_DIR/baseline.log" "$LOGS_DIR/cost-log.jsonl"; do
|
|
120
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
121
|
+
done
|
|
122
|
+
|
|
123
|
+
# Delete test-spec and current slug's prompts (will be regenerated below)
|
|
124
|
+
for f in \
|
|
125
|
+
"$DESK/plans/test-spec-$SLUG.md" \
|
|
126
|
+
"$DESK/prompts/$SLUG.worker.prompt.md" \
|
|
127
|
+
"$DESK/prompts/$SLUG.verifier.prompt.md"; do
|
|
128
|
+
[[ -f "$f" ]] && { rm "$f"; (( ++DELETED_COUNT )); }
|
|
129
|
+
done
|
|
130
|
+
|
|
131
|
+
# Reset memory and context to fresh templates (rm here; scaffold below regenerates them)
|
|
132
|
+
rm -f "$DESK/memos/$SLUG-memory.md" "$DESK/context/$SLUG-latest.md"
|
|
133
|
+
|
|
134
|
+
# PRD handling: --mode fresh deletes PRD; --mode improve preserves PRD in-place
|
|
135
|
+
if [[ "$MODE" == "fresh" ]]; then
|
|
136
|
+
[[ -f "$PRD_FILE" ]] && { rm "$PRD_FILE"; (( ++DELETED_COUNT )); echo " Deleted: prd-$SLUG.md (--mode fresh: PRD deleted for fresh start)"; }
|
|
137
|
+
fi
|
|
138
|
+
|
|
139
|
+
# Re-execution summary
|
|
140
|
+
echo " Re-execution summary:"
|
|
141
|
+
if [[ "$MODE" == "improve" ]]; then
|
|
142
|
+
echo " Preserved: prd-$SLUG.md (--mode improve: PRD kept in-place)"
|
|
143
|
+
fi
|
|
144
|
+
echo " Deleted: $DELETED_COUNT runtime artifacts"
|
|
145
|
+
echo " Reset: memory.md + context.md (regenerating from templates)"
|
|
146
|
+
echo ""
|
|
147
|
+
|
|
148
|
+
elif [[ -n "$MODE" ]] && [[ ! -f "$PRD_FILE" ]]; then
|
|
149
|
+
# Note: --mode provided but no PRD found for this slug — treating as first run
|
|
150
|
+
echo " Note: --mode $MODE provided but no PRD found for '$SLUG' — treating as first run."
|
|
151
|
+
echo ""
|
|
152
|
+
MODE=""
|
|
153
|
+
fi
|
|
154
|
+
|
|
27
155
|
# --- Worker Prompt ---
|
|
28
156
|
F="$DESK/prompts/$SLUG.worker.prompt.md"
|
|
29
157
|
if [[ ! -f "$F" ]]; then
|
|
@@ -64,6 +192,7 @@ Read these files in order:
|
|
|
64
192
|
- Do not say "I'm confident" — confidence is not evidence.
|
|
65
193
|
- Do not say "existing code has no tests" — you are improving it, add tests.
|
|
66
194
|
- Do not write code before tests — if you did, delete it and start with tests.
|
|
195
|
+
- **NEVER modify rlp-desk infrastructure files** (~/.claude/ralph-desk/*, ~/.claude/commands/rlp-desk.md). If you discover a bug in rlp-desk itself, report it in done-claim.json with {"status": "blocked", "reason": "rlp-desk bug: <description>"} and signal blocked. Do NOT attempt to fix rlp-desk — it is the orchestration tool, not your project code.
|
|
67
196
|
|
|
68
197
|
## Iteration rules
|
|
69
198
|
- Use fresh context only; do NOT depend on prior chat history.
|
|
@@ -169,6 +298,11 @@ Check the iter-signal.json "us_id" field:
|
|
|
169
298
|
- Test-specific logic: no environment-detection patterns
|
|
170
299
|
- "Code inspection" claims: Worker must run actual commands
|
|
171
300
|
- Tautological tests: expected values that mirror implementation logic
|
|
301
|
+
10½. **Worker Process Audit**:
|
|
302
|
+
- Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
|
|
303
|
+
- RED phase evidence: at least one verify_red step with exit_code=1 per AC (proves tests were written before passing)
|
|
304
|
+
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "already manually tested", "partial check")
|
|
305
|
+
- Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
|
|
172
306
|
11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
|
|
173
307
|
12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
|
|
174
308
|
|
|
@@ -185,7 +319,8 @@ Verdict JSON:
|
|
|
185
319
|
{"check": "IL-1 Evidence Gate", "decision": "pass|fail", "basis": "what command was run, what output confirmed the decision"},
|
|
186
320
|
{"check": "Layer Enforcement", "decision": "pass|fail", "basis": "which layers checked, any TODO found"},
|
|
187
321
|
{"check": "Test Sufficiency", "decision": "pass|fail", "basis": "test count per AC, category coverage"},
|
|
188
|
-
{"check": "Anti-Gaming", "decision": "pass|fail", "basis": "what was checked, any suspicious patterns"}
|
|
322
|
+
{"check": "Anti-Gaming", "decision": "pass|fail", "basis": "what was checked, any suspicious patterns"},
|
|
323
|
+
{"check": "Worker Process Audit", "decision": "pass|fail", "basis": "test-first followed: verify_red present per AC, no forbidden shortcuts in claims, execution_steps complete"}
|
|
189
324
|
],
|
|
190
325
|
"layer_status": {"L1":"pass|fail|todo|na","L2":"pass|fail|todo|na","L3":"pass|fail|todo|na","L4":"pass|fail|todo|na"},
|
|
191
326
|
"test_quality": {"test_count":0,"ac_count":0,"sufficiency":"pass|fail","anti_patterns_found":[]},
|