@ai-dev-methodologies/rlp-desk 0.7.5 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +58 -0
- package/docs/blueprints/blueprint-pivot-step.md +137 -0
- package/docs/plans/validated-snacking-crayon.md +407 -0
- package/package.json +5 -2
- package/scripts/postinstall.js +91 -51
- package/scripts/uninstall.js +18 -9
- package/src/commands/rlp-desk.md +10 -3
- package/src/governance.md +2 -1
- package/src/node/cli/command-builder.mjs +96 -0
- package/src/node/init/campaign-initializer.mjs +235 -0
- package/src/node/polling/signal-poller.mjs +106 -0
- package/src/node/prompts/prompt-assembler.mjs +213 -0
- package/src/node/reporting/campaign-reporting.mjs +257 -0
- package/src/node/run.mjs +234 -0
- package/src/node/runner/campaign-main-loop.mjs +624 -0
- package/src/node/shared/fs.mjs +23 -0
- package/src/node/shared/paths.mjs +28 -0
- package/src/node/tmux/pane-manager.mjs +77 -0
- package/docs/blueprints/blueprint-v0.4-evolution.md +0 -347
- package/docs/prompts/ralplan-codex-review.md +0 -55
- package/docs/superpowers/plans/2026-04-06-worker-verifier-prompt-restructure.md +0 -179
- package/src/scripts/init_ralph_desk.zsh +0 -885
- package/src/scripts/lib_ralph_desk.zsh +0 -904
- package/src/scripts/run_ralph_desk.zsh +0 -2750
package/README.md
CHANGED
|
@@ -399,6 +399,64 @@ Per-US catches issues early before later stories build on broken foundations.
|
|
|
399
399
|
|
|
400
400
|
Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
|
|
401
401
|
|
|
402
|
+
## Autonomous Mode
|
|
403
|
+
|
|
404
|
+
By default, Worker and Verifier stop and ask for human input when they encounter document conflicts (e.g., PRD says one thing, test-spec says another) or ambiguous instructions. This breaks unattended execution.
|
|
405
|
+
|
|
406
|
+
**`--autonomous`** enables fully unattended campaigns:
|
|
407
|
+
|
|
408
|
+
```bash
|
|
409
|
+
/rlp-desk run my-feature --mode tmux --worker-model gpt-5.4:medium --autonomous --debug
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
### How it works
|
|
413
|
+
|
|
414
|
+
When `--autonomous` is active:
|
|
415
|
+
|
|
416
|
+
1. **PRD is the single source of truth.** Resolution priority: `PRD > test-spec > context > memory`
|
|
417
|
+
2. **No stopping for questions.** Worker and Verifier make autonomous decisions based on the priority chain
|
|
418
|
+
3. **All conflicts are logged.** Every decision is recorded in `conflict-log.jsonl` for post-campaign review
|
|
419
|
+
|
|
420
|
+
### Conflict log
|
|
421
|
+
|
|
422
|
+
Each conflict is logged as a JSONL entry in `logs/<slug>/conflict-log.jsonl`:
|
|
423
|
+
|
|
424
|
+
```json
|
|
425
|
+
{
|
|
426
|
+
"iteration": 1,
|
|
427
|
+
"us_id": "US-001",
|
|
428
|
+
"source_a": "worker-prompt",
|
|
429
|
+
"source_b": "prd",
|
|
430
|
+
"conflict": "US-00 is required by the iteration prompt but is not defined as a PRD user story.",
|
|
431
|
+
"resolution": "Followed PRD as source of truth."
|
|
432
|
+
}
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
### When to use
|
|
436
|
+
|
|
437
|
+
- **Long-running campaigns** that run overnight or while you're away
|
|
438
|
+
- **High-iteration tasks** where stopping for every ambiguity wastes hours
|
|
439
|
+
- **Well-defined PRDs** where the PRD is comprehensive and authoritative
|
|
440
|
+
|
|
441
|
+
### When NOT to use
|
|
442
|
+
|
|
443
|
+
- **Exploratory work** where you want to review each decision
|
|
444
|
+
- **Ambiguous PRDs** where conflicts indicate real design gaps that need human judgment
|
|
445
|
+
- **First run of a new project** — run without `--autonomous` first to catch PRD issues interactively
|
|
446
|
+
|
|
447
|
+
### Post-campaign review
|
|
448
|
+
|
|
449
|
+
After the campaign, review the conflict log to identify systemic issues:
|
|
450
|
+
|
|
451
|
+
```bash
|
|
452
|
+
cat .claude/ralph-desk/logs/<slug>/conflict-log.jsonl | jq .
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
Common patterns:
|
|
456
|
+
- **Repeated PRD vs test-spec conflicts** — test-spec needs updating to match PRD
|
|
457
|
+
- **Scope lock vs fix contract conflicts** — governance rules may need tuning
|
|
458
|
+
- **Missing PRD definitions** — Worker created stories not in the PRD (add them or tighten the brainstorm)
|
|
459
|
+
|
|
402
460
|
## Project Structure
|
|
403
461
|
|
|
404
462
|
After `init`, your project gets this scaffold:
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# Blueprint: Pivot Step (⑤½)
|
|
2
|
+
|
|
3
|
+
> Status: TODO — not yet implemented. Document for future development.
|
|
4
|
+
|
|
5
|
+
## Summary
|
|
6
|
+
|
|
7
|
+
Insert a Pivot Review step between Worker(⑤) and Verifier(⑦) in the Leader loop. Internalizes the core thinking framework from gstack's `plan-ceo-review` (premise challenge, forced alternatives, scope decisions) without depending on external skills.
|
|
8
|
+
|
|
9
|
+
## Problem
|
|
10
|
+
|
|
11
|
+
When a Worker repeatedly fails on the same US, the fix loop retries the same approach with progressively stronger models. This works for implementation bugs but fails for **wrong approach** problems. The current CB threshold → BLOCKED pattern wastes iterations before admitting the approach is wrong.
|
|
12
|
+
|
|
13
|
+
## Proposed Solution
|
|
14
|
+
|
|
15
|
+
### New CLI Flags
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
--pivot-mode off|every|on-fail (default: off)
|
|
19
|
+
--pivot-model MODEL (default: opus)
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
- `off`: no pivot review (current behavior)
|
|
23
|
+
- `every`: pivot review after every Worker iteration
|
|
24
|
+
- `on-fail`: pivot review only after Verifier fail verdict
|
|
25
|
+
|
|
26
|
+
### Leader Loop Change
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
Current: ① → ② → ③ → ④ → ⑤ worker → ⑥ signal → ⑦ verifier → ⑧ result
|
|
30
|
+
Proposed: ① → ② → ③ → ③½ PIVOT → ④ → ⑤ worker → ⑥ signal → ⑦ verifier → ⑧ result
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Pivot runs BEFORE Worker — it decides direction, then Worker executes that direction.
|
|
34
|
+
|
|
35
|
+
### Tmux Pane Layout (3 panes)
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
+------------------+------------------+------------------+
|
|
39
|
+
| Worker pane | Pivot pane | Verifier pane |
|
|
40
|
+
| claude/codex | claude (opus) | claude/codex |
|
|
41
|
+
| implements code | direction review | verifies result |
|
|
42
|
+
+------------------+------------------+------------------+
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Pivot pane is reused each iteration (not persistent). Leader launches pivot → waits for memory update → launches Worker in Worker pane.
|
|
46
|
+
|
|
47
|
+
### ③½ Pivot Review Step
|
|
48
|
+
|
|
49
|
+
**Agent mode:**
|
|
50
|
+
```
|
|
51
|
+
Agent(
|
|
52
|
+
description="rlp-desk pivot review iter-NNN",
|
|
53
|
+
model=<pivot_model>,
|
|
54
|
+
mode="bypassPermissions",
|
|
55
|
+
prompt=<pivot_prompt>
|
|
56
|
+
)
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
**Tmux mode:**
|
|
60
|
+
- Dedicated pivot pane (3rd pane)
|
|
61
|
+
- `DISABLE_OMC=1 claude --model opus --mcp-config '{"mcpServers":{}}' --strict-mcp-config -p "$(cat pivot-prompt.md)"`
|
|
62
|
+
- After pivot completes, verify memory updated → build Worker prompt (④) → launch Worker (⑤)
|
|
63
|
+
|
|
64
|
+
### Pivot Review Responsibilities
|
|
65
|
+
|
|
66
|
+
1. **Analyze iteration result** — what did the Worker actually produce?
|
|
67
|
+
2. **Premise challenge** — is the current approach correct? What assumptions are we making?
|
|
68
|
+
3. **Forced alternatives** — propose minimum 2 alternative approaches
|
|
69
|
+
4. **Scope decision** — EXPAND (add scope), HOLD (keep current), REDUCE (simplify)
|
|
70
|
+
5. **Update campaign memory** — rewrite Next Iteration Contract if approach changes
|
|
71
|
+
6. **Record rejected directions** — prevent future iterations from revisiting dead ends
|
|
72
|
+
|
|
73
|
+
### Pivot Prompt Template (internalized from plan-ceo-review)
|
|
74
|
+
|
|
75
|
+
```markdown
|
|
76
|
+
# Pivot Review — Iteration {N}
|
|
77
|
+
|
|
78
|
+
## Context
|
|
79
|
+
- Campaign: {slug}
|
|
80
|
+
- Current US: {us_id}
|
|
81
|
+
- Worker result: {done-claim summary}
|
|
82
|
+
- Consecutive failures on this US: {N}
|
|
83
|
+
- Previous pivot decisions: {from memory}
|
|
84
|
+
|
|
85
|
+
## Your Task
|
|
86
|
+
|
|
87
|
+
### 1. Premise Check
|
|
88
|
+
For each premise below, state whether evidence supports or contradicts it:
|
|
89
|
+
{list premises from PRD/memory}
|
|
90
|
+
|
|
91
|
+
### 2. Forced Alternatives
|
|
92
|
+
Propose at least 2 alternative approaches to the current US.
|
|
93
|
+
For each: summary, effort (S/M/L), risk, key tradeoff.
|
|
94
|
+
|
|
95
|
+
### 3. Scope Decision
|
|
96
|
+
Choose ONE: EXPAND | HOLD | REDUCE
|
|
97
|
+
Justify with evidence from this iteration.
|
|
98
|
+
|
|
99
|
+
### 4. Next Iteration Contract
|
|
100
|
+
If HOLD: refine the current contract with specific fixes.
|
|
101
|
+
If EXPAND/REDUCE: rewrite the contract for the new approach.
|
|
102
|
+
|
|
103
|
+
### 5. Rejected Directions
|
|
104
|
+
List approaches that should NOT be attempted again, with reason.
|
|
105
|
+
|
|
106
|
+
## Output
|
|
107
|
+
Update campaign memory at: {memory_path}
|
|
108
|
+
- Update "Next Iteration Contract" section
|
|
109
|
+
- Add to "Key Decisions" section
|
|
110
|
+
- Add to "Rejected Directions" section (if any)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Expected Benefits
|
|
114
|
+
|
|
115
|
+
- **Breaks fix loops** — "same approach, stronger model" → "different approach"
|
|
116
|
+
- **Research campaigns** — natural direction pivots without manual intervention
|
|
117
|
+
- **Reuses proven framework** — plan-ceo-review's premise challenge + forced alternatives
|
|
118
|
+
- **Both modes** — works in tmux and agent mode
|
|
119
|
+
|
|
120
|
+
## Implementation Notes
|
|
121
|
+
|
|
122
|
+
- `PIVOT_MODE` variable in `run_ralph_desk.zsh` (pattern: same as `AUTONOMOUS_MODE`)
|
|
123
|
+
- CLI parser: `--pivot-mode`, `--pivot-model` (pattern: same as other model flags)
|
|
124
|
+
- `write_pivot_prompt()` function in `run_ralph_desk.zsh` (pattern: same as `write_worker_trigger`)
|
|
125
|
+
- Pivot review output → campaign memory update (same file, different section)
|
|
126
|
+
- Status.json: add `pivot_decisions` array for tracking
|
|
127
|
+
- Analytics: `campaign.jsonl` add `pivot_action` field per iteration
|
|
128
|
+
|
|
129
|
+
## Dependencies
|
|
130
|
+
|
|
131
|
+
- Requires `--autonomous` mode (pivot review must not stop for questions)
|
|
132
|
+
- Works with any Worker engine (Claude or Codex)
|
|
133
|
+
- Does not require gstack installation
|
|
134
|
+
|
|
135
|
+
## Priority
|
|
136
|
+
|
|
137
|
+
Medium — implement after v1.0 Node.js rewrite is stable. Current CB threshold + model upgrade handles most cases. Pivot step is for research/exploration campaigns where approach flexibility matters.
|
|
@@ -0,0 +1,407 @@
|
|
|
1
|
+
# Plan: Worker Planning, Preset Sync, Brainstorm Exploration, Memory Bridge & Coding Principles
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
rlp-desk의 Worker/Verifier 프롬프트와 brainstorm/init 흐름에 5가지 개선을 적용한다.
|
|
6
|
+
기존 iron law 정책 체계의 후속 업데이트로, 검증된 패턴을 Worker/Verifier fresh context에 내장한다.
|
|
7
|
+
|
|
8
|
+
**문제:**
|
|
9
|
+
1. `print_run_presets()`가 rlp-desk.md 옵션 인터페이스와 desync (stale 플래그, 틀린 기본값)
|
|
10
|
+
2. Worker가 파일 읽자마자 바로 TDD로 넘어감 (계획 단계 없음)
|
|
11
|
+
3. Brainstorm이 코드 안 보고 US 제안
|
|
12
|
+
4. Brainstorm 결과가 campaign memory에 안 남음 (첫 Worker가 재발견)
|
|
13
|
+
5. Worker/Verifier가 코딩 원칙 가이드라인 없이 작동 (글로벌 CLAUDE.md 의존 불가)
|
|
14
|
+
|
|
15
|
+
**브랜치:** `improve/worker-planning-and-preset-sync`
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Changes
|
|
20
|
+
|
|
21
|
+
### Change 1: Fix Run Preset Desync
|
|
22
|
+
**File:** `src/scripts/init_ralph_desk.zsh` lines 197-238
|
|
23
|
+
|
|
24
|
+
Rewrite `print_run_presets()` to match `src/commands/rlp-desk.md` lines 142-200.
|
|
25
|
+
|
|
26
|
+
**Desync table:**
|
|
27
|
+
|
|
28
|
+
| current (init_ralph_desk.zsh) | canonical (rlp-desk.md) |
|
|
29
|
+
|---|---|
|
|
30
|
+
| `--final-consensus` (line 207) | `--consensus final-only` |
|
|
31
|
+
| `gpt-5.3-codex-spark:high` (line 210) | `spark:high` |
|
|
32
|
+
| `--verify-consensus` (line 232) | `--consensus off\|all\|final-only` |
|
|
33
|
+
| worker default `sonnet` (line 230) | `haiku` |
|
|
34
|
+
| verifier default `opus` (line 231) | per-US `sonnet`, final `opus` |
|
|
35
|
+
| Missing `--mode tmux` in recommended | Present |
|
|
36
|
+
| Missing 6 options | `--lock-worker-model`, `--consensus-model`, `--final-consensus-model`, `--cb-threshold`, `--iter-timeout`, `--final-verifier-model` |
|
|
37
|
+
|
|
38
|
+
**Action:** Replace lines 197-238 with function that mirrors rlp-desk.md lines 142-200.
|
|
39
|
+
|
|
40
|
+
### Change 2: Add Worker Planning Step
|
|
41
|
+
**Files:**
|
|
42
|
+
- `src/scripts/init_ralph_desk.zsh` Worker prompt — insert between line 316 and line 318
|
|
43
|
+
- `src/governance.md` line 217 — add `plan` to step types
|
|
44
|
+
- `src/scripts/init_ralph_desk.zsh` Verifier prompt — add audit after line 478
|
|
45
|
+
|
|
46
|
+
**Insert after line 316 ("Execute the plan for $SLUG."), before line 318 ("## Before you start"):**
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
## Planning (before writing any code)
|
|
50
|
+
After reading all files, BEFORE writing any test or code:
|
|
51
|
+
1. List the specific files you will create or modify
|
|
52
|
+
2. For each AC in the contract, state your approach in 1 sentence
|
|
53
|
+
3. Identify ordering constraints (which AC depends on which)
|
|
54
|
+
4. Record as first execution_step: {"step": "plan", "ac_id": "all", "command": null, "exit_code": null, "summary": "Plan: [files], [approach], [order]"}
|
|
55
|
+
Keep planning lightweight — 1-2 sentences per AC, not a detailed analysis.
|
|
56
|
+
If the plan reveals the contract is unclear or infeasible, signal "blocked" immediately.
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
**governance.md line 217:** Change from:
|
|
60
|
+
```
|
|
61
|
+
- Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
|
|
62
|
+
```
|
|
63
|
+
to:
|
|
64
|
+
```
|
|
65
|
+
- Step types: `plan`, `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**Verifier prompt after line 478 (Worker Process Audit):** Add:
|
|
69
|
+
```
|
|
70
|
+
- Planning step presence: done-claim execution_steps should include a `plan` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Change 3: Brainstorm Exploration Phase
|
|
74
|
+
**File:** `src/commands/rlp-desk.md` — insert between line 25 and line 26
|
|
75
|
+
|
|
76
|
+
**Insert after line 25 ("2. **Objective**") and before line 26 ("3. **User Stories**"):**
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
2.5. **Codebase Exploration** — Before proposing user stories, examine the project:
|
|
80
|
+
- Read the project's entry points, key modules, and test structure
|
|
81
|
+
- Identify architectural patterns in use (frameworks, conventions, test setup)
|
|
82
|
+
- Note constraints the Worker will encounter (dependencies, build system, existing code style)
|
|
83
|
+
- Present findings: "I explored the codebase and found: [patterns], [constraints], [existing tests]. This informs the US breakdown below."
|
|
84
|
+
- If the project is new/empty, skip this step and note "greenfield project."
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Change 4: Memory Bridge
|
|
88
|
+
**Files:**
|
|
89
|
+
- `src/commands/rlp-desk.md` line 131
|
|
90
|
+
- `src/scripts/init_ralph_desk.zsh` lines 578-580 (campaign memory template)
|
|
91
|
+
- `src/scripts/init_ralph_desk.zsh` line 355 area (Worker prompt iteration rules)
|
|
92
|
+
|
|
93
|
+
**rlp-desk.md line 131:** Change from:
|
|
94
|
+
```
|
|
95
|
+
If brainstorm was done, auto-fill PRD and test-spec with the results.
|
|
96
|
+
```
|
|
97
|
+
to:
|
|
98
|
+
```
|
|
99
|
+
If brainstorm was done, auto-fill:
|
|
100
|
+
- PRD and test-spec with the brainstorm results
|
|
101
|
+
- Campaign memory "Key Decisions" with architectural decisions from brainstorm
|
|
102
|
+
- Campaign memory "Patterns Discovered" with codebase exploration findings (from step 2.5)
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**init_ralph_desk.zsh lines 578-580:** Change from:
|
|
106
|
+
```
|
|
107
|
+
## Key Decisions
|
|
108
|
+
|
|
109
|
+
## Patterns Discovered
|
|
110
|
+
```
|
|
111
|
+
to:
|
|
112
|
+
```
|
|
113
|
+
## Key Decisions
|
|
114
|
+
(seeded from brainstorm — do not erase, only append)
|
|
115
|
+
|
|
116
|
+
## Patterns Discovered
|
|
117
|
+
(seeded from brainstorm codebase exploration — do not erase, only append)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**init_ralph_desk.zsh Worker prompt, after line 355 ("- Rewrite campaign memory in full."):** Add:
|
|
121
|
+
```
|
|
122
|
+
- When rewriting campaign memory, PRESERVE the Key Decisions and Patterns Discovered sections from prior iterations — append new entries, do not erase existing ones.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Change 5: Coding Principles (Karpathy Guidelines)
|
|
126
|
+
**Files:**
|
|
127
|
+
- `src/scripts/init_ralph_desk.zsh` Worker prompt — insert after line 316, before Change 2's Planning section
|
|
128
|
+
- `src/scripts/init_ralph_desk.zsh` Verifier prompt — insert after line 429
|
|
129
|
+
|
|
130
|
+
**Worker prompt — insert after line 316 ("Execute the plan for $SLUG."), as first section:**
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
## Coding Principles (applies to ALL work in this iteration)
|
|
134
|
+
|
|
135
|
+
1. Think Before Coding
|
|
136
|
+
Don't assume. Don't hide confusion. Surface tradeoffs.
|
|
137
|
+
- State assumptions explicitly. If uncertain, signal blocked with your options
|
|
138
|
+
listed — do not guess.
|
|
139
|
+
- If multiple interpretations exist, present them in blocked signal — do not
|
|
140
|
+
pick silently.
|
|
141
|
+
- If a simpler approach exists, note it in your plan.
|
|
142
|
+
- If something important is unclear, stop and name what is confusing.
|
|
143
|
+
|
|
144
|
+
2. Simplicity First
|
|
145
|
+
Minimum code that solves the problem. Nothing speculative.
|
|
146
|
+
- No features beyond what was asked.
|
|
147
|
+
- No abstractions for single-use code.
|
|
148
|
+
- No configurability that was not specified.
|
|
149
|
+
- No defensive handling for implausible scenarios unless the context requires it.
|
|
150
|
+
- If 200 lines could be 50, rewrite it.
|
|
151
|
+
Ask: "Would a strong senior engineer call this overcomplicated?" If yes, simplify.
|
|
152
|
+
|
|
153
|
+
3. Surgical Changes
|
|
154
|
+
Touch only what you must. Clean up only your own mess.
|
|
155
|
+
- Do not improve adjacent code, comments, or formatting unless required by the task.
|
|
156
|
+
- Do not refactor unrelated code.
|
|
157
|
+
- Match the local style unless there is a compelling reason not to.
|
|
158
|
+
- If unrelated dead code is noticed, mention it in done-claim — do not delete it.
|
|
159
|
+
- Remove imports, variables, or functions that YOUR changes made unused.
|
|
160
|
+
- Do not remove pre-existing dead code.
|
|
161
|
+
Test: every changed line should trace directly to the contract.
|
|
162
|
+
|
|
163
|
+
4. Goal-Driven Execution
|
|
164
|
+
Define success criteria. Loop until verified.
|
|
165
|
+
These principles are enforced by the TDD Mandate and Planning step below.
|
|
166
|
+
If success criteria for any AC are unclear, signal blocked.
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**Verifier prompt — insert after line 429 ("Independent verifier for Ralph Desk: $SLUG"), before line 431 ("## Iron Law"):**
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
## Verification Principles
|
|
173
|
+
|
|
174
|
+
1. Think Before Judging
|
|
175
|
+
Don't assume. Don't default to PASS or FAIL without evidence.
|
|
176
|
+
- State your assumptions about what PASS looks like for each AC before
|
|
177
|
+
checking evidence.
|
|
178
|
+
- If evidence is ambiguous or incomplete, say what is unclear and why —
|
|
179
|
+
do not default to either verdict.
|
|
180
|
+
- If multiple interpretations of an AC exist, flag it as a spec issue.
|
|
181
|
+
|
|
182
|
+
2. Goal-Driven Verification
|
|
183
|
+
Define the specific evidence required for PASS before you start checking.
|
|
184
|
+
- For each AC, state: "PASS requires [specific evidence]."
|
|
185
|
+
- Verify against that criteria, not against a general impression of code quality.
|
|
186
|
+
- If success criteria are unclear, note it in reasoning — do not invent criteria.
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Implementation Sequence
|
|
192
|
+
|
|
193
|
+
| Wave | Changes | Files | Risk |
|
|
194
|
+
|------|---------|-------|------|
|
|
195
|
+
| 1 | Change 1 (run preset desync) | init_ralph_desk.zsh | LOW |
|
|
196
|
+
| 2 | Change 5 (coding principles) | init_ralph_desk.zsh | LOW |
|
|
197
|
+
| 2 | Change 2 (planning step) | init_ralph_desk.zsh + governance.md | LOW-MED |
|
|
198
|
+
| 3 | Change 3 (brainstorm exploration) | rlp-desk.md | LOW |
|
|
199
|
+
| 3 | Change 4 (memory bridge) | rlp-desk.md + init_ralph_desk.zsh | MEDIUM |
|
|
200
|
+
|
|
201
|
+
**Order rationale:**
|
|
202
|
+
- Wave 1: Standalone bugfix, no dependencies
|
|
203
|
+
- Wave 2: Coding Principles first (top of prompt), then Planning step (uses principles). Both in init_ralph_desk.zsh Worker prompt.
|
|
204
|
+
- Wave 3: rlp-desk.md changes. Change 4 depends on Change 3 (exploration produces findings that get seeded).
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## TDD Verification Plan
|
|
209
|
+
|
|
210
|
+
Each change has tests written FIRST, verified to fail, then implementation, then re-verify.
|
|
211
|
+
|
|
212
|
+
### Test Script: `tests/test_template_generation.sh`
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
#!/bin/bash
|
|
216
|
+
# TDD tests for template generation changes
|
|
217
|
+
# Run: bash tests/test_template_generation.sh
|
|
218
|
+
set -euo pipefail
|
|
219
|
+
|
|
220
|
+
SCRIPT="src/scripts/init_ralph_desk.zsh"
|
|
221
|
+
CMD="src/commands/rlp-desk.md"
|
|
222
|
+
GOV="src/governance.md"
|
|
223
|
+
PASS=0; FAIL=0; TOTAL=0
|
|
224
|
+
|
|
225
|
+
assert_contains() {
|
|
226
|
+
local file="$1" pattern="$2" label="$3"
|
|
227
|
+
TOTAL=$((TOTAL+1))
|
|
228
|
+
if grep -q "$pattern" "$file" 2>/dev/null; then
|
|
229
|
+
echo " PASS: $label"; PASS=$((PASS+1))
|
|
230
|
+
else
|
|
231
|
+
echo " FAIL: $label (pattern not found: $pattern)"; FAIL=$((FAIL+1))
|
|
232
|
+
fi
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
assert_not_contains() {
|
|
236
|
+
local file="$1" pattern="$2" label="$3"
|
|
237
|
+
TOTAL=$((TOTAL+1))
|
|
238
|
+
if grep -q "$pattern" "$file" 2>/dev/null; then
|
|
239
|
+
echo " FAIL: $label (stale pattern still present: $pattern)"; FAIL=$((FAIL+1))
|
|
240
|
+
else
|
|
241
|
+
echo " PASS: $label"; PASS=$((PASS+1))
|
|
242
|
+
fi
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
echo "=== Change 1: Run Preset Desync ==="
|
|
246
|
+
assert_not_contains "$SCRIPT" "\-\-final-consensus" "C1: no --final-consensus"
|
|
247
|
+
assert_not_contains "$SCRIPT" "gpt-5.3-codex-spark" "C1: no gpt-5.3-codex-spark"
|
|
248
|
+
assert_not_contains "$SCRIPT" "\-\-verify-consensus" "C1: no --verify-consensus"
|
|
249
|
+
assert_contains "$SCRIPT" "\-\-consensus final-only" "C1: --consensus final-only present"
|
|
250
|
+
assert_contains "$SCRIPT" "spark:high" "C1: spark:high present"
|
|
251
|
+
assert_contains "$SCRIPT" "default: haiku" "C1: worker default haiku"
|
|
252
|
+
assert_contains "$SCRIPT" "\-\-lock-worker-model" "C1: --lock-worker-model in options"
|
|
253
|
+
assert_contains "$SCRIPT" "\-\-cb-threshold" "C1: --cb-threshold in options"
|
|
254
|
+
assert_contains "$SCRIPT" "\-\-iter-timeout" "C1: --iter-timeout in options"
|
|
255
|
+
assert_contains "$SCRIPT" "\-\-consensus-model" "C1: --consensus-model in options"
|
|
256
|
+
assert_contains "$SCRIPT" "\-\-mode tmux" "C1: --mode tmux in recommended"
|
|
257
|
+
|
|
258
|
+
echo ""
|
|
259
|
+
echo "=== Change 2: Worker Planning Step ==="
|
|
260
|
+
assert_contains "$SCRIPT" "## Planning" "C2: Planning section in Worker prompt"
|
|
261
|
+
assert_contains "$SCRIPT" "step.*plan.*ac_id.*all" "C2: plan execution_step format"
|
|
262
|
+
assert_contains "$SCRIPT" "Keep planning lightweight" "C2: lightweight constraint"
|
|
263
|
+
assert_contains "$GOV" "plan.*write_test.*verify_red" "C2: plan in §1f step types"
|
|
264
|
+
assert_contains "$SCRIPT" "Planning Step.*decision.*info" "C2: Verifier plan audit"
|
|
265
|
+
|
|
266
|
+
echo ""
|
|
267
|
+
echo "=== Change 3: Brainstorm Exploration ==="
|
|
268
|
+
assert_contains "$CMD" "Codebase Exploration" "C3: exploration step present"
|
|
269
|
+
assert_contains "$CMD" "greenfield project" "C3: greenfield skip path"
|
|
270
|
+
assert_contains "$CMD" "entry points.*key modules" "C3: exploration instructions"
|
|
271
|
+
|
|
272
|
+
echo ""
|
|
273
|
+
echo "=== Change 4: Memory Bridge ==="
|
|
274
|
+
assert_contains "$CMD" "Campaign memory.*Key Decisions" "C4: init seeds memory instruction"
|
|
275
|
+
assert_contains "$SCRIPT" "seeded from brainstorm" "C4: seed markers in template"
|
|
276
|
+
assert_contains "$SCRIPT" "PRESERVE the Key Decisions" "C4: Worker preservation instruction"
|
|
277
|
+
|
|
278
|
+
echo ""
|
|
279
|
+
echo "=== Change 5: Coding Principles ==="
|
|
280
|
+
assert_contains "$SCRIPT" "## Coding Principles" "C5: Worker coding principles section"
|
|
281
|
+
assert_contains "$SCRIPT" "Think Before Coding" "C5: principle 1 in Worker"
|
|
282
|
+
assert_contains "$SCRIPT" "Simplicity First" "C5: principle 2 in Worker"
|
|
283
|
+
assert_contains "$SCRIPT" "Surgical Changes" "C5: principle 3 in Worker"
|
|
284
|
+
assert_contains "$SCRIPT" "Goal-Driven Execution" "C5: principle 4 in Worker"
|
|
285
|
+
assert_contains "$SCRIPT" "## Verification Principles" "C5: Verifier principles section"
|
|
286
|
+
assert_contains "$SCRIPT" "Think Before Judging" "C5: Verifier principle 1"
|
|
287
|
+
assert_contains "$SCRIPT" "Goal-Driven Verification" "C5: Verifier principle 2"
|
|
288
|
+
|
|
289
|
+
echo ""
|
|
290
|
+
echo "=== RESULTS ==="
|
|
291
|
+
echo "PASS: $PASS / $TOTAL"
|
|
292
|
+
echo "FAIL: $FAIL / $TOTAL"
|
|
293
|
+
[ $FAIL -eq 0 ] && echo "ALL TESTS PASSED" || echo "SOME TESTS FAILED"
|
|
294
|
+
exit $FAIL
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
### TDD Flow Per Wave
|
|
298
|
+
|
|
299
|
+
**Wave 1 (Change 1):**
|
|
300
|
+
1. Write test → run → expect 11 FAIL (stale patterns present, new patterns absent)
|
|
301
|
+
2. Implement Change 1
|
|
302
|
+
3. Run test → expect 11 PASS
|
|
303
|
+
4. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
|
|
304
|
+
|
|
305
|
+
**Wave 2 (Changes 5, 2):**
|
|
306
|
+
1. Run test → expect Change 5 (7 tests) + Change 2 (5 tests) = 12 FAIL
|
|
307
|
+
2. Implement Change 5 (Worker + Verifier principles)
|
|
308
|
+
3. Run test → expect Change 5 PASS, Change 2 still FAIL
|
|
309
|
+
4. Implement Change 2 (Planning step + governance + Verifier audit)
|
|
310
|
+
5. Run test → expect all PASS
|
|
311
|
+
6. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
|
|
312
|
+
|
|
313
|
+
**Wave 3 (Changes 3, 4):**
|
|
314
|
+
1. Run test → expect Change 3 (3 tests) + Change 4 (3 tests) = 6 FAIL
|
|
315
|
+
2. Implement Change 3 (brainstorm exploration)
|
|
316
|
+
3. Run test → expect Change 3 PASS, Change 4 still FAIL
|
|
317
|
+
4. Implement Change 4 (memory bridge — rlp-desk.md + init)
|
|
318
|
+
5. Run test → expect all PASS
|
|
319
|
+
|
|
320
|
+
### Artifact-Based End-to-End Verification
|
|
321
|
+
|
|
322
|
+
After all waves, run init on a test slug and verify generated artifacts:
|
|
323
|
+
|
|
324
|
+
```bash
|
|
325
|
+
# E2E: generate artifacts and verify
|
|
326
|
+
TEST_SLUG="test-karpathy-e2e"
|
|
327
|
+
TEST_DIR=$(mktemp -d)
|
|
328
|
+
cd "$TEST_DIR" && git init && mkdir -p .claude/ralph-desk
|
|
329
|
+
|
|
330
|
+
bash /path/to/src/scripts/init_ralph_desk.zsh "$TEST_SLUG" "test objective"
|
|
331
|
+
|
|
332
|
+
# Check Worker prompt
|
|
333
|
+
grep -q "## Coding Principles" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
|
|
334
|
+
grep -q "## Planning" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
|
|
335
|
+
grep -q "Think Before Coding" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
|
|
336
|
+
grep -q "PRESERVE the Key Decisions" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
|
|
337
|
+
|
|
338
|
+
# Check Verifier prompt
|
|
339
|
+
grep -q "## Verification Principles" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
|
|
340
|
+
grep -q "Think Before Judging" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
|
|
341
|
+
|
|
342
|
+
# Check campaign memory
|
|
343
|
+
grep -q "seeded from brainstorm" .claude/ralph-desk/memos/$TEST_SLUG-memory.md
|
|
344
|
+
|
|
345
|
+
# Check run presets (capture init output)
|
|
346
|
+
# ... verify --consensus, spark:high, haiku defaults appear
|
|
347
|
+
|
|
348
|
+
rm -rf "$TEST_DIR"
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Self-Verification Gate (CLAUDE.md mandatory)
|
|
354
|
+
|
|
355
|
+
3 scenarios required because `governance.md`, `rlp-desk.md`, `init_ralph_desk.zsh` all change.
|
|
356
|
+
|
|
357
|
+
**Scenario 1: LOW risk — greenfield campaign, brainstorm skipped**
|
|
358
|
+
- Init with test slug, no brainstorm
|
|
359
|
+
- Verify: Worker prompt has Coding Principles + Planning section, run presets correct, campaign memory has default template (seed markers present but empty), Verifier has Verification Principles
|
|
360
|
+
- Layers: L1 (grep tests) + L3 (E2E artifact check)
|
|
361
|
+
|
|
362
|
+
**Scenario 2: MEDIUM risk — full brainstorm flow**
|
|
363
|
+
- Brainstorm + init with codex installed
|
|
364
|
+
- Verify: exploration step in brainstorm, init seeds memory, Worker preserves seeds, run presets show cross-engine commands, Verifier audits plan step
|
|
365
|
+
- Layers: L1 + L2 (real integration) + L3
|
|
366
|
+
|
|
367
|
+
**Scenario 3: CRITICAL risk — governance change verification**
|
|
368
|
+
- Verify governance §1f has `plan` in step types
|
|
369
|
+
- Simulate: Worker without plan step → Verifier records `info` (not fail)
|
|
370
|
+
- Simulate: Worker erases Key Decisions → next Worker loses context
|
|
371
|
+
- Layers: L1 + L2 + L3 + governance compliance
|
|
372
|
+
|
|
373
|
+
---
|
|
374
|
+
|
|
375
|
+
## Post-Commit Checklist
|
|
376
|
+
|
|
377
|
+
1. Local file sync (ALL distributable files):
|
|
378
|
+
```bash
|
|
379
|
+
cp src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
380
|
+
cp src/governance.md ~/.claude/ralph-desk/governance.md
|
|
381
|
+
cp src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
382
|
+
cp src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
383
|
+
cp src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
384
|
+
cp README.md ~/.claude/ralph-desk/README.md
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
2. Verify sync:
|
|
388
|
+
```bash
|
|
389
|
+
diff -q src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
390
|
+
diff -q src/governance.md ~/.claude/ralph-desk/governance.md
|
|
391
|
+
diff -q src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
392
|
+
diff -q src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
393
|
+
diff -q src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
394
|
+
diff -q README.md ~/.claude/ralph-desk/README.md
|
|
395
|
+
```
|
|
396
|
+
All must produce no output.
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
## Critical Files
|
|
401
|
+
|
|
402
|
+
| File | Changes |
|
|
403
|
+
|------|---------|
|
|
404
|
+
| `src/scripts/init_ralph_desk.zsh` | C1 (lines 197-238), C2 (lines 316-318, 478), C4 (lines 355, 578-580), C5 (lines 316, 429) |
|
|
405
|
+
| `src/commands/rlp-desk.md` | C3 (lines 25-26), C4 (line 131) |
|
|
406
|
+
| `src/governance.md` | C2 (line 217) |
|
|
407
|
+
| `tests/test_template_generation.sh` | New — TDD test script |
|
package/package.json
CHANGED
|
@@ -1,13 +1,16 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.9.0",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
|
7
7
|
"uninstall": "node scripts/uninstall.js"
|
|
8
8
|
},
|
|
9
9
|
"files": [
|
|
10
|
-
"src/",
|
|
10
|
+
"src/commands/",
|
|
11
|
+
"src/node/",
|
|
12
|
+
"src/governance.md",
|
|
13
|
+
"src/model-upgrade-table.md",
|
|
11
14
|
"scripts/",
|
|
12
15
|
"docs/",
|
|
13
16
|
"examples/",
|