@ai-dev-methodologies/rlp-desk 0.7.5 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -399,6 +399,64 @@ Per-US catches issues early before later stories build on broken foundations.
399
399
 
400
400
  Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
401
401
 
402
+ ## Autonomous Mode
403
+
404
+ By default, Worker and Verifier stop and ask for human input when they encounter document conflicts (e.g., PRD says one thing, test-spec says another) or ambiguous instructions. This breaks unattended execution.
405
+
406
+ **`--autonomous`** enables fully unattended campaigns:
407
+
408
+ ```bash
409
+ /rlp-desk run my-feature --mode tmux --worker-model gpt-5.4:medium --autonomous --debug
410
+ ```
411
+
412
+ ### How it works
413
+
414
+ When `--autonomous` is active:
415
+
416
+ 1. **PRD is the single source of truth.** Resolution priority: `PRD > test-spec > context > memory`
417
+ 2. **No stopping for questions.** Worker and Verifier make autonomous decisions based on the priority chain
418
+ 3. **All conflicts are logged.** Every decision is recorded in `conflict-log.jsonl` for post-campaign review
419
+
420
+ ### Conflict log
421
+
422
+ Each conflict is logged as a JSONL entry in `logs/<slug>/conflict-log.jsonl`:
423
+
424
+ ```json
425
+ {
426
+ "iteration": 1,
427
+ "us_id": "US-001",
428
+ "source_a": "worker-prompt",
429
+ "source_b": "prd",
430
+ "conflict": "US-00 is required by the iteration prompt but is not defined as a PRD user story.",
431
+ "resolution": "Followed PRD as source of truth."
432
+ }
433
+ ```
434
+
435
+ ### When to use
436
+
437
+ - **Long-running campaigns** that run overnight or while you're away
438
+ - **High-iteration tasks** where stopping for every ambiguity wastes hours
439
+ - **Well-defined PRDs** where the PRD is comprehensive and authoritative
440
+
441
+ ### When NOT to use
442
+
443
+ - **Exploratory work** where you want to review each decision
444
+ - **Ambiguous PRDs** where conflicts indicate real design gaps that need human judgment
445
+ - **First run of a new project** — run without `--autonomous` first to catch PRD issues interactively
446
+
447
+ ### Post-campaign review
448
+
449
+ After the campaign, review the conflict log to identify systemic issues:
450
+
451
+ ```bash
452
+ cat .claude/ralph-desk/logs/<slug>/conflict-log.jsonl | jq .
453
+ ```
454
+
455
+ Common patterns:
456
+ - **Repeated PRD vs test-spec conflicts** — test-spec needs updating to match PRD
457
+ - **Scope lock vs fix contract conflicts** — governance rules may need tuning
458
+ - **Missing PRD definitions** — Worker created stories not in the PRD (add them or tighten the brainstorm)
459
+
402
460
  ## Project Structure
403
461
 
404
462
  After `init`, your project gets this scaffold:
@@ -0,0 +1,137 @@
1
+ # Blueprint: Pivot Step (⑤½)
2
+
3
+ > Status: TODO — not yet implemented. Document for future development.
4
+
5
+ ## Summary
6
+
7
+ Insert a Pivot Review step between Worker(⑤) and Verifier(⑦) in the Leader loop. Internalizes the core thinking framework from gstack's `plan-ceo-review` (premise challenge, forced alternatives, scope decisions) without depending on external skills.
8
+
9
+ ## Problem
10
+
11
+ When a Worker repeatedly fails on the same US, the fix loop retries the same approach with progressively stronger models. This works for implementation bugs but fails for **wrong approach** problems. The current CB threshold → BLOCKED pattern wastes iterations before admitting the approach is wrong.
12
+
13
+ ## Proposed Solution
14
+
15
+ ### New CLI Flags
16
+
17
+ ```
18
+ --pivot-mode off|every|on-fail (default: off)
19
+ --pivot-model MODEL (default: opus)
20
+ ```
21
+
22
+ - `off`: no pivot review (current behavior)
23
+ - `every`: pivot review after every Worker iteration
24
+ - `on-fail`: pivot review only after Verifier fail verdict
25
+
26
+ ### Leader Loop Change
27
+
28
+ ```
29
+ Current: ① → ② → ③ → ④ → ⑤ worker → ⑥ signal → ⑦ verifier → ⑧ result
30
+ Proposed: ① → ② → ③ → ③½ PIVOT → ④ → ⑤ worker → ⑥ signal → ⑦ verifier → ⑧ result
31
+ ```
32
+
33
+ Pivot runs BEFORE Worker — it decides direction, then Worker executes that direction.
34
+
35
+ ### Tmux Pane Layout (3 panes)
36
+
37
+ ```
38
+ +------------------+------------------+------------------+
39
+ | Worker pane | Pivot pane | Verifier pane |
40
+ | claude/codex | claude (opus) | claude/codex |
41
+ | implements code | direction review | verifies result |
42
+ +------------------+------------------+------------------+
43
+ ```
44
+
45
+ Pivot pane is reused each iteration (not persistent). Leader launches pivot → waits for memory update → launches Worker in Worker pane.
46
+
47
+ ### ③½ Pivot Review Step
48
+
49
+ **Agent mode:**
50
+ ```
51
+ Agent(
52
+ description="rlp-desk pivot review iter-NNN",
53
+ model=<pivot_model>,
54
+ mode="bypassPermissions",
55
+ prompt=<pivot_prompt>
56
+ )
57
+ ```
58
+
59
+ **Tmux mode:**
60
+ - Dedicated pivot pane (3rd pane)
61
+ - `DISABLE_OMC=1 claude --model opus --mcp-config '{"mcpServers":{}}' --strict-mcp-config -p "$(cat pivot-prompt.md)"`
62
+ - After pivot completes, verify memory updated → build Worker prompt (④) → launch Worker (⑤)
63
+
64
+ ### Pivot Review Responsibilities
65
+
66
+ 1. **Analyze iteration result** — what did the Worker actually produce?
67
+ 2. **Premise challenge** — is the current approach correct? What assumptions are we making?
68
+ 3. **Forced alternatives** — propose minimum 2 alternative approaches
69
+ 4. **Scope decision** — EXPAND (add scope), HOLD (keep current), REDUCE (simplify)
70
+ 5. **Update campaign memory** — rewrite Next Iteration Contract if approach changes
71
+ 6. **Record rejected directions** — prevent future iterations from revisiting dead ends
72
+
73
+ ### Pivot Prompt Template (internalized from plan-ceo-review)
74
+
75
+ ```markdown
76
+ # Pivot Review — Iteration {N}
77
+
78
+ ## Context
79
+ - Campaign: {slug}
80
+ - Current US: {us_id}
81
+ - Worker result: {done-claim summary}
82
+ - Consecutive failures on this US: {N}
83
+ - Previous pivot decisions: {from memory}
84
+
85
+ ## Your Task
86
+
87
+ ### 1. Premise Check
88
+ For each premise below, state whether evidence supports or contradicts it:
89
+ {list premises from PRD/memory}
90
+
91
+ ### 2. Forced Alternatives
92
+ Propose at least 2 alternative approaches to the current US.
93
+ For each: summary, effort (S/M/L), risk, key tradeoff.
94
+
95
+ ### 3. Scope Decision
96
+ Choose ONE: EXPAND | HOLD | REDUCE
97
+ Justify with evidence from this iteration.
98
+
99
+ ### 4. Next Iteration Contract
100
+ If HOLD: refine the current contract with specific fixes.
101
+ If EXPAND/REDUCE: rewrite the contract for the new approach.
102
+
103
+ ### 5. Rejected Directions
104
+ List approaches that should NOT be attempted again, with reason.
105
+
106
+ ## Output
107
+ Update campaign memory at: {memory_path}
108
+ - Update "Next Iteration Contract" section
109
+ - Add to "Key Decisions" section
110
+ - Add to "Rejected Directions" section (if any)
111
+ ```
112
+
113
+ ## Expected Benefits
114
+
115
+ - **Breaks fix loops** — "same approach, stronger model" → "different approach"
116
+ - **Research campaigns** — natural direction pivots without manual intervention
117
+ - **Reuses proven framework** — plan-ceo-review's premise challenge + forced alternatives
118
+ - **Both modes** — works in tmux and agent mode
119
+
120
+ ## Implementation Notes
121
+
122
+ - `PIVOT_MODE` variable in `run_ralph_desk.zsh` (pattern: same as `AUTONOMOUS_MODE`)
123
+ - CLI parser: `--pivot-mode`, `--pivot-model` (pattern: same as other model flags)
124
+ - `write_pivot_prompt()` function in `run_ralph_desk.zsh` (pattern: same as `write_worker_trigger`)
125
+ - Pivot review output → campaign memory update (same file, different section)
126
+ - Status.json: add `pivot_decisions` array for tracking
127
+ - Analytics: `campaign.jsonl` add `pivot_action` field per iteration
128
+
129
+ ## Dependencies
130
+
131
+ - Requires `--autonomous` mode (pivot review must not stop for questions)
132
+ - Works with any Worker engine (Claude or Codex)
133
+ - Does not require gstack installation
134
+
135
+ ## Priority
136
+
137
+ Medium — implement after v1.0 Node.js rewrite is stable. Current CB threshold + model upgrade handles most cases. Pivot step is for research/exploration campaigns where approach flexibility matters.
@@ -0,0 +1,407 @@
1
+ # Plan: Worker Planning, Preset Sync, Brainstorm Exploration, Memory Bridge & Coding Principles
2
+
3
+ ## Context
4
+
5
+ rlp-desk의 Worker/Verifier 프롬프트와 brainstorm/init 흐름에 5가지 개선을 적용한다.
6
+ 기존 iron law 정책 체계의 후속 업데이트로, 검증된 패턴을 Worker/Verifier fresh context에 내장한다.
7
+
8
+ **문제:**
9
+ 1. `print_run_presets()`가 rlp-desk.md 옵션 인터페이스와 desync (stale 플래그, 틀린 기본값)
10
+ 2. Worker가 파일 읽자마자 바로 TDD로 넘어감 (계획 단계 없음)
11
+ 3. Brainstorm이 코드 안 보고 US 제안
12
+ 4. Brainstorm 결과가 campaign memory에 안 남음 (첫 Worker가 재발견)
13
+ 5. Worker/Verifier가 코딩 원칙 가이드라인 없이 작동 (글로벌 CLAUDE.md 의존 불가)
14
+
15
+ **브랜치:** `improve/worker-planning-and-preset-sync`
16
+
17
+ ---
18
+
19
+ ## Changes
20
+
21
+ ### Change 1: Fix Run Preset Desync
22
+ **File:** `src/scripts/init_ralph_desk.zsh` lines 197-238
23
+
24
+ Rewrite `print_run_presets()` to match `src/commands/rlp-desk.md` lines 142-200.
25
+
26
+ **Desync table:**
27
+
28
+ | current (init_ralph_desk.zsh) | canonical (rlp-desk.md) |
29
+ |---|---|
30
+ | `--final-consensus` (line 207) | `--consensus final-only` |
31
+ | `gpt-5.3-codex-spark:high` (line 210) | `spark:high` |
32
+ | `--verify-consensus` (line 232) | `--consensus off\|all\|final-only` |
33
+ | worker default `sonnet` (line 230) | `haiku` |
34
+ | verifier default `opus` (line 231) | per-US `sonnet`, final `opus` |
35
+ | Missing `--mode tmux` in recommended | Present |
36
+ | Missing 6 options | `--lock-worker-model`, `--consensus-model`, `--final-consensus-model`, `--cb-threshold`, `--iter-timeout`, `--final-verifier-model` |
37
+
38
+ **Action:** Replace lines 197-238 with function that mirrors rlp-desk.md lines 142-200.
39
+
40
+ ### Change 2: Add Worker Planning Step
41
+ **Files:**
42
+ - `src/scripts/init_ralph_desk.zsh` Worker prompt — insert between line 316 and line 318
43
+ - `src/governance.md` line 217 — add `plan` to step types
44
+ - `src/scripts/init_ralph_desk.zsh` Verifier prompt — add audit after line 478
45
+
46
+ **Insert after line 316 ("Execute the plan for $SLUG."), before line 318 ("## Before you start"):**
47
+
48
+ ```
49
+ ## Planning (before writing any code)
50
+ After reading all files, BEFORE writing any test or code:
51
+ 1. List the specific files you will create or modify
52
+ 2. For each AC in the contract, state your approach in 1 sentence
53
+ 3. Identify ordering constraints (which AC depends on which)
54
+ 4. Record as first execution_step: {"step": "plan", "ac_id": "all", "command": null, "exit_code": null, "summary": "Plan: [files], [approach], [order]"}
55
+ Keep planning lightweight — 1-2 sentences per AC, not a detailed analysis.
56
+ If the plan reveals the contract is unclear or infeasible, signal "blocked" immediately.
57
+ ```
58
+
59
+ **governance.md line 217:** Change from:
60
+ ```
61
+ - Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
62
+ ```
63
+ to:
64
+ ```
65
+ - Step types: `plan`, `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
66
+ ```
67
+
68
+ **Verifier prompt after line 478 (Worker Process Audit):** Add:
69
+ ```
70
+ - Planning step presence: done-claim execution_steps should include a `plan` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
71
+ ```
72
+
73
+ ### Change 3: Brainstorm Exploration Phase
74
+ **File:** `src/commands/rlp-desk.md` — insert between line 25 and line 26
75
+
76
+ **Insert after line 25 ("2. **Objective**") and before line 26 ("3. **User Stories**"):**
77
+
78
+ ```
79
+ 2.5. **Codebase Exploration** — Before proposing user stories, examine the project:
80
+ - Read the project's entry points, key modules, and test structure
81
+ - Identify architectural patterns in use (frameworks, conventions, test setup)
82
+ - Note constraints the Worker will encounter (dependencies, build system, existing code style)
83
+ - Present findings: "I explored the codebase and found: [patterns], [constraints], [existing tests]. This informs the US breakdown below."
84
+ - If the project is new/empty, skip this step and note "greenfield project."
85
+ ```
86
+
87
+ ### Change 4: Memory Bridge
88
+ **Files:**
89
+ - `src/commands/rlp-desk.md` line 131
90
+ - `src/scripts/init_ralph_desk.zsh` lines 578-580 (campaign memory template)
91
+ - `src/scripts/init_ralph_desk.zsh` line 355 area (Worker prompt iteration rules)
92
+
93
+ **rlp-desk.md line 131:** Change from:
94
+ ```
95
+ If brainstorm was done, auto-fill PRD and test-spec with the results.
96
+ ```
97
+ to:
98
+ ```
99
+ If brainstorm was done, auto-fill:
100
+ - PRD and test-spec with the brainstorm results
101
+ - Campaign memory "Key Decisions" with architectural decisions from brainstorm
102
+ - Campaign memory "Patterns Discovered" with codebase exploration findings (from step 2.5)
103
+ ```
104
+
105
+ **init_ralph_desk.zsh lines 578-580:** Change from:
106
+ ```
107
+ ## Key Decisions
108
+
109
+ ## Patterns Discovered
110
+ ```
111
+ to:
112
+ ```
113
+ ## Key Decisions
114
+ (seeded from brainstorm — do not erase, only append)
115
+
116
+ ## Patterns Discovered
117
+ (seeded from brainstorm codebase exploration — do not erase, only append)
118
+ ```
119
+
120
+ **init_ralph_desk.zsh Worker prompt, after line 355 ("- Rewrite campaign memory in full."):** Add:
121
+ ```
122
+ - When rewriting campaign memory, PRESERVE the Key Decisions and Patterns Discovered sections from prior iterations — append new entries, do not erase existing ones.
123
+ ```
124
+
125
+ ### Change 5: Coding Principles (Karpathy Guidelines)
126
+ **Files:**
127
+ - `src/scripts/init_ralph_desk.zsh` Worker prompt — insert after line 316, before Change 2's Planning section
128
+ - `src/scripts/init_ralph_desk.zsh` Verifier prompt — insert after line 429
129
+
130
+ **Worker prompt — insert after line 316 ("Execute the plan for $SLUG."), as first section:**
131
+
132
+ ```
133
+ ## Coding Principles (applies to ALL work in this iteration)
134
+
135
+ 1. Think Before Coding
136
+ Don't assume. Don't hide confusion. Surface tradeoffs.
137
+ - State assumptions explicitly. If uncertain, signal blocked with your options
138
+ listed — do not guess.
139
+ - If multiple interpretations exist, present them in blocked signal — do not
140
+ pick silently.
141
+ - If a simpler approach exists, note it in your plan.
142
+ - If something important is unclear, stop and name what is confusing.
143
+
144
+ 2. Simplicity First
145
+ Minimum code that solves the problem. Nothing speculative.
146
+ - No features beyond what was asked.
147
+ - No abstractions for single-use code.
148
+ - No configurability that was not specified.
149
+ - No defensive handling for implausible scenarios unless the context requires it.
150
+ - If 200 lines could be 50, rewrite it.
151
+ Ask: "Would a strong senior engineer call this overcomplicated?" If yes, simplify.
152
+
153
+ 3. Surgical Changes
154
+ Touch only what you must. Clean up only your own mess.
155
+ - Do not improve adjacent code, comments, or formatting unless required by the task.
156
+ - Do not refactor unrelated code.
157
+ - Match the local style unless there is a compelling reason not to.
158
+ - If unrelated dead code is noticed, mention it in done-claim — do not delete it.
159
+ - Remove imports, variables, or functions that YOUR changes made unused.
160
+ - Do not remove pre-existing dead code.
161
+ Test: every changed line should trace directly to the contract.
162
+
163
+ 4. Goal-Driven Execution
164
+ Define success criteria. Loop until verified.
165
+ These principles are enforced by the TDD Mandate and Planning step below.
166
+ If success criteria for any AC are unclear, signal blocked.
167
+ ```
168
+
169
+ **Verifier prompt — insert after line 429 ("Independent verifier for Ralph Desk: $SLUG"), before line 431 ("## Iron Law"):**
170
+
171
+ ```
172
+ ## Verification Principles
173
+
174
+ 1. Think Before Judging
175
+ Don't assume. Don't default to PASS or FAIL without evidence.
176
+ - State your assumptions about what PASS looks like for each AC before
177
+ checking evidence.
178
+ - If evidence is ambiguous or incomplete, say what is unclear and why —
179
+ do not default to either verdict.
180
+ - If multiple interpretations of an AC exist, flag it as a spec issue.
181
+
182
+ 2. Goal-Driven Verification
183
+ Define the specific evidence required for PASS before you start checking.
184
+ - For each AC, state: "PASS requires [specific evidence]."
185
+ - Verify against that criteria, not against a general impression of code quality.
186
+ - If success criteria are unclear, note it in reasoning — do not invent criteria.
187
+ ```
188
+
189
+ ---
190
+
191
+ ## Implementation Sequence
192
+
193
+ | Wave | Changes | Files | Risk |
194
+ |------|---------|-------|------|
195
+ | 1 | Change 1 (run preset desync) | init_ralph_desk.zsh | LOW |
196
+ | 2 | Change 5 (coding principles) | init_ralph_desk.zsh | LOW |
197
+ | 2 | Change 2 (planning step) | init_ralph_desk.zsh + governance.md | LOW-MED |
198
+ | 3 | Change 3 (brainstorm exploration) | rlp-desk.md | LOW |
199
+ | 3 | Change 4 (memory bridge) | rlp-desk.md + init_ralph_desk.zsh | MEDIUM |
200
+
201
+ **Order rationale:**
202
+ - Wave 1: Standalone bugfix, no dependencies
203
+ - Wave 2: Coding Principles first (top of prompt), then Planning step (uses principles). Both in init_ralph_desk.zsh Worker prompt.
204
+ - Wave 3: rlp-desk.md changes. Change 4 depends on Change 3 (exploration produces findings that get seeded).
205
+
206
+ ---
207
+
208
+ ## TDD Verification Plan
209
+
210
+ Each change has tests written FIRST, verified to fail, then implementation, then re-verify.
211
+
212
+ ### Test Script: `tests/test_template_generation.sh`
213
+
214
+ ```bash
215
+ #!/bin/bash
216
+ # TDD tests for template generation changes
217
+ # Run: bash tests/test_template_generation.sh
218
+ set -euo pipefail
219
+
220
+ SCRIPT="src/scripts/init_ralph_desk.zsh"
221
+ CMD="src/commands/rlp-desk.md"
222
+ GOV="src/governance.md"
223
+ PASS=0; FAIL=0; TOTAL=0
224
+
225
+ assert_contains() {
226
+ local file="$1" pattern="$2" label="$3"
227
+ TOTAL=$((TOTAL+1))
228
+ if grep -q "$pattern" "$file" 2>/dev/null; then
229
+ echo " PASS: $label"; PASS=$((PASS+1))
230
+ else
231
+ echo " FAIL: $label (pattern not found: $pattern)"; FAIL=$((FAIL+1))
232
+ fi
233
+ }
234
+
235
+ assert_not_contains() {
236
+ local file="$1" pattern="$2" label="$3"
237
+ TOTAL=$((TOTAL+1))
238
+ if grep -q "$pattern" "$file" 2>/dev/null; then
239
+ echo " FAIL: $label (stale pattern still present: $pattern)"; FAIL=$((FAIL+1))
240
+ else
241
+ echo " PASS: $label"; PASS=$((PASS+1))
242
+ fi
243
+ }
244
+
245
+ echo "=== Change 1: Run Preset Desync ==="
246
+ assert_not_contains "$SCRIPT" "\-\-final-consensus" "C1: no --final-consensus"
247
+ assert_not_contains "$SCRIPT" "gpt-5.3-codex-spark" "C1: no gpt-5.3-codex-spark"
248
+ assert_not_contains "$SCRIPT" "\-\-verify-consensus" "C1: no --verify-consensus"
249
+ assert_contains "$SCRIPT" "\-\-consensus final-only" "C1: --consensus final-only present"
250
+ assert_contains "$SCRIPT" "spark:high" "C1: spark:high present"
251
+ assert_contains "$SCRIPT" "default: haiku" "C1: worker default haiku"
252
+ assert_contains "$SCRIPT" "\-\-lock-worker-model" "C1: --lock-worker-model in options"
253
+ assert_contains "$SCRIPT" "\-\-cb-threshold" "C1: --cb-threshold in options"
254
+ assert_contains "$SCRIPT" "\-\-iter-timeout" "C1: --iter-timeout in options"
255
+ assert_contains "$SCRIPT" "\-\-consensus-model" "C1: --consensus-model in options"
256
+ assert_contains "$SCRIPT" "\-\-mode tmux" "C1: --mode tmux in recommended"
257
+
258
+ echo ""
259
+ echo "=== Change 2: Worker Planning Step ==="
260
+ assert_contains "$SCRIPT" "## Planning" "C2: Planning section in Worker prompt"
261
+ assert_contains "$SCRIPT" "step.*plan.*ac_id.*all" "C2: plan execution_step format"
262
+ assert_contains "$SCRIPT" "Keep planning lightweight" "C2: lightweight constraint"
263
+ assert_contains "$GOV" "plan.*write_test.*verify_red" "C2: plan in §1f step types"
264
+ assert_contains "$SCRIPT" "Planning Step.*decision.*info" "C2: Verifier plan audit"
265
+
266
+ echo ""
267
+ echo "=== Change 3: Brainstorm Exploration ==="
268
+ assert_contains "$CMD" "Codebase Exploration" "C3: exploration step present"
269
+ assert_contains "$CMD" "greenfield project" "C3: greenfield skip path"
270
+ assert_contains "$CMD" "entry points.*key modules" "C3: exploration instructions"
271
+
272
+ echo ""
273
+ echo "=== Change 4: Memory Bridge ==="
274
+ assert_contains "$CMD" "Campaign memory.*Key Decisions" "C4: init seeds memory instruction"
275
+ assert_contains "$SCRIPT" "seeded from brainstorm" "C4: seed markers in template"
276
+ assert_contains "$SCRIPT" "PRESERVE the Key Decisions" "C4: Worker preservation instruction"
277
+
278
+ echo ""
279
+ echo "=== Change 5: Coding Principles ==="
280
+ assert_contains "$SCRIPT" "## Coding Principles" "C5: Worker coding principles section"
281
+ assert_contains "$SCRIPT" "Think Before Coding" "C5: principle 1 in Worker"
282
+ assert_contains "$SCRIPT" "Simplicity First" "C5: principle 2 in Worker"
283
+ assert_contains "$SCRIPT" "Surgical Changes" "C5: principle 3 in Worker"
284
+ assert_contains "$SCRIPT" "Goal-Driven Execution" "C5: principle 4 in Worker"
285
+ assert_contains "$SCRIPT" "## Verification Principles" "C5: Verifier principles section"
286
+ assert_contains "$SCRIPT" "Think Before Judging" "C5: Verifier principle 1"
287
+ assert_contains "$SCRIPT" "Goal-Driven Verification" "C5: Verifier principle 2"
288
+
289
+ echo ""
290
+ echo "=== RESULTS ==="
291
+ echo "PASS: $PASS / $TOTAL"
292
+ echo "FAIL: $FAIL / $TOTAL"
293
+ [ $FAIL -eq 0 ] && echo "ALL TESTS PASSED" || echo "SOME TESTS FAILED"
294
+ exit $FAIL
295
+ ```
296
+
297
+ ### TDD Flow Per Wave
298
+
299
+ **Wave 1 (Change 1):**
300
+ 1. Write test → run → expect 11 FAIL (stale patterns present, new patterns absent)
301
+ 2. Implement Change 1
302
+ 3. Run test → expect 11 PASS
303
+ 4. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
304
+
305
+ **Wave 2 (Changes 5, 2):**
306
+ 1. Run test → expect Change 5 (7 tests) + Change 2 (5 tests) = 12 FAIL
307
+ 2. Implement Change 5 (Worker + Verifier principles)
308
+ 3. Run test → expect Change 5 PASS, Change 2 still FAIL
309
+ 4. Implement Change 2 (Planning step + governance + Verifier audit)
310
+ 5. Run test → expect all PASS
311
+ 6. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
312
+
313
+ **Wave 3 (Changes 3, 4):**
314
+ 1. Run test → expect Change 3 (3 tests) + Change 4 (3 tests) = 6 FAIL
315
+ 2. Implement Change 3 (brainstorm exploration)
316
+ 3. Run test → expect Change 3 PASS, Change 4 still FAIL
317
+ 4. Implement Change 4 (memory bridge — rlp-desk.md + init)
318
+ 5. Run test → expect all PASS
319
+
320
+ ### Artifact-Based End-to-End Verification
321
+
322
+ After all waves, run init on a test slug and verify generated artifacts:
323
+
324
+ ```bash
325
+ # E2E: generate artifacts and verify
326
+ TEST_SLUG="test-karpathy-e2e"
327
+ TEST_DIR=$(mktemp -d)
328
+ cd "$TEST_DIR" && git init && mkdir -p .claude/ralph-desk
329
+
330
+ bash /path/to/src/scripts/init_ralph_desk.zsh "$TEST_SLUG" "test objective"
331
+
332
+ # Check Worker prompt
333
+ grep -q "## Coding Principles" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
334
+ grep -q "## Planning" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
335
+ grep -q "Think Before Coding" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
336
+ grep -q "PRESERVE the Key Decisions" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
337
+
338
+ # Check Verifier prompt
339
+ grep -q "## Verification Principles" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
340
+ grep -q "Think Before Judging" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
341
+
342
+ # Check campaign memory
343
+ grep -q "seeded from brainstorm" .claude/ralph-desk/memos/$TEST_SLUG-memory.md
344
+
345
+ # Check run presets (capture init output)
346
+ # ... verify --consensus, spark:high, haiku defaults appear
347
+
348
+ rm -rf "$TEST_DIR"
349
+ ```
350
+
351
+ ---
352
+
353
+ ## Self-Verification Gate (CLAUDE.md mandatory)
354
+
355
+ 3 scenarios required because `governance.md`, `rlp-desk.md`, `init_ralph_desk.zsh` all change.
356
+
357
+ **Scenario 1: LOW risk — greenfield campaign, brainstorm skipped**
358
+ - Init with test slug, no brainstorm
359
+ - Verify: Worker prompt has Coding Principles + Planning section, run presets correct, campaign memory has default template (seed markers present but empty), Verifier has Verification Principles
360
+ - Layers: L1 (grep tests) + L3 (E2E artifact check)
361
+
362
+ **Scenario 2: MEDIUM risk — full brainstorm flow**
363
+ - Brainstorm + init with codex installed
364
+ - Verify: exploration step in brainstorm, init seeds memory, Worker preserves seeds, run presets show cross-engine commands, Verifier audits plan step
365
+ - Layers: L1 + L2 (real integration) + L3
366
+
367
+ **Scenario 3: CRITICAL risk — governance change verification**
368
+ - Verify governance §1f has `plan` in step types
369
+ - Simulate: Worker without plan step → Verifier records `info` (not fail)
370
+ - Simulate: Worker erases Key Decisions → next Worker loses context
371
+ - Layers: L1 + L2 + L3 + governance compliance
372
+
373
+ ---
374
+
375
+ ## Post-Commit Checklist
376
+
377
+ 1. Local file sync (ALL distributable files):
378
+ ```bash
379
+ cp src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
380
+ cp src/governance.md ~/.claude/ralph-desk/governance.md
381
+ cp src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
382
+ cp src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
383
+ cp src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
384
+ cp README.md ~/.claude/ralph-desk/README.md
385
+ ```
386
+
387
+ 2. Verify sync:
388
+ ```bash
389
+ diff -q src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
390
+ diff -q src/governance.md ~/.claude/ralph-desk/governance.md
391
+ diff -q src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
392
+ diff -q src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
393
+ diff -q src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
394
+ diff -q README.md ~/.claude/ralph-desk/README.md
395
+ ```
396
+ All must produce no output.
397
+
398
+ ---
399
+
400
+ ## Critical Files
401
+
402
+ | File | Changes |
403
+ |------|---------|
404
+ | `src/scripts/init_ralph_desk.zsh` | C1 (lines 197-238), C2 (lines 316-318, 478), C4 (lines 355, 578-580), C5 (lines 316, 429) |
405
+ | `src/commands/rlp-desk.md` | C3 (lines 25-26), C4 (line 131) |
406
+ | `src/governance.md` | C2 (line 217) |
407
+ | `tests/test_template_generation.sh` | New — TDD test script |
package/package.json CHANGED
@@ -1,13 +1,16 @@
1
1
  {
2
2
  "name": "@ai-dev-methodologies/rlp-desk",
3
- "version": "0.7.5",
3
+ "version": "0.9.0",
4
4
  "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
5
5
  "scripts": {
6
6
  "postinstall": "node scripts/postinstall.js",
7
7
  "uninstall": "node scripts/uninstall.js"
8
8
  },
9
9
  "files": [
10
- "src/",
10
+ "src/commands/",
11
+ "src/node/",
12
+ "src/governance.md",
13
+ "src/model-upgrade-table.md",
11
14
  "scripts/",
12
15
  "docs/",
13
16
  "examples/",