@ai-dev-methodologies/rlp-desk 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -99,6 +99,22 @@ for iteration in 1..max_iter:
99
99
  8. Update status, report to user, continue or stop
100
100
  ```
101
101
 
102
+ ### Live PRD Update
103
+
104
+ The Leader computes a hash for `prd-<slug>.md` at startup and again at each iteration using `md5`.
105
+
106
+ When the hash changes, it:
107
+
108
+ - Logs `prd_changed=true` with `prd_hash`, previous/new US counts, and `new_us`
109
+ - Splits the PRD into per-US files (`prd-<slug>-US-<id>.md`)
110
+ - Splits the test-spec into per-US files (`test-spec-<slug>-US-<id>.md`)
111
+ - Updates the in-memory PRD US list used for per-US dispatch
112
+ - Adds `NOTE: PRD was updated since last iteration. New/changed US may exist.` to the Worker prompt
113
+
114
+ If the PRD hash is unchanged, `prd_changed=false` is logged and no re-split is triggered.
115
+
116
+ If the PRD file is missing, the process degrades gracefully and continues without failing the campaign loop.
117
+
102
118
  ### Verification Policy (v0.3.0)
103
119
 
104
120
  RLP Desk enforces a comprehensive verification policy defined in `governance.md`:
@@ -133,15 +149,75 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
133
149
  | 3 consecutive failures | Architecture Escalation (§7¾) → report to user |
134
150
  | Max iterations reached | TIMEOUT |
135
151
 
136
- ### Model Routing
152
+ ### Verification Strategy (v0.5)
153
+
154
+ **Core principle: Worker and Verifier use different AI engines whenever possible.**
155
+
156
+ - Per-US: lightweight verification after each user story (catches issues early)
157
+ - Final: top-tier consensus gate before COMPLETE (quality guarantee)
158
+ - Progressive upgrade: auto-upgrade models on consecutive failure (2-attempt windows)
159
+ - Verifier minimum: claude sonnet (haiku cannot verify)
160
+
161
+ #### 1. Claude-only (codex not installed)
162
+
163
+ Verifier is always +1 tier above Worker. Same-engine shares blind spots — install codex for improved detection.
164
+
165
+ | Risk | Worker | Per-US Verifier | Worker upgrade path | Verifier upgrade path |
166
+ |------|--------|-----------------|--------------------|-----------------------|
167
+ | LOW | haiku | sonnet | sonnet → opus | sonnet → opus |
168
+ | MEDIUM | sonnet | sonnet | opus | sonnet → opus |
169
+ | HIGH | sonnet | opus | opus | opus (ceiling) |
170
+ | CRITICAL | opus | opus ⚠ | (ceiling) | (ceiling) |
171
+
172
+ Final: **opus solo** ⚠ same-engine warning displayed
173
+
174
+ #### 2. Cross-engine: GPT Pro (spark + 5.4)
175
+
176
+ Spark is speed-optimized for coding. Use as Worker for LOW-HIGH; 5.4 for CRITICAL.
177
+
178
+ | Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
179
+ |------|---------------|--------------------------|--------------------|-----------------------|
180
+ | LOW | spark medium | sonnet | spark high → xhigh | sonnet → opus |
181
+ | MEDIUM | spark high | sonnet | spark xhigh → 5.4 medium | sonnet → opus |
182
+ | HIGH | spark xhigh | opus | 5.4 high → 5.4 xhigh | opus (ceiling) |
183
+ | CRITICAL | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
184
+
185
+ Final: **opus + 5.4 high** (both must PASS)
186
+
187
+ #### 3. Cross-engine: Non-Pro (5.4 only)
188
+
189
+ | Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
190
+ |------|---------------|--------------------------|--------------------|-----------------------|
191
+ | LOW | 5.4 low | sonnet | 5.4 medium → high | sonnet → opus |
192
+ | MEDIUM | 5.4 medium | sonnet | 5.4 high → xhigh | sonnet → opus |
193
+ | HIGH | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
194
+ | CRITICAL | 5.4 xhigh | opus | (ceiling) | opus (ceiling) |
195
+
196
+ Final: **opus + 5.4 high** (both must PASS)
197
+
198
+ #### Final Verify
199
+
200
+ | Environment | Engine 1 | Engine 2 | Rule |
201
+ |-------------|----------|----------|------|
202
+ | Claude-only | opus | — | Solo ⚠ |
203
+ | Cross-engine | opus | 5.4 high | Both must PASS → COMPLETE |
204
+
205
+ #### Progressive Upgrade (Worker Only)
206
+
207
+ Worker auto-upgrades on consecutive same-US failure. Verifier is fixed at campaign start. CB default: 6.
208
+
209
+ ```
210
+ fail 1-2: keep current model (2-attempt window)
211
+ fail 3-4: upgrade 1 step (e.g., haiku → sonnet)
212
+ fail 5-6: upgrade 2 steps (e.g., haiku → opus)
213
+ fail 7+: ceiling reached → BLOCKED
214
+ ```
137
215
 
138
- | Scenario | Model |
139
- |----------|-------|
140
- | Simple, single-file changes | `haiku` |
141
- | Standard work (default) | `sonnet` |
142
- | Architecture changes, multi-file, prior failure | `opus` |
143
- | Verification (default) | `opus` |
144
- | Lightweight verification | `sonnet` |
216
+ See `src/model-upgrade-table.md` for full upgrade paths per engine and complexity level.
217
+
218
+ #### Sequential Final Verify
219
+
220
+ When all US pass individually, the final ALL verify runs **sequentially per-US** instead of one big check. This prevents verifier timeout on large PRDs. After all per-US checks pass, the project's test suite runs once as a cross-US integration check.
145
221
 
146
222
  ## Commands
147
223
 
@@ -159,18 +235,29 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
159
235
  | Flag | Default | Description |
160
236
  |------|---------|-------------|
161
237
  | `--max-iter N` | 100 | Maximum iterations before timeout |
162
- | `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
163
- | `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
164
238
  | `--mode agent\|tmux` | agent | Execution mode (see below) |
165
- | `--worker-engine claude\|codex` | claude | Engine for Worker (claude uses Agent(), codex uses Bash CLI) |
166
- | `--verifier-engine claude\|codex` | claude | Engine for Verifier |
167
- | `--codex-model MODEL` | gpt-5.4 | Model passed to the Codex CLI (when engine=codex) |
168
- | `--codex-reasoning low\|medium\|high` | high | Reasoning effort for Codex |
239
+ | `--worker-model MODEL` | sonnet | Claude worker model (haiku/sonnet/opus) |
240
+ | `--worker-engine claude\|codex` | claude | Worker engine |
241
+ | `--verifier-model MODEL` | auto | Auto-selected: +1 tier (same-engine) or cross-engine |
242
+ | `--verifier-engine claude\|codex` | auto | Opposite of worker engine if codex available |
243
+ | `--codex-model MODEL` | gpt-5.4 | Codex model (spark requires GPT Pro) |
244
+ | `--codex-reasoning LEVEL` | medium | low/medium/high/xhigh |
169
245
  | `--verify-mode per-us\|batch` | per-us | Verification strategy (see below) |
170
- | `--verify-consensus` | off | Cross-engine consensus verification (see below) |
246
+ | `--lock-worker-model` | off | Disable progressive model upgrade on failure |
171
247
  | `--debug` | off | Debug logging to `logs/<slug>/debug.log` |
172
248
  | `--with-self-verification` | off | Campaign-level post-loop analysis report |
173
249
 
250
+ ### Init Presets
251
+
252
+ After `brainstorm`, `init` detects your environment and presents run command presets:
253
+
254
+ - **Codex detected** → recommends cross-engine mode (`--worker-model gpt-5.4:high --verify-consensus`)
255
+ - **GPT Pro (spark)** → offers spark preset (`--worker-model gpt-5.3-codex-spark:high`)
256
+ - **Claude-only** → defaults to `--worker-model sonnet` with opus verifier
257
+ - **Basic** → minimal flags for quick iteration
258
+
259
+ The brainstorm phase evaluates complexity (US count, file scope, logic, dependencies, code impact) and recommends a starting model. You can override any recommendation.
260
+
174
261
  ## Execution Modes
175
262
 
176
263
  RLP Desk supports two execution modes. Both honor the same governance protocol.
@@ -277,28 +364,18 @@ Uses the `codex` CLI via `Bash()` (agent mode) or as an interactive TUI (tmux mo
277
364
 
278
365
  ## Verification Modes
279
366
 
280
- RLP Desk supports two verification strategies. **Per-US is the default.**
281
-
282
367
  ### Per-US Verification (default)
283
368
 
284
- ```
285
- /rlp-desk run calculator
286
- /rlp-desk run calculator --verify-mode per-us
287
- ```
288
-
289
- Each user story is verified independently after completion, then a final full verification runs after all stories pass:
369
+ Each user story is verified independently, then a final full verification runs:
290
370
 
291
371
  ```
292
- Worker: US-001 → Verifier: US-001 AC only → pass
293
- Worker: US-002 → Verifier: US-002 AC only → pass
294
- Worker: US-003 → Verifier: US-003 AC only → pass
295
- Final full verify: ALL AC → pass → COMPLETE
372
+ Worker: US-001 → Verifier(per-US): US-001 only → pass
373
+ Worker: US-002 → Verifier(per-US): US-002 only → pass
374
+ ...
375
+ Final Verify: opus + 5.4 high both pass → COMPLETE
296
376
  ```
297
377
 
298
- Benefits:
299
- - Catch issues early, before later stories build on broken foundations
300
- - Smaller verification scope = faster, more accurate checks
301
- - Failed verification retries only the specific US
378
+ Per-US catches issues early before later stories build on broken foundations.
302
379
 
303
380
  ### Batch Verification
304
381
 
@@ -306,30 +383,7 @@ Benefits:
306
383
  /rlp-desk run calculator --verify-mode batch
307
384
  ```
308
385
 
309
- Legacy behavior: Worker completes all stories, then a single verification checks all acceptance criteria at once.
310
-
311
- ### Cross-Engine Consensus Verification
312
-
313
- ```
314
- /rlp-desk run calculator --verify-consensus
315
- ```
316
-
317
- When enabled, **both claude and codex verify independently**. Both must pass for verification to succeed.
318
-
319
- ```
320
- Worker completes US → Claude verifies → Codex verifies
321
- Both pass → proceed
322
- Either fails → combined fix contract → Worker retry
323
- 3 rounds without consensus → BLOCKED
324
- ```
325
-
326
- Consensus can be combined with per-US mode for maximum rigor:
327
-
328
- ```
329
- /rlp-desk run calculator --verify-mode per-us --verify-consensus
330
- ```
331
-
332
- Prerequisites: Both `claude` and `codex` CLIs must be installed.
386
+ Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
333
387
 
334
388
  ## Project Structure
335
389
 
@@ -337,20 +391,42 @@ After `init`, your project gets this scaffold:
337
391
 
338
392
  ```
339
393
  your-project/
340
- └── .claude/ralph-desk/
341
- ├── prompts/
342
- ├── <slug>.worker.prompt.md
343
- └── <slug>.verifier.prompt.md
344
- ├── context/
345
- │ └── <slug>-latest.md
346
- ├── memos/
347
- │ └── <slug>-memory.md
348
- ├── plans/
349
- ├── prd-<slug>.md
350
- └── test-spec-<slug>.md
351
- └── logs/<slug>/
352
- └── status.json
353
- ```
394
+ ├── .claude/
395
+ ├── settings.local.json # rlp-desk permissions (auto-added by init)
396
+ └── ralph-desk/
397
+ ├── prompts/
398
+ │ │ ├── <slug>.worker.prompt.md
399
+ └── <slug>.verifier.prompt.md
400
+ ├── context/
401
+ └── <slug>-latest.md
402
+ ├── memos/
403
+ └── <slug>-memory.md
404
+ ├── plans/
405
+ │ │ ├── prd-<slug>.md
406
+ │ │ └── test-spec-<slug>.md
407
+ │ └── logs/<slug>/
408
+ │ └── status.json
409
+ ```
410
+
411
+ ### Local Settings
412
+
413
+ `init` automatically adds the following permissions to `.claude/settings.local.json`:
414
+
415
+ ```json
416
+ {
417
+ "permissions": {
418
+ "allow": [
419
+ "Read(.claude/ralph-desk/**)",
420
+ "Edit(.claude/ralph-desk/**)",
421
+ "Write(.claude/ralph-desk/**)"
422
+ ]
423
+ }
424
+ }
425
+ ```
426
+
427
+ **Why:** Claude Code treats `.claude/` files as sensitive and prompts for confirmation on each access, even with `--dangerously-skip-permissions`. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.
428
+
429
+ **Note:** `settings.local.json` is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.
354
430
 
355
431
  ## Example: Calculator
356
432
 
@@ -0,0 +1,53 @@
1
+ # Plan: 리팩토링 실행 검증 + v05-remaining 재시작
2
+
3
+ ## Context
4
+ Engine path refactoring Phase 0~7 완료 (38 TDD 구조적 테스트 pass).
5
+ 하지만 **실제 tmux 실행 검증**을 안 했음. 리팩토링이 실제 캠페인에서 정상 동작하는지 확인 필요.
6
+
7
+ ## 검증 순서
8
+
9
+ ### Step 1: 좀비 runner + sentinel 정리
10
+ ```bash
11
+ ps aux | grep run_ralph_desk | grep -v grep | awk '{print $2}' | xargs kill 2>/dev/null
12
+ for p in $(tmux list-panes -F '#{pane_id}' | grep -v '%360'); do tmux kill-pane -t "$p" 2>/dev/null; done
13
+ rm -f .claude/ralph-desk/memos/v05-remaining-blocked.md
14
+ rm -f .claude/ralph-desk/memos/v05-remaining-complete.md
15
+ rm -f .claude/ralph-desk/memos/v05-remaining-done-claim.json
16
+ rm -f .claude/ralph-desk/memos/v05-remaining-verify-verdict.json
17
+ rm -f .claude/ralph-desk/memos/v05-remaining-iter-signal.json
18
+ rm -f .claude/ralph-desk/logs/v05-remaining/session-config.json
19
+ ```
20
+
21
+ ### Step 2: v05-remaining 캠페인 실행 (spark worker)
22
+ ```bash
23
+ LOOP_NAME="v05-remaining" ROOT="$PWD" MAX_ITER=15 \
24
+ WORKER_MODEL=gpt-5.3-codex-spark WORKER_ENGINE=codex \
25
+ WORKER_CODEX_MODEL=gpt-5.3-codex-spark WORKER_CODEX_REASONING=medium \
26
+ VERIFIER_MODEL=sonnet VERIFIER_ENGINE=claude \
27
+ VERIFY_MODE=per-us VERIFY_CONSENSUS=0 CB_THRESHOLD=6 \
28
+ ITER_TIMEOUT=600 DEBUG=1 WITH_SELF_VERIFICATION=1 \
29
+ zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
30
+ ```
31
+ (run_in_background=true)
32
+
33
+ ### Step 3: 검증 체크리스트
34
+ - [ ] Pane 3개 생성됨 (leader + worker + verifier)
35
+ - [ ] Worker pane에서 codex exec 실행됨 (bash trigger, dead pane 오판 없음)
36
+ - [ ] Worker 완료 후 heartbeat exited → signal auto-generate
37
+ - [ ] Verifier(sonnet) 정상 시작 + verdict 작성
38
+ - [ ] US-002 이상 진행 (이전 US-001은 이미 verified)
39
+ - [ ] 좀비 runner 없음 (ps 확인)
40
+
41
+ ### Step 4: 실패 시 대응
42
+ - codex worker 시작 실패 → trigger script 내용 확인 + 수동 실행 테스트
43
+ - verifier timeout → runner log tail + pane 상태 확인
44
+ - BLOCKED → sentinel 원인 분석 + 수정 후 재시도
45
+
46
+ ### Step 5: 성공 시
47
+ - 캠페인 진행 모니터링 (status 확인)
48
+ - 완료 대기 또는 다음 세션 handoff
49
+
50
+ ## 파일
51
+ - `src/scripts/run_ralph_desk.zsh` — 리팩토링된 runner
52
+ - `~/.claude/ralph-desk/run_ralph_desk.zsh` — 로컬 동기화된 사본
53
+ - `.claude/ralph-desk/logs/v05-remaining/` — 캠페인 아티팩트
@@ -0,0 +1,201 @@
1
+ # Architect Review: v0.6 Refactoring Plan (RALPLAN Consensus)
2
+
3
+ **Verdict: ITERATE** — The plan is directionally sound but has two concrete issues that must be resolved before execution.
4
+
5
+ ---
6
+
7
+ ## Summary
8
+
9
+ The Planner's Option C (extract `lib_ralph_desk.zsh` as a shared business-logic module) is architecturally correct and the rejection of TeamCreate is well-reasoned. However, the plan underestimates two zsh-specific risks in the extraction and contains a gap in the final-verify-split proposal. I recommend proceeding with Option C after addressing the issues below.
10
+
11
+ ---
12
+
13
+ ## Analysis
14
+
15
+ ### 1. Steelman Antithesis: The Strongest Case Against Option C
16
+
17
+ **The best argument against Option C is not that TeamCreate is better — it is that the extraction creates a maintenance burden for zero immediate user value.**
18
+
19
+ Consider: Agent() mode (rlp-desk.md) is an LLM template, not a shell script. It does not call `get_next_model()`, `check_model_upgrade()`, `write_worker_trigger()`, or any zsh function. The Agent mode Leader is Claude Code itself, interpreting markdown instructions. There is no code to share between the two modes because one mode is shell and the other is natural language.
20
+
21
+ The Planner claims "~1,900 lines are business logic" shareable between modes. But examining the actual functions:
22
+
23
+ - `write_worker_trigger()` (lines 1162-1297): Constructs shell trigger scripts with heredocs embedding `$CLAUDE_BIN`, `$CODEX_BIN`, heartbeat PIDs — entirely tmux-specific.
24
+ - `write_verifier_trigger()` (lines 1299-1389): Same pattern — generates shell trigger scripts for tmux panes.
25
+ - `poll_for_signal()` (lines 1955-2104): Polls tmux panes, monitors heartbeats, nudges idle panes, auto-approves permission prompts via `tmux send-keys` — 100% tmux plumbing.
26
+ - `run_single_verifier()` (lines 2276-2372): Manages tmux pane lifecycle (kill, split, reset), then launches into pane — tmux-specific.
27
+ - `run_consensus_verification()` (lines 2393-2539): Calls `run_single_verifier()` — inherits tmux dependency.
28
+ - `cleanup()` (lines 1807-1948): Kills tmux panes, generates campaign report — tmux lifecycle.
29
+ - `main()` (lines 2561-3126): The entire main loop — creates tmux sessions, polls panes, manages pane lifecycle.
30
+
31
+ The genuinely **mode-independent** functions are a smaller set than claimed:
32
+
33
+ | Function | Lines | Truly Shareable? |
34
+ |----------|-------|-----------------|
35
+ | `log()` / `log_debug()` / `log_error()` | 152-165 | Yes |
36
+ | `parse_model_flag()` | 173-192 | Yes |
37
+ | `get_model_string()` | 219-229 | Yes |
38
+ | `get_next_model()` | 440-469 | Yes |
39
+ | `check_model_upgrade()` | 475-527 | Yes |
40
+ | `atomic_write()` | 531-536 | Yes |
41
+ | `validate_scaffold()` | 635-669 | Yes |
42
+ | `update_status()` | 1391-1432 | Yes |
43
+ | `write_result_log()` | 1435-1471 | Yes |
44
+ | `archive_iter_artifacts()` | 1474-1484 | Yes |
45
+ | `write_cost_log()` | 1487-1524 | Yes |
46
+ | `write_campaign_jsonl()` | 1527-1558 | Yes |
47
+ | `generate_campaign_report()` | 1561-1706 | Yes |
48
+ | `generate_sv_report()` | 1708-1779 | Yes |
49
+ | `compute_prd_hash()` | 2111-2121 | Yes |
50
+ | `count_prd_us()` | 2123-2133 | Yes |
51
+ | `split_prd_by_us()` | 2135-2158 | Yes |
52
+ | `split_test_spec_by_us()` | 2160-2193 | Yes |
53
+ | `check_prd_update()` | 2195-2232 | Yes |
54
+ | `compute_context_hash()` | 2234-2250 | Yes |
55
+ | `check_stale_context()` | 2252-2274 | Yes |
56
+ | `inject_per_us_prd()` | 1144-1157 | Yes |
57
+
58
+ This is roughly 700-800 lines of genuinely shareable logic, not 1,900. The rest is deeply intertwined with tmux pane management. The "~1,100 lines of business logic" claim needs recalibration.
59
+
60
+ **But here is why the antithesis ultimately fails:** Even if Agent() mode cannot directly `source` these functions (it is an LLM, not a shell), extracting them still has value:
61
+ 1. **Testing**: The `extract_fn()` test pattern (used across all 35 test files) extracts functions from `run_ralph_desk.zsh` by awk-ing function boundaries. A dedicated `lib_ralph_desk.zsh` would make tests cleaner — `source lib_ralph_desk.zsh` instead of fragile awk extraction.
62
+ 2. **Readability**: 3,184 lines in one file is objectively hard to navigate.
63
+ 3. **Future extensibility**: If a third orchestration mode appears (e.g., Docker, SSH), the shared lib is ready.
64
+
65
+ **Synthesis**: Option C is correct, but the extraction scope should be the ~800 lines of genuinely mode-independent logic, not the inflated ~1,900 line estimate. The tmux-entangled functions stay in `run_ralph_desk.zsh`.
66
+
67
+ ### 2. Tradeoff Tension: "Simplify" vs. "Preserve"
68
+
69
+ The plan says it preserves both Agent() and tmux modes while "simplifying" via extraction. But there is a fundamental tension:
70
+
71
+ **Agent() mode is an LLM interpreting markdown. Tmux mode is a shell script.** They do not share code. They share *concepts* (the governance protocol). The governance.md document IS the shared abstraction — it already serves as the "lib" for Agent mode.
72
+
73
+ Extracting shell functions into `lib_ralph_desk.zsh` simplifies tmux mode's file organization, but does nothing to reduce the conceptual duplication between the modes. Every governance rule appears in three places:
74
+ 1. `governance.md` (the canonical spec)
75
+ 2. `rlp-desk.md` (Agent mode instructions, lines 296-555)
76
+ 3. `run_ralph_desk.zsh` (tmux mode implementation)
77
+
78
+ The lib extraction does not reduce this triple-statement problem. If the user later changes the circuit breaker threshold logic, they must still update all three files.
79
+
80
+ **This is not a blocking issue** — it is a tension to acknowledge in documentation. The plan should explicitly state: "lib extraction reduces file-level complexity but does not reduce specification duplication. governance.md remains the single source of truth; both modes implement it independently."
81
+
82
+ ### 3. Architecture Soundness: zsh-specific `source` Pitfalls
83
+
84
+ The plan calls the extraction "purely mechanical (move functions, add source statement)." This is dangerously optimistic for zsh. Two concrete risks:
85
+
86
+ **Risk A: Global variable scoping across `source` boundaries.**
87
+
88
+ `run_ralph_desk.zsh` uses three `typeset -A` associative arrays at file scope (line 118-120):
89
+ ```
90
+ typeset -A LAST_PANE_CONTENT
91
+ typeset -A PANE_IDLE_SINCE
92
+ typeset -A WORKER_RESTARTS
93
+ ```
94
+
95
+ These are tmux-specific and would stay in `run_ralph_desk.zsh`. But 30+ other global variables (lines 47-143) — `SLUG`, `WORKER_MODEL`, `ITERATION`, `VERIFIED_US`, `CONSECUTIVE_FAILURES`, etc. — are read and mutated by functions throughout the file. After extraction:
96
+
97
+ - `lib_ralph_desk.zsh` functions (e.g., `check_model_upgrade()` at line 475) mutate globals like `_SAME_US_FAIL_COUNT`, `_LAST_FAILED_US`, `_MODEL_UPGRADED`, `WORKER_MODEL`, `WORKER_CODEX_MODEL`, `WORKER_CODEX_REASONING`.
98
+ - These globals are defined in `run_ralph_desk.zsh` before `source lib_ralph_desk.zsh`.
99
+ - In zsh, `source` shares the caller's scope — globals survive across source boundaries. **This works.**
100
+ - But `typeset` inside a function creates a **local** variable in zsh (unlike bash where `declare` in a function is local but at top-level is global). If any extracted function uses `typeset` internally, it creates a local shadow, not a global mutation. This is already the case in the current code so it is not a new problem, but the extractor must verify no `typeset` statements are accidentally introduced during the move.
101
+
102
+ **Risk B: `local` vs. global mutation in extracted functions.**
103
+
104
+ `check_model_upgrade()` (line 475-527) directly mutates globals: `_SAME_US_FAIL_COUNT`, `_LAST_FAILED_US`, `_MODEL_UPGRADED`, `_ORIGINAL_WORKER_MODEL`, `WORKER_MODEL`, `WORKER_CODEX_MODEL`, `WORKER_CODEX_REASONING`. After moving to `lib_ralph_desk.zsh`, these mutations will still work because zsh functions see the calling scope's globals. **But**: if someone later wraps the `source` call inside a function (e.g., `load_lib()`), the scoping changes — `typeset -A` in the sourced file would become local to `load_lib()`. The source statement must remain at the file's top level.
105
+
106
+ **Mitigation**: Add a comment in `run_ralph_desk.zsh` line 1 area: `# IMPORTANT: source lib_ralph_desk.zsh at file scope, NOT inside a function.`
107
+
108
+ ### 4. Risk the Planner Missed: Test Breakage Pattern
109
+
110
+ All 35 test files use the `extract_fn()` pattern (confirmed at `tests/test_engine_refactor.sh:12-14`, `tests/test_us009_api_retry_guard.sh:11-31`, `tests/test_us004_progressive_upgrade.sh:17-20`):
111
+
112
+ ```bash
113
+ RUN="${RUN:-src/scripts/run_ralph_desk.zsh}"
114
+ extract_fn() {
115
+ awk -v fn="$1" '$0 ~ "^"fn"\\(\\)" { p=1 } p { print } p && /^}/ { p=0 }' "$RUN"
116
+ }
117
+ ```
118
+
119
+ After extraction, functions that move to `lib_ralph_desk.zsh` will no longer be found by `extract_fn()` because `$RUN` still points to `run_ralph_desk.zsh`. The plan says "171 tests continue working with updated paths" — this requires either:
120
+
121
+ **Option 1**: Update `$RUN` in each test to `$LIB` for functions in the lib (changes to 35 files).
122
+ **Option 2**: Have `run_ralph_desk.zsh` physically `source` the lib, so extracting from the combined output works. But `extract_fn()` runs awk on a **file**, not on the runtime-sourced combination.
123
+ **Option 3**: Add a `LIB="${LIB:-src/scripts/lib_ralph_desk.zsh}"` variable in each test and update `extract_fn()` to search both files.
124
+
125
+ The Planner did not specify which approach. This is a concrete implementation detail that affects all test files and must be decided before execution. Option 3 is recommended — it is backward-compatible and minimal.
126
+
127
+ ### 5. Final Verify Split: Sequential Per-US
128
+
129
+ The proposal to split the final ALL verify into sequential per-US checks is sound in principle — it reuses the proven per-US mechanism and avoids the monolithic timeout problem. However:
130
+
131
+ **Gap: Cross-US integration is the entire point of the final verify.**
132
+
133
+ The governance spec (`governance.md` lines 184-187) explicitly states:
134
+ > Checkpoint 2: Release Readiness (us_id=ALL) — Scope: all AC + L2 integration (if applicable) + L3 E2E Simulation + L4 deploy (if applicable)
135
+
136
+ The final ALL verify exists to catch **cross-US regressions** — e.g., US-003's changes broke US-001's tests. Sequential per-US re-verification catches per-US regressions but may miss **system-level integration** issues that only manifest when all changes interact.
137
+
138
+ **Mitigation**: The sequential per-US checks should be followed by a lightweight integration check: run the full test suite once (not per-US scoped). If the full suite passes, COMPLETE. If it fails, the failure is already scoped to specific tests that can be debugged. This is cheap (one test run) and preserves the cross-US safety net.
139
+
140
+ ### 6. Merge Strategy: Squash Merge of 77 Commits
141
+
142
+ Squash merge is correct for this case:
143
+ - 77 commits include campaign iteration artifacts (iter01, iter02, ..., iter14), done-claim corrections, and verification handoffs — these are process noise, not meaningful history.
144
+ - The feature branch is `feature/v0.4.1-tmux-sv-report` — a single feature.
145
+ - Squash produces one clean commit on main with a clear message.
146
+
147
+ **One caution**: Verify that `git diff main...HEAD` shows only the intended changes before squashing. Campaign-generated test artifacts or temporary files should not be included.
148
+
149
+ ---
150
+
151
+ ## Root Cause
152
+
153
+ The plan's core weakness is not its direction (Option C is correct) but its estimation of extraction scope. The "1,900 lines of business logic" figure conflates tmux-entangled orchestration logic with genuinely mode-independent utility functions. This overestimate could lead to an extraction that either (a) tries to extract tmux-dependent code and breaks it, or (b) discovers mid-implementation that the extraction is smaller than planned and loses momentum.
154
+
155
+ ---
156
+
157
+ ## Recommendations
158
+
159
+ 1. **Recalibrate extraction scope** — LOW effort, HIGH impact. The lib should contain ~800 lines of genuinely mode-independent functions (logging, model management, scaffold validation, reporting, PRD/context utilities), not the full 1,900 claimed. Functions that call `tmux` commands or reference pane IDs stay in `run_ralph_desk.zsh`.
160
+
161
+ 2. **Decide test migration strategy** — LOW effort, HIGH impact. Before extraction, decide on Option 3 (dual-file `extract_fn`) and document it. This prevents 35 test files from breaking.
162
+
163
+ 3. **Add a source-scope guard comment** — TRIVIAL effort, MEDIUM impact. `# IMPORTANT: source at file scope, NOT inside a function` at the top of both files. Prevents future scoping bugs.
164
+
165
+ 4. **Add integration check to final verify split** — LOW effort, HIGH impact. After sequential per-US re-checks, run the full test suite once as a cross-US safety net.
166
+
167
+ 5. **Proceed with Phase 0 (npm publish v0.5) first** — as planned. Ship what exists before refactoring.
168
+
169
+ ---
170
+
171
+ ## Consensus Addendum
172
+
173
+ ### Antithesis (steelman)
174
+ The strongest argument against Option C: Agent() mode is an LLM interpreting markdown — it will never `source lib_ralph_desk.zsh`. The extraction creates a cleaner tmux codebase but does NOT create a "shared module used by both modes." The "hybrid" framing is misleading. What this actually is: a tmux-mode-internal refactoring that splits one 3,184-line file into two files. That is still valuable, but the value proposition should be stated honestly.
175
+
176
+ ### Tradeoff tension
177
+ **File organization simplicity vs. specification duplication**: Extracting a lib simplifies the file structure but does nothing about the triple-statement problem (governance.md + rlp-desk.md + run_ralph_desk.zsh). Every governance change still requires updating three artifacts. The real "shared module" is governance.md itself — both modes implement it from the spec. Until the architecture evolves to make governance.md machine-executable (not just human-readable), this duplication persists regardless of how many .zsh files exist.
178
+
179
+ ### Synthesis
180
+ Accept Option C but reframe it: "tmux-mode internal refactoring" rather than "hybrid shared module." This honest framing prevents scope creep (trying to make Agent mode consume the lib) and focuses the extraction on the right ~800 lines. The long-term path to true mode unification would be making governance.md a structured schema that both modes consume programmatically — but that is v0.7+ territory, not v0.6.
181
+
182
+ ### Principle violations
183
+ - **Estimation accuracy**: The 1,900-line extraction claim does not survive code inspection. The real shareable set is ~800 lines. This is a planning accuracy issue, not a direction issue.
184
+ - **Test impact omission**: The plan claims "171 tests continue working with updated paths" but does not specify the mechanism. The `extract_fn()` pattern hardcodes `$RUN` pointing to one file; extraction breaks this.
185
+
186
+ ---
187
+
188
+ ## References
189
+
190
+ - `src/scripts/run_ralph_desk.zsh:118-120` — `typeset -A` associative arrays (tmux-specific global state)
191
+ - `src/scripts/run_ralph_desk.zsh:440-469` — `get_next_model()` (genuinely shareable business logic)
192
+ - `src/scripts/run_ralph_desk.zsh:475-527` — `check_model_upgrade()` (shareable but mutates 7 globals)
193
+ - `src/scripts/run_ralph_desk.zsh:1162-1297` — `write_worker_trigger()` (tmux-entangled, NOT shareable)
194
+ - `src/scripts/run_ralph_desk.zsh:1955-2104` — `poll_for_signal()` (100% tmux plumbing)
195
+ - `src/scripts/run_ralph_desk.zsh:2276-2372` — `run_single_verifier()` (tmux pane lifecycle)
196
+ - `src/scripts/run_ralph_desk.zsh:2561-3126` — `main()` (tmux session management + main loop)
197
+ - `src/governance.md:184-187` — Checkpoint 2 Release Readiness scope (cross-US integration)
198
+ - `src/governance.md:300-374` — Agent mode (§5a) and Tmux mode (§5b) architecture
199
+ - `src/commands/rlp-desk.md:296-460` — Agent mode Leader loop (LLM instructions, not shell code)
200
+ - `tests/test_engine_refactor.sh:6-14` — `extract_fn()` pattern with `$RUN` hardcoded
201
+ - `tests/test_us009_api_retry_guard.sh:4-31` — Same pattern with more complex harness
@@ -0,0 +1,117 @@
1
+ # Remaining Work Plan — rlp-desk Post v0.5
2
+
3
+ **Created**: 2026-03-30
4
+ **Updated**: 2026-03-30 (Phase 1-4 구현 완료, 커밋 대기)
5
+ **Branch**: main (v0.5.0, 커밋 대기 변경 +89 lines)
6
+
7
+ ---
8
+
9
+ ## Context
10
+
11
+ v0.5.0 코드는 main에 머지 + push 완료. npm publish와 gh release만 남음. lib_ralph_desk.zsh 추출 완료 (internal refactoring, semver 변경 불필요). 이 계획은 master issue list의 미해결 항목 전체를 다룸.
12
+
13
+ ---
14
+
15
+ ## Phase 0: npm publish v0.5.0 (보류 — 유저 요청 시)
16
+
17
+ 1. `gh release create v0.5.0` (user-facing release notes)
18
+ 2. `npm publish`
19
+ 3. Local file sync 확인
20
+
21
+ ---
22
+
23
+ ## Phase 1: 검증 필요 항목 (구현됨, 실전 테스트 미완)
24
+
25
+ ### A14/A15: init --mode improve (test-spec 보존 + sentinel 정리)
26
+ - **상태**: v05 캠페인에서 구현, test_a14a15_init_improve.sh 존재
27
+ - **필요**: 실제 `--mode improve` 시나리오 수동 테스트로 동작 확인
28
+ - **파일**: `src/scripts/init_ralph_desk.zsh`
29
+
30
+ ### A18: zombie runner lockfile
31
+ - **상태**: lockfile 로직 구현됨 (8 references in run_ralph_desk.zsh)
32
+ - **필요**: 실전 캠페인에서 중복 실행 방지 검증
33
+ - **파일**: `src/scripts/run_ralph_desk.zsh`
34
+
35
+ ---
36
+
37
+ ## Phase 2: HIGH 우선순위 이슈
38
+
39
+ ### A10: "edit its own settings" permission prompt 블로킹
40
+ - **문제**: Claude Code가 자체 settings 수정 시 permission 프롬프트 발생 → Worker 블로킹
41
+ - **근본 원인**: `--dangerously-skip-permissions`로도 우회 불가
42
+ - **접근**: Claude Code 측 해결 대기 or Worker prompt에 settings 수정 금지 규칙 강화
43
+ - **파일**: `src/commands/rlp-desk.md` (Worker prompt), `src/governance.md`
44
+ - **크기**: SMALL (prompt 변경만)
45
+
46
+ ### C4: /rlp-desk status 상세 보고
47
+ - **문제**: 현재 status가 빈약 — 현재 US, 시도 횟수, 실패 원인, 실패 주체 미표시
48
+ - **접근**: status.json에 이미 필드 존재 → rlp-desk.md status 서브커맨드에서 파싱 + 표시
49
+ - **파일**: `src/commands/rlp-desk.md` (status 섹션)
50
+ - **TDD**: `tests/test_status_detail.sh` 신규
51
+ - **크기**: MEDIUM
52
+
53
+ ### B3/B4: 런타임 per-US document splitting
54
+ - **문제**: init에서 PRD/test-spec 분할은 완료됐지만, run 중 Worker prompt에 해당 US만 주입하는 로직 미완
55
+ - **접근**: write_worker_trigger()에서 per-US PRD/test-spec 파일 존재 시 해당 파일만 inject
56
+ - **파일**: `src/scripts/run_ralph_desk.zsh` (write_worker_trigger), `src/scripts/lib_ralph_desk.zsh` (inject_per_us_prd 이미 존재 확인 필요)
57
+ - **TDD**: 기존 test_us002_perus_inject.sh 확장
58
+ - **크기**: MEDIUM
59
+
60
+ ---
61
+
62
+ ## Phase 3: MEDIUM 우선순위 이슈
63
+
64
+ ### A16: tmux foreground 실행 충돌
65
+ - **문제**: run_ralph_desk.zsh를 foreground로 실행하면 Claude Code pane과 충돌
66
+ - **접근**: rlp-desk.md에서 run_in_background 필수 명시 + foreground 감지 시 경고
67
+ - **파일**: `src/commands/rlp-desk.md`, `src/scripts/run_ralph_desk.zsh`
68
+ - **크기**: SMALL
69
+
70
+ ### D1/D2: rlp-desk resume 서브커맨드
71
+ - **문제**: 캠페인 중단 후 재시작 시 verified_us 복원 안 됨
72
+ - **접근**: status.json에서 verified_us 읽어 복원 + resume 서브커맨드 추가
73
+ - **파일**: `src/commands/rlp-desk.md` (resume 섹션), `src/scripts/run_ralph_desk.zsh` (--resume 플래그)
74
+ - **TDD**: `tests/test_resume.sh` 신규
75
+ - **크기**: MEDIUM
76
+
77
+ ---
78
+
79
+ ## Phase 4: LOW 우선순위 / Backlog
80
+
81
+ ### A5: Rate limit 후 pane 오염 — ✅ 구현됨 (미커밋)
82
+ - poll_for_signal에서 "queued messages" 감지 시 pane C-c + /exit 자동 실행
83
+
84
+ ### C3: Agent mode campaign.jsonl — ✅ 구현됨 (미커밋)
85
+ - rlp-desk.md ⑧ 섹션에 campaign.jsonl APPEND 지시 추가
86
+
87
+ ### F8: --consensus-fail-fast — ✅ 구현됨 (미커밋)
88
+ - CONSENSUS_FAIL_FAST 환경변수 + claude fail 시 codex skip 로직
89
+
90
+ ### F9: rlp-desk analytics 서브커맨드 — ✅ 스텁 추가 (미커밋)
91
+ - rlp-desk.md에 analytics 서브커맨드 문서화 (실제 구현은 Agent mode LLM이 해석)
92
+
93
+ ### A17: logs/ 디렉토리 구조 리팩토링 — ❌ 미착수
94
+ - **크기**: LARGE (경로 참조 수십 곳 변경)
95
+ - **다음 세션으로 보류**
96
+
97
+ ---
98
+
99
+ ## 실행 순서 (권장)
100
+
101
+ ```
102
+ Phase 0: npm publish (유저 요청 시)
103
+ Phase 1: A14/A15 + A18 실전 검증 (수동 테스트, 코드 변경 없음)
104
+ Phase 2: C4 → B3/B4 → A10 (순서대로, 각각 독립)
105
+ Phase 3: A16 → D1/D2
106
+ Phase 4: Backlog (필요 시)
107
+ ```
108
+
109
+ Phase 2의 C4, B3/B4, A10은 독립적이므로 병렬 가능.
110
+
111
+ ---
112
+
113
+ ## Verification
114
+
115
+ - 각 Phase 완료 후: `for f in tests/test_*.sh; do bash "$f" || exit 1; done`
116
+ - 신규 기능: TDD (test 먼저, RED 확인, 구현, GREEN 확인)
117
+ - CLAUDE.md 규칙: 커밋 전 유저 승인, npm publish 전 유저 승인
@@ -4,7 +4,7 @@
4
4
  /ralplan {{OBJECTIVE}}
5
5
  {{SCOPE}}
6
6
  Run codex cross-validation after consensus. Repeat revise -> consensus -> codex until 0 issues.
7
- If source documents are insufficient, identify gaps before proceeding.
7
+ If the code, requirements, or source documents are insufficient or unclear, identify the gaps before proceeding.
8
8
  ```
9
9
 
10
10
  ---
package/install.sh CHANGED
@@ -40,6 +40,11 @@ echo " Downloading tmux runner script..."
40
40
  curl -sSL "$REPO_URL/src/scripts/run_ralph_desk.zsh" -o "$DESK_DIR/run_ralph_desk.zsh"
41
41
  chmod +x "$DESK_DIR/run_ralph_desk.zsh"
42
42
 
43
+ # Download shared business logic library
44
+ echo " Downloading shared library..."
45
+ curl -sSL "$REPO_URL/src/scripts/lib_ralph_desk.zsh" -o "$DESK_DIR/lib_ralph_desk.zsh"
46
+ chmod +x "$DESK_DIR/lib_ralph_desk.zsh"
47
+
43
48
  # Download governance protocol
44
49
  echo " Downloading governance protocol..."
45
50
  curl -sSL "$REPO_URL/src/governance.md" -o "$DESK_DIR/governance.md"