npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.4.0 → 0.5.0 - Mend

@ai-dev-methodologies/rlp-desk 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +145 -69
package/docs/plans/cozy-gliding-trinket.md +53 -0
package/docs/plans/toasty-whistling-diffie-agent-a6814625642e956da.md +201 -0
package/docs/plans/toasty-whistling-diffie.md +117 -0
package/docs/prompts/ralplan-codex-review.md +1 -1
package/install.sh +5 -0
package/package.json +1 -1
package/scripts/postinstall.js +5 -0
package/scripts/uninstall.js +1 -0
package/src/commands/rlp-desk.md +193 -51
package/src/governance.md +27 -9
package/src/model-upgrade-table.md +50 -0
package/src/scripts/init_ralph_desk.zsh +200 -19
package/src/scripts/lib_ralph_desk.zsh +837 -0
package/src/scripts/run_ralph_desk.zsh +812 -594

package/README.md CHANGED Viewed

@@ -99,6 +99,22 @@ for iteration in 1..max_iter:
   8. Update status, report to user, continue or stop
 ```
+### Live PRD Update
+The Leader computes a hash for `prd-<slug>.md` at startup and again at each iteration using `md5`.
+When the hash changes, it:
+- Logs `prd_changed=true` with `prd_hash`, previous/new US counts, and `new_us`
+- Splits the PRD into per-US files (`prd-<slug>-US-<id>.md`)
+- Splits the test-spec into per-US files (`test-spec-<slug>-US-<id>.md`)
+- Updates the in-memory PRD US list used for per-US dispatch
+- Adds `NOTE: PRD was updated since last iteration. New/changed US may exist.` to the Worker prompt
+If the PRD hash is unchanged, `prd_changed=false` is logged and no re-split is triggered.
+If the PRD file is missing, the process degrades gracefully and continues without failing the campaign loop.
 ### Verification Policy (v0.3.0)
 RLP Desk enforces a comprehensive verification policy defined in `governance.md`:
@@ -133,15 +149,75 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
 | 3 consecutive failures | Architecture Escalation (§7¾) → report to user |
 | Max iterations reached | TIMEOUT |
-### Model Routing
+### Verification Strategy (v0.5)
+**Core principle: Worker and Verifier use different AI engines whenever possible.**
+- Per-US: lightweight verification after each user story (catches issues early)
+- Final: top-tier consensus gate before COMPLETE (quality guarantee)
+- Progressive upgrade: auto-upgrade models on consecutive failure (2-attempt windows)
+- Verifier minimum: claude sonnet (haiku cannot verify)
+#### 1. Claude-only (codex not installed)
+Verifier is always +1 tier above Worker. Same-engine shares blind spots — install codex for improved detection.
+| Risk | Worker | Per-US Verifier | Worker upgrade path | Verifier upgrade path |
+|------|--------|-----------------|--------------------|-----------------------|
+| LOW | haiku | sonnet | sonnet → opus | sonnet → opus |
+| MEDIUM | sonnet | sonnet | opus | sonnet → opus |
+| HIGH | sonnet | opus | opus | opus (ceiling) |
+| CRITICAL | opus | opus ⚠ | (ceiling) | (ceiling) |
+Final: **opus solo** ⚠ same-engine warning displayed
+#### 2. Cross-engine: GPT Pro (spark + 5.4)
+Spark is speed-optimized for coding. Use as Worker for LOW-HIGH; 5.4 for CRITICAL.
+| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
+|------|---------------|--------------------------|--------------------|-----------------------|
+| LOW | spark medium | sonnet | spark high → xhigh | sonnet → opus |
+| MEDIUM | spark high | sonnet | spark xhigh → 5.4 medium | sonnet → opus |
+| HIGH | spark xhigh | opus | 5.4 high → 5.4 xhigh | opus (ceiling) |
+| CRITICAL | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
+Final: **opus + 5.4 high** (both must PASS)
+#### 3. Cross-engine: Non-Pro (5.4 only)
+| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
+|------|---------------|--------------------------|--------------------|-----------------------|
+| LOW | 5.4 low | sonnet | 5.4 medium → high | sonnet → opus |
+| MEDIUM | 5.4 medium | sonnet | 5.4 high → xhigh | sonnet → opus |
+| HIGH | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
+| CRITICAL | 5.4 xhigh | opus | (ceiling) | opus (ceiling) |
+Final: **opus + 5.4 high** (both must PASS)
+#### Final Verify
+| Environment | Engine 1 | Engine 2 | Rule |
+|-------------|----------|----------|------|
+| Claude-only | opus | — | Solo ⚠ |
+| Cross-engine | opus | 5.4 high | Both must PASS → COMPLETE |
+#### Progressive Upgrade (Worker Only)
+Worker auto-upgrades on consecutive same-US failure. Verifier is fixed at campaign start. CB default: 6.
+```
+fail 1-2: keep current model (2-attempt window)
+fail 3-4: upgrade 1 step (e.g., haiku → sonnet)
+fail 5-6: upgrade 2 steps (e.g., haiku → opus)
+fail 7+:  ceiling reached → BLOCKED
+```
-| Scenario | Model |
-|----------|-------|
-| Simple, single-file changes | `haiku` |
-| Standard work (default) | `sonnet` |
-| Architecture changes, multi-file, prior failure | `opus` |
-| Verification (default) | `opus` |
-| Lightweight verification | `sonnet` |
+See `src/model-upgrade-table.md` for full upgrade paths per engine and complexity level.
+#### Sequential Final Verify
+When all US pass individually, the final ALL verify runs **sequentially per-US** instead of one big check. This prevents verifier timeout on large PRDs. After all per-US checks pass, the project's test suite runs once as a cross-US integration check.
 ## Commands
@@ -159,18 +235,29 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
 | Flag | Default | Description |
 |------|---------|-------------|
 | `--max-iter N` | 100 | Maximum iterations before timeout |
-| `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
-| `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
 | `--mode agent\|tmux` | agent | Execution mode (see below) |
-| `--worker-engine claude\|codex` | claude | Engine for Worker (claude uses Agent(), codex uses Bash CLI) |
-| `--verifier-engine claude\|codex` | claude | Engine for Verifier |
-| `--codex-model MODEL` | gpt-5.4 | Model passed to the Codex CLI (when engine=codex) |
-| `--codex-reasoning low\|medium\|high` | high | Reasoning effort for Codex |
+| `--worker-model MODEL` | sonnet | Claude worker model (haiku/sonnet/opus) |
+| `--worker-engine claude\|codex` | claude | Worker engine |
+| `--verifier-model MODEL` | auto | Auto-selected: +1 tier (same-engine) or cross-engine |
+| `--verifier-engine claude\|codex` | auto | Opposite of worker engine if codex available |
+| `--codex-model MODEL` | gpt-5.4 | Codex model (spark requires GPT Pro) |
+| `--codex-reasoning LEVEL` | medium | low/medium/high/xhigh |
 | `--verify-mode per-us\|batch` | per-us | Verification strategy (see below) |
-| `--verify-consensus` | off | Cross-engine consensus verification (see below) |
+| `--lock-worker-model` | off | Disable progressive model upgrade on failure |
 | `--debug` | off | Debug logging to `logs/<slug>/debug.log` |
 | `--with-self-verification` | off | Campaign-level post-loop analysis report |
+### Init Presets
+After `brainstorm`, `init` detects your environment and presents run command presets:
+- **Codex detected** → recommends cross-engine mode (`--worker-model gpt-5.4:high --verify-consensus`)
+- **GPT Pro (spark)** → offers spark preset (`--worker-model gpt-5.3-codex-spark:high`)
+- **Claude-only** → defaults to `--worker-model sonnet` with opus verifier
+- **Basic** → minimal flags for quick iteration
+The brainstorm phase evaluates complexity (US count, file scope, logic, dependencies, code impact) and recommends a starting model. You can override any recommendation.
 ## Execution Modes
 RLP Desk supports two execution modes. Both honor the same governance protocol.
@@ -277,28 +364,18 @@ Uses the `codex` CLI via `Bash()` (agent mode) or as an interactive TUI (tmux mo
 ## Verification Modes
-RLP Desk supports two verification strategies. **Per-US is the default.**
 ### Per-US Verification (default)
-```
-/rlp-desk run calculator
-/rlp-desk run calculator --verify-mode per-us
-```
-Each user story is verified independently after completion, then a final full verification runs after all stories pass:
+Each user story is verified independently, then a final full verification runs:
 ```
-Worker: US-001 → Verifier: US-001 AC only → pass
-Worker: US-002 → Verifier: US-002 AC only → pass
-Worker: US-003 → Verifier: US-003 AC only → pass
-Final full verify: ALL AC → pass → COMPLETE
+Worker: US-001 → Verifier(per-US): US-001 only → pass
+Worker: US-002 → Verifier(per-US): US-002 only → pass
+...
+Final Verify: opus + 5.4 high → both pass → COMPLETE
 ```
-Benefits:
-- Catch issues early, before later stories build on broken foundations
-- Smaller verification scope = faster, more accurate checks
-- Failed verification retries only the specific US
+Per-US catches issues early before later stories build on broken foundations.
 ### Batch Verification
@@ -306,30 +383,7 @@ Benefits:
 /rlp-desk run calculator --verify-mode batch
 ```
-Legacy behavior: Worker completes all stories, then a single verification checks all acceptance criteria at once.
-### Cross-Engine Consensus Verification
-```
-/rlp-desk run calculator --verify-consensus
-```
-When enabled, **both claude and codex verify independently**. Both must pass for verification to succeed.
-```
-Worker completes US → Claude verifies → Codex verifies
-  Both pass → proceed
-  Either fails → combined fix contract → Worker retry
-  3 rounds without consensus → BLOCKED
-```
-Consensus can be combined with per-US mode for maximum rigor:
-```
-/rlp-desk run calculator --verify-mode per-us --verify-consensus
-```
-Prerequisites: Both `claude` and `codex` CLIs must be installed.
+Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
 ## Project Structure
@@ -337,20 +391,42 @@ After `init`, your project gets this scaffold:
 ```
 your-project/
-└── .claude/ralph-desk/
-    ├── prompts/
-    │   ├── <slug>.worker.prompt.md
-    │   └── <slug>.verifier.prompt.md
-    ├── context/
-    │   └── <slug>-latest.md
-    ├── memos/
-    │   └── <slug>-memory.md
-    ├── plans/
-    │   ├── prd-<slug>.md
-    │   └── test-spec-<slug>.md
-    └── logs/<slug>/
-        └── status.json
-```
+├── .claude/
+│   ├── settings.local.json          # rlp-desk permissions (auto-added by init)
+│   └── ralph-desk/
+│       ├── prompts/
+│       │   ├── <slug>.worker.prompt.md
+│       │   └── <slug>.verifier.prompt.md
+│       ├── context/
+│       │   └── <slug>-latest.md
+│       ├── memos/
+│       │   └── <slug>-memory.md
+│       ├── plans/
+│       │   ├── prd-<slug>.md
+│       │   └── test-spec-<slug>.md
+│       └── logs/<slug>/
+│           └── status.json
+```
+### Local Settings
+`init` automatically adds the following permissions to `.claude/settings.local.json`:
+```json
+{
+  "permissions": {
+    "allow": [
+      "Read(.claude/ralph-desk/**)",
+      "Edit(.claude/ralph-desk/**)",
+      "Write(.claude/ralph-desk/**)"
+    ]
+  }
+}
+```
+**Why:** Claude Code treats `.claude/` files as sensitive and prompts for confirmation on each access, even with `--dangerously-skip-permissions`. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.
+**Note:** `settings.local.json` is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.
 ## Example: Calculator

package/docs/plans/cozy-gliding-trinket.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Plan: 리팩토링 실행 검증 + v05-remaining 재시작
+## Context
+Engine path refactoring Phase 0~7 완료 (38 TDD 구조적 테스트 pass).
+하지만 **실제 tmux 실행 검증**을 안 했음. 리팩토링이 실제 캠페인에서 정상 동작하는지 확인 필요.
+## 검증 순서
+### Step 1: 좀비 runner + sentinel 정리
+```bash
+ps aux | grep run_ralph_desk | grep -v grep | awk '{print $2}' | xargs kill 2>/dev/null
+for p in $(tmux list-panes -F '#{pane_id}' | grep -v '%360'); do tmux kill-pane -t "$p" 2>/dev/null; done
+rm -f .claude/ralph-desk/memos/v05-remaining-blocked.md
+rm -f .claude/ralph-desk/memos/v05-remaining-complete.md
+rm -f .claude/ralph-desk/memos/v05-remaining-done-claim.json
+rm -f .claude/ralph-desk/memos/v05-remaining-verify-verdict.json
+rm -f .claude/ralph-desk/memos/v05-remaining-iter-signal.json
+rm -f .claude/ralph-desk/logs/v05-remaining/session-config.json
+```
+### Step 2: v05-remaining 캠페인 실행 (spark worker)
+```bash
+LOOP_NAME="v05-remaining" ROOT="$PWD" MAX_ITER=15 \
+WORKER_MODEL=gpt-5.3-codex-spark WORKER_ENGINE=codex \
+WORKER_CODEX_MODEL=gpt-5.3-codex-spark WORKER_CODEX_REASONING=medium \
+VERIFIER_MODEL=sonnet VERIFIER_ENGINE=claude \
+VERIFY_MODE=per-us VERIFY_CONSENSUS=0 CB_THRESHOLD=6 \
+ITER_TIMEOUT=600 DEBUG=1 WITH_SELF_VERIFICATION=1 \
+  zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
+```
+(run_in_background=true)
+### Step 3: 검증 체크리스트
+- [ ] Pane 3개 생성됨 (leader + worker + verifier)
+- [ ] Worker pane에서 codex exec 실행됨 (bash trigger, dead pane 오판 없음)
+- [ ] Worker 완료 후 heartbeat exited → signal auto-generate
+- [ ] Verifier(sonnet) 정상 시작 + verdict 작성
+- [ ] US-002 이상 진행 (이전 US-001은 이미 verified)
+- [ ] 좀비 runner 없음 (ps 확인)
+### Step 4: 실패 시 대응
+- codex worker 시작 실패 → trigger script 내용 확인 + 수동 실행 테스트
+- verifier timeout → runner log tail + pane 상태 확인
+- BLOCKED → sentinel 원인 분석 + 수정 후 재시도
+### Step 5: 성공 시
+- 캠페인 진행 모니터링 (status 확인)
+- 완료 대기 또는 다음 세션 handoff
+## 파일
+- `src/scripts/run_ralph_desk.zsh` — 리팩토링된 runner
+- `~/.claude/ralph-desk/run_ralph_desk.zsh` — 로컬 동기화된 사본
+- `.claude/ralph-desk/logs/v05-remaining/` — 캠페인 아티팩트

package/docs/plans/toasty-whistling-diffie-agent-a6814625642e956da.md ADDED Viewed

@@ -0,0 +1,201 @@
+# Architect Review: v0.6 Refactoring Plan (RALPLAN Consensus)
+**Verdict: ITERATE** — The plan is directionally sound but has two concrete issues that must be resolved before execution.
+---
+## Summary
+The Planner's Option C (extract `lib_ralph_desk.zsh` as a shared business-logic module) is architecturally correct and the rejection of TeamCreate is well-reasoned. However, the plan underestimates two zsh-specific risks in the extraction and contains a gap in the final-verify-split proposal. I recommend proceeding with Option C after addressing the issues below.
+---
+## Analysis
+### 1. Steelman Antithesis: The Strongest Case Against Option C
+**The best argument against Option C is not that TeamCreate is better — it is that the extraction creates a maintenance burden for zero immediate user value.**
+Consider: Agent() mode (rlp-desk.md) is an LLM template, not a shell script. It does not call `get_next_model()`, `check_model_upgrade()`, `write_worker_trigger()`, or any zsh function. The Agent mode Leader is Claude Code itself, interpreting markdown instructions. There is no code to share between the two modes because one mode is shell and the other is natural language.
+The Planner claims "~1,900 lines are business logic" shareable between modes. But examining the actual functions:
+- `write_worker_trigger()` (lines 1162-1297): Constructs shell trigger scripts with heredocs embedding `$CLAUDE_BIN`, `$CODEX_BIN`, heartbeat PIDs — entirely tmux-specific.
+- `write_verifier_trigger()` (lines 1299-1389): Same pattern — generates shell trigger scripts for tmux panes.
+- `poll_for_signal()` (lines 1955-2104): Polls tmux panes, monitors heartbeats, nudges idle panes, auto-approves permission prompts via `tmux send-keys` — 100% tmux plumbing.
+- `run_single_verifier()` (lines 2276-2372): Manages tmux pane lifecycle (kill, split, reset), then launches into pane — tmux-specific.
+- `run_consensus_verification()` (lines 2393-2539): Calls `run_single_verifier()` — inherits tmux dependency.
+- `cleanup()` (lines 1807-1948): Kills tmux panes, generates campaign report — tmux lifecycle.
+- `main()` (lines 2561-3126): The entire main loop — creates tmux sessions, polls panes, manages pane lifecycle.
+The genuinely **mode-independent** functions are a smaller set than claimed:
+| Function | Lines | Truly Shareable? |
+|----------|-------|-----------------|
+| `log()` / `log_debug()` / `log_error()` | 152-165 | Yes |
+| `parse_model_flag()` | 173-192 | Yes |
+| `get_model_string()` | 219-229 | Yes |
+| `get_next_model()` | 440-469 | Yes |
+| `check_model_upgrade()` | 475-527 | Yes |
+| `atomic_write()` | 531-536 | Yes |
+| `validate_scaffold()` | 635-669 | Yes |
+| `update_status()` | 1391-1432 | Yes |
+| `write_result_log()` | 1435-1471 | Yes |
+| `archive_iter_artifacts()` | 1474-1484 | Yes |
+| `write_cost_log()` | 1487-1524 | Yes |
+| `write_campaign_jsonl()` | 1527-1558 | Yes |
+| `generate_campaign_report()` | 1561-1706 | Yes |
+| `generate_sv_report()` | 1708-1779 | Yes |
+| `compute_prd_hash()` | 2111-2121 | Yes |
+| `count_prd_us()` | 2123-2133 | Yes |
+| `split_prd_by_us()` | 2135-2158 | Yes |
+| `split_test_spec_by_us()` | 2160-2193 | Yes |
+| `check_prd_update()` | 2195-2232 | Yes |
+| `compute_context_hash()` | 2234-2250 | Yes |
+| `check_stale_context()` | 2252-2274 | Yes |
+| `inject_per_us_prd()` | 1144-1157 | Yes |
+This is roughly 700-800 lines of genuinely shareable logic, not 1,900. The rest is deeply intertwined with tmux pane management. The "~1,100 lines of business logic" claim needs recalibration.
+**But here is why the antithesis ultimately fails:** Even if Agent() mode cannot directly `source` these functions (it is an LLM, not a shell), extracting them still has value:
+1. **Testing**: The `extract_fn()` test pattern (used across all 35 test files) extracts functions from `run_ralph_desk.zsh` by awk-ing function boundaries. A dedicated `lib_ralph_desk.zsh` would make tests cleaner — `source lib_ralph_desk.zsh` instead of fragile awk extraction.
+2. **Readability**: 3,184 lines in one file is objectively hard to navigate.
+3. **Future extensibility**: If a third orchestration mode appears (e.g., Docker, SSH), the shared lib is ready.
+**Synthesis**: Option C is correct, but the extraction scope should be the ~800 lines of genuinely mode-independent logic, not the inflated ~1,900 line estimate. The tmux-entangled functions stay in `run_ralph_desk.zsh`.
+### 2. Tradeoff Tension: "Simplify" vs. "Preserve"
+The plan says it preserves both Agent() and tmux modes while "simplifying" via extraction. But there is a fundamental tension:
+**Agent() mode is an LLM interpreting markdown. Tmux mode is a shell script.** They do not share code. They share *concepts* (the governance protocol). The governance.md document IS the shared abstraction — it already serves as the "lib" for Agent mode.
+Extracting shell functions into `lib_ralph_desk.zsh` simplifies tmux mode's file organization, but does nothing to reduce the conceptual duplication between the modes. Every governance rule appears in three places:
+1. `governance.md` (the canonical spec)
+2. `rlp-desk.md` (Agent mode instructions, lines 296-555)
+3. `run_ralph_desk.zsh` (tmux mode implementation)
+The lib extraction does not reduce this triple-statement problem. If the user later changes the circuit breaker threshold logic, they must still update all three files.
+**This is not a blocking issue** — it is a tension to acknowledge in documentation. The plan should explicitly state: "lib extraction reduces file-level complexity but does not reduce specification duplication. governance.md remains the single source of truth; both modes implement it independently."
+### 3. Architecture Soundness: zsh-specific `source` Pitfalls
+The plan calls the extraction "purely mechanical (move functions, add source statement)." This is dangerously optimistic for zsh. Two concrete risks:
+**Risk A: Global variable scoping across `source` boundaries.**
+`run_ralph_desk.zsh` uses three `typeset -A` associative arrays at file scope (line 118-120):
+```
+typeset -A LAST_PANE_CONTENT
+typeset -A PANE_IDLE_SINCE
+typeset -A WORKER_RESTARTS
+```
+These are tmux-specific and would stay in `run_ralph_desk.zsh`. But 30+ other global variables (lines 47-143) — `SLUG`, `WORKER_MODEL`, `ITERATION`, `VERIFIED_US`, `CONSECUTIVE_FAILURES`, etc. — are read and mutated by functions throughout the file. After extraction:
+- `lib_ralph_desk.zsh` functions (e.g., `check_model_upgrade()` at line 475) mutate globals like `_SAME_US_FAIL_COUNT`, `_LAST_FAILED_US`, `_MODEL_UPGRADED`, `WORKER_MODEL`, `WORKER_CODEX_MODEL`, `WORKER_CODEX_REASONING`.
+- These globals are defined in `run_ralph_desk.zsh` before `source lib_ralph_desk.zsh`.
+- In zsh, `source` shares the caller's scope — globals survive across source boundaries. **This works.**
+- But `typeset` inside a function creates a **local** variable in zsh (unlike bash where `declare` in a function is local but at top-level is global). If any extracted function uses `typeset` internally, it creates a local shadow, not a global mutation. This is already the case in the current code so it is not a new problem, but the extractor must verify no `typeset` statements are accidentally introduced during the move.
+**Risk B: `local` vs. global mutation in extracted functions.**
+`check_model_upgrade()` (line 475-527) directly mutates globals: `_SAME_US_FAIL_COUNT`, `_LAST_FAILED_US`, `_MODEL_UPGRADED`, `_ORIGINAL_WORKER_MODEL`, `WORKER_MODEL`, `WORKER_CODEX_MODEL`, `WORKER_CODEX_REASONING`. After moving to `lib_ralph_desk.zsh`, these mutations will still work because zsh functions see the calling scope's globals. **But**: if someone later wraps the `source` call inside a function (e.g., `load_lib()`), the scoping changes — `typeset -A` in the sourced file would become local to `load_lib()`. The source statement must remain at the file's top level.
+**Mitigation**: Add a comment in `run_ralph_desk.zsh` line 1 area: `# IMPORTANT: source lib_ralph_desk.zsh at file scope, NOT inside a function.`
+### 4. Risk the Planner Missed: Test Breakage Pattern
+All 35 test files use the `extract_fn()` pattern (confirmed at `tests/test_engine_refactor.sh:12-14`, `tests/test_us009_api_retry_guard.sh:11-31`, `tests/test_us004_progressive_upgrade.sh:17-20`):
+```bash
+RUN="${RUN:-src/scripts/run_ralph_desk.zsh}"
+extract_fn() {
+  awk -v fn="$1" '$0 ~ "^"fn"\\(\\)" { p=1 } p { print } p && /^}/ { p=0 }' "$RUN"
+}
+```
+After extraction, functions that move to `lib_ralph_desk.zsh` will no longer be found by `extract_fn()` because `$RUN` still points to `run_ralph_desk.zsh`. The plan says "171 tests continue working with updated paths" — this requires either:
+**Option 1**: Update `$RUN` in each test to `$LIB` for functions in the lib (changes to 35 files).
+**Option 2**: Have `run_ralph_desk.zsh` physically `source` the lib, so extracting from the combined output works. But `extract_fn()` runs awk on a **file**, not on the runtime-sourced combination.
+**Option 3**: Add a `LIB="${LIB:-src/scripts/lib_ralph_desk.zsh}"` variable in each test and update `extract_fn()` to search both files.
+The Planner did not specify which approach. This is a concrete implementation detail that affects all test files and must be decided before execution. Option 3 is recommended — it is backward-compatible and minimal.
+### 5. Final Verify Split: Sequential Per-US
+The proposal to split the final ALL verify into sequential per-US checks is sound in principle — it reuses the proven per-US mechanism and avoids the monolithic timeout problem. However:
+**Gap: Cross-US integration is the entire point of the final verify.**
+The governance spec (`governance.md` lines 184-187) explicitly states:
+> Checkpoint 2: Release Readiness (us_id=ALL) — Scope: all AC + L2 integration (if applicable) + L3 E2E Simulation + L4 deploy (if applicable)
+The final ALL verify exists to catch **cross-US regressions** — e.g., US-003's changes broke US-001's tests. Sequential per-US re-verification catches per-US regressions but may miss **system-level integration** issues that only manifest when all changes interact.
+**Mitigation**: The sequential per-US checks should be followed by a lightweight integration check: run the full test suite once (not per-US scoped). If the full suite passes, COMPLETE. If it fails, the failure is already scoped to specific tests that can be debugged. This is cheap (one test run) and preserves the cross-US safety net.
+### 6. Merge Strategy: Squash Merge of 77 Commits
+Squash merge is correct for this case:
+- 77 commits include campaign iteration artifacts (iter01, iter02, ..., iter14), done-claim corrections, and verification handoffs — these are process noise, not meaningful history.
+- The feature branch is `feature/v0.4.1-tmux-sv-report` — a single feature.
+- Squash produces one clean commit on main with a clear message.
+**One caution**: Verify that `git diff main...HEAD` shows only the intended changes before squashing. Campaign-generated test artifacts or temporary files should not be included.
+---
+## Root Cause
+The plan's core weakness is not its direction (Option C is correct) but its estimation of extraction scope. The "1,900 lines of business logic" figure conflates tmux-entangled orchestration logic with genuinely mode-independent utility functions. This overestimate could lead to an extraction that either (a) tries to extract tmux-dependent code and breaks it, or (b) discovers mid-implementation that the extraction is smaller than planned and loses momentum.
+---
+## Recommendations
+1. **Recalibrate extraction scope** — LOW effort, HIGH impact. The lib should contain ~800 lines of genuinely mode-independent functions (logging, model management, scaffold validation, reporting, PRD/context utilities), not the full 1,900 claimed. Functions that call `tmux` commands or reference pane IDs stay in `run_ralph_desk.zsh`.
+2. **Decide test migration strategy** — LOW effort, HIGH impact. Before extraction, decide on Option 3 (dual-file `extract_fn`) and document it. This prevents 35 test files from breaking.
+3. **Add a source-scope guard comment** — TRIVIAL effort, MEDIUM impact. `# IMPORTANT: source at file scope, NOT inside a function` at the top of both files. Prevents future scoping bugs.
+4. **Add integration check to final verify split** — LOW effort, HIGH impact. After sequential per-US re-checks, run the full test suite once as a cross-US safety net.
+5. **Proceed with Phase 0 (npm publish v0.5) first** — as planned. Ship what exists before refactoring.
+---
+## Consensus Addendum
+### Antithesis (steelman)
+The strongest argument against Option C: Agent() mode is an LLM interpreting markdown — it will never `source lib_ralph_desk.zsh`. The extraction creates a cleaner tmux codebase but does NOT create a "shared module used by both modes." The "hybrid" framing is misleading. What this actually is: a tmux-mode-internal refactoring that splits one 3,184-line file into two files. That is still valuable, but the value proposition should be stated honestly.
+### Tradeoff tension
+**File organization simplicity vs. specification duplication**: Extracting a lib simplifies the file structure but does nothing about the triple-statement problem (governance.md + rlp-desk.md + run_ralph_desk.zsh). Every governance change still requires updating three artifacts. The real "shared module" is governance.md itself — both modes implement it from the spec. Until the architecture evolves to make governance.md machine-executable (not just human-readable), this duplication persists regardless of how many .zsh files exist.
+### Synthesis
+Accept Option C but reframe it: "tmux-mode internal refactoring" rather than "hybrid shared module." This honest framing prevents scope creep (trying to make Agent mode consume the lib) and focuses the extraction on the right ~800 lines. The long-term path to true mode unification would be making governance.md a structured schema that both modes consume programmatically — but that is v0.7+ territory, not v0.6.
+### Principle violations
+- **Estimation accuracy**: The 1,900-line extraction claim does not survive code inspection. The real shareable set is ~800 lines. This is a planning accuracy issue, not a direction issue.
+- **Test impact omission**: The plan claims "171 tests continue working with updated paths" but does not specify the mechanism. The `extract_fn()` pattern hardcodes `$RUN` pointing to one file; extraction breaks this.
+---
+## References
+- `src/scripts/run_ralph_desk.zsh:118-120` — `typeset -A` associative arrays (tmux-specific global state)
+- `src/scripts/run_ralph_desk.zsh:440-469` — `get_next_model()` (genuinely shareable business logic)
+- `src/scripts/run_ralph_desk.zsh:475-527` — `check_model_upgrade()` (shareable but mutates 7 globals)
+- `src/scripts/run_ralph_desk.zsh:1162-1297` — `write_worker_trigger()` (tmux-entangled, NOT shareable)
+- `src/scripts/run_ralph_desk.zsh:1955-2104` — `poll_for_signal()` (100% tmux plumbing)
+- `src/scripts/run_ralph_desk.zsh:2276-2372` — `run_single_verifier()` (tmux pane lifecycle)
+- `src/scripts/run_ralph_desk.zsh:2561-3126` — `main()` (tmux session management + main loop)
+- `src/governance.md:184-187` — Checkpoint 2 Release Readiness scope (cross-US integration)
+- `src/governance.md:300-374` — Agent mode (§5a) and Tmux mode (§5b) architecture
+- `src/commands/rlp-desk.md:296-460` — Agent mode Leader loop (LLM instructions, not shell code)
+- `tests/test_engine_refactor.sh:6-14` — `extract_fn()` pattern with `$RUN` hardcoded
+- `tests/test_us009_api_retry_guard.sh:4-31` — Same pattern with more complex harness

package/docs/plans/toasty-whistling-diffie.md ADDED Viewed

@@ -0,0 +1,117 @@
+# Remaining Work Plan — rlp-desk Post v0.5
+**Created**: 2026-03-30
+**Updated**: 2026-03-30 (Phase 1-4 구현 완료, 커밋 대기)
+**Branch**: main (v0.5.0, 커밋 대기 변경 +89 lines)
+---
+## Context
+v0.5.0 코드는 main에 머지 + push 완료. npm publish와 gh release만 남음. lib_ralph_desk.zsh 추출 완료 (internal refactoring, semver 변경 불필요). 이 계획은 master issue list의 미해결 항목 전체를 다룸.
+---
+## Phase 0: npm publish v0.5.0 (보류 — 유저 요청 시)
+1. `gh release create v0.5.0` (user-facing release notes)
+2. `npm publish`
+3. Local file sync 확인
+---
+## Phase 1: 검증 필요 항목 (구현됨, 실전 테스트 미완)
+### A14/A15: init --mode improve (test-spec 보존 + sentinel 정리)
+- **상태**: v05 캠페인에서 구현, test_a14a15_init_improve.sh 존재
+- **필요**: 실제 `--mode improve` 시나리오 수동 테스트로 동작 확인
+- **파일**: `src/scripts/init_ralph_desk.zsh`
+### A18: zombie runner lockfile
+- **상태**: lockfile 로직 구현됨 (8 references in run_ralph_desk.zsh)
+- **필요**: 실전 캠페인에서 중복 실행 방지 검증
+- **파일**: `src/scripts/run_ralph_desk.zsh`
+---
+## Phase 2: HIGH 우선순위 이슈
+### A10: "edit its own settings" permission prompt 블로킹
+- **문제**: Claude Code가 자체 settings 수정 시 permission 프롬프트 발생 → Worker 블로킹
+- **근본 원인**: `--dangerously-skip-permissions`로도 우회 불가
+- **접근**: Claude Code 측 해결 대기 or Worker prompt에 settings 수정 금지 규칙 강화
+- **파일**: `src/commands/rlp-desk.md` (Worker prompt), `src/governance.md`
+- **크기**: SMALL (prompt 변경만)
+### C4: /rlp-desk status 상세 보고
+- **문제**: 현재 status가 빈약 — 현재 US, 시도 횟수, 실패 원인, 실패 주체 미표시
+- **접근**: status.json에 이미 필드 존재 → rlp-desk.md status 서브커맨드에서 파싱 + 표시
+- **파일**: `src/commands/rlp-desk.md` (status 섹션)
+- **TDD**: `tests/test_status_detail.sh` 신규
+- **크기**: MEDIUM
+### B3/B4: 런타임 per-US document splitting
+- **문제**: init에서 PRD/test-spec 분할은 완료됐지만, run 중 Worker prompt에 해당 US만 주입하는 로직 미완
+- **접근**: write_worker_trigger()에서 per-US PRD/test-spec 파일 존재 시 해당 파일만 inject
+- **파일**: `src/scripts/run_ralph_desk.zsh` (write_worker_trigger), `src/scripts/lib_ralph_desk.zsh` (inject_per_us_prd 이미 존재 확인 필요)
+- **TDD**: 기존 test_us002_perus_inject.sh 확장
+- **크기**: MEDIUM
+---
+## Phase 3: MEDIUM 우선순위 이슈
+### A16: tmux foreground 실행 충돌
+- **문제**: run_ralph_desk.zsh를 foreground로 실행하면 Claude Code pane과 충돌
+- **접근**: rlp-desk.md에서 run_in_background 필수 명시 + foreground 감지 시 경고
+- **파일**: `src/commands/rlp-desk.md`, `src/scripts/run_ralph_desk.zsh`
+- **크기**: SMALL
+### D1/D2: rlp-desk resume 서브커맨드
+- **문제**: 캠페인 중단 후 재시작 시 verified_us 복원 안 됨
+- **접근**: status.json에서 verified_us 읽어 복원 + resume 서브커맨드 추가
+- **파일**: `src/commands/rlp-desk.md` (resume 섹션), `src/scripts/run_ralph_desk.zsh` (--resume 플래그)
+- **TDD**: `tests/test_resume.sh` 신규
+- **크기**: MEDIUM
+---
+## Phase 4: LOW 우선순위 / Backlog
+### A5: Rate limit 후 pane 오염 — ✅ 구현됨 (미커밋)
+- poll_for_signal에서 "queued messages" 감지 시 pane C-c + /exit 자동 실행
+### C3: Agent mode campaign.jsonl — ✅ 구현됨 (미커밋)
+- rlp-desk.md ⑧ 섹션에 campaign.jsonl APPEND 지시 추가
+### F8: --consensus-fail-fast — ✅ 구현됨 (미커밋)
+- CONSENSUS_FAIL_FAST 환경변수 + claude fail 시 codex skip 로직
+### F9: rlp-desk analytics 서브커맨드 — ✅ 스텁 추가 (미커밋)
+- rlp-desk.md에 analytics 서브커맨드 문서화 (실제 구현은 Agent mode LLM이 해석)
+### A17: logs/ 디렉토리 구조 리팩토링 — ❌ 미착수
+- **크기**: LARGE (경로 참조 수십 곳 변경)
+- **다음 세션으로 보류**
+---
+## 실행 순서 (권장)
+```
+Phase 0: npm publish (유저 요청 시)
+Phase 1: A14/A15 + A18 실전 검증 (수동 테스트, 코드 변경 없음)
+Phase 2: C4 → B3/B4 → A10 (순서대로, 각각 독립)
+Phase 3: A16 → D1/D2
+Phase 4: Backlog (필요 시)
+```
+Phase 2의 C4, B3/B4, A10은 독립적이므로 병렬 가능.
+---
+## Verification
+- 각 Phase 완료 후: `for f in tests/test_*.sh; do bash "$f" || exit 1; done`
+- 신규 기능: TDD (test 먼저, RED 확인, 구현, GREEN 확인)
+- CLAUDE.md 규칙: 커밋 전 유저 승인, npm publish 전 유저 승인

package/docs/prompts/ralplan-codex-review.md CHANGED Viewed

@@ -4,7 +4,7 @@
 /ralplan {{OBJECTIVE}}
 {{SCOPE}}
 Run codex cross-validation after consensus. Repeat revise -> consensus -> codex until 0 issues.
-If source documents are insufficient, identify gaps before proceeding.
+If the code, requirements, or source documents are insufficient or unclear, identify the gaps before proceeding.
 ```
 ---

package/install.sh CHANGED Viewed

@@ -40,6 +40,11 @@ echo "  Downloading tmux runner script..."
 curl -sSL "$REPO_URL/src/scripts/run_ralph_desk.zsh" -o "$DESK_DIR/run_ralph_desk.zsh"
 chmod +x "$DESK_DIR/run_ralph_desk.zsh"
+# Download shared business logic library
+echo "  Downloading shared library..."
+curl -sSL "$REPO_URL/src/scripts/lib_ralph_desk.zsh" -o "$DESK_DIR/lib_ralph_desk.zsh"
+chmod +x "$DESK_DIR/lib_ralph_desk.zsh"
 # Download governance protocol
 echo "  Downloading governance protocol..."
 curl -sSL "$REPO_URL/src/governance.md" -o "$DESK_DIR/governance.md"