@ai-dev-methodologies/rlp-desk 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,84 @@
1
+ # Blueprint: Self-Verification Architecture Rethink (§7¾ escalation)
2
+
3
+ **Status**: backlog (escalation candidate, not yet planned for implementation)
4
+ **Filed by**: RC-1 patch (fix/rc1-tmux-sv-skip-and-rc2-prd-cross-us-lint, 2026-04-25)
5
+ **Escalation trigger**: governance §7¾ Architecture Escalation
6
+
7
+ ## Why this is here
8
+
9
+ RC-1 (tmux SV report 5-min hang) was fixed by **disabling** `generate_sv_report`
10
+ in tmux runners and recording the disable on three channels (session-config,
11
+ metadata.json, debug log) so traceability is preserved. That patch is honest
12
+ but it does not answer the deeper question:
13
+
14
+ > Should the Self-Verification report be produced by spawning `claude --print`
15
+ > at all? And if not, what should replace it?
16
+
17
+ The current shell implementation (`src/scripts/lib_ralph_desk.zsh:603-668`)
18
+ spawns `claude --print` against a tmux pane with no usable TTY/stdin and waits
19
+ for a 300-second watchdog. Even in Agent mode the cost surface is large:
20
+ synchronous LLM call, no caching, no determinism, and no obvious story for
21
+ running SV against historical campaigns offline.
22
+
23
+ Per governance §7¾ this counts as an **architecture question**, not a patch
24
+ target. Open it as a blueprint, gather options, and decide before touching the
25
+ shell function again.
26
+
27
+ ## Options to evaluate
28
+
29
+ 1. **Static SV (no LLM call)**.
30
+ - Replace the LLM-generated report with a deterministic Node module that
31
+ reads `iter-*.json`, `cost-log.jsonl`, `verify-verdict.json`, etc., and
32
+ emits the 10-section report from templates + counts.
33
+ - Pros: no spawn, sub-second runtime, runs offline against archived
34
+ campaigns, cacheable.
35
+ - Cons: loses qualitative analysis ("Worker over-engineered" / "Verifier
36
+ rubber-stamped"). The 10 sections are not all reducible to counts.
37
+ 2. **Hybrid: static skeleton + on-demand LLM section.**
38
+ - Static skeleton always runs at campaign end. The qualitative sections
39
+ ("Worker Process Quality", "Verifier Judgment Quality", "Patterns:
40
+ Strengths & Weaknesses", "Recommendations for Next Cycle", "Blind Spots")
41
+ become an opt-in `rlp-desk sv enrich <slug>` command run from a real TTY
42
+ (Agent mode or interactive shell).
43
+ - Pros: best of both. tmux runner gets a useful baseline report; the user
44
+ can opt into the LLM pass when convenient.
45
+ - Cons: two code paths to maintain.
46
+ 3. **Move SV out of the runner entirely.**
47
+ - rlp-desk produces only the artifacts (already does). SV becomes a
48
+ dedicated `rlp-desk sv <slug>` command that the user invokes from any
49
+ working session.
50
+ - Pros: cleanest separation. Runner stays small; SV evolves independently.
51
+ - Cons: discoverability — users may forget to run it. Requires a small UX
52
+ for "you should run sv against this slug" hints in the campaign report.
53
+ 4. **Keep the LLM spawn but fix it for tmux.**
54
+ - Negotiate with claude CLI for a non-interactive, no-TTY mode that streams
55
+ stdout and exits cleanly. Possibly via `--output-format json` plus
56
+ explicit stdin closure.
57
+ - Pros: smallest surface change.
58
+ - Cons: depends on Claude CLI semantics that may shift between releases.
59
+ RC-1 already showed how brittle this is.
60
+
61
+ ## Open questions
62
+
63
+ - What sections of the current 10-section SV are actually consumed by humans?
64
+ (Metric sections vs qualitative sections vs blind spots — gather telemetry.)
65
+ - Is there a way to reuse the verifier model itself (already running per
66
+ iteration) to emit a "campaign-level" verdict at the end, rather than a
67
+ separate spawn?
68
+ - Should SV be cross-engine consensus too, or is single-model SV enough?
69
+
70
+ ## Decision pending
71
+
72
+ Open. The current RC-1 patch is the **interim** answer: SV is disabled in
73
+ tmux, traceability is preserved through `with_self_verification_requested` +
74
+ `sv_skipped_reason`, and the user gets an honest "requested but skipped"
75
+ banner in the Campaign Report. This blueprint will be promoted to a plan when
76
+ someone takes ownership of the architecture question above.
77
+
78
+ ## References
79
+
80
+ - `src/scripts/lib_ralph_desk.zsh:603-668` — current `generate_sv_report` impl.
81
+ - `src/scripts/run_ralph_desk.zsh:719-727, 758-768, 2118-2125, 2151` — RC-1 patch.
82
+ - `src/node/reporting/campaign-reporting.mjs:412-591` — Node-side SV summary
83
+ builder (already pure file I/O — promising starting point for option 1/2).
84
+ - governance §7¾ Architecture Escalation — escalation policy this entry uses.
@@ -0,0 +1,154 @@
1
+ # Multi-Mission Orchestration Patterns
2
+
3
+ rlp-desk runs **one mission per `run_ralph_desk.zsh` invocation**. The runner is
4
+ intentionally single-purpose: it loads a slug's PRD, executes the per-US loop,
5
+ writes a sentinel (`COMPLETE` or `BLOCKED`), and exits. Anything that needs to
6
+ coordinate **multiple missions** in sequence — for example a flywheel that
7
+ runs `axis-A → axis-B → measurement → improve` — is the responsibility of a
8
+ **wrapper script** owned by the consumer.
9
+
10
+ This document explains the contract between rlp-desk and a wrapper, with a
11
+ small worked example.
12
+
13
+ ## Why no built-in chain
14
+
15
+ Hard-coding mission sequences inside rlp-desk would couple the runner to a
16
+ particular project's idea of "what comes next." Different consumers have
17
+ different goals (fixed-length improvement campaigns, indefinite uptime, mission
18
+ graphs branching on metrics). A wrapper layer keeps rlp-desk focused on the
19
+ single-mission contract while letting each consumer encode its own policy.
20
+
21
+ ## Per-mission outputs the wrapper can read
22
+
23
+ After every campaign rlp-desk writes the following artifacts under
24
+ `<ROOT>/.claude/ralph-desk/`:
25
+
26
+ | Path | Purpose |
27
+ |---|---|
28
+ | `memos/<slug>-blocked.md` | Sentinel for a BLOCKED outcome. First line is `BLOCKED: <us_id>`; second line is `Reason: <verdict reason>` (governance §1f BLOCKED Surfacing). |
29
+ | `memos/<slug>-complete.md` | Sentinel for a COMPLETE outcome. |
30
+ | `memos/<slug>-iter-signal.json` | Last worker signal (status, us_id, summary). |
31
+ | `memos/<slug>-memory.md` | Campaign memory accumulated across iterations. |
32
+ | `memos/<slug>-flywheel-signal.json` | When flywheel ran, the direction it picked. May contain a `next_mission_candidate` field that the wrapper can use to decide what to launch next. |
33
+ | `logs/<slug>/metadata.json` | One-line summary of the campaign config, including `with_self_verification`, `with_self_verification_requested`, and `sv_skipped_reason` (RC-1). |
34
+
35
+ **Exit-code contract (per entry point — read carefully):**
36
+
37
+ | Entry point | Exit 0 | Exit 2 | Exit 1 |
38
+ |---|---|---|---|
39
+ | `src/node/run.mjs` (Node, agent mode) | clean COMPLETE | BLOCKED (verifier or model-upgrade-exhausted), reason on stderr | unhandled error (e.g. unknown flag) |
40
+ | `src/scripts/init_ralph_desk.zsh` | scaffold OK | PRD cross-US lint reject (per-us mode), violations on stderr | scaffold incomplete or input error |
41
+ | `src/scripts/run_ralph_desk.zsh` (zsh, tmux mode) | clean exit (sentinel decides COMPLETE/BLOCKED) | not used | any failure path; wrappers must inspect the sentinel files to tell COMPLETE from BLOCKED |
42
+
43
+ The Node entry surfaces the BLOCKED reason on **stderr** with **exit code 2**.
44
+ The zsh runner does **not** distinguish exit codes for COMPLETE vs BLOCKED;
45
+ inspect the sentinel files (`memos/<slug>-{complete,blocked}.md`) instead.
46
+ PRD lint rejection (init exit 2) is the only path that uses exit 2 in the
47
+ zsh side.
48
+
49
+ ## Minimal wrapper recipe (zsh)
50
+
51
+ The recipe below polls a fixed mission list, launches each one, and stops on
52
+ the first BLOCKED. It demonstrates the contract without prescribing a policy.
53
+
54
+ ```zsh
55
+ #!/usr/bin/env zsh
56
+ set -u
57
+ set -o pipefail
58
+
59
+ ROOT="${ROOT:-$PWD}"
60
+ DESK="$ROOT/.claude/ralph-desk"
61
+ MISSIONS=(
62
+ "axis-1-baseline"
63
+ "axis-2-improve"
64
+ "axis-3-measurement"
65
+ )
66
+
67
+ for SLUG in "${MISSIONS[@]}"; do
68
+ # Skip missions that already finished (idempotent re-runs).
69
+ if [[ -f "$DESK/memos/$SLUG-complete.md" ]]; then
70
+ print "Skipping $SLUG — already COMPLETE"
71
+ continue
72
+ fi
73
+ if [[ -f "$DESK/memos/$SLUG-blocked.md" ]]; then
74
+ print "Stopping chain — $SLUG is BLOCKED:"
75
+ cat "$DESK/memos/$SLUG-blocked.md"
76
+ exit 2
77
+ fi
78
+
79
+ print "Launching $SLUG"
80
+ ROOT="$ROOT" \
81
+ WORKER_MODEL="${WORKER_MODEL:-gpt-5.5:medium}" \
82
+ VERIFIER_MODEL="${VERIFIER_MODEL:-opus}" \
83
+ VERIFY_MODE="${VERIFY_MODE:-per-us}" \
84
+ CB_THRESHOLD="${CB_THRESHOLD:-6}" \
85
+ zsh ~/.claude/ralph-desk/run_ralph_desk.zsh "$SLUG"
86
+ rc=$?
87
+
88
+ # zsh runner: rc tells you "did the script crash", not COMPLETE vs BLOCKED.
89
+ # Read the sentinel for the actual terminal state.
90
+ if [[ -f "$DESK/memos/$SLUG-complete.md" ]]; then
91
+ print "$SLUG completed cleanly"
92
+ elif [[ -f "$DESK/memos/$SLUG-blocked.md" ]]; then
93
+ print "$SLUG blocked — stopping chain"
94
+ cat "$DESK/memos/$SLUG-blocked.md"
95
+ exit 2
96
+ else
97
+ print "$SLUG ended without sentinel (rc=$rc) — stopping chain"
98
+ exit "${rc:-1}"
99
+ fi
100
+ done
101
+ ```
102
+
103
+ Three design notes:
104
+
105
+ - The wrapper checks the **sentinel files first** — both before and after
106
+ invoking the runner. This makes re-runs idempotent (finished missions are
107
+ skipped) and accommodates the zsh runner's lack of a distinct
108
+ COMPLETE-vs-BLOCKED exit code.
109
+ - For the **Node runner** (`src/node/run.mjs`, agent mode) the recipe can be
110
+ simpler: switch on `$rc` directly because Node uses exit 2 specifically for
111
+ blocked outcomes. The exit-code table above lists which entry point uses
112
+ which convention.
113
+ - `init_ralph_desk.zsh`'s exit 2 (PRD lint reject) is the wrapper's only
114
+ pre-launch fail-fast signal. Treat it the same as a blocked sentinel — the
115
+ campaign will not start until the PRD is fixed.
116
+
117
+ ## Flywheel-driven dynamic chain (optional)
118
+
119
+ **Emit side (rlp-desk responsibility)**: when a mission runs the flywheel
120
+ review (`--flywheel on-fail`), the flywheel agent's signal JSON
121
+ (`memos/<slug>-flywheel-signal.json`) MAY include an optional
122
+ `next_mission_candidate` field — `null` for "no recommendation" or a slug
123
+ string for "consumer should chain this slug next." The Node leader
124
+ propagates this field into `status.json` (`status.next_mission_candidate`)
125
+ so wrappers can poll either file. The flywheel prompt template
126
+ (`init_ralph_desk.zsh` flywheel heredoc) and `governance.md` §7 ⑥½ both
127
+ document the field. Field is OPTIONAL and absence is treated as `null` —
128
+ backward-compat with prior flywheel signals.
129
+
130
+ **Consumer side (wrapper responsibility)**: pick the next slug from that
131
+ field instead of a fixed list:
132
+
133
+ ```zsh
134
+ NEXT_SLUG=$(jq -r '.next_mission_candidate // empty' \
135
+ "$DESK/memos/$SLUG-flywheel-signal.json" 2>/dev/null)
136
+ if [[ -n "$NEXT_SLUG" ]]; then
137
+ # Recurse or push onto the queue. Apply your own policy:
138
+ # - de-dupe against an already-launched set,
139
+ # - cap chain length to avoid runaway loops,
140
+ # - require `axis-history.json` distance to avoid revisiting.
141
+ fi
142
+ ```
143
+
144
+ `next_mission_candidate` is advisory only. Wrapper authors should still apply
145
+ guardrails (max chain length, distance-from-history checks, manual approval
146
+ gates) before consuming it.
147
+
148
+ ## Non-goals (explicitly)
149
+
150
+ - A built-in `rlp-desk auto-chain --slug-prefix … --max-missions N` command is
151
+ **not** in scope. It would re-introduce the coupling we are trying to avoid.
152
+ If you want one, build it as a small wrapper and share it with the community.
153
+ - rlp-desk does not validate mission ordering or dependency graphs. The wrapper
154
+ owns this policy.
@@ -0,0 +1,352 @@
1
+ # rlp-desk 0.11 — Handoff Final 7-fix bundle (ralplan v3)
2
+
3
+ > v3 changes: NEW-1 (bash→zsh fixture invocation) + NEW-2 (early-exit grep broadened) Architect executor follow-ups 흡수.
4
+ > v2 changes (Architect + Critic codex iteration): PR split A/B 결정, R7 schema fallback, R8 helper-side guard, R9 reason canonicalization + edge cases, R10 normalized US extractor + quarantine (not rm), R11 early-exit grep inventory + trap, self-verification mechanical assertion 패치.
5
+
6
+ ## Context
7
+
8
+ 소비자 Final Handoff (`coordination/handoffs/2026-04-25-rlp-desk-final-status-and-handoff.md`) timestamp evidence 기반 7건 결함:
9
+
10
+ | ID | Severity | 결함 | Root file |
11
+ |---|---|---|---|
12
+ | P0-D | HIGH | A4 fallback 83% 빈발 (worker iter-signal 누락) | `run_ralph_desk.zsh:1587-1595`, `:526-546` |
13
+ | P1-F | MEDIUM | test-spec ≥3 tests/AC IL-4 자가모순 | `init_ralph_desk.zsh` test-spec gen + ingest |
14
+ | P1-G | MEDIUM | partial_verify signal vocabulary 부재 | `init_ralph_desk.zsh:448-454` Signal rules + verifier |
15
+ | P1-H | MEDIUM | blocked 시 memory.md/latest.md 미갱신 | worker prompt blocked exit hygiene |
16
+ | P2-I | MEDIUM | block ≠ failure → contract defect silent 12-iter | `run_ralph_desk.zsh:2659` consecutive_blocks 신규 |
17
+ | P2-J | MEDIUM | final ALL verify cross-mission us_id leak | `run_ralph_desk.zsh:2198/2425-2429` US_LIST scope |
18
+ | P2-K | LOW | cost-log 비어있음 (tmux mode) | `lib_ralph_desk.zsh:367` write_cost_log call coverage |
19
+
20
+ ## PR 분할 결정 (v2)
21
+
22
+ Architect 권고에 따라 **PR-A(protocol) + PR-B(runtime) 2-PR 분할** 채택. 사용자가 "단일 PR" 명시한 경우에도 R7 schema collision (R3 와 silent fallback 위험) 때문에 분리 필요.
23
+
24
+ - **PR-A (protocol/contract)**: R5 + R6 + R7 + governance §1f/§7f/§7g + us017/us018/us019
25
+ - **PR-B (runtime/state)**: R8 + R9 + R10 + R11 + governance §8/§7a + us020/us021/us022/us023
26
+ - 자가검증 mapping 시나리오는 양 PR 모두 포함 (각 PR 의 fix 만 evaluate). 최종 self-verification (7/7) 은 PR-B merge 후 별도.
27
+
28
+ 단, 사용자 직접 "단일 PR" 재요청 시 single PR 로 진행하되 self-verification 시나리오를 더 강화 (per-row mechanical assertion 필수).
29
+
30
+ ## RALPLAN-DR
31
+
32
+ **Principles** (4):
33
+ 1. **Fail loud, not silent** — A4 fallback / block-as-success / cross-mission leak / cost-log silence 모두 silent failure 패턴.
34
+ 2. **Backward-compat first** — verify_partial 신규 status 의 기존 wrapper malformed 처리 명시. test-spec lint warn-then-strict 단계 진화.
35
+ 3. **Minimal blast radius** — PR split + per-fix helper 분리. 각 fix 의 회귀는 독립 us_test.
36
+ 4. **Self-verification mechanical** — 변경 사항 X가 자가검증 시나리오 Y에서 실제 트리거되었음을 grep+exit-code 로 증명.
37
+
38
+ **Decision Drivers**:
39
+ 1. consumer wrapper 가 동일 패턴(83% A4 fallback, cross-mission leak, contract defect silent loop) 재발 차단.
40
+ 2. 7-mission autonomous run 후 debug.log [FLOW] events 가 의미 있는 summary 보유 + audit log auditable.
41
+ 3. cost-log 빈 파일 = "broken logging" 분류 가능, audit pipeline 신뢰성.
42
+
43
+ **Viable Options 비교**:
44
+
45
+ (아래 옵션 비교 v1 과 동일하나 Critic ITERATE 흡수 패치 추가)
46
+
47
+ - **R7 verify_partial schema malformed 처리 (Architect issue #2)**: `verify_partial` 인데 `verified_acs` 미존재/빈 배열 → `status='blocked'`, `reason='verify_partial_malformed'` 으로 다운그레이드. Worker autonomy 위배 차단.
48
+ - **R8 helper-side guard (Critic R8 + Architect issue #3)**: Verifier 의 mtime check 만으로 부족. `write_blocked_sentinel` 자체에 hygiene check 추가 — memory.md/latest.md mtime 이 sentinel 작성 시각보다 오래됐으면 (즉 worker 가 hygiene update 안 했으면) sentinel JSON 에 `meta.blocked_hygiene_violated=true` 자동 첨부. Worker 가 잊어도 verifier 가 즉시 인지.
49
+ - **R9 reason canonicalization (Architect issue #3)**: `_canonical_block_reason()` helper — hygiene wrapper prefix("hygiene_violated:", "wrapped:") strip 후 비교. R8 hygiene_violated 가 R9 counter 우회 차단.
50
+ - **R9 edge cases (Critic R9)**: 첫-iter block / mission setup block 은 `infra_failure` reason 으로 분류된 경우 counter 증가 안 함 (mission abort 부적절). 명시 exempt.
51
+ - **R10 normalized extractor + quarantine (Architect issue #4 + Critic R10)**: `grep -qE "^## $stale_us[: ]"` 대신 `awk '/^##[[:space:]]+(US-[0-9]+)([[:space:]:-]|$)/'` 로 정규화 추출 (PRD heading variation 대응). `rm -f` 대신 `mv` to `.sisyphus/quarantine/` (silent destructive 차단).
52
+ - **R11 trap-based final write (Architect issue #6 + Critic R11)**: init placeholder 폐기. zsh `trap 'write_cost_log "$ITERATION" || true' EXIT` 추가 + early-exit path grep inventory 회귀로 보장.
53
+ - **Self-verification per-row functions (Architect issue #5 + Critic Self-V)**: 단일 monolithic script 대신 7 함수 (`test_r5_a4_audit_triggered`, …) + 각 함수 내 pre/post 카운터 + grep 로 변경 함수 호출 증명.
54
+
55
+ ---
56
+
57
+ ## 해결 계획 (v2 patches highlighted)
58
+
59
+ ### Fix R5: P0-D — A4 fallback 추적 + worker prompt 강화
60
+
61
+ **대상**:
62
+ 1. `src/scripts/run_ralph_desk.zsh:1587-1595` + `:526-546` — A4 fallback 발동 시 audit log entry 작성 (`a4-fallback-audit.jsonl`, append).
63
+ 2. `src/scripts/init_ralph_desk.zsh` worker prompt — "Step N+1 (mandatory)" 추가 + auto-generated summary penalty 명시.
64
+ 3. Verifier prompt — A4 fallback summary detection 시 verdict.meta.iter_signal_quality='auto_generated'.
65
+ 4. governance §1f — A4 ratio 권고 (per-mission < 10%).
66
+
67
+ **검증 (us017) — Critic R5 patch 흡수**:
68
+ - AC1: a4-fallback-audit.jsonl entry 작성 (zsh fixture)
69
+ - **AC1+ (Critic R5)**: pre_count=$(wc -l a4-fallback-audit.jsonl), trigger fixture, assert post_count > pre_count + ratio 계산 정확.
70
+ - AC2: worker prompt grep "Step N+1" + "iter-signal.json with SPECIFIC summary" 존재
71
+ - AC3: governance §1f 에 "A4 ratio < 10%" 권고 텍스트 + 측정 방법 명시
72
+ - AC4 (신규): Verifier prompt 에 "auto_generated" detection 문장 + meta field 명시
73
+
74
+ ### Fix R6: P1-F — test-spec ≥3/AC enforcement (warn default + strict opt-in)
75
+
76
+ **대상**:
77
+ 1. `src/scripts/init_ralph_desk.zsh` — `_lint_test_density()` helper:
78
+ - PRD AC count 추출 (per-US, `^- AC[0-9]+:` regex)
79
+ - test-spec test count 추출 (per-US, `^### Test ` 또는 `^\*\*T-` 헤더 카운트)
80
+ - ratio < 3 시: WARN(default) → log_warn + audit + **init exit message 마지막에 summary 표시 (Critic R6 patch)**; STRICT(`--test-density-strict`) → exit 1.
81
+ 2. `src/scripts/run_ralph_desk.zsh` + `src/node/run.mjs` — `--test-density-strict` flag stub.
82
+ 3. governance §7f — Test Density Enforcement (WARN+STRICT decision tree).
83
+ 4. Worker prompt — "≥3 tests/AC (happy + negative + boundary) 강제" 강화.
84
+
85
+ **검증 (us018) — Critic R6 patch 흡수**:
86
+ - AC1: `--test-density-strict` 플래그 파싱 (zsh + Node)
87
+ - AC2: WARN default — ratio<3 fixture 에서 init exit=0 + audit log entry **+ stderr/stdout 마지막 라인에 "Test density warning: US-XXX has N tests for M ACs (ratio=N/M < 3)" 메시지 포함**
88
+ - AC3: STRICT — ratio<3 fixture 에서 init exit=1 + 동일 메시지
89
+ - AC4: governance §7f 텍스트 정합 (Decision tree, downgrade 없음)
90
+
91
+ ### Fix R7: P1-G — verify_partial signal vocabulary
92
+
93
+ **대상 (Critic R7 + Architect issue #2 patches)**:
94
+ 1. `src/scripts/init_ralph_desk.zsh:448` Signal rules — verify_partial + 필수 필드 명시.
95
+ 2. `src/scripts/init_ralph_desk.zsh build_verifier_prompt` 함수 (or equivalent prompt heredoc) — 정확 문장 추가:
96
+ ```
97
+ If signal status=verify_partial, evaluate ONLY verified_acs. Treat deferred_acs as out-of-scope (not fail).
98
+ ```
99
+ 3. `src/node/runner/campaign-main-loop.mjs` 신호 파싱 — verify_partial + verified_acs 미존재/빈 배열 시:
100
+ ```js
101
+ if (signalStatus === 'verify_partial' && (!Array.isArray(signal.verified_acs) || signal.verified_acs.length === 0)) {
102
+ // Downgrade to blocked
103
+ await writeSentinel(blockedSentinel, 'blocked', usId, 'verify_partial_malformed', { reason_category: 'mission_abort', recoverable: true, suggested_action: 'retry_after_fix' });
104
+ continue;
105
+ }
106
+ ```
107
+ 4. `src/scripts/run_ralph_desk.zsh:1313+` — verify_partial 동등 처리 (zsh 측 fallback).
108
+ 5. governance §7g 신규 — Signal Vocabulary Extension + malformed downgrade 명시.
109
+
110
+ **검증 (us019)**:
111
+ - AC1: Signal rules grep verify_partial + verified_acs/deferred_acs/defer_reason
112
+ - AC2: governance §7g 정합 + malformed downgrade 명시
113
+ - AC3: Node 파서 verify_partial→verified_acs 만 verifier prompt 전달 (behavioural fixture)
114
+ - AC4: zsh 파서 verify_partial 인지
115
+ - **AC5 (Architect issue #2)**: malformed fixture (verify_partial + verified_acs=[]) → blocked sentinel 작성 + reason='verify_partial_malformed' + reason_category='mission_abort'
116
+ - **AC6 (Critic R7)**: Verifier prompt 에 정확 sentence 존재 (grep)
117
+
118
+ ### Fix R8: P1-H — Blocked exit hygiene + helper-side guard
119
+
120
+ **대상 (Critic R8 + Architect issue #3 patches)**:
121
+ 1. `src/scripts/init_ralph_desk.zsh` worker prompt — Blocked exit hygiene 섹션:
122
+ > "On blocked exit (status=blocked): BEFORE writing iter-signal.json, ALWAYS append to memory.md § Blocking History `{iter, us, reason, suggested_repair}` AND update latest.md § Known Issues."
123
+ 2. **`src/scripts/lib_ralph_desk.zsh:write_blocked_sentinel` (Critic R8 patch)** — sentinel write 직전 hygiene check:
124
+ ```zsh
125
+ local hygiene_violated=false
126
+ local mem_file="$DESK/memos/$SLUG-memory.md"
127
+ local lat_file="$DESK/context/$SLUG-latest.md"
128
+ local now_ts=$(date +%s)
129
+ for f in "$mem_file" "$lat_file"; do
130
+ if [[ -f "$f" ]]; then
131
+ local f_mtime=$(stat -f %m "$f" 2>/dev/null || stat -c %Y "$f" 2>/dev/null || echo 0)
132
+ if (( now_ts - f_mtime > 300 )); then
133
+ hygiene_violated=true
134
+ break
135
+ fi
136
+ fi
137
+ done
138
+ ```
139
+ JSON sidecar 에 `meta.blocked_hygiene_violated=$hygiene_violated` 자동 첨부.
140
+ 3. `src/node/runner/campaign-main-loop.mjs` `_checkBlockedHygiene()` helper — blocked write 시 동등 검사 + analytics event.
141
+ 4. governance §1f — "5th channel: memory.md/latest.md hygiene update" 추가 (4 channels → 5 channels).
142
+
143
+ **검증 (us020)**:
144
+ - AC1: Worker prompt grep "Blocked exit hygiene" + "memory.md" + "latest.md"
145
+ - AC2: governance §1f grep "5th channel" + "memory.md/latest.md hygiene"
146
+ - AC3: Node helper `_checkBlockedHygiene` 정의 (grep)
147
+ - AC4: behavioural — fixture: stale memory.md (mtime > 5min ago) → blocked sentinel JSON sidecar 의 meta.blocked_hygiene_violated=true
148
+ - **AC5 (Critic R8)**: lib_ralph_desk.zsh write_blocked_sentinel 에 hygiene_violated 자동 첨부 grep + behavioural fixture
149
+
150
+ ### Fix R9: P2-I — consecutive_blocks counter + canonicalization + edge cases
151
+
152
+ **대상 (Critic R9 + Architect issue #3 patches)**:
153
+ 1. `src/scripts/run_ralph_desk.zsh` 변수:
154
+ ```zsh
155
+ CONSECUTIVE_BLOCKS=0
156
+ LAST_BLOCK_REASON=""
157
+ BLOCK_CB_THRESHOLD="${BLOCK_CB_THRESHOLD:-3}"
158
+ ```
159
+ 2. **`_canonical_block_reason()` helper (Architect issue #3)**:
160
+ ```zsh
161
+ _canonical_block_reason() {
162
+ local raw="$1"
163
+ # Strip wrapper prefixes
164
+ echo "$raw" | sed -E 's/^(hygiene_violated:|wrapped:)//' | head -c 80
165
+ }
166
+ ```
167
+ 3. **Edge case exemption (Critic R9)** — `infra_failure` category 또는 첫 iter block 은 counter 증가 안 함:
168
+ ```zsh
169
+ if [[ "$reason_category" == "infra_failure" ]] || (( ITERATION <= 1 )); then
170
+ # Exempt from consecutive_blocks
171
+ LAST_BLOCK_REASON=""
172
+ CONSECUTIVE_BLOCKS=0
173
+ else
174
+ local canonical=$(_canonical_block_reason "$reason")
175
+ if [[ "$canonical" == "$LAST_BLOCK_REASON" ]]; then
176
+ CONSECUTIVE_BLOCKS=$((CONSECUTIVE_BLOCKS + 1))
177
+ else
178
+ CONSECUTIVE_BLOCKS=1
179
+ LAST_BLOCK_REASON="$canonical"
180
+ fi
181
+ if (( CONSECUTIVE_BLOCKS >= BLOCK_CB_THRESHOLD )); then
182
+ echo '{"reason":"consecutive_blocks","count":'"$CONSECUTIVE_BLOCKS"',"last_reason":"'"$LAST_BLOCK_REASON"'"}' | atomic_write "$DESK/.sisyphus/mission-abort.json"
183
+ exit 1
184
+ fi
185
+ fi
186
+ ```
187
+ 4. `src/node/runner/campaign-main-loop.mjs` 동등 (state.consecutive_blocks + last_block_reason + canonicalReason).
188
+ 5. governance §8 — consecutive_blocks + canonicalization + exemption 명시.
189
+
190
+ **검증 (us021)**:
191
+ - AC1: BLOCK_CB_THRESHOLD 변수 정의 (default 3)
192
+ - AC2: zsh same-reason counter logic
193
+ - AC3: governance §8 텍스트 정합
194
+ - AC4: behavioural — 3회 동일 reason BLOCK 후 mission-abort.json 생성
195
+ - **AC5 (Architect issue #3)**: `_canonical_block_reason` helper 정의 + hygiene_violated prefix strip 검증
196
+ - **AC6 (Critic R9)**: 첫-iter block exempt fixture (ITERATION=1, reason="setup_fail") → CONSECUTIVE_BLOCKS=0 유지
197
+ - **AC7 (Critic R9)**: infra_failure category exempt fixture → CONSECUTIVE_BLOCKS=0 유지
198
+
199
+ ### Fix R10: P2-J — Cross-mission us_id leak + normalized extractor + quarantine
200
+
201
+ **대상 (Critic R10 + Architect issue #4 patches)**:
202
+ 1. `src/scripts/init_ralph_desk.zsh` mission init — stale us_id detect + scrub:
203
+ ```zsh
204
+ if [[ -f "$SIGNAL_FILE" ]]; then
205
+ stale_us=$(jq -r '.us_id // empty' "$SIGNAL_FILE" 2>/dev/null)
206
+ if [[ -n "$stale_us" && "$stale_us" != "ALL" ]]; then
207
+ # Critic R10: normalized US extractor
208
+ prd_us_list=$(awk 'match($0, /^##[[:space:]]+(US-[0-9]+)([[:space:]:-]|$)/, m) { print m[1] }' "$PRD_FILE" 2>/dev/null | sort -u)
209
+ if ! echo "$prd_us_list" | grep -qx "$stale_us"; then
210
+ # Architect issue #4: quarantine, not rm
211
+ mkdir -p "$DESK/.sisyphus/quarantine"
212
+ mv "$SIGNAL_FILE" "$DESK/.sisyphus/quarantine/iter-signal.$(date +%s).json"
213
+ log " Cross-mission stale us_id ($stale_us) — quarantined to .sisyphus/quarantine/"
214
+ fi
215
+ fi
216
+ fi
217
+ ```
218
+ 단, BSD awk match() 3-arg 미지원 → `match() + RSTART/RLENGTH + substr()` pattern 또는 `grep -oE` + 후처리 사용:
219
+ ```zsh
220
+ prd_us_list=$(grep -oE '^##[[:space:]]+US-[0-9]+([[:space:]:-]|$)' "$PRD_FILE" 2>/dev/null | grep -oE 'US-[0-9]+' | sort -u)
221
+ ```
222
+ 2. `src/scripts/run_ralph_desk.zsh:2425-2429` final ALL verify scope — US_LIST 만 신뢰 (signal_us_id US_LIST 에 없으면 무시 + warn).
223
+ 3. `src/node/runner/campaign-main-loop.mjs` — 동등 처리.
224
+ 4. governance §7a — cross-mission us_id leak 방어 + quarantine path 명시.
225
+
226
+ **검증 (us022)**:
227
+ - AC1: init 단계 stale us_id detect + quarantine helper (grep + behavioural)
228
+ - AC2: zsh runner final ALL verify US_LIST 신뢰
229
+ - AC3: governance §7a 텍스트 정합 + quarantine path
230
+ - AC4: behavioural — fixture mission PRD (US-001~003) + stale signal us_id=US-005 → SIGNAL_FILE quarantine 이동, .sisyphus/quarantine/ 에 파일 존재
231
+ - **AC5 (Architect issue #4)**: rm -f 사용 안 함 (`grep -n "rm -f.*SIGNAL_FILE" src/scripts/init_ralph_desk.zsh` = 0)
232
+ - **AC6 (Critic R10)**: PRD heading variation fixture (`## US-005 -`, `## US-005:`, `## US-005`) → 모두 정상 인식 (false positive 0)
233
+
234
+ ### Fix R11: P2-K — Cost log non-empty + trap-based final write + early-exit inventory
235
+
236
+ **대상 (Critic R11 + Architect issue #6 patches)**:
237
+ 1. `src/scripts/lib_ralph_desk.zsh:367` write_cost_log — note 필드 (bytes=0 시 'no_actual_usage_recorded').
238
+ 2. **`src/scripts/run_ralph_desk.zsh` (Architect issue #6)** — main loop 진입 직후 trap 등록:
239
+ ```zsh
240
+ trap '_emit_final_cost_log' EXIT
241
+ _emit_final_cost_log() {
242
+ [[ -n "${ITERATION:-}" ]] && [[ "${COST_LOG_FINAL_WRITTEN:-0}" -eq 0 ]] && {
243
+ write_cost_log "$ITERATION" 2>/dev/null || true
244
+ COST_LOG_FINAL_WRITTEN=1
245
+ }
246
+ }
247
+ ```
248
+ 3. **Early-exit path inventory (Critic R11 + Architect NEW-2)** — us023 회귀가 다음 broadened grep 결과의 모든 path 가 trap coverage 내인지 검증:
249
+ ```bash
250
+ grep -nE '^[[:space:]]*(exit\b|return\b|die\b)' src/scripts/run_ralph_desk.zsh src/scripts/lib_ralph_desk.zsh | grep -v '^[^:]*:[^:]*:.*\${' > early_exits.txt
251
+ ```
252
+ `die` wrapper 함수가 `lib_ralph_desk.zsh` 에 정의된 경우 명시적으로 trap 우회 분석 + 회귀에 포함.
253
+ 4. (init placeholder 삭제 — Architect issue #6) — 빈 cost-log 가 "broken logging" 으로 감지되도록 normal path 만 보강.
254
+ 5. governance §7 Cost Tracking — tmux estimated path + trap 명시.
255
+
256
+ **검증 (us023)**:
257
+ - AC1: write_cost_log 에 note 필드 (bytes=0 시 'no_actual_usage_recorded')
258
+ - AC2: zsh runner 에 `trap '_emit_final_cost_log' EXIT` 존재 (grep)
259
+ - AC3: behavioural — write_cost_log 호출 후 cost-log.jsonl 비어있지 않음
260
+ - **AC4 (Critic R11)**: early-exit grep inventory + 모든 path 가 trap coverage 검증 (스크립트 내 모든 `exit N` 또는 `return N` 위치 grep + trap fire 시점 비교)
261
+ - **AC5 (Architect issue #6)**: init placeholder 코드 부재 (grep `placeholder.*cost-log` = 0)
262
+
263
+ ---
264
+
265
+ ## 자가검증 시나리오 — Mechanical per-row (v2)
266
+
267
+ `tests/test_self_verification_0_11_handoff.sh` — 7 함수 + 각 함수 내 pre/post + grep 증명:
268
+
269
+ ```bash
270
+ test_r5_a4_audit_triggered() {
271
+ local audit="$LOGS_DIR/a4-fallback-audit.jsonl"
272
+ local pre=$(wc -l < "$audit" 2>/dev/null || echo 0)
273
+ # Trigger: simulate done-claim without iter-signal
274
+ echo '{"us_id":"US-001","status":"complete"}' > "$DESK/memos/${SLUG}-done-claim.json"
275
+ rm -f "$DESK/memos/${SLUG}-iter-signal.json"
276
+ # NEW-1 (Architect): zsh fixture invocation (run_ralph_desk.zsh is zsh, NOT bash)
277
+ # us017 implementation MUST extract A4 fallback into a callable helper in lib_ralph_desk.zsh
278
+ # so it can be sourced cleanly. Until then, use zsh -c with explicit DESK/SLUG/ITERATION exports.
279
+ zsh -c "DESK='$DESK' SLUG='$SLUG' ITERATION=1 LOGS_DIR='$LOGS_DIR' source src/scripts/lib_ralph_desk.zsh; _emit_a4_fallback_audit US-001 1" 2>/dev/null
280
+ local post=$(wc -l < "$audit" 2>/dev/null || echo 0)
281
+ [[ "$post" -gt "$pre" ]] || { fail "R5 A4 audit not triggered (pre=$pre post=$post)"; return 1; }
282
+ # Mechanical: grep that the patched code path was exercised
283
+ grep -q "a4_fallback" "$audit" || { fail "R5 audit entry missing"; return 1; }
284
+ pass "R5 A4 fallback audit triggered ($pre→$post)"
285
+ }
286
+
287
+ test_r6_test_density_warn() {
288
+ # Fixture: PRD with 3 ACs, test-spec with 1 test
289
+ local stderr_capture=$(./init_ralph_desk.zsh --slug test-r6 --prd fixtures/r6-bad-prd.md 2>&1)
290
+ echo "$stderr_capture" | grep -q "Test density warning" || { fail "R6 init exit message missing warning"; return 1; }
291
+ pass "R6 test density warning emitted to stderr"
292
+ }
293
+
294
+ # ... R7~R11 동일 패턴: 각 함수가 (1) pre-state 캡처, (2) 변경 코드 직접 invoke, (3) post-state grep 검증
295
+ ```
296
+
297
+ | Fix | 시나리오 | Mechanical 증명 |
298
+ |---|---|---|
299
+ | R5 P0-D | done-claim 작성 + iter-signal 누락 → A4 fallback 발동 | `wc -l a4-fallback-audit.jsonl` pre/post 비교 + entry grep |
300
+ | R6 P1-F | test-spec AC 3개 + test 1개 fixture | stderr 의 "Test density warning" 라인 grep |
301
+ | R7 P1-G | iter-signal status=verify_partial fixture (정상 + malformed) | verifier prompt grep `verified_acs only` + malformed → blocked sentinel meta.reason='verify_partial_malformed' |
302
+ | R8 P1-H | blocked sentinel + memory.md unchanged 5min+ | sentinel JSON sidecar `meta.blocked_hygiene_violated=true` jq 추출 |
303
+ | R9 P2-I | 동일 reason 3회 BLOCK + canonicalization + edge cases | mission-abort.json 존재 + jq `.count==3` + first-iter exempt fixture CONSECUTIVE_BLOCKS=0 검증 |
304
+ | R10 P2-J | PRD US-001~003 + stale signal us_id=US-005 + heading variation | `.sisyphus/quarantine/iter-signal.*.json` 존재 + 원본 SIGNAL_FILE 부재 + 3 variation fixture 정상 인식 |
305
+ | R11 P2-K | tmux mode 5 iter run + early-exit fixture | `cost-log.jsonl` 행 수 ≥ 5 + 모두 note 필드 보유 + trap fire 검증 |
306
+
307
+ **Pass criterion**: 7/7 mechanical 증명 + 각 fix 가 변경된 함수/파일을 실제 호출했음을 grep 으로 확인 (tautology 방지).
308
+
309
+ ---
310
+
311
+ ## 변경 대상 파일 표
312
+
313
+ ```
314
+ src/scripts/init_ralph_desk.zsh # R5(worker prompt), R6(test density lint + flag), R7(Signal rules + verifier prompt), R8(blocked exit hygiene), R10(stale us_id quarantine)
315
+ src/scripts/run_ralph_desk.zsh # R5(A4 audit), R6(--test-density-strict), R7(verify_partial parsing), R9(consecutive_blocks + canonical + exempt), R10(US_LIST scope), R11(trap)
316
+ src/scripts/lib_ralph_desk.zsh # R8(write_blocked_sentinel hygiene_violated), R11(write_cost_log note + bytes=0 path)
317
+ src/node/run.mjs # R6(--test-density-strict stub)
318
+ src/node/runner/campaign-main-loop.mjs # R7(verify_partial parser + malformed downgrade), R8(_checkBlockedHygiene), R9(consecutive_blocks state), R10(stale us_id scrub)
319
+ src/governance.md # R5(§1f A4 metric), R6(§7f Test Density), R7(§7g Signal Vocabulary + malformed), R8(§1f 5th channel), R9(§8 cb + canonicalization + exempt), R10(§7a quarantine)
320
+
321
+ [테스트]
322
+ tests/test_us017_a4_fallback_audit.sh
323
+ tests/test_us018_test_density.sh
324
+ tests/test_us019_verify_partial.sh
325
+ tests/test_us020_blocked_hygiene.sh
326
+ tests/test_us021_consecutive_blocks.sh
327
+ tests/test_us022_cross_mission_us_leak.sh
328
+ tests/test_us023_cost_log_nonempty.sh
329
+ tests/test_self_verification_0_11_handoff.sh # mechanical per-row
330
+ ```
331
+
332
+ ## 검증 (Self-Verification Gate)
333
+
334
+ 1. **LOW** — `zsh -n` + `node --check` (~10s)
335
+ 2. **MEDIUM** — us017~us023 7 신규 회귀 (~3min)
336
+ 3. **CRITICAL** — us001/us007/us012/us013/us014/us015/us016 회귀 무손실 (~3min)
337
+ 4. **자가검증 매핑 시나리오** — `test_self_verification_0_11_handoff.sh` 7/7 mechanical 증명
338
+
339
+ ## 단일 PR 진행 결정 (사용자 명시 시)
340
+
341
+ 사용자가 PR split 거부 + 단일 PR 명시한 경우:
342
+ - R5+R6+R7 (protocol) + R8+R9+R10+R11 (runtime) 단일 PR
343
+ - self-verification 시나리오는 양 영역 모두 포함하므로 보장 유지
344
+ - 단, codex review iteration 5+ 도달 시 split fallback 자동 트리거
345
+
346
+ ## ADR (간결)
347
+
348
+ - **Decision**: 7건 fix. v2 patches: PR split 권고 (사용자 명시 시 단일), R7 schema fallback (verify_partial_malformed downgrade), R8 helper-side hygiene check, R9 canonical reason + edge exempt, R10 normalized extractor + quarantine, R11 trap-based final write + early-exit inventory, self-verification mechanical per-row.
349
+ - **Drivers**: silent failure 가시화 + backward-compat + minimal blast radius + mechanical self-verification.
350
+ - **Alternatives considered (각 R 별 v1 표 + v2 새 patches)**.
351
+ - **Consequences**: PR-A 먼저 머지 + soak → PR-B (권고). 단일 PR 도 가능. Worker prompt 길이 약간 증가. test-spec WARN 다수 발생 가능 (점진 strict 화).
352
+ - **Follow-ups**: test-density STRICT 의 default 화 (v0.12+), verify_partial deferred_acs 자동 우선 재시도, A4 fallback 0% 시 hard fail.