@ai-dev-methodologies/rlp-desk 0.14.5 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,234 @@
1
+ # Bug Report #7 — Post-Sentinel Process Race Fix
2
+
3
+ ## Context
4
+
5
+ BOS 사용자가 19th launch에서 측정한 race window:
6
+ - iter-1 verifier가 verdict detect 후 **1m 43s** 뒤 `verify-verdict.json` 재수정 (file mtime 증거)
7
+ - iter-1 verifier post-verdict 후속 활동 **2m 1s**
8
+ - iter-1 verifier ↔ iter-2 worker 동시 작업 약 **2분**
9
+
10
+ Bug report:
11
+ `/Users/kyjin/dev/doul/bos/docs/exec-plans/active/2026-05-06-rlp-desk-bug-report-7-post-sentinel-process-race.md`
12
+
13
+ ### Root cause
14
+
15
+ Leader는 `iter-signal.json` / `verify-verdict.json` 발견 즉시 다음 iter로 진입하지만, 그 sentinel을 쓴 Worker/Verifier process(claude/codex TUI)는 **명시적으로 종료되지 않는다**. tmux pane은 살아 있고 TUI는 idle prompt로 회귀 후 자체 self-review를 수행 → sentinel 재수정·working tree 오염·토큰 낭비.
16
+
17
+ ### 모드 영향 범위 (중요)
18
+
19
+ `--mode tmux`(zsh runner)와 `--mode agent`(Node leader) **둘 다 영향**. Node leader도 `defaultSendKeys`/`defaultCreatePane`(`src/node/tmux/pane-manager.mjs`)을 통해 실제 tmux pane 위에서 worker/verifier를 실행한다 (`src/node/runner/campaign-main-loop.mjs:1077-1080`, `1116-1133`). Agent 모드 면역이라는 초기 가설은 부정확.
20
+
21
+ ### 비대칭 (현 상태)
22
+
23
+ | 경로 | Worker 후처리 | Verifier 후처리 |
24
+ |---|---|---|
25
+ | Node leader | 없음 | 없음 |
26
+ | zsh runner | 다음 iter 시작 시 cleanup (`run_ralph_desk.zsh:2948-2956`) — race window 5s+ | dispatch 직전 cleanup (`3160-3180`) — 같은 iter 내에선 보호되나 final iter 종료 후 또는 cross-iter race는 불보호 |
27
+
28
+ ---
29
+
30
+ ## Approach (Fix-Q + Fix-R, 최소 surgical 조합)
31
+
32
+ | Fix | 효과 | 채택 |
33
+ |---|---|---|
34
+ | **Q** Sentinel detect 즉시 producing pane에 Ctrl+C → process 종료 | race를 ~1초 안에 직접 차단 | **YES (primary)** |
35
+ | **R** Sentinel 파일 chmod 0444로 재수정 차단 | Q가 늦거나 fail해도 mtime 동결 | **YES (defense-in-depth)** |
36
+ | S Pane lifecycle 전면 리팩토링 | 효과는 있으나 surface가 너무 큼. 기존 prep cleanup (zsh 2948-2956)으로 부분 커버됨. Karpathy "surgical changes" 원칙 위반 | NO |
37
+ | T post-sentinel 30s 안전망 timeout | Q가 fail-open이고 다음 iter prep cleanup이 backup이라 중복 | NO |
38
+
39
+ 근거:
40
+ - Q는 producer를 ~1초 내 죽여서 root cause 차단. 기존 패턴 정확히 미러 (zsh `run_ralph_desk.zsh:2384-2397`, Ctrl+C 더블 송신 + `wait_for_pane_ready`).
41
+ - R은 chmod 실패에 관대(EPERM/ENOTSUP 무시 — `scripts/postinstall.js:104` `tryLockFile` 선례). WSL1/NTFS/tmpfs 등 chmod no-op 환경에서도 graceful degradation.
42
+ - S/T 제거로 review surface 최소화.
43
+
44
+ ---
45
+
46
+ ## Concrete code changes
47
+
48
+ ### Node leader
49
+
50
+ #### 1. `src/node/tmux/pane-manager.mjs` — helper 추가 (line 77 뒤)
51
+
52
+ 신규 export:
53
+ - `sendRawKey(paneId, key)` — `runTmux(['send-keys', '-t', paneId, key])`. `sendKeys`(`-l --` literal text)와 분리: C-c 같은 raw key용.
54
+ - `killPaneProcess(paneId, { sendRawKey, waitForExit, gracePeriodMs=800, exitTimeoutMs=5000, log })`:
55
+ 1. `sendRawKey('C-c')` → `await sleep(gracePeriodMs)` → `sendRawKey('C-c')` (double press, zsh `375-376` 미러).
56
+ 2. `await waitForExit(paneId, { timeoutMs: exitTimeoutMs }).catch(log)` — fail-open.
57
+ 3. raw key 송신 자체의 TmuxError도 catch+log (이미 죽은 pane에 안전).
58
+
59
+ 기존 `waitForProcessExit` (line 55) 그대로 재사용.
60
+
61
+ #### 2. `src/node/shared/fs.mjs` — helper 추가 (line 61 뒤)
62
+
63
+ - `lockSentinelFile(filePath, { log })` — `fs.chmod(filePath, 0o444)`, error 시 한 번만 경고 로그. `tryLockFile`(`scripts/postinstall.js:104`) 선례 미러.
64
+ - `unlockSentinelFile(filePath)` — `fs.chmod(filePath, 0o644)`, 실패 무시. iter cleanup 직전에 호출.
65
+
66
+ #### 3. `src/node/runner/campaign-main-loop.mjs` — wire + call sites
67
+
68
+ DI 슬롯 추가 (line 1077-1080):
69
+ ```
70
+ const sendRawKey = options.sendRawKey ?? defaultSendRawKey;
71
+ const waitForProcessExit = options.waitForProcessExit ?? defaultWaitForProcessExit;
72
+ const killPaneProcess = options.killPaneProcess ?? defaultKillPaneProcess;
73
+ const lockSentinel = options.lockSentinelFile ?? lockSentinelFile;
74
+ ```
75
+
76
+ 내부 wrapper:
77
+ ```
78
+ async function reapProducer(paneId, sentinelFile) {
79
+ await killPaneProcess(paneId, { sendRawKey, waitForExit: waitForProcessExit, log: console.error });
80
+ if (sentinelFile) await lockSentinel(sentinelFile, { log: console.error });
81
+ }
82
+ ```
83
+
84
+ 호출 사이트 (성공 + `validateArtifact` 통과 직후):
85
+
86
+ | Site | Line | 호출 |
87
+ |---|---|---|
88
+ | Flywheel poll | 1267-1277 다음 (1285 앞) | `reapProducer(state.flywheel_pane_id ?? state.verifier_pane_id, paths.flywheelSignalFile)` |
89
+ | Guard poll | 1305-1315 다음 (1323 앞) | `reapProducer(guardPaneId, paths.flywheelGuardVerdictFile)` |
90
+ | Worker poll | 1422-1432 다음 (1456 앞) | `reapProducer(state.worker_pane_id, paths.signalFile)` |
91
+ | Verifier poll | 1489-1513 다음 (1522 앞) | `reapProducer(state.verifier_pane_id, paths.verdictFile)` |
92
+ | Final per-US verifier (`runFinalSequentialVerify`) | 890-894 다음 (896 앞) | `reapProducer(verifierPaneId, paths.verdictFile)` — `runFinalSequentialVerify` 시그니처에 `reapProducer` 추가 + 호출처(1185-1194) 전달 |
93
+
94
+ iter cleanup unlock — `fs.unlink(...)` 호출 직전 `unlockSentinelFile` 호출:
95
+ - L1291 (`flywheelSignalFile`)
96
+ - L1328 (`flywheelGuardVerdictFile`)
97
+ - 루프 상단 (1145 직후) — Worker `signalFile` / Verifier `verdictFile` 방어적 unlock (다음 iter producer가 atomic rename으로 덮어쓸 때 대비)
98
+
99
+ ### zsh runner
100
+
101
+ #### 4. `src/scripts/lib_ralph_desk.zsh` — helper 추가 (`atomic_write` 다음, line 245 뒤)
102
+
103
+ ```
104
+ _kill_pane_process() {
105
+ local pane_id="$1" role="${2:-producer}"
106
+ log_debug "[bug7] kill_pane_process pane=$pane_id role=$role"
107
+ tmux send-keys -t "$pane_id" C-c 2>/dev/null
108
+ sleep 0.5
109
+ tmux send-keys -t "$pane_id" C-c 2>/dev/null
110
+ sleep 1
111
+ wait_for_pane_ready "$pane_id" 5 2>/dev/null || true
112
+ }
113
+
114
+ _lock_sentinel() {
115
+ local file="$1"
116
+ [[ -f "$file" ]] || return 0
117
+ chmod 0444 "$file" 2>/dev/null || true
118
+ }
119
+
120
+ _unlock_sentinel() {
121
+ local file="$1"
122
+ [[ -f "$file" ]] || return 0
123
+ chmod 0644 "$file" 2>/dev/null || true
124
+ }
125
+ ```
126
+
127
+ #### 5. `src/scripts/run_ralph_desk.zsh` — call sites
128
+
129
+ | Site | Line | 호출 |
130
+ |---|---|---|
131
+ | Worker poll 성공 직후 | 3003 (`worker_poll_done=1` 분기 안, `log_debug` 다음) | `_kill_pane_process "$WORKER_PANE" "worker"; _lock_sentinel "$SIGNAL_FILE"` |
132
+ | Verifier poll 성공 직후 (main path) | 3202 통과 후, 3215 앞 (`ITER_VERIFIER_END`) | `_kill_pane_process "$VERIFIER_PANE" "verifier"; _lock_sentinel "$VERDICT_FILE"` |
133
+ | Final-verify per-US (`run_sequential_final_verify`) | 2524 통과 후, 다음 iter 진입 전 | `_kill_pane_process "$VERIFIER_PANE" "verifier-final"; _lock_sentinel "$VERDICT_FILE"` |
134
+ | Codex grace path | `dispatch_verifier_per_us` (2420 그레이스 종료 직후, 2471 `cp` 앞) | `_kill_pane_process "$VERIFIER_PANE" "verifier-${suffix}"; _lock_sentinel "$VERDICT_FILE"` |
135
+ | Consensus path | `run_consensus_verification` 내 각 `poll_for_signal` 성공 직후 | 동일 패턴 |
136
+
137
+ prep cleanup unlock — line 2948-2956 cleanup 직전:
138
+ ```
139
+ _unlock_sentinel "$SIGNAL_FILE"; _unlock_sentinel "$VERDICT_FILE"
140
+ rm -f "$SIGNAL_FILE" "$DONE_CLAIM_FILE" "$VERDICT_FILE" 2>/dev/null
141
+ ```
142
+
143
+ ---
144
+
145
+ ## Files to modify
146
+
147
+ | 파일 | 변경 |
148
+ |---|---|
149
+ | `src/node/tmux/pane-manager.mjs` | `sendRawKey`, `killPaneProcess` export 추가 |
150
+ | `src/node/shared/fs.mjs` | `lockSentinelFile`, `unlockSentinelFile` export 추가 |
151
+ | `src/node/runner/campaign-main-loop.mjs` | DI + `reapProducer` + 5개 call site + iter cleanup unlock |
152
+ | `src/scripts/lib_ralph_desk.zsh` | `_kill_pane_process`, `_lock_sentinel`, `_unlock_sentinel` 추가 |
153
+ | `src/scripts/run_ralph_desk.zsh` | 4-5개 call site + prep cleanup unlock |
154
+ | `tests/node/us006-campaign-main-loop.test.mjs` | `createTmuxFakes()`에 `killPaneProcess`/`lockSentinelFile` 레코더 추가 + Bug-7 테스트 3건 |
155
+ | `tests/node/test-kill-pane-process.test.mjs` | NEW — helper 단위 테스트 |
156
+ | `tests/node/test-lock-sentinel-file.test.mjs` | NEW — chmod 단위 테스트 |
157
+ | `tests/test-bug7-post-sentinel-race.sh` | NEW — 실제 tmux 통합 테스트 (Bug #6 패턴 미러) |
158
+
159
+ 배포는 단일 PR (helper는 call site 없으면 no-op이라 review surface 작음).
160
+
161
+ ---
162
+
163
+ ## Reused functions (참조)
164
+
165
+ - Node: `pane-manager.mjs:50` `sendKeys`, `pane-manager.mjs:55` `waitForProcessExit` (5s timeout, shell 감지)
166
+ - Node: `shared/fs.mjs:6-23` `writeFileAtomic`, `42-61` `writeSentinelExclusive`
167
+ - Node: `scripts/postinstall.js:104` `tryLockFile` (chmod 0o444 선례)
168
+ - zsh: `lib_ralph_desk.zsh:240-245` `atomic_write`, `1075-1137` `wait_for_pane_ready`
169
+ - zsh: `run_ralph_desk.zsh:2384-2397` 검증된 verifier-cleanup 패턴 (Ctrl+C + /exit + wait), `375-376/529-530` 더블 Ctrl+C 패턴
170
+
171
+ ---
172
+
173
+ ## Testing strategy
174
+
175
+ ### 단위 테스트 (Node)
176
+
177
+ `tests/node/test-kill-pane-process.test.mjs` (NEW):
178
+ - AC1 정상: C-c → sleep → C-c → waitForExit 순서 (fake recorder 검증).
179
+ - AC2 fail-open: `waitForExit` 가 TmuxError throw 시 helper resolve.
180
+ - AC3 dead-pane: `sendRawKey` throw 시 resolve.
181
+ - AC4 grace: gracePeriodMs 준수 (fake clock 또는 tolerance 검증).
182
+
183
+ `tests/node/test-lock-sentinel-file.test.mjs` (NEW):
184
+ - AC1: lock 후 mode `& 0o222 === 0` (chmod 무시 FS는 skip).
185
+ - AC2: 존재하지 않는 path에 lock — throw 안 함.
186
+ - AC3: unlock 후 writable.
187
+
188
+ ### 통합 테스트 (Node)
189
+
190
+ `tests/node/us006-campaign-main-loop.test.mjs` 확장:
191
+ 1. **Bug-7-A**: Worker pollForSignal 성공 → next dispatchVerifier 전에 `killPaneProcess('%worker')` + `lockSentinelFile(signalFile)` 호출 순서 검증.
192
+ 2. **Bug-7-B**: Verifier verdict pass 후 next iter dispatchWorker 전에 `killPaneProcess('%verifier')` + `lockSentinelFile(verdictFile)`.
193
+ 3. **Bug-7-C**: `killPaneProcess`가 throw해도 run() 정상 완료.
194
+
195
+ `createTmuxFakes()`(line 83)에 fake `killPaneProcess`/`lockSentinelFile` 레코더 추가 (기존 30+ 테스트 호환 보장).
196
+
197
+ ### 통합 테스트 (zsh)
198
+
199
+ `tests/test-bug7-post-sentinel-race.sh` (NEW, `test-bug6-worker-idle-false-positive.sh` 패턴 미러):
200
+ - Scenario 1: tmux 세션에 `sleep 600` 띄우고 `_kill_pane_process` 호출 → 2s 안에 `pane_current_command`가 zsh/bash로 회귀.
201
+ - Scenario 2: `_lock_sentinel` → mode 0444 검증 → `_unlock_sentinel` → writable → `rm -f` 성공.
202
+ - Scenario 3 (REAL_E2E gated): 1-iter 캠페인 + stub claude(sentinel write 후 sleep 120) → 10s 후 verdict file mtime delta == 0.
203
+
204
+ ### Self-Verification 시나리오 (CLAUDE.md gate, 3건 필수)
205
+
206
+ `src/scripts/run_ralph_desk.zsh` 수정 — MEDIUM-HIGH risk:
207
+ - **LOW**: helper 단위 테스트 + 기존 Node/zsh 회귀 테스트 통과.
208
+ - **MEDIUM**: 1-iter 실제 캠페인. Worker → Verifier 전이 시점에 `pane_current_command` 캡처, 2s 내 shell 회귀 검증. Verdict file mtime 동결 검증.
209
+ - **CRITICAL**: 2-iter 캠페인 (verify→fail→verify→pass). iter-N+1 worker dispatch가 iter-N verifier `pane_current_command == zsh` 확인 후에만 발생 — 타임스탬프 로그 캡처. `--mode agent`와 `--mode tmux` 둘 다 실행.
210
+
211
+ ---
212
+
213
+ ## Verification end-to-end
214
+
215
+ 1. **단위**: `node --test tests/node/test-kill-pane-process.test.mjs tests/node/test-lock-sentinel-file.test.mjs` 통과.
216
+ 2. **통합 (Node)**: `node --test tests/node/us006-campaign-main-loop.test.mjs` 통과 — call order 단언이 회귀 가드.
217
+ 3. **라이브 tmux**: `_kill_pane_process` 호출 후 2s 내 `tmux display-message -p '#{pane_current_command}' -t $pane`가 `zsh`/`bash` 반환.
218
+ 4. **mtime 동결**: `stat -f %m verify-verdict.json`을 detect 시점과 +10s 시점에 측정해 delta == 0. Bug report의 1m43s 증거를 직접 반박.
219
+ 5. **Pane 출력**: `tmux capture-pane -p` 결과에 `Worked for Xm Ys` / `esc to interrupt` 신규 표식 없음.
220
+ 6. **두 모드**: 스모크 테스트를 `--mode tmux`(zsh runner)와 `--mode agent`(Node leader) 각각 실행 — 둘 다 4초 내 shell 회귀 검증.
221
+ 7. **재현 시나리오**: 19th launch와 동일 조건(claude opus 1m worker + gpt-5.5:high codex verifier)으로 캠페인 1회 실행 후 leader log + file mtime 비교 — race 0.
222
+
223
+ ---
224
+
225
+ ## Risk / mitigation
226
+
227
+ | Risk | 가능성 | 완화 |
228
+ |---|---|---|
229
+ | C-c가 producer artifact 쓰기 중간 인터럽트 | LOW — sentinel은 detect 시점에 이미 디스크에 존재 | `MalformedArtifactError` 경로가 partial write 처리 |
230
+ | chmod 0444가 다음 iter cleanup의 `unlink` 차단 | LOW | `_unlock_sentinel` / `unlockSentinelFile`이 unlink 직전 실행. 대부분 Unix FS는 dir-perms 기준이라 0444 파일도 unlink 가능 |
231
+ | Producer가 atomic rename으로 sentinel 재기록 (chmod 우회) | POSSIBLE | Q(kill)이 ~1s 내 producer 죽이므로 rewrite window가 2분 → 1초로 축소. 게다가 leader는 이미 in-band로 sentinel 소비 |
232
+ | `killPaneProcess`가 죽은 pane에 throw | POSSIBLE | helper 내부 catch + 단위 테스트 AC2/AC3로 회귀 가드 |
233
+ | chmod 0444 silent no-op (WSL1/NTFS/tmpfs) | OBSERVED (postinstall.js 선례) | 한 번만 경고 로그. Q(kill)이 primary defense라 graceful degradation |
234
+ | 기존 us006 테스트 회귀 | MEDIUM | `createTmuxFakes()`에 fake helper 레코더 추가 — 기존 호출자는 자동 주입 받음 |
@@ -0,0 +1,93 @@
1
+ # Signal Protocol — current contract + alternatives
2
+
3
+ **Spec version:** `signal-protocol-v1`
4
+ **Source consensus:** ralplan iter 6 — Architect synthesis, Critic codex APPROVED (P0=0, P1=0)
5
+ **Audience:** maintainers evaluating whether to adopt mailbox-dir, daemon, or in-process IPC alternatives.
6
+
7
+ ---
8
+
9
+ ## 1. Current Contract
10
+
11
+ rlp-desk routes Worker → Verifier handoff through a **single sentinel file per role per iteration**. The contract has four invariants:
12
+
13
+ 1. **Sentinel = artifact.** Every transition step (`verify`, `verdict`, `flywheel`, `flywheel-guard`) is encoded as a JSON file at a deterministic path under `.rlp-desk/memos/`. The Leader polls the path with `fs.access` + atomic JSON-parse; any partial write is rejected (`jq -e .` gate, see `tests/test-bug7-poll-partial-write.sh`).
14
+ 2. **`reapProducer` = lifecycle.** Once the Leader accepts a sentinel (validateArtifact passes), it MUST kill the producing TUI pane and chmod-lock the file. Skipping the reap leaves a self-reviewing claude/codex pane that overwrites the artifact mid-poll (Bug #7).
15
+ 3. **Strict ordering: detect → reap → wait shell → next dispatch.** The Leader does NOT dispatch the next role (Verifier after Worker, next-iter Worker after Verifier) until the producing pane's `pane_current_command` has returned to `zsh|bash|sh`. AC-H1 of PR-0b-narrow strengthens this with `waitForProcessExit`.
16
+ 4. **First-writer-wins for terminal sentinels.** `blocked.md` and `complete.md` are written via `O_EXCL` (`writeSentinelExclusive`); concurrent error paths cannot trample the canonical exit reason.
17
+
18
+ The same contract is implemented twice (`src/node/runner/campaign-main-loop.mjs` for `--mode agent`, `src/scripts/run_ralph_desk.zsh` for `--mode tmux`) with bit-for-bit parity on `(reason_text, reason_category, failure_category)` — verified by `tests/test-bug8-refuse-synthesis.sh` Scenario 4.
19
+
20
+ ---
21
+
22
+ ## 2. omc-teams Comparison (mailbox dir, daemon-backed CLI)
23
+
24
+ [omc-teams](https://github.com/oh-my-claudecode) delivers multi-agent coordination over a **daemon-backed CLI** (`omc team api ...`). Producers append to a per-team mailbox directory; consumers tail it. The reliability contract is enforced by the daemon process, not by file polling.
25
+
26
+ **What omc-teams gives you:**
27
+
28
+ - Crash-safe append-only message log (no truncated JSON window).
29
+ - Per-team subscription with backpressure.
30
+ - Cross-process delivery guarantees (daemon survives subprocess restart).
31
+
32
+ **What's load-bearing in the reliability gain — and what's not:**
33
+
34
+ The reliability gain is the **daemon**, not the mailbox dir. A bare file-mailbox (without daemon) inherits the same partial-write and self-review failure modes that rlp-desk's sentinel path already guards against, plus a new failure mode: a Worker prompt that misbehaves and dumps multiple JSON files into the mailbox (no single-writer invariant). Architect findings recorded in ralplan iter 6:
35
+
36
+ > Mailbox-dir without a daemon = same polling reliability as the sentinel approach + worker-prompt failure-mode increase. Adopting it as an intermediate step is strictly worse than the current contract.
37
+
38
+ So if rlp-desk wants the actual omc-teams reliability profile, it must adopt the **daemon**, not just the directory layout. That is the `Track B` work, not a sentinel rewrite.
39
+
40
+ ---
41
+
42
+ ## 3. claude code `/team` Comparison (in-process TeamCreate + SendMessage)
43
+
44
+ The Claude Code SDK exposes `TeamCreate` + `SendMessage` for in-process subagent coordination. This is fundamentally different:
45
+
46
+ | Property | rlp-desk sentinel | claude `/team` |
47
+ |---|---|---|
48
+ | Process model | Standalone tmux runner | Single-process subagent tree |
49
+ | IPC channel | Filesystem | In-memory message bus |
50
+ | Failure mode | Pane death, partial write | Subagent throw |
51
+ | Lifetime | Survives leader exit | Dies with parent |
52
+
53
+ `/team` is **not applicable** to a standalone tmux runner. rlp-desk explicitly supports the use case where the Leader can crash, the user can detach the tmux session, and a fresh Leader process can resume against the on-disk sentinel state. `/team` cannot be paused, snapshotted, or resumed across processes — by design.
54
+
55
+ ---
56
+
57
+ ## 4. Why rlp-desk does NOT adopt mailbox-dir
58
+
59
+ Architect/Critic codex consensus iter 6 rejected swapping the sentinel contract for a mailbox-dir for three concrete reasons:
60
+
61
+ 1. **No reliability gain without the daemon.** Section 2 above. The daemon is the load-bearing piece; the directory is a side-effect of the daemon's protocol.
62
+ 2. **Increased Worker-prompt failure surface.** Today the Worker is held to a single-writer contract: it MUST write `iter-signal.json` exactly once. A mailbox flips this to "append any number of messages and the daemon picks the latest" — a much weaker prompt-side invariant that empirically breaks under the kind of multi-pass self-review failures that Bug #7 was created to fix.
63
+ 3. **Migration cost without commensurate benefit.** Two implementations (Node + zsh), Self-Verification Gate matrix (LOW/MEDIUM/CRITICAL × `--mode tmux/agent`), backwards compatibility for in-flight campaigns, and downstream wrapper tools (analytics, blueprints, Test Spec) all assume the sentinel contract. Replacing it is a multi-PR migration with no incremental win until the daemon ships.
64
+
65
+ The bug-fix track (Bug #6 worker-dead, Bug #7 post-sentinel-race, Bug #8 refuse-synthesize) closes the actual reliability gaps inside the sentinel contract and is strictly cheaper than the mailbox migration.
66
+
67
+ ---
68
+
69
+ ## 5. Track B Roadmap — daemon-backed `rlp-desk team api`
70
+
71
+ When the project is ready to adopt the omc-teams reliability profile, the migration looks like this:
72
+
73
+ **Track B — Phase 1 (PoC, separate ralplan):**
74
+ - New CLI: `rlp-desk team api start|stop|status|send|recv`
75
+ - Daemon process (`rlp-desk-teamd`) owns a per-campaign mailbox under `~/.rlp-desk/team/{slug}/`.
76
+ - Leader and Workers route through the CLI; no direct file polling.
77
+ - File-system fallback retained for the migration window — daemon down ⇒ degrade to sentinel mode.
78
+
79
+ **Track B — Phase 2 (cutover):**
80
+ - Sentinel reads behind a feature flag (`RLP_TEAM_API=1`).
81
+ - Self-Verification Gate matrix extended: each scenario runs once per backend (sentinel + team-api).
82
+ - Wrapper tools (analytics, blueprints) updated to consume the new event stream.
83
+
84
+ **Track B — Phase 3 (deprecation):**
85
+ - Sentinel path removed from runtime once team-api has burned in for ≥1 release.
86
+ - Documentation rolled forward; `signal-protocol-v1` archived.
87
+
88
+ Dependencies:
89
+ - Daemon implementation (~600 LoC Node, drawing on Bun's IPC primitives or plain `node:net`).
90
+ - Integration test harness for daemon crash recovery.
91
+ - Self-Verification Gate parity matrix (Node × zsh × team-api).
92
+
93
+ This track is **explicitly out of scope** for the Bug #6/#7/#8 plan v6. It is captured here so future maintainers do not interpret "rlp-desk does not use a mailbox" as an oversight — it is a deliberate architectural decision with a known successor path.
package/install.sh CHANGED
@@ -115,6 +115,8 @@ fetch "$REPO_URL/docs/rlp-desk/getting-started.md" "$DESK_DIR/docs/rlp-desk/gett
115
115
  fetch "$REPO_URL/docs/rlp-desk/protocol-reference.md" "$DESK_DIR/docs/rlp-desk/protocol-reference.md"
116
116
  fetch "$REPO_URL/docs/rlp-desk/TODO-verification-next.md" "$DESK_DIR/docs/rlp-desk/TODO-verification-next.md"
117
117
  fetch "$REPO_URL/docs/rlp-desk/multi-mission-orchestration.md" "$DESK_DIR/docs/rlp-desk/multi-mission-orchestration.md"
118
+ # Plan v6 PR-0a: signal protocol documentation
119
+ fetch "$REPO_URL/docs/rlp-desk/signal-protocol.md" "$DESK_DIR/docs/rlp-desk/signal-protocol.md"
118
120
  # Dev meta docs (v5.7 §4.15: under docs/rlp-desk/ to avoid mixing with user docs)
119
121
  fetch "$REPO_URL/docs/rlp-desk/internal/verification-policy-gap-analysis.md" "$DESK_DIR/docs/rlp-desk/internal/verification-policy-gap-analysis.md"
120
122
  fetch "$REPO_URL/docs/rlp-desk/internal/verification-strategy-research.md" "$DESK_DIR/docs/rlp-desk/internal/verification-strategy-research.md"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ai-dev-methodologies/rlp-desk",
3
- "version": "0.14.5",
3
+ "version": "0.15.0",
4
4
  "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
5
5
  "scripts": {
6
6
  "postinstall": "node scripts/postinstall.js",
@@ -33,6 +33,8 @@ const runtimeSources = [
33
33
  ["docs/rlp-desk/protocol-reference.md", path.join(docsDir, "rlp-desk", "protocol-reference.md")],
34
34
  ["docs/rlp-desk/TODO-verification-next.md", path.join(docsDir, "rlp-desk", "TODO-verification-next.md")],
35
35
  ["docs/rlp-desk/multi-mission-orchestration.md", path.join(docsDir, "rlp-desk", "multi-mission-orchestration.md")],
36
+ // Plan v6 PR-0a: signal protocol documentation (Architect/Critic codex iter 6).
37
+ ["docs/rlp-desk/signal-protocol.md", path.join(docsDir, "rlp-desk", "signal-protocol.md")],
36
38
  ];
37
39
  // v0.14.0: legacy-deletion list cleared. The Node-canonical era (v5.7+)
38
40
  // removed zsh after install; v0.14.0 reverts that — the zsh runner is the
@@ -89,6 +89,14 @@ Ask about these items one by one (or in small groups):
89
89
  - **gpt-5.5:medium** — default recommendation (full context window, progressive upgrade handles harder US)
90
90
  - **spark:high** — only when US is small enough for spark's 100k context (single-file, AC count <= 4, simple logic). Do NOT use as primary recommendation — spark context window is too small for most tasks
91
91
 
92
+ **Context window behavior (claude models — v0.14.6+)**:
93
+ - All claude models default to **200K**. `sonnet` and `opus` aliases both run at the standard window.
94
+ - To request 1M, append the explicit `[1m]` suffix on the full model id:
95
+ - `claude-opus-4-7[1m]` — 1M attempted via `ANTHROPIC_BETA=context-1m-2025-08-07`. Works on most Claude Max accounts.
96
+ - `claude-sonnet-4-6[1m]` — 1M attempted, **but** requires the Anthropic "Extra usage" toggle at https://claude.ai/settings/usage. Without that toggle the worker fails at the first API call with `Extra usage is required for 1M context`.
97
+ - rlp-desk does NOT pre-check entitlement — the explicit `[1m]` is honored as-is. If the API rejects it, you will see the error immediately and can re-run with the standard alias or the opus 1M form.
98
+ - **Default recommendation when 1M is genuinely needed:** prefer `claude-opus-4-7[1m]` over `claude-sonnet-4-6[1m]` because opus 1M does not require a separate entitlement toggle.
99
+
92
100
  Present complexity score with evidence to the user, e.g.: "I rate this MEDIUM because: US count=4 (MEDIUM), file scope=2 (MEDIUM), logic=conditionals (MEDIUM), deps=none (LOW), impact=modify (MEDIUM). Highest=MEDIUM."
93
101
 
94
102
  **If codex IS installed** — say: "Codex is installed. I recommend cross-engine Worker for cost savings (Pro token pool separation) and cross-engine blind-spot coverage (claude Verifier catches issues codex Worker misses)."
@@ -1,5 +1,5 @@
1
1
  import { shellQuote } from '../util/shell-quote.mjs';
2
- import { OPUS_1M_BETA, isOpusModel } from '../constants.mjs';
2
+ import { ONE_MILLION_BETA, wantsOneMillionContext } from '../constants.mjs';
3
3
 
4
4
  const CLAUDE_BIN = 'claude';
5
5
  const CODEX_BIN = 'codex';
@@ -32,12 +32,14 @@ function assertTuiMode(mode, builderName) {
32
32
  export function buildClaudeCmd(mode, model, options = {}) {
33
33
  assertTuiMode(mode, 'buildClaudeCmd');
34
34
 
35
- // v5.7 §4.9: auto-enable 1M-token context for Opus models. Long campaigns
36
- // no longer silently truncate at 200K. Header is benign for non-Opus calls
37
- // but we omit it there to keep the cmdline tidy.
35
+ // v0.14.6: 1M context is opt-in only via the explicit '[1m]' suffix.
36
+ // opus / sonnet / claude-opus-4-7 (no suffix) all run at the standard
37
+ // 200K context. Adding '[1m]' on either opus or sonnet model id injects
38
+ // the ANTHROPIC_BETA header and attempts the 1M window — sonnet[1m] still
39
+ // requires Anthropic "Extra usage" entitlement at the API layer.
38
40
  const parts = ['DISABLE_OMC=1'];
39
- if (isOpusModel(model)) {
40
- parts.push(`ANTHROPIC_BETA=${shellQuote(OPUS_1M_BETA)}`);
41
+ if (wantsOneMillionContext(model)) {
42
+ parts.push(`ANTHROPIC_BETA=${shellQuote(ONE_MILLION_BETA)}`);
41
43
  }
42
44
  parts.push(
43
45
  CLAUDE_BIN,
@@ -1,19 +1,21 @@
1
1
  // Shared runtime constants. Single-source for cross-module values.
2
2
 
3
- // Anthropic Claude API beta header that activates the 1M-token context window
4
- // for Opus models. Auto-prepended to every claude CLI invocation that uses
5
- // --model opus so long campaigns no longer silently truncate at 200K.
3
+ // Anthropic Claude API beta header for the 1M-token context window. Injected
4
+ // only when the user explicitly opts in via the '[1m]' suffix on the model
5
+ // id see wantsOneMillionContext() below.
6
6
  //
7
7
  // Docs: https://docs.anthropic.com/en/docs/build-with-claude/context-windows
8
8
  // (search "1M context") — header rotates with each beta phase.
9
- export const OPUS_1M_BETA = 'context-1m-2025-08-07';
9
+ export const ONE_MILLION_BETA = 'context-1m-2025-08-07';
10
10
 
11
- // Model id that triggers Opus 1M auto-enable. Plain string match against the
12
- // --model value (post-shellQuote stripping). Bracketed form
13
- // 'claude-opus-4-7[1m]' is also Opus and benefits from this; pattern match
14
- // covers both.
15
- export function isOpusModel(model) {
11
+ // v0.14.6: 1M context is opt-in only via the explicit '[1m]' suffix on the
12
+ // model id. Previously rlp-desk auto-injected ANTHROPIC_BETA for any opus
13
+ // model; in practice that produced surprising results (opus alias still
14
+ // reported a 200K window in real CLI calls, and sonnet[1m] requires a
15
+ // separate "Extra usage" entitlement). New rule: user is the source of
16
+ // truth. Type the suffix to opt in; otherwise both opus and sonnet run at
17
+ // the standard 200K context.
18
+ export function wantsOneMillionContext(model) {
16
19
  if (!model) return false;
17
- const m = String(model).toLowerCase();
18
- return m === 'opus' || m.startsWith('claude-opus-');
20
+ return String(model).toLowerCase().endsWith('[1m]');
19
21
  }
@@ -7,10 +7,15 @@ import { promisify } from 'node:util';
7
7
 
8
8
  import { buildClaudeCmd, buildCodexCmd, parseModelFlag } from '../cli/command-builder.mjs';
9
9
  import { shellQuote } from '../util/shell-quote.mjs';
10
- import { OPUS_1M_BETA, isOpusModel } from '../constants.mjs';
10
+ import { ONE_MILLION_BETA, wantsOneMillionContext } from '../constants.mjs';
11
11
  import { initCampaign } from '../init/campaign-initializer.mjs';
12
12
  import { LEGACY_DESK_REL, resolveDeskRoot } from '../util/desk-root.mjs';
13
- import { writeSentinelExclusive } from '../shared/fs.mjs';
13
+ import {
14
+ lockSentinelFile as defaultLockSentinelFile,
15
+ stampAckField as defaultStampAckField,
16
+ unlockSentinelFile,
17
+ writeSentinelExclusive,
18
+ } from '../shared/fs.mjs';
14
19
  import {
15
20
  TimeoutError,
16
21
  WorkerExitedError,
@@ -29,7 +34,10 @@ import {
29
34
  } from '../reporting/campaign-reporting.mjs';
30
35
  import {
31
36
  createPane as defaultCreatePane,
37
+ killPaneProcess as defaultKillPaneProcess,
32
38
  sendKeys as defaultSendKeys,
39
+ sendRawKey as defaultSendRawKey,
40
+ waitForProcessExit as defaultWaitForProcessExit,
33
41
  } from '../tmux/pane-manager.mjs';
34
42
 
35
43
  const execFileAsync = promisify(execFile);
@@ -128,6 +136,39 @@ function buildPaths(rootDir, slug, env = process.env) {
128
136
  };
129
137
  }
130
138
 
139
+ // Bug #8 PR-B: default git working-tree probe. Inline (~20 LoC) — no new
140
+ // module per Architect/Critic codex iter 6 consensus. Tests inject a stub via
141
+ // run() option `checkWorkingTree`.
142
+ // - returns { ok: false, error } when git rev-parse fails (not a repo, etc).
143
+ // - returns { ok: true, dirty: bool, dirtyFiles[] } otherwise.
144
+ // - dirtyFiles are raw `git status --porcelain` lines (caller truncates).
145
+ async function _defaultCheckWorkingTree(rootDir) {
146
+ try {
147
+ const { stdout: top } = await execFileAsync('git', ['-C', rootDir, 'rev-parse', '--show-toplevel']);
148
+ const trimmed = top.trim();
149
+ // macOS `/var` resolves to `/private/var`; symlinks elsewhere too. Compare
150
+ // canonical realpaths via fs.realpath so the comparison does not fire on
151
+ // symlink-equivalent paths.
152
+ const [topCanon, rootCanon] = await Promise.all([
153
+ fs.realpath(trimmed).catch(() => trimmed),
154
+ fs.realpath(rootDir).catch(() => rootDir),
155
+ ]);
156
+ if (topCanon !== rootCanon) {
157
+ // Worker is in a sub-tree, not the campaign root. Refuse to classify.
158
+ return { ok: false, error: `git toplevel ${trimmed} != ${rootDir}` };
159
+ }
160
+ } catch (err) {
161
+ return { ok: false, error: err?.message ?? String(err) };
162
+ }
163
+ try {
164
+ const { stdout } = await execFileAsync('git', ['-C', rootDir, 'status', '--porcelain']);
165
+ const lines = stdout.split('\n').filter(Boolean);
166
+ return { ok: true, dirty: lines.length > 0, dirtyFiles: lines };
167
+ } catch (err) {
168
+ return { ok: false, error: err?.message ?? String(err) };
169
+ }
170
+ }
171
+
131
172
  async function exists(targetPath) {
132
173
  try {
133
174
  await fs.access(targetPath);
@@ -534,6 +575,12 @@ export const BLOCK_TAGS = Object.freeze({
534
575
  MALFORMED_ARTIFACT: 'malformed_artifact',
535
576
  // Backstop (run() try/finally)
536
577
  LEADER_EXITED_WITHOUT_TERMINAL_STATE: 'leader_exited_without_terminal_state',
578
+ // Bug #8 (Plan v6 PR-B): refuse to synthesize verify signal when codex
579
+ // worker exited without committing. Three new tags route through
580
+ // _handlePollFailure with reasonOverride/categoryOverride.
581
+ CODEX_EXIT_NO_DONE_CLAIM: 'codex_exit_no_done_claim',
582
+ GIT_STATE_UNVERIFIABLE: 'git_state_unverifiable',
583
+ WORKER_INCOMPLETE_UNCOMMITTED: 'worker_incomplete_uncommitted',
537
584
  });
538
585
 
539
586
  // P1-D Failure Taxonomy classifier. governance §1f locks the reason_category
@@ -619,6 +666,32 @@ function _classifyBlock(source, { verdict, state, slug } = {}) {
619
666
  action = 'investigate_leader_logs';
620
667
  failureCategory = 'leader_exited_without_terminal_state';
621
668
  break;
669
+ // Bug #8 PR-B — codex worker exited but did not write done-claim. Refuse
670
+ // to synthesize a verify signal; surface as infra_failure so wrapper does
671
+ // not retry blindly.
672
+ case BLOCK_TAGS.CODEX_EXIT_NO_DONE_CLAIM:
673
+ category = 'infra_failure';
674
+ recoverable = false;
675
+ action = 'investigate_pane_logs';
676
+ failureCategory = 'codex_exit_no_done_claim';
677
+ break;
678
+ // Bug #8 PR-B — git status could not be resolved (not a repo, git binary
679
+ // missing, etc). Without git we cannot prove the working tree is clean,
680
+ // so refuse to synthesize.
681
+ case BLOCK_TAGS.GIT_STATE_UNVERIFIABLE:
682
+ category = 'infra_failure';
683
+ recoverable = false;
684
+ action = 'investigate_git_state';
685
+ failureCategory = 'git_state_unverifiable';
686
+ break;
687
+ // Bug #8 PR-B — worker said it was done (done-claim present) but the tree
688
+ // is dirty. Recoverable: next iteration's worker can finish committing.
689
+ case BLOCK_TAGS.WORKER_INCOMPLETE_UNCOMMITTED:
690
+ category = 'metric_failure';
691
+ recoverable = true;
692
+ action = 'retry_after_fix';
693
+ failureCategory = 'worker_incomplete_uncommitted';
694
+ break;
622
695
  default:
623
696
  category = 'metric_failure';
624
697
  recoverable = false;
@@ -650,9 +723,41 @@ async function _handlePollFailure(error, ctx) {
650
723
  options,
651
724
  role, // 'worker' | 'verifier' | 'final_verifier' | 'flywheel' | 'guard'
652
725
  usIdOverride,
726
+ // Bug #8 PR-B: when the caller has already classified the failure (e.g.
727
+ // codex done-claim/git gate), forward an explicit BLOCK_TAGS value as
728
+ // categoryOverride and a reason string. Named `categoryOverride` per
729
+ // Plan v6 PRD (it overrides the tag→reason_category mapping). Existing 5
730
+ // callers omit both and the legacy error→tag mapping below runs unchanged.
731
+ categoryOverride,
732
+ reasonOverride,
653
733
  } = ctx;
654
734
  const usId = usIdOverride ?? state.current_us;
655
735
 
736
+ if (categoryOverride) {
737
+ state.phase = 'blocked';
738
+ const classification = _classifyBlock(categoryOverride, { state, slug });
739
+ const reasonText = reasonOverride ?? `${role} blocked: ${categoryOverride}`;
740
+ await writeSentinel(paths.blockedSentinel, 'blocked', usId, reasonText, classification, paths);
741
+ await writeStatus(paths, state, options.onStatusChange, options.now);
742
+ await generateCampaignReport({
743
+ slug,
744
+ reportFile: paths.reportFile,
745
+ prdFile: paths.prdFile,
746
+ statusFile: paths.statusFile,
747
+ analyticsFile: paths.analyticsFile,
748
+ now: resolveNow(options.now),
749
+ blockedReason: reasonText,
750
+ blockedCategory: classification.reason_category,
751
+ });
752
+ return {
753
+ status: 'blocked',
754
+ usId,
755
+ reason: reasonText,
756
+ category: classification.reason_category,
757
+ statusFile: paths.statusFile,
758
+ };
759
+ }
760
+
656
761
  let tag;
657
762
  let reason;
658
763
  if (error instanceof WorkerExitedError) {
@@ -872,6 +977,10 @@ async function runFinalSequentialVerify({
872
977
  pollForSignal,
873
978
  runIntegrationCheck,
874
979
  iterTimeoutMs,
980
+ // Bug #7 Fix-Q/R: optional reaper. Passed from _runCampaignBody so each
981
+ // per-US verdict kills the verifier TUI before the next per-US dispatch
982
+ // reuses the same pane. No-op when undefined (legacy/test callers).
983
+ reapProducer,
875
984
  }) {
876
985
  const verifierModel = state.final_verifier_model;
877
986
 
@@ -893,6 +1002,10 @@ async function runFinalSequentialVerify({
893
1002
  timeoutMs: iterTimeoutMs,
894
1003
  });
895
1004
 
1005
+ if (typeof reapProducer === 'function') {
1006
+ await reapProducer(verifierPaneId, paths.verdictFile);
1007
+ }
1008
+
896
1009
  if (verdict.verdict !== 'pass') {
897
1010
  return {
898
1011
  status: 'continue',
@@ -933,9 +1046,11 @@ async function runFinalSequentialVerify({
933
1046
  const HOME_DESK_DIR = path.join(os.homedir(), '.claude', 'ralph-desk');
934
1047
 
935
1048
  function buildAutonomousClaudeCmd({ promptFile, model, rootDir, homeDeskDir = HOME_DESK_DIR }) {
936
- // §4.9: ANTHROPIC_BETA prefix for Opus 1M context.
937
- const betaPrefix = isOpusModel(model)
938
- ? `ANTHROPIC_BETA=${shellQuote(OPUS_1M_BETA)} `
1049
+ // v0.14.6: ANTHROPIC_BETA prefix injected only when the model id ends
1050
+ // with explicit '[1m]' suffix. opus / sonnet / claude-opus-4-7 (no
1051
+ // suffix) all run at the standard 200K context.
1052
+ const betaPrefix = wantsOneMillionContext(model)
1053
+ ? `ANTHROPIC_BETA=${shellQuote(ONE_MILLION_BETA)} `
939
1054
  : '';
940
1055
  // §4.11.a: --add-dir whitelist (home rlp-desk + campaign cwd) for true autonomy.
941
1056
  const addDirParts = [];
@@ -1076,6 +1191,46 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1076
1191
  const createPane = options.createPane ?? defaultCreatePane;
1077
1192
  const createSession = options.createSession ?? defaultCreateSession;
1078
1193
  const pollForSignal = options.pollForSignal ?? defaultPollForSignal;
1194
+ // Bug #7 Fix-Q/R: post-sentinel reaper. Producer (claude/codex TUI) must be
1195
+ // interrupted the moment leader has consumed the sentinel; otherwise the
1196
+ // pane lingers in idle prompt and self-reviews for ~2min. lockSentinel
1197
+ // freezes the file mtime as defense-in-depth. All four are injectable so
1198
+ // existing tests with fake sendKeys keep working (us006 createTmuxFakes).
1199
+ const sendRawKey = options.sendRawKey ?? defaultSendRawKey;
1200
+ const waitForProcessExit = options.waitForProcessExit ?? defaultWaitForProcessExit;
1201
+ const killPaneProcess = options.killPaneProcess ?? defaultKillPaneProcess;
1202
+ const lockSentinel = options.lockSentinelFile ?? defaultLockSentinelFile;
1203
+ const stampAckField = options.stampAckField ?? defaultStampAckField;
1204
+ const reapProducer = async (paneId, sentinelFile) => {
1205
+ if (!paneId) return;
1206
+ await killPaneProcess(paneId, {
1207
+ sendRawKey,
1208
+ waitForExit: waitForProcessExit,
1209
+ log: (msg) => console.error(msg),
1210
+ });
1211
+ // PR-0b-narrow AC-H1: after killPaneProcess, wait for the producing
1212
+ // process to actually exit before continuing. waitForProcessExit returns
1213
+ // when pane_current_command resolves to a shell (zsh/bash/sh). Wrapped
1214
+ // in try/catch — failure here is non-fatal but emits a log entry.
1215
+ try {
1216
+ await waitForProcessExit(paneId, { timeoutMs: 5000 });
1217
+ } catch (err) {
1218
+ console.error(`[handshake] waitForProcessExit failed on ${paneId} (${err?.message ?? err}); continuing`);
1219
+ }
1220
+ if (sentinelFile) {
1221
+ await lockSentinel(sentinelFile, { log: (msg) => console.error(msg) });
1222
+ // PR-0b-narrow AC-H2: stamp the leader_ack audit field. Best-effort,
1223
+ // does not block subsequent dispatch.
1224
+ await stampAckField(sentinelFile, {
1225
+ acked_by: 'leader',
1226
+ acked_at: new Date(resolveNow(options.now)).toISOString(),
1227
+ ack_pane_state: 'shell',
1228
+ }, { log: (msg) => console.error(msg) });
1229
+ }
1230
+ };
1231
+ // Bug #8 PR-B: working-tree probe injected (or default execFile git).
1232
+ // Returns { ok: boolean, dirty?: boolean, dirtyFiles?: string[], error?: string }.
1233
+ const checkWorkingTree = options.checkWorkingTree ?? _defaultCheckWorkingTree;
1079
1234
  const runIntegrationCheck = options.runIntegrationCheck ?? (async () => ({ exitCode: 0, summary: 'integration skipped' }));
1080
1235
  const maxIterations = options.maxIterations ?? 100;
1081
1236
  // v5.7 §4.19: campaign-level pollForSignal timeout (Node leader fix).
@@ -1141,6 +1296,11 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1141
1296
  let _laneSnapshot = await _snapshotLaneMtimes(paths);
1142
1297
 
1143
1298
  while (state.iteration <= maxIterations) {
1299
+ // Bug #7 Fix-R defensive unlock: a 0o444 sentinel left from the previous
1300
+ // iteration must not block the next producer's atomic-rename write.
1301
+ // Idempotent: missing-file calls are no-ops.
1302
+ await unlockSentinelFile(paths.signalFile);
1303
+ await unlockSentinelFile(paths.verdictFile);
1144
1304
  // Audit drift from the prior iteration before doing anything new.
1145
1305
  const _laneSnapshotAfter = await _snapshotLaneMtimes(paths);
1146
1306
  const _laneViolations = await _checkLaneViolations(paths, _laneSnapshot, _laneSnapshotAfter, state, options);
@@ -1189,6 +1349,7 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1189
1349
  pollForSignal,
1190
1350
  runIntegrationCheck,
1191
1351
  iterTimeoutMs,
1352
+ reapProducer,
1192
1353
  });
1193
1354
  } catch (error) {
1194
1355
  // v5.7 §4.25 — uniform poll-failure handling for final verifier.
@@ -1280,12 +1441,17 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1280
1441
  });
1281
1442
  }
1282
1443
 
1444
+ // Bug #7 Fix-Q/R: reap flywheel pane before consuming the signal.
1445
+ await reapProducer(state.flywheel_pane_id ?? state.verifier_pane_id, paths.flywheelSignalFile);
1446
+
1283
1447
  state.last_flywheel_decision = flywheelSignal.decision;
1284
1448
  // P0-A multi-mission orchestration: optionally captured from flywheel signal.
1285
1449
  // null when the flywheel did not suggest a next mission. Consumer wrappers
1286
1450
  // poll status.next_mission_candidate to chain missions without code edits.
1287
1451
  // See docs/multi-mission-orchestration.md.
1288
1452
  state.next_mission_candidate = flywheelSignal.next_mission_candidate ?? null;
1453
+ // Bug #7 Fix-R cleanup: unlock before unlink so 0o444 doesn't block.
1454
+ await unlockSentinelFile(paths.flywheelSignalFile);
1289
1455
  await fs.unlink(paths.flywheelSignalFile).catch(() => {});
1290
1456
 
1291
1457
  // Flywheel Guard (independent validation of flywheel decision)
@@ -1318,11 +1484,15 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1318
1484
  });
1319
1485
  }
1320
1486
 
1487
+ // Bug #7 Fix-Q/R: reap guard pane before mutating state.
1488
+ await reapProducer(guardPaneId, paths.flywheelGuardVerdictFile);
1489
+
1321
1490
  if (!state.flywheel_guard_count[state.current_us]) {
1322
1491
  state.flywheel_guard_count[state.current_us] = 0;
1323
1492
  }
1324
1493
  state.flywheel_guard_count[state.current_us] += 1;
1325
1494
 
1495
+ await unlockSentinelFile(paths.flywheelGuardVerdictFile);
1326
1496
  await fs.unlink(paths.flywheelGuardVerdictFile).catch(() => {});
1327
1497
 
1328
1498
  if (guardVerdict.verdict === 'inconclusive') {
@@ -1430,8 +1600,43 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1430
1600
  });
1431
1601
  } catch (error) {
1432
1602
  if (error instanceof TimeoutError && parseModelFlag(state.worker_model).engine === 'codex') {
1433
- // v5.7 codex CLI exits cleanly after writing signal; if pollForSignal
1434
- // timed out for codex, synthesize a verify signal so the loop continues.
1603
+ // Bug #8 PR-B 4-way gate: refuse to synthesize verify signal when
1604
+ // codex worker exited without committing real work.
1605
+ // 1. done-claim absent → BLOCKED infra_failure
1606
+ // 2. git unverifiable → BLOCKED infra_failure
1607
+ // 3. done-claim + dirty tree → BLOCKED metric_failure
1608
+ // 4. done-claim + clean tree → synthesize verify (legacy path)
1609
+ const doneClaimExists = await exists(paths.doneClaimFile);
1610
+ if (!doneClaimExists) {
1611
+ return _handlePollFailure(error, {
1612
+ paths, state, slug, options,
1613
+ role: 'worker',
1614
+ categoryOverride: BLOCK_TAGS.CODEX_EXIT_NO_DONE_CLAIM,
1615
+ reasonOverride:
1616
+ 'codex worker exited (timeout) without writing done-claim; refusing to synthesize verify signal',
1617
+ });
1618
+ }
1619
+ const tree = await checkWorkingTree(rootDir);
1620
+ if (!tree.ok) {
1621
+ return _handlePollFailure(error, {
1622
+ paths, state, slug, options,
1623
+ role: 'worker',
1624
+ categoryOverride: BLOCK_TAGS.GIT_STATE_UNVERIFIABLE,
1625
+ reasonOverride:
1626
+ `git status unverifiable (${tree.error ?? 'unknown'}); refusing to synthesize verify signal`,
1627
+ });
1628
+ }
1629
+ if (tree.dirty) {
1630
+ const sample = (tree.dirtyFiles ?? []).slice(0, 5).join(', ');
1631
+ return _handlePollFailure(error, {
1632
+ paths, state, slug, options,
1633
+ role: 'worker',
1634
+ categoryOverride: BLOCK_TAGS.WORKER_INCOMPLETE_UNCOMMITTED,
1635
+ reasonOverride:
1636
+ `worker_incomplete_uncommitted: done-claim present but tree dirty (${sample || 'no file list'})`,
1637
+ });
1638
+ }
1639
+ // Clean tree — preserve the legacy synthesize behaviour.
1435
1640
  signal = {
1436
1641
  iteration: state.iteration,
1437
1642
  status: 'verify',
@@ -1448,6 +1653,11 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1448
1653
  }
1449
1654
  }
1450
1655
 
1656
+ // Bug #7 Fix-Q/R: reap the worker pane the instant we accept the signal so
1657
+ // claude/codex cannot self-review and rewrite iter-signal.json. Runs even
1658
+ // for the codex-fallback synthesized signal (no-op on a dead pane).
1659
+ await reapProducer(state.worker_pane_id, paths.signalFile);
1660
+
1451
1661
  // US-019 R7 P1-G: verify_partial malformed downgrade.
1452
1662
  // verify_partial requires verified_acs[] to be a non-empty array. Otherwise the verifier
1453
1663
  // has nothing to evaluate and we must treat the signal as broken contract → blocked.
@@ -1517,6 +1727,11 @@ async function _runCampaignBody(slug, options, paths, rootDir) {
1517
1727
  });
1518
1728
  }
1519
1729
 
1730
+ // Bug #7 Fix-Q/R: reap verifier pane immediately after accepting the
1731
+ // verdict — without this the codex/claude TUI keeps running for ~2min and
1732
+ // can rewrite verify-verdict.json (mtime drift observed in 19th launch).
1733
+ await reapProducer(state.verifier_pane_id, paths.verdictFile);
1734
+
1520
1735
  if (verdict.verdict === 'pass') {
1521
1736
  state.consecutive_failures = 0;
1522
1737
  if (!state.verified_us.includes(usId)) {
@@ -59,3 +59,86 @@ export async function writeSentinelExclusive(targetPath, content) {
59
59
  }
60
60
  return { wrote: true };
61
61
  }
62
+
63
+ // Bug #7 Fix-R: best-effort chmod 0o444 to freeze a sentinel file once the
64
+ // leader has accepted it. Mirror of scripts/postinstall.js tryLockFile (L104).
65
+ // Some filesystems silently ignore chmod (WSL1/NTFS, tmpfs); we log once and
66
+ // continue. Q (process kill) is the primary defense; R is defense-in-depth.
67
+ let _sentinelLockWarningEmitted = false;
68
+ export async function lockSentinelFile(filePath, { log = (msg) => console.error(msg) } = {}) {
69
+ try {
70
+ await fs.chmod(filePath, 0o444);
71
+ } catch (err) {
72
+ if (err && err.code === 'ENOENT') {
73
+ // File missing is not an error — sentinel may have been consumed and
74
+ // unlinked by a concurrent path. Idempotent no-op.
75
+ return;
76
+ }
77
+ if (!_sentinelLockWarningEmitted) {
78
+ log(`[bug7] chmod 0444 on ${filePath} failed (${err?.code ?? 'unknown'}); post-sentinel write-protection unavailable on this FS.`);
79
+ _sentinelLockWarningEmitted = true;
80
+ }
81
+ }
82
+ }
83
+
84
+ // Pair to lockSentinelFile. Called before fs.unlink in iter-cleanup paths so
85
+ // subsequent atomic-rename writes never see EACCES on the destination mode.
86
+ // Idempotent — missing file or already-writable is fine.
87
+ export async function unlockSentinelFile(filePath) {
88
+ try {
89
+ await fs.chmod(filePath, 0o644);
90
+ } catch {
91
+ // best-effort; cleanup proceeds regardless.
92
+ }
93
+ }
94
+
95
+ // PR-0b-narrow (Plan v6) — stamp leader handshake ack onto an already-locked
96
+ // sentinel. Best-effort, audit-only: the contract is "if we can write, do; if
97
+ // not, swallow". Callers must NOT depend on the ack landing for hard ordering
98
+ // semantics (use waitForProcessExit + the chmod 0o444 lock for that). The
99
+ // resulting `content.leader_ack` is auxiliary metadata so post-mortem audits
100
+ // can prove which Leader iteration consumed which sentinel.
101
+ //
102
+ // Sequence (mirrored in src/scripts/lib_ralph_desk.zsh::_stamp_ack_field):
103
+ // 1. chmod 0o644 (so we can write — sentinel was locked by lockSentinelFile)
104
+ // 2. JSON.parse
105
+ // 3. merge ack as content.leader_ack
106
+ // 4. atomic write
107
+ // 5. chmod 0o444 (re-lock)
108
+ //
109
+ // All steps wrapped in try/catch; any failure is silently dropped. Failure
110
+ // modes that we deliberately swallow:
111
+ // - File missing (sentinel was unlinked by a concurrent path).
112
+ // - Malformed JSON (race with a partial-write window — Bug #7 already gates
113
+ // this on the read side, but stampAckField may still observe it during
114
+ // transitional iterations).
115
+ // - chmod ENOTSUP / WSL1 / NTFS (recorded in Bug #7 fixes).
116
+ export async function stampAckField(filePath, ack, { log = (msg) => console.error(msg) } = {}) {
117
+ try {
118
+ await fs.chmod(filePath, 0o644);
119
+ } catch (err) {
120
+ if (err && err.code === 'ENOENT') return; // sentinel gone — nothing to stamp
121
+ // chmod failure is non-fatal — try the write anyway in case the FS already allows it
122
+ }
123
+ let content;
124
+ try {
125
+ const raw = await fs.readFile(filePath, 'utf8');
126
+ content = JSON.parse(raw);
127
+ } catch (err) {
128
+ log(`[stamp-ack] read/parse failed for ${filePath} (${err?.code ?? err?.message ?? 'unknown'}); ack dropped (audit-only)`);
129
+ // Re-lock if possible — best-effort.
130
+ try { await fs.chmod(filePath, 0o444); } catch {}
131
+ return;
132
+ }
133
+ if (!content || typeof content !== 'object') {
134
+ try { await fs.chmod(filePath, 0o444); } catch {}
135
+ return;
136
+ }
137
+ content.leader_ack = ack;
138
+ try {
139
+ await fs.writeFile(filePath, `${JSON.stringify(content, null, 2)}\n`, 'utf8');
140
+ } catch (err) {
141
+ log(`[stamp-ack] write failed for ${filePath} (${err?.code ?? err?.message ?? 'unknown'}); ack dropped`);
142
+ }
143
+ try { await fs.chmod(filePath, 0o444); } catch {}
144
+ }
@@ -52,6 +52,12 @@ export async function sendKeys(paneId, command) {
52
52
  await runTmux(['send-keys', '-t', paneId, 'Enter'], { paneId });
53
53
  }
54
54
 
55
+ // Bug #7 Fix-Q: send a raw tmux key (e.g. C-c) without the `-l --` literal-text
56
+ // flag. Distinct from sendKeys() so callers can interrupt a running TUI.
57
+ export async function sendRawKey(paneId, key) {
58
+ await runTmux(['send-keys', '-t', paneId, key], { paneId });
59
+ }
60
+
55
61
  export async function waitForProcessExit(
56
62
  paneId,
57
63
  { pollIntervalMs = 100, timeoutMs = 5000 } = {},
@@ -75,3 +81,36 @@ export async function waitForProcessExit(
75
81
  paneId,
76
82
  });
77
83
  }
84
+
85
+ // Bug #7 Fix-Q: terminate the TUI process producing a sentinel file the moment
86
+ // the leader has accepted it. Without this, claude/codex returns to its idle
87
+ // prompt and continues self-review for 1-2 minutes, racing the next iteration.
88
+ // Mirror of zsh pattern at run_ralph_desk.zsh:2384-2397, 375-376, 529-530.
89
+ // Fail-open: pane may already be dead from prior teardown, or waitForExit may
90
+ // time out — neither aborts the iteration.
91
+ export async function killPaneProcess(
92
+ paneId,
93
+ {
94
+ sendRawKey: sendRawKeyImpl = sendRawKey,
95
+ waitForExit = waitForProcessExit,
96
+ gracePeriodMs = 800,
97
+ exitTimeoutMs = 5000,
98
+ log = () => {},
99
+ } = {},
100
+ ) {
101
+ const safeSend = async (key) => {
102
+ try {
103
+ await sendRawKeyImpl(paneId, key);
104
+ } catch (err) {
105
+ log(`[bug7] killPaneProcess sendRawKey ${key} failed for ${paneId}: ${err?.message ?? err}`);
106
+ }
107
+ };
108
+ await safeSend('C-c');
109
+ await new Promise((resolve) => setTimeout(resolve, gracePeriodMs));
110
+ await safeSend('C-c');
111
+ try {
112
+ await waitForExit(paneId, { timeoutMs: exitTimeoutMs });
113
+ } catch (err) {
114
+ log(`[bug7] killPaneProcess waitForExit failed for ${paneId}: ${err?.message ?? err}`);
115
+ }
116
+ }
@@ -46,17 +46,19 @@ build_claude_cmd() {
46
46
  # Defends against bracketed model ids like 'claude-opus-4-7[1m]' (zsh char-class glob),
47
47
  # spaces, embedded quotes, etc. Plain "$model" would let zsh expand brackets as glob.
48
48
  #
49
- # v5.7 §4.9: auto-enable Opus 1M context window via ANTHROPIC_BETA env. Mirror
50
- # of src/node/constants.mjs OPUS_1M_BETA. Update both on header rotation.
51
- local _opus_beta=""
49
+ # v0.14.6: ANTHROPIC_BETA injected only when the model id ends with the
50
+ # explicit '[1m]' suffix. opus / sonnet / claude-opus-4-7 (no suffix) all
51
+ # run at the standard 200K context. Mirror of src/node/constants.mjs
52
+ # ONE_MILLION_BETA + wantsOneMillionContext(). Update both on rotation.
53
+ local _onem_beta=""
52
54
  case "$model" in
53
- opus|claude-opus-*) _opus_beta="ANTHROPIC_BETA='context-1m-2025-08-07' " ;;
55
+ *\[1m\]) _onem_beta="ANTHROPIC_BETA='context-1m-2025-08-07' " ;;
54
56
  esac
55
57
  # v5.7 §4.11.a: --add-dir whitelist for autonomous mode. ROOT (campaign cwd)
56
58
  # plus home rlp-desk tree authorized for read/write without TUI prompts.
57
59
  local _home_desk="$HOME/.claude/ralph-desk"
58
60
  local _add_dirs="--add-dir ${(qq)_home_desk} --add-dir ${(qq)ROOT}"
59
- local base="DISABLE_OMC=1 ${_opus_beta}$CLAUDE_BIN --model ${(qq)model} --mcp-config '{\"mcpServers\":{}}' --strict-mcp-config --dangerously-skip-permissions ${_add_dirs}"
61
+ local base="DISABLE_OMC=1 ${_onem_beta}$CLAUDE_BIN --model ${(qq)model} --mcp-config '{\"mcpServers\":{}}' --strict-mcp-config --dangerously-skip-permissions ${_add_dirs}"
60
62
  if [[ -n "$effort" ]]; then
61
63
  base="$base --effort $effort"
62
64
  fi
@@ -242,6 +244,74 @@ atomic_write() {
242
244
  mv "$tmp" "$target"
243
245
  }
244
246
 
247
+ # =============================================================================
248
+ # Bug #7 Fix-Q/R: Post-sentinel pane reaper + sentinel write-lock
249
+ # =============================================================================
250
+ # Without explicit teardown the claude/codex TUI returns to its idle prompt and
251
+ # self-reviews for ~2min after writing iter-signal.json or verify-verdict.json.
252
+ # Observed: verdict mtime drift 1m43s post-detect; iter-N verifier overlapped
253
+ # iter-N+1 worker for 2min. _kill_pane_process closes the race; _lock_sentinel
254
+ # is defense-in-depth that freezes the file mtime. Mirror of run_ralph_desk.zsh
255
+ # verifier-cleanup pattern at L2384-2397 (Ctrl+C + /exit + wait_for_pane_ready).
256
+ # Both helpers are fail-open: pane may already be dead, FS may ignore chmod.
257
+ _kill_pane_process() {
258
+ local pane_id="$1"
259
+ local role="${2:-producer}"
260
+ [[ -n "$pane_id" ]] || return 0
261
+ if typeset -f log_debug >/dev/null 2>&1; then
262
+ log_debug "[bug7] kill_pane_process pane=$pane_id role=$role"
263
+ fi
264
+ tmux send-keys -t "$pane_id" C-c 2>/dev/null
265
+ sleep 0.5
266
+ tmux send-keys -t "$pane_id" C-c 2>/dev/null
267
+ sleep 1
268
+ if typeset -f wait_for_pane_ready >/dev/null 2>&1; then
269
+ wait_for_pane_ready "$pane_id" 5 2>/dev/null || true
270
+ fi
271
+ return 0
272
+ }
273
+
274
+ _lock_sentinel() {
275
+ local file="$1"
276
+ [[ -n "$file" && -f "$file" ]] || return 0
277
+ chmod 0444 "$file" 2>/dev/null || true
278
+ return 0
279
+ }
280
+
281
+ _unlock_sentinel() {
282
+ local file="$1"
283
+ [[ -n "$file" && -f "$file" ]] || return 0
284
+ chmod 0644 "$file" 2>/dev/null || true
285
+ return 0
286
+ }
287
+
288
+ # PR-0b-narrow (Plan v6) — stamp leader handshake ack onto the sentinel.
289
+ # Mirror of src/node/shared/fs.mjs::stampAckField. Best-effort, audit-only:
290
+ # any failure is silently swallowed. Sequence:
291
+ # 1. chmod 0644 (so jq + mv can write)
292
+ # 2. jq merge .leader_ack
293
+ # 3. atomic rename via tmp file
294
+ # 4. chmod 0444 (re-lock)
295
+ # Tolerant of jq absence (graceful degrade — no stamp, no error).
296
+ _stamp_ack_field() {
297
+ local file="$1"
298
+ [[ -n "$file" && -f "$file" ]] || return 0
299
+ command -v jq >/dev/null 2>&1 || return 0
300
+ local now_iso
301
+ now_iso=$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo "")
302
+ local tmp="${file}.ack.tmp"
303
+ chmod 0644 "$file" 2>/dev/null || true
304
+ if jq --arg ts "$now_iso" \
305
+ '. + {leader_ack: {acked_by: "leader", acked_at: $ts, ack_pane_state: "shell"}}' \
306
+ "$file" > "$tmp" 2>/dev/null; then
307
+ mv "$tmp" "$file" 2>/dev/null || rm -f "$tmp" 2>/dev/null
308
+ else
309
+ rm -f "$tmp" 2>/dev/null
310
+ fi
311
+ chmod 0444 "$file" 2>/dev/null || true
312
+ return 0
313
+ }
314
+
245
315
  # =============================================================================
246
316
  # Scaffold Validation
247
317
  # =============================================================================
@@ -635,27 +635,82 @@ launch_verifier_claude() {
635
635
  # On exit: check done-claim, auto-generate iter-signal.
636
636
  # Args: $1=iteration $2=signal_file
637
637
  # Returns: 0 (signal generated), 1 (error)
638
+ # Bug #8 PR-B (codex critic P1.2 fix): shared 4-way gate used by both
639
+ # handle_worker_exit_codex and the inline-polling A4 path. Returns:
640
+ # 0 = synthesize allowed (caller writes signal_file + emits audit)
641
+ # 1 = BLOCKED (this function already wrote sentinel + emitted audit)
642
+ # Args: $1=iter $2=us_id $3=audit_clean_code (e.g. codex_exit_with_done_claim
643
+ # or inline_polling_a4_clean)
644
+ _bug8_check_synth_allowed() {
645
+ local iter="$1"
646
+ local us_id="${2:-${CURRENT_US:-ALL}}"
647
+ local audit_clean="$3"
648
+
649
+ # Gate 1: done-claim must exist.
650
+ if [[ ! -f "$DONE_CLAIM_FILE" ]]; then
651
+ log_error " Bug #8: no done-claim. Refusing to synthesize verify signal."
652
+ log_debug "[GOV] iter=$iter bug8=block_codex_exit_no_done_claim"
653
+ write_blocked_sentinel \
654
+ "Codex worker exited without writing done-claim (refusing to synthesize verify signal)" \
655
+ "$us_id" \
656
+ "infra_failure"
657
+ _emit_a4_fallback_audit "$us_id" "$iter" "blocked_codex_exit_no_done_claim"
658
+ return 1
659
+ fi
660
+
661
+ # Gate 2: git toplevel must equal $ROOT (canonicalized — macOS resolves
662
+ # /var → /private/var, NTFS may have 8.3 short paths; compare realpaths).
663
+ local _bug8_top _bug8_top_canon _bug8_root_canon
664
+ _bug8_top=$(git -C "$ROOT" rev-parse --show-toplevel 2>/dev/null)
665
+ _bug8_top_canon=$(cd "$_bug8_top" 2>/dev/null && pwd -P 2>/dev/null)
666
+ _bug8_root_canon=$(cd "$ROOT" 2>/dev/null && pwd -P 2>/dev/null)
667
+ if [[ -z "$_bug8_top" || "$_bug8_top_canon" != "$_bug8_root_canon" ]]; then
668
+ log_error " Bug #8: git unverifiable at \$ROOT=$ROOT (toplevel='$_bug8_top'). Refusing synthesis."
669
+ log_debug "[GOV] iter=$iter bug8=block_git_unverifiable root=$ROOT toplevel=$_bug8_top"
670
+ write_blocked_sentinel \
671
+ "git status unverifiable at $ROOT (toplevel='$_bug8_top'); refusing to synthesize verify signal" \
672
+ "$us_id" \
673
+ "infra_failure"
674
+ _emit_a4_fallback_audit "$us_id" "$iter" "blocked_git_unverifiable"
675
+ return 1
676
+ fi
677
+
678
+ # Gate 3: tree must be clean.
679
+ local _bug8_dirty
680
+ _bug8_dirty=$(git -C "$ROOT" status --porcelain 2>/dev/null)
681
+ if [[ -n "$_bug8_dirty" ]]; then
682
+ local _bug8_first5
683
+ _bug8_first5=$(printf '%s\n' "$_bug8_dirty" | head -n 5 | tr '\n' '|' | sed 's/|$//')
684
+ log_error " Bug #8: done-claim present but tree dirty. Refusing synthesis. dirty: $_bug8_first5"
685
+ log_debug "[GOV] iter=$iter bug8=block_dirty_tree us_id=$us_id dirty='$_bug8_first5'"
686
+ write_blocked_sentinel \
687
+ "worker_incomplete_uncommitted: done-claim present but tree dirty ($_bug8_first5)" \
688
+ "$us_id" \
689
+ "metric_failure"
690
+ _emit_a4_fallback_audit "$us_id" "$iter" "blocked_dirty_tree"
691
+ return 1
692
+ fi
693
+
694
+ # All gates passed — synthesize allowed.
695
+ return 0
696
+ }
697
+
638
698
  handle_worker_exit_codex() {
639
699
  local iter="$1"
640
700
  local signal_file="$2"
641
701
 
642
- log " Codex worker process exited. Checking for done-claim..."
643
- if [[ -f "$DONE_CLAIM_FILE" ]]; then
644
- local dc_us_id
645
- dc_us_id=$(jq -r '.us_id // "unknown"' "$DONE_CLAIM_FILE" 2>/dev/null)
646
- log " Codex worker completed with done-claim (us_id=$dc_us_id). Auto-generating signal."
647
- echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated after codex exit","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
648
- _emit_a4_fallback_audit "$dc_us_id" "$iter" "codex_exit_with_done_claim"
649
- else
650
- log " WARNING: Codex worker exited without done-claim. Generating verify signal for current US."
651
- local current_us
652
- current_us=$(jq -r '.us_id // "US-001"' "$DESK/memos/${SLUG}-iter-signal.json" 2>/dev/null || echo "US-001")
653
- local mem_us
654
- mem_us=$(sed -n 's/.*Next.*US-\([0-9]*\).*/US-\1/p' "$DESK/memos/${SLUG}-memory.md" 2>/dev/null | head -1)
655
- [[ -n "$mem_us" ]] && current_us="$mem_us"
656
- echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$current_us"'","summary":"auto-generated after codex exit (no done-claim)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
657
- _emit_a4_fallback_audit "$current_us" "$iter" "codex_exit_no_done_claim"
702
+ log " Codex worker process exited. Checking for done-claim + clean tree..."
703
+
704
+ if ! _bug8_check_synth_allowed "$iter" "${CURRENT_US:-ALL}" "codex_exit_with_done_claim"; then
705
+ return 1
658
706
  fi
707
+
708
+ # All 3 gates passed: done-claim present, git OK, tree clean → synthesize.
709
+ local dc_us_id
710
+ dc_us_id=$(jq -r '.us_id // "unknown"' "$DONE_CLAIM_FILE" 2>/dev/null)
711
+ log " Codex worker completed with done-claim (us_id=$dc_us_id) and clean tree. Auto-generating signal."
712
+ echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated after codex exit (clean tree)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
713
+ _emit_a4_fallback_audit "$dc_us_id" "$iter" "codex_exit_with_done_claim_clean"
659
714
  return 0
660
715
  }
661
716
 
@@ -2176,8 +2231,22 @@ poll_for_signal() {
2176
2231
 
2177
2232
  # Check if signal file appeared
2178
2233
  if [[ -f "$signal_file" ]]; then
2179
- log " Signal file detected: $signal_file"
2180
- return 0 # success
2234
+ # Bug #7-extra (BOS 2026-05-06): file existence is NOT enough. Worker
2235
+ # (claude opus) writes via Claude Code's Write tool, which is not
2236
+ # guaranteed atomic — the file can appear with empty / partial JSON
2237
+ # before the write completes. Verifier was being dispatched against a
2238
+ # half-written iter-signal.json. Validate that the file holds a single
2239
+ # parseable, non-null JSON value (`jq -e .`) before accepting; any
2240
+ # failure simply continues polling (next tick re-reads). Note: `jq
2241
+ # empty` was rejected because it accepts an EMPTY file as "zero
2242
+ # documents" — the exact race window we need to reject.
2243
+ if jq -e . "$signal_file" >/dev/null 2>&1; then
2244
+ log " Signal file detected: $signal_file"
2245
+ return 0 # success
2246
+ fi
2247
+ # Empty / truncated / mid-write JSON. Stay in the polling loop and let
2248
+ # the next tick re-read once the writer has finished.
2249
+ log_debug "[bug7-extra] $role signal file present but JSON not yet valid — continue polling"
2181
2250
  fi
2182
2251
 
2183
2252
  # A4 fallback: done-claim exists but no signal → Worker forgot iter-signal
@@ -2216,11 +2285,24 @@ poll_for_signal() {
2216
2285
  local dc_us_id
2217
2286
  dc_us_id=$(jq -r '.us_id // "unknown"' "$DONE_CLAIM_FILE" 2>/dev/null)
2218
2287
  if [[ -n "$dc_us_id" && "$dc_us_id" != "null" ]]; then
2219
- log " WARNING: done-claim exists for $dc_us_id but no iter-signal. Auto-generating signal (A4 fallback)."
2220
- log_debug "[GOV] iter=$ITERATION done_claim_without_signal=true us_id=$dc_us_id action=auto_generate_signal"
2221
- echo '{"iteration":'"$ITERATION"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated by A4 fallback (done-claim without signal)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
2222
- _emit_a4_fallback_audit "$dc_us_id" "$ITERATION" "inline_polling_a4"
2223
- return 0
2288
+ # Bug #8 PR-B: defer to shared 4-way gate (codex critic P1.2).
2289
+ # _bug8_check_synth_allowed handles done-claim/git/dirty-tree gates
2290
+ # uniformly across handle_worker_exit_codex AND this inline path so
2291
+ # both codex-exit and inline-polling A4 enforce the same contract.
2292
+ if _bug8_check_synth_allowed "$ITERATION" "$dc_us_id" "inline_polling_a4_clean"; then
2293
+ log " WARNING: done-claim exists for $dc_us_id but no iter-signal. Tree clean — auto-generating signal (A4 fallback)."
2294
+ log_debug "[GOV] iter=$ITERATION done_claim_without_signal=true us_id=$dc_us_id action=auto_generate_signal"
2295
+ echo '{"iteration":'"$ITERATION"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated by A4 fallback (done-claim + clean tree)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
2296
+ _emit_a4_fallback_audit "$dc_us_id" "$ITERATION" "inline_polling_a4_clean"
2297
+ return 0
2298
+ else
2299
+ # Bug #8 PR-B (codex critic round-2 P2): hard-stop rc=2 so the
2300
+ # main worker loop (L3119) treats this BLOCKED as terminal,
2301
+ # matching the handle_worker_exit_codex blocked path. rc=1 is
2302
+ # ambiguous — caller may interpret it as a recoverable poll
2303
+ # failure and re-loop while the BLOCKED sentinel is on disk.
2304
+ return 2
2305
+ fi
2224
2306
  fi
2225
2307
  fi
2226
2308
  fi
@@ -2271,8 +2353,16 @@ poll_for_signal() {
2271
2353
  fi
2272
2354
  # Dispatch to engine-specific exit handler
2273
2355
  if [[ "$WORKER_ENGINE" = "codex" && "$role" != *erifier* ]]; then
2274
- handle_worker_exit_codex "$ITERATION" "$signal_file"
2275
- return 0
2356
+ # Bug #8 PR-B: handle_worker_exit_codex now returns 1 when it has
2357
+ # written a BLOCKED sentinel (no done-claim, dirty tree, git
2358
+ # unverifiable). Propagate the return so main loop stops, instead
2359
+ # of swallowing it with `return 0` and continuing as if the poll
2360
+ # had succeeded.
2361
+ if handle_worker_exit_codex "$ITERATION" "$signal_file"; then
2362
+ return 0
2363
+ else
2364
+ return 2
2365
+ fi
2276
2366
  fi
2277
2367
  # Claude path (or verifier of any engine)
2278
2368
  if handle_worker_exit_claude "$pane_id" "$ITERATION" "$trigger_file"; then
@@ -2467,8 +2557,16 @@ run_single_verifier() {
2467
2557
  fi
2468
2558
  fi
2469
2559
 
2560
+ # Bug #7 Fix-Q/R: reap verifier pane the moment we accept the verdict so
2561
+ # codex/claude cannot keep self-reviewing and rewrite verify-verdict.json.
2562
+ # Lock applied AFTER cp so the archived snapshot is also frozen at intent.
2563
+ _kill_pane_process "$VERIFIER_PANE" "verifier-${suffix}"
2564
+
2470
2565
  # Copy verdict to destination
2471
2566
  cp "$VERDICT_FILE" "$verdict_dest"
2567
+ _lock_sentinel "$VERDICT_FILE"
2568
+ # PR-0b-narrow: stamp leader handshake ack on the verdict (audit-only).
2569
+ _stamp_ack_field "$VERDICT_FILE"
2472
2570
  log " Verifier$suffix verdict saved to $verdict_dest"
2473
2571
  return 0
2474
2572
  }
@@ -2528,6 +2626,14 @@ run_sequential_final_verify() {
2528
2626
  return 1
2529
2627
  fi
2530
2628
 
2629
+ # Bug #7 Fix-Q/R: reap verifier pane between per-US final verifications so
2630
+ # the previous codex/claude TUI cannot continue running while the next per-
2631
+ # US verifier dispatch reuses the same pane.
2632
+ _kill_pane_process "$VERIFIER_PANE" "verifier-final"
2633
+ _lock_sentinel "$VERDICT_FILE"
2634
+ # PR-0b-narrow: stamp leader handshake ack on the verdict (audit-only).
2635
+ _stamp_ack_field "$VERDICT_FILE"
2636
+
2531
2637
  # Check verdict
2532
2638
  local verdict
2533
2639
  verdict=$(jq -r '.verdict' "$VERDICT_FILE" 2>/dev/null)
@@ -2940,6 +3046,10 @@ main() {
2940
3046
  fi
2941
3047
 
2942
3048
  # --- governance.md s7 step 8 (cleanup): Clean previous iteration signals ---
3049
+ # Bug #7 Fix-R cleanup: unlock 0o444 sentinels written by the previous
3050
+ # iteration's reaper before rm so cleanup does not log permission noise.
3051
+ _unlock_sentinel "$SIGNAL_FILE"
3052
+ _unlock_sentinel "$VERDICT_FILE"
2943
3053
  rm -f "$SIGNAL_FILE" "$DONE_CLAIM_FILE" "$VERDICT_FILE" 2>/dev/null
2944
3054
  rm -f "$WORKER_HEARTBEAT" "$VERIFIER_HEARTBEAT" 2>/dev/null
2945
3055
 
@@ -3003,6 +3113,12 @@ main() {
3003
3113
  if poll_for_signal "$SIGNAL_FILE" "$WORKER_HEARTBEAT" "$WORKER_PANE" "$worker_launch" "Worker"; then
3004
3114
  worker_poll_done=1
3005
3115
  log_debug "[FLOW] iter=$ITERATION poll_signal_received=true"
3116
+ # Bug #7 Fix-Q/R: reap worker pane immediately so claude/codex cannot
3117
+ # self-review and rewrite iter-signal.json (1m43s drift observed).
3118
+ _kill_pane_process "$WORKER_PANE" "worker"
3119
+ _lock_sentinel "$SIGNAL_FILE"
3120
+ # PR-0b-narrow: stamp leader handshake ack on the iter-signal (audit-only).
3121
+ _stamp_ack_field "$SIGNAL_FILE"
3006
3122
  else
3007
3123
  worker_poll_rc=$?
3008
3124
  if (( worker_poll_rc == 2 )); then
@@ -3210,6 +3326,12 @@ main() {
3210
3326
  update_status "blocked" "verifier_dead"
3211
3327
  return 1
3212
3328
  fi
3329
+ # Bug #7 Fix-Q/R: reap verifier pane immediately so codex cannot
3330
+ # rewrite verify-verdict.json post-detect (mtime drift fix).
3331
+ _kill_pane_process "$VERIFIER_PANE" "verifier"
3332
+ _lock_sentinel "$VERDICT_FILE"
3333
+ # PR-0b-narrow: stamp leader handshake ack on the verdict (audit-only).
3334
+ _stamp_ack_field "$VERDICT_FILE"
3213
3335
  fi
3214
3336
 
3215
3337
  # AC1: capture verifier end timestamp