@ai-dev-methodologies/rlp-desk 0.15.0 → 0.15.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,49 @@
1
+ # Bug Report Overhaul — P2/P3 Backlog
2
+
3
+ > Companion to `bug-report-overhaul-v1.md` (PR-A/B/C plan).
4
+ > User stop-rule: ralplan iterates only until P0+P1 = 0; P2 and below are captured here, NOT blockers.
5
+ > Re-prioritize from this file in a future ralplan when the operator-minutes-saved metric from PR-A/B/C lands.
6
+
7
+ ---
8
+
9
+ ## P2 — should fix in a follow-up PR after PR-A/B/C land
10
+
11
+ ### From v0 plan (Option C/D, deferred features)
12
+
13
+ - **Heartbeat-warning sidecar (Option B from v0)** — emit `<slug>-warning.{md,json}` when heartbeat anomaly crosses 50% of `iter-timeout`. Lets operator pre-empt a BLOCKED before the 30-min wall hits. Decoupled from this PR set because (a) report-quality is the dominant pain (D1), and (b) warning sidecar adds a second sentinel surface that risks false-positive fatigue. Revisit after PR-A/B land and we measure how many BLOCKEDs would have been pre-empted.
14
+ - **GitHub Issues integration (Option D from v0)** — POST blocked context to a configured GitHub repo issue. Requires per-repo authn story (token storage, network retry, rate-limits) — violates principle 3 in the current PR set. Re-evaluate after a credible authn proposal exists.
15
+ - **Pattern-learning loop** — mine `~/.claude/ralph-desk/analytics/*/bug-reports/` for emerging clusters. Auto-extends `docs/bug-patterns.json` with new candidate signatures for human review.
16
+ - **Cross-campaign bug-report dashboard in `/rlp-desk analytics`** — surface patterns across projects.
17
+ - **Auto-suggest "this looks like Bug #N — try fix-X" inline in CLI output** — operationalize PR-C's `pattern_match` data with an inline suggestion. Held back so the deterministic Jaccard implementation can be calibrated against real campaign data first.
18
+ - **Operator-CLI `/rlp-desk recover <slug> --to verify`** — write the manual recovery artifacts (`iter-signal.json`, `done-claim.json`, `status.json` patch) deterministically. Currently a hand-rolled `jq` pipeline per Bug #10 §7 workaround.
19
+
20
+ ### From Codex Critic Round 2 (BACKLOG)
21
+
22
+ - **[P2-1]** PR-A `_validateOperatorRecoveryArtifacts` return shape — current pseudo-code mixes `if (valid)` (boolean coercion) with `valid.reason` (object access). Resolve at implementation time to either `{ ok: bool, reason: string }` (object) or pure boolean + separate side-channel for the warning text. Affects the audit log line shape.
23
+ - **[P2-2]** PR-A test summary in §5 says "5 ACs (R1–R5)" but §8 added AC-R6 (`_skipNextWorkerDispatch` cleared after one use). Update §5 to "6 ACs (R1–R6)" for consistency before PR-A merges.
24
+
25
+ ### From Codex Critic Round 3 (BACKLOG)
26
+
27
+ - **[P2-3]** §9 step 5 banner-aware diff command only covers `run_ralph_desk.zsh`. PR-A and PR-B both also touch `lib_ralph_desk.zsh`. Add a matching `diff <(cat src/scripts/lib_ralph_desk.zsh) <(tail -n +N ~/.claude/ralph-desk/scripts/lib_ralph_desk.zsh)` step in the implementation runbook (verify the right `tail -n +N` offset at impl time — `lib_*.zsh` is sourced and may have no shebang). Extend to `init_ralph_desk.zsh` if PR-B touches it.
28
+
29
+ ## P3 — nice-to-have polish
30
+
31
+ ### From Codex Critic Round 2
32
+
33
+ - **[P3-1]** Option C/D/E rejection rationale in v1 §4 says "Same as v0" — acceptable because v0 is co-located, but inline one-sentence rationale would make the v1 plan self-contained for future readers who do not have the v0 file.
34
+
35
+ ### From Architect Round 1 (residual notes)
36
+
37
+ - Validate the `bug-patterns.json` Jaccard threshold (0.7) against actual past blocks once we have ≥20 historical reports — current threshold is hand-picked. Likely needs a small calibration script in `scripts/`.
38
+ - Consider whether `bug-reports/` should ship in the npm tarball default `.gitignore` of newly initialized projects — currently the schema doc only recommends operators add it themselves.
39
+
40
+ ---
41
+
42
+ ## Promotion criteria (when to re-ralplan one of these)
43
+
44
+ A backlog item moves back into a planner draft when **any** of these is true:
45
+
46
+ 1. PR-A/B/C lands and we measure ≥3 BLOCKEDs where the deferred item would have moved D1 by ≥10 minutes (e.g. heartbeat warning would have pre-empted a 30-min wait).
47
+ 2. Operator hand-files ≥2 bug reports about the same backlog gap (signal that the deferral was wrong).
48
+ 3. The `bug-patterns.json` seed becomes too large for human authoring (≥30 entries) — triggers the pattern-learning loop item.
49
+ 4. A user explicitly asks for one (e.g. operator-CLI `/rlp-desk recover` once they fatigue of jq pipelines).
@@ -0,0 +1,238 @@
1
+ # Bug Report Mechanism Overhaul — v0 (RALPLAN-DR Planner Draft)
2
+
3
+ > **Status**: Planner draft awaiting Architect → Codex Critic.
4
+ > **Mode**: deliberate (auto-enabled — touches governance, runner, slash command, test infra).
5
+ > **Stop rule**: iterate until codex critic returns 0 P0 + 0 P1. P2 → backlog.
6
+ > **Critic instruction**: *approve unless P0 or P1 found.*
7
+
8
+ ---
9
+
10
+ ## 1. Problem statement
11
+
12
+ 10 hand-written 200-line bug reports (`Bug #1`–`Bug #10`, BOS dev `2026-05-01..05-07`) point at one root frustration: **bugs are endless and each one costs 30+ min of operator time to package** before the rlp-desk side can even start triage. Examples:
13
+
14
+ | Pain | Evidence |
15
+ |---|---|
16
+ | Manual context capture | Each report re-collects: env, version, command, status snapshot, pane logs, settings, gitignore — all already on disk |
17
+ | No similarity search | Bug #6/#7/#8 are all "worker hang variants"; operator re-discovers the cluster each time |
18
+ | Recovery is broken | Bug #10 — leader resets `phase=worker` ignoring operator's `phase=verify` manual recovery; operator's iter-signal/done-claim files deleted |
19
+ | Reactive only | Bugs surface only after full BLOCKED (~30 min poll timeout); no early warning on heartbeat anomalies |
20
+ | No deterministic repro pack | rlp-desk side has to chase BOS for missing context (logs, env, version) → fix latency multiplier |
21
+
22
+ The blocked-sentinel JSON (`schema_version: 2.0`) already classifies (`reason_category` / `recoverable` / `suggested_action`) but stops at the campaign boundary — it does not become a *bug report*. That gap is the target.
23
+
24
+ ---
25
+
26
+ ## 2. Principles (5)
27
+
28
+ 1. **Capture-by-default, not by-request.** When the campaign blocks, the operator should not have to gather anything that already exists on disk.
29
+ 2. **One canonical schema, two consumers.** A single `bug-report.json` feeds both BOS-side templates and rlp-desk-side triage; no divergent representations.
30
+ 3. **Surgical diffs over new infra.** Extend the existing `blocked.{md,json}` writer + `/rlp-desk` subcommand surface; do not introduce a new daemon, queue, or service.
31
+ 4. **Recovery must be idempotent.** Manual recovery of a BLOCKED campaign must not be silently overwritten on relaunch (Bug #10 contract).
32
+ 5. **Earlier is cheaper.** A heartbeat-anomaly *warning* costs nothing; a 30-min BLOCKED poll-timeout is the most expensive form of feedback.
33
+
34
+ ---
35
+
36
+ ## 3. Decision drivers (top 3)
37
+
38
+ | # | Driver | Why it dominates |
39
+ |---|---|---|
40
+ | D1 | **Operator minutes per BLOCKED → first actionable report** | Today: 30+ min hand-writing + log collection. Target: ≤2 min (review + 1-line headline edit). Drives the "auto-bundle" choice below. |
41
+ | D2 | **Cluster recognition (avoid duplicate `Bug #N` for same root cause)** | 5 of 10 reports cluster around "worker hang on sentinel" or "verifier post-sentinel race". Without similarity hinting we keep paying triage cost N times. |
42
+ | D3 | **Zero regression on `--mode tmux` 19th launch** | Per `docs/plans/native-agent-revert.md`, the production tmux path is mid-flight. Any change must be additive there; default behavior unchanged. |
43
+
44
+ ---
45
+
46
+ ## 4. Viable options
47
+
48
+ ### Option A — **Bundle-first**: `/rlp-desk report <slug>` + auto-emit on BLOCKED *(recommended)*
49
+
50
+ Add a single subcommand and one auto-trigger. Mechanics:
51
+
52
+ - **Trigger**: every `_handlePollFailure` / `_emitBlockedSentinel` call already in `campaign-main-loop.mjs` and `write_blocked_sentinel` in `run_ralph_desk.zsh` ALSO writes `bug-reports/<slug>-<UTCISO>.json` + `<...>.md` (template-rendered).
53
+ - **Schema**: extends current blocked-sentinel JSON v2.0 with: `repro.command`, `repro.env_snapshot`, `repro.git_head_sha`, `pane_tail.{worker,verifier}` (last 200 lines, redacted), `recent_iter_artifacts[]` (last 3 iterations' done-claim/verdict paths), `pattern_match.{candidate_bug_ids[], score}` (similarity vs known reports).
54
+ - **Subcommand**: `/rlp-desk report <slug>` to (a) regenerate from saved campaign state, (b) attach a custom headline, (c) print the markdown to stdout for paste-into-issue-tracker.
55
+ - **Pattern match**: deterministic — hashed signature on `{reason_category, failure_category, suggested_action, top-level pane stem}` against a `docs/bug-patterns.json` lookup (seeded with #1–#10).
56
+ - **Bug #10 fix**: leader on relaunch honors `status.phase == "verify"` + valid manual artifacts (validated against schema) and skips worker dispatch. Same surgical injection point used by P1-D classifier.
57
+
58
+ **Pros**: low-surface; reuses `_classifyBlock`, `writeSentinelExclusive`, `~/.claude/ralph-desk/analytics`; produces the same output regardless of whether BLOCKED came from `--mode tmux` or `--mode native`/`--mode agent`. Operator's job collapses to "edit headline".
59
+
60
+ **Cons**: pattern-match is naive (string-stem); will need iteration. Pane-tail capture risks PII/secret leak — must redact (governance §1f already has redaction precedent).
61
+
62
+ ### Option B — **Heartbeat-first**: pre-BLOCKED early warning channel
63
+
64
+ Introduce a `<slug>-warning.{md,json}` sidecar emitted whenever a heartbeat anomaly crosses a soft threshold (50% of `iter-timeout`, no progress). Operator can opt into pre-empting the BLOCKED with `/rlp-desk warn <slug>` before the 30-min wall hits.
65
+
66
+ **Pros**: shortens the perceived "bug is endless" tail by surfacing earlier.
67
+
68
+ **Cons**: orthogonal to the *report quality* problem. Adds a second sentinel surface; risks false positives that train operators to ignore. Does not solve D1 (hand-writing) or D2 (clusters).
69
+
70
+ ### Option C — **External tracker integration** (GitHub Issues auto-file)
71
+
72
+ Instead of file artifacts, POST blocked context to a configured GitHub repo issue.
73
+
74
+ **Pros**: makes rlp-desk-side triage visible without operator handoff.
75
+
76
+ **Cons**: violates principle 3 (new infra: secrets, network, retry, rate-limits). Couples to an external service. Out-of-scope per ABSOLUTE rule "NEVER push to remote without explicit user approval" — would need a per-campaign auth path. **Invalidated.**
77
+
78
+ ### Option D — **Status-quo + better doc template**
79
+
80
+ Just publish a clearer template under `docs/rlp-desk/bug-report-template.md` and call it done.
81
+
82
+ **Pros**: zero code change.
83
+
84
+ **Cons**: does not move D1 (operator minutes) or D2 (clusters) at all. Bug #10 (Recovery breakage) untouched. **Invalidated.**
85
+
86
+ ### Why A wins (with optional B as future-PR)
87
+
88
+ A directly addresses D1 (auto-bundle), D2 (pattern_match seeded with #1–#10), and Bug #10 (relaunch hygiene). B is complementary but orthogonal — defer to a separate PR after A lands and we measure operator minutes-saved. C/D fail principle 3 / D1 respectively.
89
+
90
+ ---
91
+
92
+ ## 5. Scope (this PR)
93
+
94
+ ### P0 — must land
95
+
96
+ 1. **`bug-report.json` writer** — extend `_emitBlockedSentinel` (Node, `campaign-main-loop.mjs:923-968`) and `write_blocked_sentinel` (zsh, `lib_ralph_desk.zsh`) to also emit a per-block bug-report under `.rlp-desk/bug-reports/<slug>-<iso>.{json,md}`. Schema documented at `docs/rlp-desk/bug-report-schema.md`.
97
+ 2. **Bug #10 relaunch hygiene** — in launch-time entry of `campaign-main-loop.mjs` (currently forces `phase=worker`+`iter=1`), branch on `status.phase == 'verify'` and validate operator-written `iter-signal.json` + `done-claim.json` against existing artifact-validators. If valid → skip worker dispatch, enter verifier directly. If invalid → log warning + fall through to current behavior.
98
+ 3. **Redaction pass** — `pane_tail` and `env_snapshot` go through a deny-list (governance §1f redaction precedent: any `/(api[_-]?key|token|secret|password|bearer|authorization)/i` line replaced with `<REDACTED>`).
99
+
100
+ ### P1 — must land
101
+
102
+ 4. **`/rlp-desk report <slug>` subcommand** — added to `src/commands/rlp-desk.md` per current command-handler patterns; reads the latest blocked-sentinel JSON + most recent `bug-reports/*.json`, prints the markdown render to stdout. Optional `--headline "..."` flag rewrites the title line in-place. No auto-publish to any remote.
103
+ 5. **`pattern_match` seed** — `docs/bug-patterns.json` shipped with deterministic signatures for Bug #1–#10 (manually authored from BOS reports). Bug-report writer fills `pattern_match.candidate_bug_ids[]` + `score` (Jaccard on `{reason_category, failure_category, top-level pane-tail token bag}`).
104
+ 6. **Self-Verification gate compliance** — `src/scripts/run_ralph_desk.zsh` is touched → CLAUDE.md mandates 3 self-verification scenarios (LOW + MEDIUM + CRITICAL). Spelled out in §10.
105
+
106
+ ### P2+ → `docs/plans/bug-report-overhaul-backlog.md` (separate file, not this PR)
107
+
108
+ - Heartbeat-warning sidecar (Option B).
109
+ - GitHub Issues integration (Option C, after authn story).
110
+ - Pattern-learning loop that mines `~/.claude/ralph-desk/analytics/*/bug-reports/` for emerging clusters.
111
+ - Cross-campaign bug-report dashboard in `/rlp-desk analytics`.
112
+ - Auto-suggest "this looks like Bug #N — try fix-X" inline in CLI output (today: `pattern_match` is data-only).
113
+
114
+ ---
115
+
116
+ ## 6. Files to modify
117
+
118
+ | File | Change | Risk |
119
+ |---|---|---|
120
+ | `src/node/runner/campaign-main-loop.mjs` | Extend `_emitBlockedSentinel` to call new `writeBugReport` helper; add Bug #10 relaunch-phase-honor branch in `_runCampaignBody` entry | MED |
121
+ | `src/node/shared/bug-report.mjs` (NEW) | `writeBugReport({slug, classification, reason, paths, env, paneTails, recentArtifacts})` + redaction + pattern-match | LOW (new isolated module) |
122
+ | `src/scripts/lib_ralph_desk.zsh` | New `_write_bug_report` helper called from `write_blocked_sentinel` | MED |
123
+ | `src/scripts/run_ralph_desk.zsh` | Wire `_write_bug_report` after each `write_blocked_sentinel` site (≈10 sites, all already in one taxonomy) | MED |
124
+ | `src/commands/rlp-desk.md` | Add `## report <slug>` section; document `bug-reports/` directory + schema link; add `/rlp-desk report` to help block | LOW |
125
+ | `src/governance.md` | Add §1g "Bug Report Capture" — invariant: every BLOCKED writes a bug-report; redaction rules; relaunch hygiene contract (Bug #10) | LOW (additive) |
126
+ | `docs/rlp-desk/bug-report-schema.md` (NEW) | JSON schema doc + worked example | LOW |
127
+ | `docs/bug-patterns.json` (NEW) | Seed with #1–#10 signatures | LOW |
128
+ | `tests/node/test-bug-report-writer.test.mjs` (NEW) | Schema, redaction, pattern-match unit tests | LOW |
129
+ | `tests/node/test-relaunch-phase-verify-hygiene.test.mjs` (NEW) | Bug #10 fix unit + integration | MED |
130
+ | `tests/test-bug-report-zsh-emit.sh` (NEW) | zsh side bug-report emit verification | MED |
131
+
132
+ Total: 5 modified + 6 new. No deletions. Single PR — review surface bounded.
133
+
134
+ ---
135
+
136
+ ## 7. Pre-mortem (deliberate mode — 3 scenarios)
137
+
138
+ ### S1 — Pane-tail leaks a secret into a committed bug-report
139
+
140
+ A worker pane prints `Authorization: Bearer eyJ...` from a vendor SDK debug log. `pane_tail` captures it; operator commits the bug-report markdown without re-reading.
141
+
142
+ **Mitigation**: redaction deny-list runs on the JSON writer side (not at view-time); deny-list is unit-tested in `test-bug-report-writer.test.mjs` with a fuzz-style fixture (10+ secret-shaped strings). Markdown render reads from JSON post-redaction, so it can never out-leak. Additional belt: `bug-reports/` is added to a sample `.gitignore` snippet in the schema doc; we do not auto-add to user repo `.gitignore`.
143
+
144
+ **Residual risk**: vendor-specific secret formats not in deny-list. Acceptable: schema doc tells operator to scan before committing, and pattern_match leaves an `unredacted_count` audit field that flags how many lines hit the deny-list (operator can sanity-check).
145
+
146
+ ### S2 — Bug #10 fix accidentally honors a stale `phase=verify` from a CRASHED leader and re-enters verifier on garbage state
147
+
148
+ If the leader crashed mid-worker after writing `phase=verify` (race), relaunch could enter verifier with an inconsistent on-disk state.
149
+
150
+ **Mitigation**: validation gate is strict — both `iter-signal.json` AND `done-claim.json` must (a) exist, (b) have `us_id` matching `status.target_us`, (c) have `iteration` matching `status.iteration`, (d) have `iter_signal_quality == 'specific'`, (e) be newer than the most recent `worker-prompt.md` mtime. Failure of ANY check → fall through to current behavior + log "phase=verify ignored: <reason>". This preserves backward compat and matches `_checkBlockedHygiene` precedent.
151
+
152
+ **Residual risk**: a clever filesystem race can still pass all five checks. We accept this — the existing "every relaunch resets to worker" behavior is itself a bug (#10), and Option C's stricter "operator must pass `--resume-from-verify` flag" is named in P2 backlog as an opt-in escalation if operators report false positives.
153
+
154
+ ### S3 — `pattern_match` false-positive trains operators to dismiss real bugs
155
+
156
+ Two unrelated `infra_failure` blocks both score ≥0.8 against the same Bug #N signature; operator stops reading.
157
+
158
+ **Mitigation**: `pattern_match` is **data-only** in P1 — no inline CLI suggestion. Score + candidate IDs are written to JSON; markdown render places them in a "Possible related bugs" footer with explicit "score: 0.83 — review before assuming match". Auto-suggest is deferred to P2 backlog precisely because we have not validated the signature space yet. Also: ship with **deterministic** Jaccard over a small token bag, not ML — failures are inspectable.
159
+
160
+ **Residual risk**: low — operator opt-in to act on `pattern_match`.
161
+
162
+ ---
163
+
164
+ ## 8. Expanded test plan (deliberate mode)
165
+
166
+ ### Unit (Node)
167
+
168
+ `tests/node/test-bug-report-writer.test.mjs`:
169
+
170
+ - AC-W1: schema fields all present + types match `docs/rlp-desk/bug-report-schema.md`.
171
+ - AC-W2: redaction — 12 secret-shaped fixtures (Bearer token, AWS key, GH PAT, OpenAI key, generic `password=...`, etc.) all replaced by `<REDACTED>`; `meta.redacted_line_count` reflects count.
172
+ - AC-W3: pane-tail truncates at 200 lines; preserves last lines (most recent diagnostic value).
173
+ - AC-W4: `pattern_match` against seeded `docs/bug-patterns.json` — synthetic block matching Bug #6 signature returns `score >= 0.7` + correct `candidate_bug_ids`.
174
+ - AC-W5: idempotent — second call with same `(slug, classification, iso)` is a no-op (uses `writeSentinelExclusive` semantics).
175
+
176
+ `tests/node/test-relaunch-phase-verify-hygiene.test.mjs`:
177
+
178
+ - AC-R1: status.phase=verify + valid artifacts → verifier-only entry (no worker dispatch).
179
+ - AC-R2: status.phase=verify + missing `done-claim.json` → fall through to worker, log warning.
180
+ - AC-R3: status.phase=verify + `us_id` mismatch → fall through, warning.
181
+ - AC-R4: status.phase=verify + `iter-signal.json` older than worker-prompt.md → fall through, warning.
182
+ - AC-R5: status.phase=verify + `iter_signal_quality != 'specific'` → fall through, warning.
183
+
184
+ ### Integration (Node)
185
+
186
+ `tests/node/us006-campaign-main-loop.test.mjs` extension:
187
+
188
+ - AC-I1: BLOCKED via `flywheel_inconclusive` → bug-report file written to `.rlp-desk/bug-reports/`; JSON parses; `reason_category == 'mission_abort'`.
189
+ - AC-I2: BLOCKED via `worker_exited` → bug-report `pattern_match.candidate_bug_ids` includes `Bug-7` (worker pane death lineage).
190
+ - AC-I3: relaunch with valid `phase=verify` artifacts → no `iter-002.worker-prompt.md` created; verifier dispatched directly.
191
+
192
+ ### Integration (zsh)
193
+
194
+ `tests/test-bug-report-zsh-emit.sh` (NEW, mirrors `test-bug7-post-sentinel-race.sh` style):
195
+
196
+ - Sc-1: stub `dispatch_worker` exits 1 → `write_blocked_sentinel` runs → `<slug>-<iso>.json` exists in `bug-reports/` + parses with `jq`.
197
+ - Sc-2: redaction — pre-injected pane log with `Bearer X` → `jq .pane_tail.worker` does not contain `Bearer X`.
198
+
199
+ ### Self-Verification scenarios (CLAUDE.md gate, MANDATORY since `run_ralph_desk.zsh` is touched)
200
+
201
+ - **LOW**: redaction unit fixture passes; existing zsh + Node regression tests green.
202
+ - **MEDIUM**: real campaign with stub worker that fails → bug-report appears; markdown render contains all required sections; operator can `cat` it; `pattern_match` populated.
203
+ - **CRITICAL**: 2-iter campaign with deliberate BLOCKED at iter-1, then operator manual recovery (write iter-signal/done-claim by hand, set `phase=verify`), relaunch → verifier-only path runs (no worker iter-2 dispatch); Bug #10 reproduction scenario reversed; verdict accepted; `complete.md` written.
204
+
205
+ All 3 must PASS before commit. If any FAIL: fix root cause, re-run failing scenario, then re-verify all 3.
206
+
207
+ ---
208
+
209
+ ## 9. Verification end-to-end
210
+
211
+ 1. `node --test 'tests/node/*.test.mjs'` — all green; new tests visible.
212
+ 2. `bash tests/test-bug-report-zsh-emit.sh` — green.
213
+ 3. `bash tests/test-bug7-post-sentinel-race.sh` + `bash tests/test-bug7-poll-partial-write.sh` — unchanged green (no Bug #7 regression).
214
+ 4. CLAUDE.md self-verification gate × 3 (above) — all PASS.
215
+ 5. Manual: trigger BLOCKED in a sandbox campaign; verify `.rlp-desk/bug-reports/<slug>-<iso>.md` is human-readable + has `Possible related bugs` footer.
216
+ 6. Banner-aware diff `src/` ⇆ `~/.claude/ralph-desk/` after `node scripts/postinstall.js`.
217
+
218
+ ---
219
+
220
+ ## 10. ADR (preview — final once Critic approves)
221
+
222
+ - **Decision**: Adopt Option A (bundle-first, auto-emit on BLOCKED) for v0.16.0; defer heartbeat-warning (B) and external-tracker (C) to backlog.
223
+ - **Drivers**: D1 operator-minutes, D2 cluster-recognition, D3 zero `--mode tmux` regression.
224
+ - **Alternatives considered**: B (orthogonal — does not solve D1/D2), C (violates principle 3, requires authn/network), D (does not move any driver).
225
+ - **Why chosen**: A reuses `_classifyBlock` + `writeSentinelExclusive`; surgical-diff principle satisfied; pattern_match seeded from real history.
226
+ - **Consequences**: BLOCKED writes additional artifact (`bug-reports/<slug>-<iso>.{json,md}`); operator workflow shifts from "hand-write 200 lines" to "review + edit headline"; `bug-patterns.json` becomes a living artifact maintained alongside reports.
227
+ - **Follow-ups**: Backlog file lists P2+ items. Heartbeat warning revisited after we measure operator minutes-saved on first 3 BLOCKED post-land.
228
+
229
+ ---
230
+
231
+ ## 11. Round-by-round resolution log
232
+
233
+ | Round | Reviewer | Verdict | Findings closed |
234
+ |---|---|---|---|
235
+ | 0 | — | Planner v0 | initial draft |
236
+ | 1 | Architect | _pending_ | _to fill_ |
237
+ | 2 | Codex Critic | _pending_ | _to fill_ |
238
+ | ... | | | |
@@ -0,0 +1,319 @@
1
+ # Bug Report Mechanism Overhaul — v1 (Architect-revised)
2
+
3
+ > **Status**: Planner v1 awaiting Codex Critic.
4
+ > **Mode**: deliberate.
5
+ > **Stop rule**: iterate until codex critic returns 0 P0 + 0 P1. P2 → backlog.
6
+ > **Critic instruction**: *approve unless P0 or P1 found.*
7
+ > **Changes from v0**: Architect ITERATE feedback applied — split into 3 sequenced PRs, exact file:line for Bug #10, `pattern_match` seed → P2 backlog, explicit `native-agent-revert` dependency, governance §1g rationale.
8
+
9
+ ---
10
+
11
+ ## 1. Problem statement (unchanged from v0)
12
+
13
+ 10 hand-written 200-line bug reports (`Bug #1`–`Bug #10`, BOS dev `2026-05-01..05-07`) point at one root frustration: **bugs are endless and each one costs 30+ min of operator time to package** before the rlp-desk side can even start triage. Two distinct cost lines:
14
+
15
+ | Pain | Evidence | Cost line |
16
+ |---|---|---|
17
+ | **Recovery friction** (lose work on relaunch) | Bug #10 100%-reproducible: leader resets `phase=worker`, deletes operator-written iter-signal/done-claim | per BLOCKED, lost 30+ min of manual work |
18
+ | **Capture friction** (hand-write report) | All 10 reports re-collect env, version, command, pane logs, settings, gitignore — already on disk | per BLOCKED, 30+ min hand-writing |
19
+ | Cluster blindness | Bug #6/#7/#8 are all "worker hang variants"; cluster re-discovered each time | amortized over many BLOCKED |
20
+ | Reactive only | Bugs surface only after 30-min poll timeout | per BLOCKED, 30 min wall-clock |
21
+
22
+ The blocked-sentinel JSON (`schema_version: 2.0`) already classifies (`reason_category` / `recoverable` / `suggested_action`) but stops at the campaign boundary — it does not become a *bug report*. That gap is the target.
23
+
24
+ ---
25
+
26
+ ## 2. Principles (5)
27
+
28
+ 1. **Capture-by-default, not by-request.** When the campaign blocks, the operator should not have to gather anything that already exists on disk.
29
+ 2. **One canonical schema, two consumers.** A single `bug-report.json` feeds both BOS-side templates and rlp-desk-side triage; no divergent representations.
30
+ 3. **Surgical diffs over new infra.** Extend the existing `blocked.{md,json}` writer + `/rlp-desk` subcommand surface; do not introduce a new daemon, queue, or service. **Sequenced PRs > one big PR.**
31
+ 4. **Recovery must be idempotent.** Manual recovery of a BLOCKED campaign must not be silently overwritten on relaunch (Bug #10 contract).
32
+ 5. **Earlier is cheaper.** A heartbeat-anomaly *warning* costs nothing; a 30-min BLOCKED poll-timeout is the most expensive form of feedback.
33
+
34
+ ---
35
+
36
+ ## 3. Decision drivers (top 3)
37
+
38
+ | # | Driver | Why it dominates |
39
+ |---|---|---|
40
+ | D1 | **Operator minutes per BLOCKED → first actionable report** | Today: 30+ min recovery + 30+ min hand-writing = 60+ min. Target: ≤2 min recovery (just relaunch) + ≤2 min report review. |
41
+ | D2 | **Cluster recognition (avoid duplicate `Bug #N` for same root cause)** | 5 of 10 reports cluster around "worker hang on sentinel" or "verifier post-sentinel race". Without similarity hinting we keep paying triage cost N times. |
42
+ | D3 | **Zero regression on `--mode tmux` 19th launch** + **zero merge collision with `feat/native-agent-revert`** | Per `docs/plans/native-agent-revert.md:7`, the production tmux path is mid-flight. PR-A (Bug #10 hygiene) and PR-B (bundler) BOTH must wait for `native-agent-revert` to land. Documented as hard dependency below. |
43
+
44
+ **Architect-flagged ranking shift**: the per-BLOCKED cost of *recovery loss* (Bug #10) > per-BLOCKED cost of *hand-writing*. PR-A (Bug #10) lands first because it has higher per-event leverage AND smaller surface.
45
+
46
+ ---
47
+
48
+ ## 4. Viable options
49
+
50
+ ### Option A — **Sequenced bundle**: PR-A relaunch hygiene, PR-B bundler, PR-C governance/patterns *(recommended)*
51
+
52
+ Three sequenced PRs landed in order. Each has a single, narrow purpose; review surface bounded; dependencies explicit.
53
+
54
+ **Pros**: Surgical-diff principle satisfied; merge-conflict risk with `native-agent-revert` minimized; each PR reverts cleanly; per-PR self-verification scenarios stay scoped.
55
+
56
+ **Cons**: more wall-clock to land the full vision (3 land-cycles instead of 1). Acceptable: PR-A alone moves D1 by ~50% on its own.
57
+
58
+ ### Option B — **One mega-PR** *(rejected per Architect)*
59
+
60
+ The v0 plan, kept here for the record. **Invalidated** because (a) merge collision with `native-agent-revert` is near-certain on `campaign-main-loop.mjs` + `governance.md` + `commands/rlp-desk.md` (3 of the 5 modified files overlap), (b) self-verification scope balloons (one PR triggers MEDIUM+CRITICAL on too many subsystems), (c) Bug #10 fix is delayed by bundler's review surface.
61
+
62
+ ### Option C — **Heartbeat-first (early warning)** *(deferred to backlog)*
63
+
64
+ Same as v0 — orthogonal to the report-quality problem; defer to backlog.
65
+
66
+ ### Option D — **External tracker integration** *(rejected, principle 3 violation)*
67
+
68
+ Same as v0.
69
+
70
+ ### Option E — **Doc template only** *(rejected, does not move D1/D2)*
71
+
72
+ Same as v0.
73
+
74
+ ### Why A wins
75
+
76
+ A directly addresses D1 (PR-A: recovery, PR-B: capture), D2 (PR-C: pattern_match operationalized), and D3 (PRs sequenced AFTER `native-agent-revert`). C is complementary; B/D/E fail principles or drivers.
77
+
78
+ ---
79
+
80
+ ## 5. Scope
81
+
82
+ ### Hard dependency (all PRs)
83
+
84
+ **`feat/native-agent-revert` (P0+P1) MUST merge to `main` before PR-A starts.** Source: `docs/plans/native-agent-revert.md:7`. Documented in §3 D3. No PR-A/B/C work begins on shared files until that merge is verified by `git log main --oneline | head -3` showing the native-revert P0+P1 commits.
85
+
86
+ ### PR-A — **Bug #10 relaunch hygiene** (lands first; ~1 file modified, ~1 test added; ~50-line surgical diff)
87
+
88
+ **P0**:
89
+
90
+ 1. **Inject phase=verify honor branch** at `src/node/runner/campaign-main-loop.mjs`. Two surgical edits:
91
+ - Verified ground-truth (read 2026-05-07): `readCurrentState` (`:364-389`) at `:371` already preserves `status.phase`. The bug is that the main loop unconditionally writes `state.phase = 'worker'` at `:1575` (every iteration, top of body) before `dispatchWorker`. Architect-flagged misattribution in v0 corrected.
92
+ - Edit 1: after `readCurrentState` (`:1256`), BEFORE the main `while` loop, add a single-iteration "phase=verify recovery branch":
93
+ ```
94
+ if (state.phase === 'verify' && state.iteration > 0) {
95
+ const valid = await _validateOperatorRecoveryArtifacts(paths, state); // 5 checks (see §7 S2)
96
+ if (valid) {
97
+ _logRecovery('Resuming verify phase — operator manual recovery detected');
98
+ state.phase = 'verify'; // explicit re-affirm
99
+ state._skipNextWorkerDispatch = true; // consumed at :1575
100
+ } else {
101
+ _logRecovery(`phase=verify ignored: ${valid.reason}`);
102
+ // fall through; default behavior (worker dispatch) remains
103
+ }
104
+ }
105
+ ```
106
+ - Edit 2: at `:1575`, guard the unconditional reset:
107
+ ```
108
+ if (!state._skipNextWorkerDispatch) {
109
+ state.phase = 'worker';
110
+ await writeStatus(...);
111
+ await dispatchWorker(...);
112
+ } else {
113
+ state._skipNextWorkerDispatch = false; // one-shot
114
+ }
115
+ ```
116
+ - `_validateOperatorRecoveryArtifacts` is a new internal helper in the same file (~25 lines): exists+parses `iter-signal.json` AND `done-claim.json`; `us_id == state.current_us`; `iteration == state.iteration`; `iter_signal_quality == 'specific'`; both files newer than the most recent `iter-NNN.worker-prompt.md` mtime.
117
+ 2. **Mirror the same guard in `src/scripts/run_ralph_desk.zsh`** for `--mode tmux`. **Verified injection points (read 2026-05-07, P1-1 corrected from v1)**: the iteration-body reset is **not** in any `start_iteration()` function — it lives at the top of the iteration body. Three concrete sites:
118
+ - `:3053` — `rm -f "$SIGNAL_FILE" "$DONE_CLAIM_FILE" "$VERDICT_FILE"` **deletes operator-written recovery artifacts**. PR-A guard MUST wrap this with: skip the rm when `LAST_PHASE == "verify"` AND validator passes.
119
+ - `:3084` — `update_status "worker" "running"` forces phase=worker. PR-A guard: skip when phase=verify+valid.
120
+ - `:3087` (and following dispatch block down to ~`:3110`) — worker launch. PR-A guard: when phase=verify+valid, jump straight to verifier dispatch (mirrors the per-US verifier dispatch already in the file; reuse `dispatch_verifier_per_us`).
121
+ - 5-check validator added to `lib_ralph_desk.zsh` as `_validate_operator_recovery_artifacts` (LAST_PHASE read from earlier `read_status` call site, audited at impl time).
122
+
123
+ **Tests**:
124
+
125
+ - `tests/node/test-relaunch-phase-verify-hygiene.test.mjs` (NEW) — 5 ACs (R1–R5 in §8).
126
+ - `tests/test-bug10-zsh-relaunch-hygiene.sh` (NEW) — zsh side, mirrors `test-bug7-post-sentinel-race.sh` style.
127
+
128
+ **No governance change in PR-A.** No new sentinel writer. Just the guard + helper + tests.
129
+
130
+ ### PR-B — **Bug-report bundler** (lands second; ~3 modified + 4 new; bundler module + subcommand)
131
+
132
+ **P0**:
133
+
134
+ 3. **`bug-report.json` writer** — new `src/node/shared/bug-report.mjs` exposes `writeBugReport({slug, classification, reason, paths, env, paneTails, recentArtifacts, now})`. Called from `_emitBlockedSentinel` in `campaign-main-loop.mjs:920-968` AFTER the existing JSON sidecar write succeeds. Idempotent via `writeSentinelExclusive` semantics (per-block `<iso>` filename).
135
+ 4. **Mirror in zsh**: `_write_bug_report` in `lib_ralph_desk.zsh`, called from `write_blocked_sentinel`. Exact call-site count audited at implementation time (current grep shows ~10 invocations, all in one taxonomy).
136
+ 5. **Redaction pass** — deny-list applied before write (12 secret-shape regex from §8 AC-W2); `meta.redacted_line_count` exposed for operator audit. Markdown render reads from JSON post-redaction.
137
+ 6. **`pattern_match` field reserved-but-empty** — schema includes it; bundler writes `{ candidate_bug_ids: [], score: null, source: "deferred-to-PR-C" }`. No `docs/bug-patterns.json` shipped in PR-B.
138
+
139
+ **P1**:
140
+
141
+ 7. **`/rlp-desk report <slug>` subcommand** — new section in `src/commands/rlp-desk.md`; reads latest `bug-reports/<slug>-*.json`, prints markdown to stdout. Optional `--headline "..."` flag. **No remote publish.**
142
+ 8. **Schema doc** — `docs/rlp-desk/bug-report-schema.md` (NEW): JSON schema + worked example + .gitignore snippet recommendation.
143
+
144
+ **Tests**:
145
+
146
+ - `tests/node/test-bug-report-writer.test.mjs` (NEW) — 5 ACs (W1–W5 in §8).
147
+ - `tests/test-bug-report-zsh-emit.sh` (NEW).
148
+ - `tests/node/us006-campaign-main-loop.test.mjs` extension — 2 integration ACs (I1–I2 in §8).
149
+
150
+ ### PR-C — **Pattern operationalization + governance §1g** (lands third; pattern data + governance ride-along justified)
151
+
152
+ **P1**:
153
+
154
+ 9. **`docs/bug-patterns.json`** (NEW) — seed signatures for Bug #1–#10, hand-authored from BOS reports. Schema: `{bug_id, signature: {reason_category, failure_category, pane_token_bag[]}, fix_pr_url}`.
155
+ 10. **Pattern-match implementation** — `bug-report.mjs` Jaccard implementation populates `pattern_match.candidate_bug_ids[]` + `score` (P2 cap: ≥0.7 = "candidate", < 0.7 = empty list). Output remains data-only — no inline CLI suggestion. v0 §7 S3 mitigation preserved.
156
+ 11. **Governance §1g "Bug Report Capture"** — additive section; documents (a) every BLOCKED writes a bug-report invariant, (b) redaction rules + audit field, (c) PR-A relaunch hygiene contract (operator's recovery is honored). **Ride-along rationale**: §1g formalizes the contract that PR-A and PR-B implement; landing them as separate PRs without governance text leaves the invariants implicit. Per CLAUDE.md, governance changes require `ralplan + codex review` — this very plan satisfies that for §1g; PR-C's review surface is *only* the §1g text + pattern data + Jaccard ≤80 LOC implementation. Bounded.
157
+
158
+ **Tests**:
159
+
160
+ - `tests/node/test-bug-report-pattern-match.test.mjs` (NEW) — Jaccard determinism, threshold, regression on synthetic Bug #6 fixture. AC-W4 from §8 (relocated from PR-B).
161
+
162
+ ### P2+ → `docs/plans/bug-report-overhaul-backlog.md` (separate file, NOT this PR-A/B/C set)
163
+
164
+ - Heartbeat-warning sidecar (Option C).
165
+ - GitHub Issues integration (Option D, after authn story).
166
+ - Pattern-learning loop that mines `~/.claude/ralph-desk/analytics/*/bug-reports/` for emerging clusters.
167
+ - Cross-campaign bug-report dashboard in `/rlp-desk analytics`.
168
+ - Auto-suggest "this looks like Bug #N — try fix-X" inline in CLI output (today: data-only).
169
+ - Operator-CLI `/rlp-desk recover <slug> --to verify` to write the manual recovery artifacts deterministically (currently a hand-rolled jq pipeline).
170
+
171
+ ---
172
+
173
+ ## 6. Files to modify (summary across PR-A/B/C)
174
+
175
+ | PR | File | Change | Risk |
176
+ |---|---|---|---|
177
+ | A | `src/node/runner/campaign-main-loop.mjs` | Phase-verify recovery branch (after `:1256`); guard at `:1575`; `_validateOperatorRecoveryArtifacts` helper | **MED** (control-flow change in main loop) |
178
+ | A | `src/scripts/lib_ralph_desk.zsh` | `_validate_operator_recovery_artifacts` helper | LOW |
179
+ | A | `src/scripts/run_ralph_desk.zsh` | Phase-verify recovery branch wrapping the iteration-body sites at `:3053` (rm guard), `:3084` (update_status guard), `:3087` (worker dispatch skip → verifier dispatch jump). No `start_iteration` function exists — iteration body starts at the top of the main while loop after the cleanup block. | **MED** |
180
+ | A | `tests/node/test-relaunch-phase-verify-hygiene.test.mjs` | NEW — 5 ACs | LOW |
181
+ | A | `tests/test-bug10-zsh-relaunch-hygiene.sh` | NEW | LOW |
182
+ | B | `src/node/shared/bug-report.mjs` | NEW module — writer + redaction; `pattern_match` reserved-empty | LOW |
183
+ | B | `src/node/runner/campaign-main-loop.mjs` | Call `writeBugReport` from `_emitBlockedSentinel` (post existing JSON write) | LOW |
184
+ | B | `src/scripts/lib_ralph_desk.zsh` | `_write_bug_report` helper | MED |
185
+ | B | `src/scripts/run_ralph_desk.zsh` | Wire `_write_bug_report` after `write_blocked_sentinel` sites | MED |
186
+ | B | `src/commands/rlp-desk.md` | Add `## report <slug>` + help-block entry | LOW |
187
+ | B | `docs/rlp-desk/bug-report-schema.md` | NEW | LOW |
188
+ | B | `tests/node/test-bug-report-writer.test.mjs` | NEW | LOW |
189
+ | B | `tests/test-bug-report-zsh-emit.sh` | NEW | LOW |
190
+ | B | `tests/node/us006-campaign-main-loop.test.mjs` | +2 integration ACs | LOW |
191
+ | C | `docs/bug-patterns.json` | NEW seed | LOW |
192
+ | C | `src/node/shared/bug-report.mjs` | Jaccard implementation (≤80 LOC); replaces reserved-empty logic | LOW |
193
+ | C | `src/governance.md` | Additive §1g | LOW (additive) |
194
+ | C | `tests/node/test-bug-report-pattern-match.test.mjs` | NEW | LOW |
195
+
196
+ **PR-A**: 3 modified + 2 new = 5 files.
197
+ **PR-B**: 4 modified + 4 new = 8 files.
198
+ **PR-C**: 2 modified + 2 new = 4 files.
199
+
200
+ Each PR's review surface bounded; merge-conflict surface with `native-agent-revert` is empty (PRs sequenced AFTER it lands).
201
+
202
+ ---
203
+
204
+ ## 7. Pre-mortem (deliberate mode — 3 scenarios; updated for v1)
205
+
206
+ ### S1 — Pane-tail leaks a secret into a committed bug-report (PR-B risk)
207
+
208
+ (unchanged from v0; mitigation in PR-B test AC-W2; `meta.redacted_line_count` audit field; schema doc recommends `.gitignore` snippet but does not auto-modify user repo)
209
+
210
+ ### S2 — Bug #10 fix accidentally honors a stale `phase=verify` from a CRASHED leader (PR-A risk)
211
+
212
+ The 5-validation gate (exists × 2, us_id match, iteration match, `iter_signal_quality=='specific'`, mtimes-newer-than-worker-prompt) blocks the most likely race. PR-A also adds a `_logRecovery` audit line for every relaunch outcome (honored / ignored + reason) so operators can confirm in `/rlp-desk logs <slug>`.
213
+
214
+ **Residual risk**: a clever filesystem race can pass all five checks. Backlog item: `/rlp-desk recover <slug> --to verify` opt-in flag (P2). Until then, the validator's strictness (any miss → fall through to current behavior) makes the failure mode "no improvement" not "regression".
215
+
216
+ ### S3 — `pattern_match` false-positive trains operators to dismiss real bugs (PR-C risk)
217
+
218
+ In PR-B, `pattern_match` is empty/reserved — no risk. In PR-C, threshold 0.7 + Jaccard determinism + data-only output. Operator sees `score: 0.83 — review before assuming match`. Auto-suggest deferred to P2.
219
+
220
+ **Architect-flagged risk added (S4)**: PR-A's `state._skipNextWorkerDispatch` is a one-shot mutation on a shared object. **Mitigation**: explicitly cleared inside the guard branch (Edit 2 above); PR-A includes a unit test that runs 2 consecutive iterations and asserts the worker IS dispatched on iter-2.
221
+
222
+ ---
223
+
224
+ ## 8. Expanded test plan (deliberate mode)
225
+
226
+ ### PR-A unit (Node) — `tests/node/test-relaunch-phase-verify-hygiene.test.mjs`
227
+
228
+ - AC-R1: status.phase=verify + valid artifacts → verifier-only entry (no worker dispatch).
229
+ - AC-R2: status.phase=verify + missing `done-claim.json` → fall through to worker, log warning.
230
+ - AC-R3: status.phase=verify + `us_id` mismatch → fall through, warning.
231
+ - AC-R4: status.phase=verify + `iter-signal.json` older than worker-prompt.md → fall through, warning.
232
+ - AC-R5: status.phase=verify + `iter_signal_quality != 'specific'` → fall through, warning.
233
+ - AC-R6 (Architect S4): `_skipNextWorkerDispatch` cleared after one use; iter-2 worker dispatched normally.
234
+
235
+ ### PR-B unit (Node) — `tests/node/test-bug-report-writer.test.mjs`
236
+
237
+ - AC-W1: schema fields all present + types match `docs/rlp-desk/bug-report-schema.md`.
238
+ - AC-W2: redaction — 12 secret-shape fixtures all replaced by `<REDACTED>`; `meta.redacted_line_count` reflects count.
239
+ - AC-W3: pane-tail truncates at 200 lines; preserves last lines.
240
+ - AC-W5: idempotent — second call with same `(slug, iso)` is a no-op.
241
+ - (AC-W4 relocated to PR-C — Jaccard pattern match.)
242
+
243
+ ### PR-B integration (Node) — `tests/node/us006-campaign-main-loop.test.mjs` extension
244
+
245
+ - AC-I1: BLOCKED via `flywheel_inconclusive` → bug-report file written; JSON parses; `reason_category == 'mission_abort'`.
246
+ - AC-I2: BLOCKED via `worker_exited` → bug-report `pattern_match` field exists with `{candidate_bug_ids: [], score: null, source: "deferred-to-PR-C"}`.
247
+
248
+ ### PR-C unit (Node) — `tests/node/test-bug-report-pattern-match.test.mjs`
249
+
250
+ - AC-W4: pattern_match against seeded `docs/bug-patterns.json` — synthetic block matching Bug #6 signature returns `score >= 0.7` + correct `candidate_bug_ids`.
251
+ - AC-W4b: synthetic block with no matches → `candidate_bug_ids: []`, `score < 0.7`.
252
+
253
+ ### Integration (zsh)
254
+
255
+ - `tests/test-bug10-zsh-relaunch-hygiene.sh` (PR-A).
256
+ - `tests/test-bug-report-zsh-emit.sh` (PR-B). Sc-1 + Sc-2 from v0.
257
+
258
+ ### Self-Verification scenarios (CLAUDE.md gate)
259
+
260
+ **PR-A** (touches `run_ralph_desk.zsh` → CLAUDE.md mandates 3 SV scenarios):
261
+
262
+ - LOW: 5-AC unit suite green; existing zsh + Node regression green.
263
+ - MEDIUM: real campaign brought to BLOCKED via stub failure; operator runs the documented recovery flow (jq patches `phase=verify`, writes manual artifacts); relaunch → verifier-only path runs (no `iter-002.worker-prompt.md` created); verdict accepted.
264
+ - CRITICAL: same as MEDIUM but also assert the `_skipNextWorkerDispatch` flag does not survive into iter-3; `/rlp-desk logs` shows `Resuming verify phase` audit line.
265
+
266
+ **PR-B** (touches `run_ralph_desk.zsh` again → 3 SV scenarios):
267
+
268
+ - LOW: redaction unit fixture passes.
269
+ - MEDIUM: real campaign with stub worker fails → `bug-reports/<slug>-<iso>.{json,md}` appears; markdown render contains all required sections; `/rlp-desk report <slug>` prints same markdown.
270
+ - CRITICAL: redaction smoke — pane log pre-injected with `Bearer X` and `OpenAI-API-Key: sk-...`; bug-report JSON does not contain those substrings; `meta.redacted_line_count >= 2`.
271
+
272
+ **PR-C** (governance + patterns; no runtime code path → CLAUDE.md gate ride-along scenarios):
273
+
274
+ - LOW: Jaccard unit suite green.
275
+ - MEDIUM: synthetic block matching Bug #6 → bug-report `pattern_match.candidate_bug_ids` includes `Bug-6`.
276
+ - CRITICAL: ralplan + codex review of governance §1g additions reaches 0 issues.
277
+
278
+ ---
279
+
280
+ ## 9. Verification end-to-end (per PR)
281
+
282
+ (Each PR verified independently before next PR starts.)
283
+
284
+ 1. `node --test 'tests/node/*.test.mjs'` all green; new PR-specific tests visible.
285
+ 2. zsh integration test for that PR green.
286
+ 3. Bug #7 regression suite (`test-bug7-post-sentinel-race.sh`, `test-bug7-poll-partial-write.sh`) unchanged green.
287
+ 4. CLAUDE.md SV gate × 3 for that PR — all PASS.
288
+ 5. **Local sync verification (P1-2 corrected from v1)** — depends on which file types changed:
289
+ - **Node files only** (PR-C): `node scripts/postinstall.js` + banner-aware diff `src/` ⇆ `~/.claude/ralph-desk/` per CLAUDE.md.
290
+ - **zsh wrappers changed** (PR-A and PR-B both touch `src/scripts/{run,lib}_ralph_desk.zsh`): `node scripts/postinstall.js` does NOT install the legacy zsh wrappers (CLAUDE.md `Local File Sync` §: "synced ONLY via `bash install.sh` curl path"). Required additional steps:
291
+ - `bash install.sh` from a clean shell to drive the curl-path installer.
292
+ - Verify zsh wrappers landed: `ls -la ~/.claude/ralph-desk/install.sh ~/.claude/ralph-desk/scripts/{init,run,lib}_ralph_desk.zsh`. Each must be banner-headed (`# DO NOT EDIT ...` line 2 after the shebang) and `chmod 0o444`.
293
+ - Banner-aware diff: `diff <(cat src/scripts/run_ralph_desk.zsh) <(tail -n +3 ~/.claude/ralph-desk/scripts/run_ralph_desk.zsh)` (skip shebang + banner).
294
+ - Document which install channel was exercised in the SV scenario notes so future audits can replay.
295
+ 6. Manual sandbox campaign trigger (PR-A: deliberate BLOCKED + recovery; PR-B: BLOCKED + read bug-report markdown; PR-C: pattern_match populated).
296
+
297
+ ---
298
+
299
+ ## 10. ADR (preview — final once Critic approves)
300
+
301
+ - **Decision**: Adopt Option A (sequenced PR-A→PR-B→PR-C) for v0.16.x; defer heartbeat-warning, external tracker, and operator recovery CLI to backlog.
302
+ - **Drivers**: D1 operator-minutes, D2 cluster-recognition, D3 zero `--mode tmux` regression + zero `native-agent-revert` collision.
303
+ - **Alternatives considered**: B (one mega-PR) rejected per Architect for merge collision + SV scope balloon; C (heartbeat) orthogonal — backlog; D (GitHub Issues) violates principle 3; E (doc only) does not move D1/D2.
304
+ - **Why chosen**: PR-A fixes the highest per-event cost (recovery loss). PR-B captures-by-default with redaction. PR-C operationalizes patterns and formalizes governance §1g. Each PR has bounded review surface and clean revert.
305
+ - **Consequences**: BLOCKED writes additional artifacts (`bug-reports/<slug>-<iso>.{json,md}`); operator recovery is honored; `bug-patterns.json` becomes a living artifact; governance gains §1g formal contract.
306
+ - **Follow-ups**: Backlog file lists P2+ items. Heartbeat warning revisited after we measure operator minutes-saved on first 3 BLOCKED post-PR-A land.
307
+
308
+ ---
309
+
310
+ ## 11. Round-by-round resolution log
311
+
312
+ | Round | Reviewer | Verdict | Findings closed |
313
+ |---|---|---|---|
314
+ | 0 | — | Planner v0 | initial draft |
315
+ | 1 | Architect (Claude) | ITERATE | (1) split into PRs (2) cite file:line (3) pattern_match → backlog/PR-C (4) native-revert dependency (5) governance §1g rationale |
316
+ | 2 | Codex Critic | ITERATE — 0 P0, 2 P1 | P1-1 zsh sites corrected to `:3053/:3084/:3087` (no `start_iteration` function); P1-2 sync gap closed (zsh = `bash install.sh` channel + banner+chmod verify). BACKLOG: P2-1/P2-2/P3-1 captured below. |
317
+ | 3 | Codex Critic | **APPROVE** — 0 P0, 0 P1 | Round 2 P1-1/P1-2 confirmed closed by ground-truth checks (zsh sites at `:3053/:3084/:3087` + CLAUDE.md `bash install.sh` channel). New P2 (lib_ralph_desk.zsh diff) → backlog. |
318
+
319
+ **Loop terminated**: Codex Critic returned APPROVE at Round 3. P0+P1 findings = 0. Per user stop rule, ralplan exits. P2/P3 captured in `bug-report-overhaul-backlog.md`.