@ai-dev-methodologies/rlp-desk 0.10.1 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/blueprints/sv-architecture-rethink.md +84 -0
- package/docs/multi-mission-orchestration.md +154 -0
- package/docs/plans/rlp-desk-0.11-handoff-7fixes.md +352 -0
- package/docs/plans/rlp-desk-elegant-papert-agent-a8cd695ffca2a3ad8.md +84 -0
- package/docs/plans/rlp-desk-elegant-papert.md +270 -0
- package/docs/protocol-reference.md +82 -0
- package/package.json +1 -1
- package/src/commands/rlp-desk.md +5 -0
- package/src/governance.md +160 -0
- package/src/node/reporting/campaign-reporting.mjs +4 -0
- package/src/node/run.mjs +23 -1
- package/src/node/runner/campaign-main-loop.mjs +284 -9
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Blueprint: Self-Verification Architecture Rethink (§7¾ escalation)
|
|
2
|
+
|
|
3
|
+
**Status**: backlog (escalation candidate, not yet planned for implementation)
|
|
4
|
+
**Filed by**: RC-1 patch (fix/rc1-tmux-sv-skip-and-rc2-prd-cross-us-lint, 2026-04-25)
|
|
5
|
+
**Escalation trigger**: governance §7¾ Architecture Escalation
|
|
6
|
+
|
|
7
|
+
## Why this is here
|
|
8
|
+
|
|
9
|
+
RC-1 (tmux SV report 5-min hang) was fixed by **disabling** `generate_sv_report`
|
|
10
|
+
in tmux runners and recording the disable on three channels (session-config,
|
|
11
|
+
metadata.json, debug log) so traceability is preserved. That patch is honest
|
|
12
|
+
but it does not answer the deeper question:
|
|
13
|
+
|
|
14
|
+
> Should the Self-Verification report be produced by spawning `claude --print`
|
|
15
|
+
> at all? And if not, what should replace it?
|
|
16
|
+
|
|
17
|
+
The current shell implementation (`src/scripts/lib_ralph_desk.zsh:603-668`)
|
|
18
|
+
spawns `claude --print` against a tmux pane with no usable TTY/stdin and waits
|
|
19
|
+
for a 300-second watchdog. Even in Agent mode the cost surface is large:
|
|
20
|
+
synchronous LLM call, no caching, no determinism, and no obvious story for
|
|
21
|
+
running SV against historical campaigns offline.
|
|
22
|
+
|
|
23
|
+
Per governance §7¾ this counts as an **architecture question**, not a patch
|
|
24
|
+
target. Open it as a blueprint, gather options, and decide before touching the
|
|
25
|
+
shell function again.
|
|
26
|
+
|
|
27
|
+
## Options to evaluate
|
|
28
|
+
|
|
29
|
+
1. **Static SV (no LLM call)**.
|
|
30
|
+
- Replace the LLM-generated report with a deterministic Node module that
|
|
31
|
+
reads `iter-*.json`, `cost-log.jsonl`, `verify-verdict.json`, etc., and
|
|
32
|
+
emits the 10-section report from templates + counts.
|
|
33
|
+
- Pros: no spawn, sub-second runtime, runs offline against archived
|
|
34
|
+
campaigns, cacheable.
|
|
35
|
+
- Cons: loses qualitative analysis ("Worker over-engineered" / "Verifier
|
|
36
|
+
rubber-stamped"). The 10 sections are not all reducible to counts.
|
|
37
|
+
2. **Hybrid: static skeleton + on-demand LLM section.**
|
|
38
|
+
- Static skeleton always runs at campaign end. The qualitative sections
|
|
39
|
+
("Worker Process Quality", "Verifier Judgment Quality", "Patterns:
|
|
40
|
+
Strengths & Weaknesses", "Recommendations for Next Cycle", "Blind Spots")
|
|
41
|
+
become an opt-in `rlp-desk sv enrich <slug>` command run from a real TTY
|
|
42
|
+
(Agent mode or interactive shell).
|
|
43
|
+
- Pros: best of both. tmux runner gets a useful baseline report; the user
|
|
44
|
+
can opt into the LLM pass when convenient.
|
|
45
|
+
- Cons: two code paths to maintain.
|
|
46
|
+
3. **Move SV out of the runner entirely.**
|
|
47
|
+
- rlp-desk produces only the artifacts (already does). SV becomes a
|
|
48
|
+
dedicated `rlp-desk sv <slug>` command that the user invokes from any
|
|
49
|
+
working session.
|
|
50
|
+
- Pros: cleanest separation. Runner stays small; SV evolves independently.
|
|
51
|
+
- Cons: discoverability — users may forget to run it. Requires a small UX
|
|
52
|
+
for "you should run sv against this slug" hints in the campaign report.
|
|
53
|
+
4. **Keep the LLM spawn but fix it for tmux.**
|
|
54
|
+
- Negotiate with claude CLI for a non-interactive, no-TTY mode that streams
|
|
55
|
+
stdout and exits cleanly. Possibly via `--output-format json` plus
|
|
56
|
+
explicit stdin closure.
|
|
57
|
+
- Pros: smallest surface change.
|
|
58
|
+
- Cons: depends on Claude CLI semantics that may shift between releases.
|
|
59
|
+
RC-1 already showed how brittle this is.
|
|
60
|
+
|
|
61
|
+
## Open questions
|
|
62
|
+
|
|
63
|
+
- What sections of the current 10-section SV are actually consumed by humans?
|
|
64
|
+
(Metric sections vs qualitative sections vs blind spots — gather telemetry.)
|
|
65
|
+
- Is there a way to reuse the verifier model itself (already running per
|
|
66
|
+
iteration) to emit a "campaign-level" verdict at the end, rather than a
|
|
67
|
+
separate spawn?
|
|
68
|
+
- Should SV be cross-engine consensus too, or is single-model SV enough?
|
|
69
|
+
|
|
70
|
+
## Decision pending
|
|
71
|
+
|
|
72
|
+
Open. The current RC-1 patch is the **interim** answer: SV is disabled in
|
|
73
|
+
tmux, traceability is preserved through `with_self_verification_requested` +
|
|
74
|
+
`sv_skipped_reason`, and the user gets an honest "requested but skipped"
|
|
75
|
+
banner in the Campaign Report. This blueprint will be promoted to a plan when
|
|
76
|
+
someone takes ownership of the architecture question above.
|
|
77
|
+
|
|
78
|
+
## References
|
|
79
|
+
|
|
80
|
+
- `src/scripts/lib_ralph_desk.zsh:603-668` — current `generate_sv_report` impl.
|
|
81
|
+
- `src/scripts/run_ralph_desk.zsh:719-727, 758-768, 2118-2125, 2151` — RC-1 patch.
|
|
82
|
+
- `src/node/reporting/campaign-reporting.mjs:412-591` — Node-side SV summary
|
|
83
|
+
builder (already pure file I/O — promising starting point for option 1/2).
|
|
84
|
+
- governance §7¾ Architecture Escalation — escalation policy this entry uses.
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# Multi-Mission Orchestration Patterns
|
|
2
|
+
|
|
3
|
+
rlp-desk runs **one mission per `run_ralph_desk.zsh` invocation**. The runner is
|
|
4
|
+
intentionally single-purpose: it loads a slug's PRD, executes the per-US loop,
|
|
5
|
+
writes a sentinel (`COMPLETE` or `BLOCKED`), and exits. Anything that needs to
|
|
6
|
+
coordinate **multiple missions** in sequence — for example a flywheel that
|
|
7
|
+
runs `axis-A → axis-B → measurement → improve` — is the responsibility of a
|
|
8
|
+
**wrapper script** owned by the consumer.
|
|
9
|
+
|
|
10
|
+
This document explains the contract between rlp-desk and a wrapper, with a
|
|
11
|
+
small worked example.
|
|
12
|
+
|
|
13
|
+
## Why no built-in chain
|
|
14
|
+
|
|
15
|
+
Hard-coding mission sequences inside rlp-desk would couple the runner to a
|
|
16
|
+
particular project's idea of "what comes next." Different consumers have
|
|
17
|
+
different goals (fixed-length improvement campaigns, indefinite uptime, mission
|
|
18
|
+
graphs branching on metrics). A wrapper layer keeps rlp-desk focused on the
|
|
19
|
+
single-mission contract while letting each consumer encode its own policy.
|
|
20
|
+
|
|
21
|
+
## Per-mission outputs the wrapper can read
|
|
22
|
+
|
|
23
|
+
After every campaign rlp-desk writes the following artifacts under
|
|
24
|
+
`<ROOT>/.claude/ralph-desk/`:
|
|
25
|
+
|
|
26
|
+
| Path | Purpose |
|
|
27
|
+
|---|---|
|
|
28
|
+
| `memos/<slug>-blocked.md` | Sentinel for a BLOCKED outcome. First line is `BLOCKED: <us_id>`; second line is `Reason: <verdict reason>` (governance §1f BLOCKED Surfacing). |
|
|
29
|
+
| `memos/<slug>-complete.md` | Sentinel for a COMPLETE outcome. |
|
|
30
|
+
| `memos/<slug>-iter-signal.json` | Last worker signal (status, us_id, summary). |
|
|
31
|
+
| `memos/<slug>-memory.md` | Campaign memory accumulated across iterations. |
|
|
32
|
+
| `memos/<slug>-flywheel-signal.json` | When flywheel ran, the direction it picked. May contain a `next_mission_candidate` field that the wrapper can use to decide what to launch next. |
|
|
33
|
+
| `logs/<slug>/metadata.json` | One-line summary of the campaign config, including `with_self_verification`, `with_self_verification_requested`, and `sv_skipped_reason` (RC-1). |
|
|
34
|
+
|
|
35
|
+
**Exit-code contract (per entry point — read carefully):**
|
|
36
|
+
|
|
37
|
+
| Entry point | Exit 0 | Exit 2 | Exit 1 |
|
|
38
|
+
|---|---|---|---|
|
|
39
|
+
| `src/node/run.mjs` (Node, agent mode) | clean COMPLETE | BLOCKED (verifier or model-upgrade-exhausted), reason on stderr | unhandled error (e.g. unknown flag) |
|
|
40
|
+
| `src/scripts/init_ralph_desk.zsh` | scaffold OK | PRD cross-US lint reject (per-us mode), violations on stderr | scaffold incomplete or input error |
|
|
41
|
+
| `src/scripts/run_ralph_desk.zsh` (zsh, tmux mode) | clean exit (sentinel decides COMPLETE/BLOCKED) | not used | any failure path; wrappers must inspect the sentinel files to tell COMPLETE from BLOCKED |
|
|
42
|
+
|
|
43
|
+
The Node entry surfaces the BLOCKED reason on **stderr** with **exit code 2**.
|
|
44
|
+
The zsh runner does **not** distinguish exit codes for COMPLETE vs BLOCKED;
|
|
45
|
+
inspect the sentinel files (`memos/<slug>-{complete,blocked}.md`) instead.
|
|
46
|
+
PRD lint rejection (init exit 2) is the only path that uses exit 2 in the
|
|
47
|
+
zsh side.
|
|
48
|
+
|
|
49
|
+
## Minimal wrapper recipe (zsh)
|
|
50
|
+
|
|
51
|
+
The recipe below polls a fixed mission list, launches each one, and stops on
|
|
52
|
+
the first BLOCKED. It demonstrates the contract without prescribing a policy.
|
|
53
|
+
|
|
54
|
+
```zsh
|
|
55
|
+
#!/usr/bin/env zsh
|
|
56
|
+
set -u
|
|
57
|
+
set -o pipefail
|
|
58
|
+
|
|
59
|
+
ROOT="${ROOT:-$PWD}"
|
|
60
|
+
DESK="$ROOT/.claude/ralph-desk"
|
|
61
|
+
MISSIONS=(
|
|
62
|
+
"axis-1-baseline"
|
|
63
|
+
"axis-2-improve"
|
|
64
|
+
"axis-3-measurement"
|
|
65
|
+
)
|
|
66
|
+
|
|
67
|
+
for SLUG in "${MISSIONS[@]}"; do
|
|
68
|
+
# Skip missions that already finished (idempotent re-runs).
|
|
69
|
+
if [[ -f "$DESK/memos/$SLUG-complete.md" ]]; then
|
|
70
|
+
print "Skipping $SLUG — already COMPLETE"
|
|
71
|
+
continue
|
|
72
|
+
fi
|
|
73
|
+
if [[ -f "$DESK/memos/$SLUG-blocked.md" ]]; then
|
|
74
|
+
print "Stopping chain — $SLUG is BLOCKED:"
|
|
75
|
+
cat "$DESK/memos/$SLUG-blocked.md"
|
|
76
|
+
exit 2
|
|
77
|
+
fi
|
|
78
|
+
|
|
79
|
+
print "Launching $SLUG"
|
|
80
|
+
ROOT="$ROOT" \
|
|
81
|
+
WORKER_MODEL="${WORKER_MODEL:-gpt-5.5:medium}" \
|
|
82
|
+
VERIFIER_MODEL="${VERIFIER_MODEL:-opus}" \
|
|
83
|
+
VERIFY_MODE="${VERIFY_MODE:-per-us}" \
|
|
84
|
+
CB_THRESHOLD="${CB_THRESHOLD:-6}" \
|
|
85
|
+
zsh ~/.claude/ralph-desk/run_ralph_desk.zsh "$SLUG"
|
|
86
|
+
rc=$?
|
|
87
|
+
|
|
88
|
+
# zsh runner: rc tells you "did the script crash", not COMPLETE vs BLOCKED.
|
|
89
|
+
# Read the sentinel for the actual terminal state.
|
|
90
|
+
if [[ -f "$DESK/memos/$SLUG-complete.md" ]]; then
|
|
91
|
+
print "$SLUG completed cleanly"
|
|
92
|
+
elif [[ -f "$DESK/memos/$SLUG-blocked.md" ]]; then
|
|
93
|
+
print "$SLUG blocked — stopping chain"
|
|
94
|
+
cat "$DESK/memos/$SLUG-blocked.md"
|
|
95
|
+
exit 2
|
|
96
|
+
else
|
|
97
|
+
print "$SLUG ended without sentinel (rc=$rc) — stopping chain"
|
|
98
|
+
exit "${rc:-1}"
|
|
99
|
+
fi
|
|
100
|
+
done
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Three design notes:
|
|
104
|
+
|
|
105
|
+
- The wrapper checks the **sentinel files first** — both before and after
|
|
106
|
+
invoking the runner. This makes re-runs idempotent (finished missions are
|
|
107
|
+
skipped) and accommodates the zsh runner's lack of a distinct
|
|
108
|
+
COMPLETE-vs-BLOCKED exit code.
|
|
109
|
+
- For the **Node runner** (`src/node/run.mjs`, agent mode) the recipe can be
|
|
110
|
+
simpler: switch on `$rc` directly because Node uses exit 2 specifically for
|
|
111
|
+
blocked outcomes. The exit-code table above lists which entry point uses
|
|
112
|
+
which convention.
|
|
113
|
+
- `init_ralph_desk.zsh`'s exit 2 (PRD lint reject) is the wrapper's only
|
|
114
|
+
pre-launch fail-fast signal. Treat it the same as a blocked sentinel — the
|
|
115
|
+
campaign will not start until the PRD is fixed.
|
|
116
|
+
|
|
117
|
+
## Flywheel-driven dynamic chain (optional)
|
|
118
|
+
|
|
119
|
+
**Emit side (rlp-desk responsibility)**: when a mission runs the flywheel
|
|
120
|
+
review (`--flywheel on-fail`), the flywheel agent's signal JSON
|
|
121
|
+
(`memos/<slug>-flywheel-signal.json`) MAY include an optional
|
|
122
|
+
`next_mission_candidate` field — `null` for "no recommendation" or a slug
|
|
123
|
+
string for "consumer should chain this slug next." The Node leader
|
|
124
|
+
propagates this field into `status.json` (`status.next_mission_candidate`)
|
|
125
|
+
so wrappers can poll either file. The flywheel prompt template
|
|
126
|
+
(`init_ralph_desk.zsh` flywheel heredoc) and `governance.md` §7 ⑥½ both
|
|
127
|
+
document the field. Field is OPTIONAL and absence is treated as `null` —
|
|
128
|
+
backward-compat with prior flywheel signals.
|
|
129
|
+
|
|
130
|
+
**Consumer side (wrapper responsibility)**: pick the next slug from that
|
|
131
|
+
field instead of a fixed list:
|
|
132
|
+
|
|
133
|
+
```zsh
|
|
134
|
+
NEXT_SLUG=$(jq -r '.next_mission_candidate // empty' \
|
|
135
|
+
"$DESK/memos/$SLUG-flywheel-signal.json" 2>/dev/null)
|
|
136
|
+
if [[ -n "$NEXT_SLUG" ]]; then
|
|
137
|
+
# Recurse or push onto the queue. Apply your own policy:
|
|
138
|
+
# - de-dupe against an already-launched set,
|
|
139
|
+
# - cap chain length to avoid runaway loops,
|
|
140
|
+
# - require `axis-history.json` distance to avoid revisiting.
|
|
141
|
+
fi
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
`next_mission_candidate` is advisory only. Wrapper authors should still apply
|
|
145
|
+
guardrails (max chain length, distance-from-history checks, manual approval
|
|
146
|
+
gates) before consuming it.
|
|
147
|
+
|
|
148
|
+
## Non-goals (explicitly)
|
|
149
|
+
|
|
150
|
+
- A built-in `rlp-desk auto-chain --slug-prefix … --max-missions N` command is
|
|
151
|
+
**not** in scope. It would re-introduce the coupling we are trying to avoid.
|
|
152
|
+
If you want one, build it as a small wrapper and share it with the community.
|
|
153
|
+
- rlp-desk does not validate mission ordering or dependency graphs. The wrapper
|
|
154
|
+
owns this policy.
|
|
@@ -0,0 +1,352 @@
|
|
|
1
|
+
# rlp-desk 0.11 — Handoff Final 7-fix bundle (ralplan v3)
|
|
2
|
+
|
|
3
|
+
> v3 changes: NEW-1 (bash→zsh fixture invocation) + NEW-2 (early-exit grep broadened) Architect executor follow-ups 흡수.
|
|
4
|
+
> v2 changes (Architect + Critic codex iteration): PR split A/B 결정, R7 schema fallback, R8 helper-side guard, R9 reason canonicalization + edge cases, R10 normalized US extractor + quarantine (not rm), R11 early-exit grep inventory + trap, self-verification mechanical assertion 패치.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
|
|
8
|
+
소비자 Final Handoff (`coordination/handoffs/2026-04-25-rlp-desk-final-status-and-handoff.md`) timestamp evidence 기반 7건 결함:
|
|
9
|
+
|
|
10
|
+
| ID | Severity | 결함 | Root file |
|
|
11
|
+
|---|---|---|---|
|
|
12
|
+
| P0-D | HIGH | A4 fallback 83% 빈발 (worker iter-signal 누락) | `run_ralph_desk.zsh:1587-1595`, `:526-546` |
|
|
13
|
+
| P1-F | MEDIUM | test-spec ≥3 tests/AC IL-4 자가모순 | `init_ralph_desk.zsh` test-spec gen + ingest |
|
|
14
|
+
| P1-G | MEDIUM | partial_verify signal vocabulary 부재 | `init_ralph_desk.zsh:448-454` Signal rules + verifier |
|
|
15
|
+
| P1-H | MEDIUM | blocked 시 memory.md/latest.md 미갱신 | worker prompt blocked exit hygiene |
|
|
16
|
+
| P2-I | MEDIUM | block ≠ failure → contract defect silent 12-iter | `run_ralph_desk.zsh:2659` consecutive_blocks 신규 |
|
|
17
|
+
| P2-J | MEDIUM | final ALL verify cross-mission us_id leak | `run_ralph_desk.zsh:2198/2425-2429` US_LIST scope |
|
|
18
|
+
| P2-K | LOW | cost-log 비어있음 (tmux mode) | `lib_ralph_desk.zsh:367` write_cost_log call coverage |
|
|
19
|
+
|
|
20
|
+
## PR 분할 결정 (v2)
|
|
21
|
+
|
|
22
|
+
Architect 권고에 따라 **PR-A(protocol) + PR-B(runtime) 2-PR 분할** 채택. 사용자가 "단일 PR" 명시한 경우에도 R7 schema collision (R3 와 silent fallback 위험) 때문에 분리 필요.
|
|
23
|
+
|
|
24
|
+
- **PR-A (protocol/contract)**: R5 + R6 + R7 + governance §1f/§7f/§7g + us017/us018/us019
|
|
25
|
+
- **PR-B (runtime/state)**: R8 + R9 + R10 + R11 + governance §8/§7a + us020/us021/us022/us023
|
|
26
|
+
- 자가검증 mapping 시나리오는 양 PR 모두 포함 (각 PR 의 fix 만 evaluate). 최종 self-verification (7/7) 은 PR-B merge 후 별도.
|
|
27
|
+
|
|
28
|
+
단, 사용자 직접 "단일 PR" 재요청 시 single PR 로 진행하되 self-verification 시나리오를 더 강화 (per-row mechanical assertion 필수).
|
|
29
|
+
|
|
30
|
+
## RALPLAN-DR
|
|
31
|
+
|
|
32
|
+
**Principles** (4):
|
|
33
|
+
1. **Fail loud, not silent** — A4 fallback / block-as-success / cross-mission leak / cost-log silence 모두 silent failure 패턴.
|
|
34
|
+
2. **Backward-compat first** — verify_partial 신규 status 의 기존 wrapper malformed 처리 명시. test-spec lint warn-then-strict 단계 진화.
|
|
35
|
+
3. **Minimal blast radius** — PR split + per-fix helper 분리. 각 fix 의 회귀는 독립 us_test.
|
|
36
|
+
4. **Self-verification mechanical** — 변경 사항 X가 자가검증 시나리오 Y에서 실제 트리거되었음을 grep+exit-code 로 증명.
|
|
37
|
+
|
|
38
|
+
**Decision Drivers**:
|
|
39
|
+
1. consumer wrapper 가 동일 패턴(83% A4 fallback, cross-mission leak, contract defect silent loop) 재발 차단.
|
|
40
|
+
2. 7-mission autonomous run 후 debug.log [FLOW] events 가 의미 있는 summary 보유 + audit log auditable.
|
|
41
|
+
3. cost-log 빈 파일 = "broken logging" 분류 가능, audit pipeline 신뢰성.
|
|
42
|
+
|
|
43
|
+
**Viable Options 비교**:
|
|
44
|
+
|
|
45
|
+
(아래 옵션 비교 v1 과 동일하나 Critic ITERATE 흡수 패치 추가)
|
|
46
|
+
|
|
47
|
+
- **R7 verify_partial schema malformed 처리 (Architect issue #2)**: `verify_partial` 인데 `verified_acs` 미존재/빈 배열 → `status='blocked'`, `reason='verify_partial_malformed'` 으로 다운그레이드. Worker autonomy 위배 차단.
|
|
48
|
+
- **R8 helper-side guard (Critic R8 + Architect issue #3)**: Verifier 의 mtime check 만으로 부족. `write_blocked_sentinel` 자체에 hygiene check 추가 — memory.md/latest.md mtime 이 sentinel 작성 시각보다 오래됐으면 (즉 worker 가 hygiene update 안 했으면) sentinel JSON 에 `meta.blocked_hygiene_violated=true` 자동 첨부. Worker 가 잊어도 verifier 가 즉시 인지.
|
|
49
|
+
- **R9 reason canonicalization (Architect issue #3)**: `_canonical_block_reason()` helper — hygiene wrapper prefix("hygiene_violated:", "wrapped:") strip 후 비교. R8 hygiene_violated 가 R9 counter 우회 차단.
|
|
50
|
+
- **R9 edge cases (Critic R9)**: 첫-iter block / mission setup block 은 `infra_failure` reason 으로 분류된 경우 counter 증가 안 함 (mission abort 부적절). 명시 exempt.
|
|
51
|
+
- **R10 normalized extractor + quarantine (Architect issue #4 + Critic R10)**: `grep -qE "^## $stale_us[: ]"` 대신 `awk '/^##[[:space:]]+(US-[0-9]+)([[:space:]:-]|$)/'` 로 정규화 추출 (PRD heading variation 대응). `rm -f` 대신 `mv` to `.sisyphus/quarantine/` (silent destructive 차단).
|
|
52
|
+
- **R11 trap-based final write (Architect issue #6 + Critic R11)**: init placeholder 폐기. zsh `trap 'write_cost_log "$ITERATION" || true' EXIT` 추가 + early-exit path grep inventory 회귀로 보장.
|
|
53
|
+
- **Self-verification per-row functions (Architect issue #5 + Critic Self-V)**: 단일 monolithic script 대신 7 함수 (`test_r5_a4_audit_triggered`, …) + 각 함수 내 pre/post 카운터 + grep 로 변경 함수 호출 증명.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 해결 계획 (v2 patches highlighted)
|
|
58
|
+
|
|
59
|
+
### Fix R5: P0-D — A4 fallback 추적 + worker prompt 강화
|
|
60
|
+
|
|
61
|
+
**대상**:
|
|
62
|
+
1. `src/scripts/run_ralph_desk.zsh:1587-1595` + `:526-546` — A4 fallback 발동 시 audit log entry 작성 (`a4-fallback-audit.jsonl`, append).
|
|
63
|
+
2. `src/scripts/init_ralph_desk.zsh` worker prompt — "Step N+1 (mandatory)" 추가 + auto-generated summary penalty 명시.
|
|
64
|
+
3. Verifier prompt — A4 fallback summary detection 시 verdict.meta.iter_signal_quality='auto_generated'.
|
|
65
|
+
4. governance §1f — A4 ratio 권고 (per-mission < 10%).
|
|
66
|
+
|
|
67
|
+
**검증 (us017) — Critic R5 patch 흡수**:
|
|
68
|
+
- AC1: a4-fallback-audit.jsonl entry 작성 (zsh fixture)
|
|
69
|
+
- **AC1+ (Critic R5)**: pre_count=$(wc -l a4-fallback-audit.jsonl), trigger fixture, assert post_count > pre_count + ratio 계산 정확.
|
|
70
|
+
- AC2: worker prompt grep "Step N+1" + "iter-signal.json with SPECIFIC summary" 존재
|
|
71
|
+
- AC3: governance §1f 에 "A4 ratio < 10%" 권고 텍스트 + 측정 방법 명시
|
|
72
|
+
- AC4 (신규): Verifier prompt 에 "auto_generated" detection 문장 + meta field 명시
|
|
73
|
+
|
|
74
|
+
### Fix R6: P1-F — test-spec ≥3/AC enforcement (warn default + strict opt-in)
|
|
75
|
+
|
|
76
|
+
**대상**:
|
|
77
|
+
1. `src/scripts/init_ralph_desk.zsh` — `_lint_test_density()` helper:
|
|
78
|
+
- PRD AC count 추출 (per-US, `^- AC[0-9]+:` regex)
|
|
79
|
+
- test-spec test count 추출 (per-US, `^### Test ` 또는 `^\*\*T-` 헤더 카운트)
|
|
80
|
+
- ratio < 3 시: WARN(default) → log_warn + audit + **init exit message 마지막에 summary 표시 (Critic R6 patch)**; STRICT(`--test-density-strict`) → exit 1.
|
|
81
|
+
2. `src/scripts/run_ralph_desk.zsh` + `src/node/run.mjs` — `--test-density-strict` flag stub.
|
|
82
|
+
3. governance §7f — Test Density Enforcement (WARN+STRICT decision tree).
|
|
83
|
+
4. Worker prompt — "≥3 tests/AC (happy + negative + boundary) 강제" 강화.
|
|
84
|
+
|
|
85
|
+
**검증 (us018) — Critic R6 patch 흡수**:
|
|
86
|
+
- AC1: `--test-density-strict` 플래그 파싱 (zsh + Node)
|
|
87
|
+
- AC2: WARN default — ratio<3 fixture 에서 init exit=0 + audit log entry **+ stderr/stdout 마지막 라인에 "Test density warning: US-XXX has N tests for M ACs (ratio=N/M < 3)" 메시지 포함**
|
|
88
|
+
- AC3: STRICT — ratio<3 fixture 에서 init exit=1 + 동일 메시지
|
|
89
|
+
- AC4: governance §7f 텍스트 정합 (Decision tree, downgrade 없음)
|
|
90
|
+
|
|
91
|
+
### Fix R7: P1-G — verify_partial signal vocabulary
|
|
92
|
+
|
|
93
|
+
**대상 (Critic R7 + Architect issue #2 patches)**:
|
|
94
|
+
1. `src/scripts/init_ralph_desk.zsh:448` Signal rules — verify_partial + 필수 필드 명시.
|
|
95
|
+
2. `src/scripts/init_ralph_desk.zsh build_verifier_prompt` 함수 (or equivalent prompt heredoc) — 정확 문장 추가:
|
|
96
|
+
```
|
|
97
|
+
If signal status=verify_partial, evaluate ONLY verified_acs. Treat deferred_acs as out-of-scope (not fail).
|
|
98
|
+
```
|
|
99
|
+
3. `src/node/runner/campaign-main-loop.mjs` 신호 파싱 — verify_partial + verified_acs 미존재/빈 배열 시:
|
|
100
|
+
```js
|
|
101
|
+
if (signalStatus === 'verify_partial' && (!Array.isArray(signal.verified_acs) || signal.verified_acs.length === 0)) {
|
|
102
|
+
// Downgrade to blocked
|
|
103
|
+
await writeSentinel(blockedSentinel, 'blocked', usId, 'verify_partial_malformed', { reason_category: 'mission_abort', recoverable: true, suggested_action: 'retry_after_fix' });
|
|
104
|
+
continue;
|
|
105
|
+
}
|
|
106
|
+
```
|
|
107
|
+
4. `src/scripts/run_ralph_desk.zsh:1313+` — verify_partial 동등 처리 (zsh 측 fallback).
|
|
108
|
+
5. governance §7g 신규 — Signal Vocabulary Extension + malformed downgrade 명시.
|
|
109
|
+
|
|
110
|
+
**검증 (us019)**:
|
|
111
|
+
- AC1: Signal rules grep verify_partial + verified_acs/deferred_acs/defer_reason
|
|
112
|
+
- AC2: governance §7g 정합 + malformed downgrade 명시
|
|
113
|
+
- AC3: Node 파서 verify_partial→verified_acs 만 verifier prompt 전달 (behavioural fixture)
|
|
114
|
+
- AC4: zsh 파서 verify_partial 인지
|
|
115
|
+
- **AC5 (Architect issue #2)**: malformed fixture (verify_partial + verified_acs=[]) → blocked sentinel 작성 + reason='verify_partial_malformed' + reason_category='mission_abort'
|
|
116
|
+
- **AC6 (Critic R7)**: Verifier prompt 에 정확 sentence 존재 (grep)
|
|
117
|
+
|
|
118
|
+
### Fix R8: P1-H — Blocked exit hygiene + helper-side guard
|
|
119
|
+
|
|
120
|
+
**대상 (Critic R8 + Architect issue #3 patches)**:
|
|
121
|
+
1. `src/scripts/init_ralph_desk.zsh` worker prompt — Blocked exit hygiene 섹션:
|
|
122
|
+
> "On blocked exit (status=blocked): BEFORE writing iter-signal.json, ALWAYS append to memory.md § Blocking History `{iter, us, reason, suggested_repair}` AND update latest.md § Known Issues."
|
|
123
|
+
2. **`src/scripts/lib_ralph_desk.zsh:write_blocked_sentinel` (Critic R8 patch)** — sentinel write 직전 hygiene check:
|
|
124
|
+
```zsh
|
|
125
|
+
local hygiene_violated=false
|
|
126
|
+
local mem_file="$DESK/memos/$SLUG-memory.md"
|
|
127
|
+
local lat_file="$DESK/context/$SLUG-latest.md"
|
|
128
|
+
local now_ts=$(date +%s)
|
|
129
|
+
for f in "$mem_file" "$lat_file"; do
|
|
130
|
+
if [[ -f "$f" ]]; then
|
|
131
|
+
local f_mtime=$(stat -f %m "$f" 2>/dev/null || stat -c %Y "$f" 2>/dev/null || echo 0)
|
|
132
|
+
if (( now_ts - f_mtime > 300 )); then
|
|
133
|
+
hygiene_violated=true
|
|
134
|
+
break
|
|
135
|
+
fi
|
|
136
|
+
fi
|
|
137
|
+
done
|
|
138
|
+
```
|
|
139
|
+
JSON sidecar 에 `meta.blocked_hygiene_violated=$hygiene_violated` 자동 첨부.
|
|
140
|
+
3. `src/node/runner/campaign-main-loop.mjs` `_checkBlockedHygiene()` helper — blocked write 시 동등 검사 + analytics event.
|
|
141
|
+
4. governance §1f — "5th channel: memory.md/latest.md hygiene update" 추가 (4 channels → 5 channels).
|
|
142
|
+
|
|
143
|
+
**검증 (us020)**:
|
|
144
|
+
- AC1: Worker prompt grep "Blocked exit hygiene" + "memory.md" + "latest.md"
|
|
145
|
+
- AC2: governance §1f grep "5th channel" + "memory.md/latest.md hygiene"
|
|
146
|
+
- AC3: Node helper `_checkBlockedHygiene` 정의 (grep)
|
|
147
|
+
- AC4: behavioural — fixture: stale memory.md (mtime > 5min ago) → blocked sentinel JSON sidecar 의 meta.blocked_hygiene_violated=true
|
|
148
|
+
- **AC5 (Critic R8)**: lib_ralph_desk.zsh write_blocked_sentinel 에 hygiene_violated 자동 첨부 grep + behavioural fixture
|
|
149
|
+
|
|
150
|
+
### Fix R9: P2-I — consecutive_blocks counter + canonicalization + edge cases
|
|
151
|
+
|
|
152
|
+
**대상 (Critic R9 + Architect issue #3 patches)**:
|
|
153
|
+
1. `src/scripts/run_ralph_desk.zsh` 변수:
|
|
154
|
+
```zsh
|
|
155
|
+
CONSECUTIVE_BLOCKS=0
|
|
156
|
+
LAST_BLOCK_REASON=""
|
|
157
|
+
BLOCK_CB_THRESHOLD="${BLOCK_CB_THRESHOLD:-3}"
|
|
158
|
+
```
|
|
159
|
+
2. **`_canonical_block_reason()` helper (Architect issue #3)**:
|
|
160
|
+
```zsh
|
|
161
|
+
_canonical_block_reason() {
|
|
162
|
+
local raw="$1"
|
|
163
|
+
# Strip wrapper prefixes
|
|
164
|
+
echo "$raw" | sed -E 's/^(hygiene_violated:|wrapped:)//' | head -c 80
|
|
165
|
+
}
|
|
166
|
+
```
|
|
167
|
+
3. **Edge case exemption (Critic R9)** — `infra_failure` category 또는 첫 iter block 은 counter 증가 안 함:
|
|
168
|
+
```zsh
|
|
169
|
+
if [[ "$reason_category" == "infra_failure" ]] || (( ITERATION <= 1 )); then
|
|
170
|
+
# Exempt from consecutive_blocks
|
|
171
|
+
LAST_BLOCK_REASON=""
|
|
172
|
+
CONSECUTIVE_BLOCKS=0
|
|
173
|
+
else
|
|
174
|
+
local canonical=$(_canonical_block_reason "$reason")
|
|
175
|
+
if [[ "$canonical" == "$LAST_BLOCK_REASON" ]]; then
|
|
176
|
+
CONSECUTIVE_BLOCKS=$((CONSECUTIVE_BLOCKS + 1))
|
|
177
|
+
else
|
|
178
|
+
CONSECUTIVE_BLOCKS=1
|
|
179
|
+
LAST_BLOCK_REASON="$canonical"
|
|
180
|
+
fi
|
|
181
|
+
if (( CONSECUTIVE_BLOCKS >= BLOCK_CB_THRESHOLD )); then
|
|
182
|
+
echo '{"reason":"consecutive_blocks","count":'"$CONSECUTIVE_BLOCKS"',"last_reason":"'"$LAST_BLOCK_REASON"'"}' | atomic_write "$DESK/.sisyphus/mission-abort.json"
|
|
183
|
+
exit 1
|
|
184
|
+
fi
|
|
185
|
+
fi
|
|
186
|
+
```
|
|
187
|
+
4. `src/node/runner/campaign-main-loop.mjs` 동등 (state.consecutive_blocks + last_block_reason + canonicalReason).
|
|
188
|
+
5. governance §8 — consecutive_blocks + canonicalization + exemption 명시.
|
|
189
|
+
|
|
190
|
+
**검증 (us021)**:
|
|
191
|
+
- AC1: BLOCK_CB_THRESHOLD 변수 정의 (default 3)
|
|
192
|
+
- AC2: zsh same-reason counter logic
|
|
193
|
+
- AC3: governance §8 텍스트 정합
|
|
194
|
+
- AC4: behavioural — 3회 동일 reason BLOCK 후 mission-abort.json 생성
|
|
195
|
+
- **AC5 (Architect issue #3)**: `_canonical_block_reason` helper 정의 + hygiene_violated prefix strip 검증
|
|
196
|
+
- **AC6 (Critic R9)**: 첫-iter block exempt fixture (ITERATION=1, reason="setup_fail") → CONSECUTIVE_BLOCKS=0 유지
|
|
197
|
+
- **AC7 (Critic R9)**: infra_failure category exempt fixture → CONSECUTIVE_BLOCKS=0 유지
|
|
198
|
+
|
|
199
|
+
### Fix R10: P2-J — Cross-mission us_id leak + normalized extractor + quarantine
|
|
200
|
+
|
|
201
|
+
**대상 (Critic R10 + Architect issue #4 patches)**:
|
|
202
|
+
1. `src/scripts/init_ralph_desk.zsh` mission init — stale us_id detect + scrub:
|
|
203
|
+
```zsh
|
|
204
|
+
if [[ -f "$SIGNAL_FILE" ]]; then
|
|
205
|
+
stale_us=$(jq -r '.us_id // empty' "$SIGNAL_FILE" 2>/dev/null)
|
|
206
|
+
if [[ -n "$stale_us" && "$stale_us" != "ALL" ]]; then
|
|
207
|
+
# Critic R10: normalized US extractor
|
|
208
|
+
prd_us_list=$(awk 'match($0, /^##[[:space:]]+(US-[0-9]+)([[:space:]:-]|$)/, m) { print m[1] }' "$PRD_FILE" 2>/dev/null | sort -u)
|
|
209
|
+
if ! echo "$prd_us_list" | grep -qx "$stale_us"; then
|
|
210
|
+
# Architect issue #4: quarantine, not rm
|
|
211
|
+
mkdir -p "$DESK/.sisyphus/quarantine"
|
|
212
|
+
mv "$SIGNAL_FILE" "$DESK/.sisyphus/quarantine/iter-signal.$(date +%s).json"
|
|
213
|
+
log " Cross-mission stale us_id ($stale_us) — quarantined to .sisyphus/quarantine/"
|
|
214
|
+
fi
|
|
215
|
+
fi
|
|
216
|
+
fi
|
|
217
|
+
```
|
|
218
|
+
단, BSD awk match() 3-arg 미지원 → `match() + RSTART/RLENGTH + substr()` pattern 또는 `grep -oE` + 후처리 사용:
|
|
219
|
+
```zsh
|
|
220
|
+
prd_us_list=$(grep -oE '^##[[:space:]]+US-[0-9]+([[:space:]:-]|$)' "$PRD_FILE" 2>/dev/null | grep -oE 'US-[0-9]+' | sort -u)
|
|
221
|
+
```
|
|
222
|
+
2. `src/scripts/run_ralph_desk.zsh:2425-2429` final ALL verify scope — US_LIST 만 신뢰 (signal_us_id US_LIST 에 없으면 무시 + warn).
|
|
223
|
+
3. `src/node/runner/campaign-main-loop.mjs` — 동등 처리.
|
|
224
|
+
4. governance §7a — cross-mission us_id leak 방어 + quarantine path 명시.
|
|
225
|
+
|
|
226
|
+
**검증 (us022)**:
|
|
227
|
+
- AC1: init 단계 stale us_id detect + quarantine helper (grep + behavioural)
|
|
228
|
+
- AC2: zsh runner final ALL verify US_LIST 신뢰
|
|
229
|
+
- AC3: governance §7a 텍스트 정합 + quarantine path
|
|
230
|
+
- AC4: behavioural — fixture mission PRD (US-001~003) + stale signal us_id=US-005 → SIGNAL_FILE quarantine 이동, .sisyphus/quarantine/ 에 파일 존재
|
|
231
|
+
- **AC5 (Architect issue #4)**: rm -f 사용 안 함 (`grep -n "rm -f.*SIGNAL_FILE" src/scripts/init_ralph_desk.zsh` = 0)
|
|
232
|
+
- **AC6 (Critic R10)**: PRD heading variation fixture (`## US-005 -`, `## US-005:`, `## US-005`) → 모두 정상 인식 (false positive 0)
|
|
233
|
+
|
|
234
|
+
### Fix R11: P2-K — Cost log non-empty + trap-based final write + early-exit inventory
|
|
235
|
+
|
|
236
|
+
**대상 (Critic R11 + Architect issue #6 patches)**:
|
|
237
|
+
1. `src/scripts/lib_ralph_desk.zsh:367` write_cost_log — note 필드 (bytes=0 시 'no_actual_usage_recorded').
|
|
238
|
+
2. **`src/scripts/run_ralph_desk.zsh` (Architect issue #6)** — main loop 진입 직후 trap 등록:
|
|
239
|
+
```zsh
|
|
240
|
+
trap '_emit_final_cost_log' EXIT
|
|
241
|
+
_emit_final_cost_log() {
|
|
242
|
+
[[ -n "${ITERATION:-}" ]] && [[ "${COST_LOG_FINAL_WRITTEN:-0}" -eq 0 ]] && {
|
|
243
|
+
write_cost_log "$ITERATION" 2>/dev/null || true
|
|
244
|
+
COST_LOG_FINAL_WRITTEN=1
|
|
245
|
+
}
|
|
246
|
+
}
|
|
247
|
+
```
|
|
248
|
+
3. **Early-exit path inventory (Critic R11 + Architect NEW-2)** — us023 회귀가 다음 broadened grep 결과의 모든 path 가 trap coverage 내인지 검증:
|
|
249
|
+
```bash
|
|
250
|
+
grep -nE '^[[:space:]]*(exit\b|return\b|die\b)' src/scripts/run_ralph_desk.zsh src/scripts/lib_ralph_desk.zsh | grep -v '^[^:]*:[^:]*:.*\${' > early_exits.txt
|
|
251
|
+
```
|
|
252
|
+
`die` wrapper 함수가 `lib_ralph_desk.zsh` 에 정의된 경우 명시적으로 trap 우회 분석 + 회귀에 포함.
|
|
253
|
+
4. (init placeholder 삭제 — Architect issue #6) — 빈 cost-log 가 "broken logging" 으로 감지되도록 normal path 만 보강.
|
|
254
|
+
5. governance §7 Cost Tracking — tmux estimated path + trap 명시.
|
|
255
|
+
|
|
256
|
+
**검증 (us023)**:
|
|
257
|
+
- AC1: write_cost_log 에 note 필드 (bytes=0 시 'no_actual_usage_recorded')
|
|
258
|
+
- AC2: zsh runner 에 `trap '_emit_final_cost_log' EXIT` 존재 (grep)
|
|
259
|
+
- AC3: behavioural — write_cost_log 호출 후 cost-log.jsonl 비어있지 않음
|
|
260
|
+
- **AC4 (Critic R11)**: early-exit grep inventory + 모든 path 가 trap coverage 검증 (스크립트 내 모든 `exit N` 또는 `return N` 위치 grep + trap fire 시점 비교)
|
|
261
|
+
- **AC5 (Architect issue #6)**: init placeholder 코드 부재 (grep `placeholder.*cost-log` = 0)
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## 자가검증 시나리오 — Mechanical per-row (v2)
|
|
266
|
+
|
|
267
|
+
`tests/test_self_verification_0_11_handoff.sh` — 7 함수 + 각 함수 내 pre/post + grep 증명:
|
|
268
|
+
|
|
269
|
+
```bash
|
|
270
|
+
test_r5_a4_audit_triggered() {
|
|
271
|
+
local audit="$LOGS_DIR/a4-fallback-audit.jsonl"
|
|
272
|
+
local pre=$(wc -l < "$audit" 2>/dev/null || echo 0)
|
|
273
|
+
# Trigger: simulate done-claim without iter-signal
|
|
274
|
+
echo '{"us_id":"US-001","status":"complete"}' > "$DESK/memos/${SLUG}-done-claim.json"
|
|
275
|
+
rm -f "$DESK/memos/${SLUG}-iter-signal.json"
|
|
276
|
+
# NEW-1 (Architect): zsh fixture invocation (run_ralph_desk.zsh is zsh, NOT bash)
|
|
277
|
+
# us017 implementation MUST extract A4 fallback into a callable helper in lib_ralph_desk.zsh
|
|
278
|
+
# so it can be sourced cleanly. Until then, use zsh -c with explicit DESK/SLUG/ITERATION exports.
|
|
279
|
+
zsh -c "DESK='$DESK' SLUG='$SLUG' ITERATION=1 LOGS_DIR='$LOGS_DIR' source src/scripts/lib_ralph_desk.zsh; _emit_a4_fallback_audit US-001 1" 2>/dev/null
|
|
280
|
+
local post=$(wc -l < "$audit" 2>/dev/null || echo 0)
|
|
281
|
+
[[ "$post" -gt "$pre" ]] || { fail "R5 A4 audit not triggered (pre=$pre post=$post)"; return 1; }
|
|
282
|
+
# Mechanical: grep that the patched code path was exercised
|
|
283
|
+
grep -q "a4_fallback" "$audit" || { fail "R5 audit entry missing"; return 1; }
|
|
284
|
+
pass "R5 A4 fallback audit triggered ($pre→$post)"
|
|
285
|
+
}
|
|
286
|
+
|
|
287
|
+
test_r6_test_density_warn() {
|
|
288
|
+
# Fixture: PRD with 3 ACs, test-spec with 1 test
|
|
289
|
+
local stderr_capture=$(./init_ralph_desk.zsh --slug test-r6 --prd fixtures/r6-bad-prd.md 2>&1)
|
|
290
|
+
echo "$stderr_capture" | grep -q "Test density warning" || { fail "R6 init exit message missing warning"; return 1; }
|
|
291
|
+
pass "R6 test density warning emitted to stderr"
|
|
292
|
+
}
|
|
293
|
+
|
|
294
|
+
# ... R7~R11 동일 패턴: 각 함수가 (1) pre-state 캡처, (2) 변경 코드 직접 invoke, (3) post-state grep 검증
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
| Fix | 시나리오 | Mechanical 증명 |
|
|
298
|
+
|---|---|---|
|
|
299
|
+
| R5 P0-D | done-claim 작성 + iter-signal 누락 → A4 fallback 발동 | `wc -l a4-fallback-audit.jsonl` pre/post 비교 + entry grep |
|
|
300
|
+
| R6 P1-F | test-spec AC 3개 + test 1개 fixture | stderr 의 "Test density warning" 라인 grep |
|
|
301
|
+
| R7 P1-G | iter-signal status=verify_partial fixture (정상 + malformed) | verifier prompt grep `verified_acs only` + malformed → blocked sentinel meta.reason='verify_partial_malformed' |
|
|
302
|
+
| R8 P1-H | blocked sentinel + memory.md unchanged 5min+ | sentinel JSON sidecar `meta.blocked_hygiene_violated=true` jq 추출 |
|
|
303
|
+
| R9 P2-I | 동일 reason 3회 BLOCK + canonicalization + edge cases | mission-abort.json 존재 + jq `.count==3` + first-iter exempt fixture CONSECUTIVE_BLOCKS=0 검증 |
|
|
304
|
+
| R10 P2-J | PRD US-001~003 + stale signal us_id=US-005 + heading variation | `.sisyphus/quarantine/iter-signal.*.json` 존재 + 원본 SIGNAL_FILE 부재 + 3 variation fixture 정상 인식 |
|
|
305
|
+
| R11 P2-K | tmux mode 5 iter run + early-exit fixture | `cost-log.jsonl` 행 수 ≥ 5 + 모두 note 필드 보유 + trap fire 검증 |
|
|
306
|
+
|
|
307
|
+
**Pass criterion**: 7/7 mechanical 증명 + 각 fix 가 변경된 함수/파일을 실제 호출했음을 grep 으로 확인 (tautology 방지).
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
## 변경 대상 파일 표
|
|
312
|
+
|
|
313
|
+
```
|
|
314
|
+
src/scripts/init_ralph_desk.zsh # R5(worker prompt), R6(test density lint + flag), R7(Signal rules + verifier prompt), R8(blocked exit hygiene), R10(stale us_id quarantine)
|
|
315
|
+
src/scripts/run_ralph_desk.zsh # R5(A4 audit), R6(--test-density-strict), R7(verify_partial parsing), R9(consecutive_blocks + canonical + exempt), R10(US_LIST scope), R11(trap)
|
|
316
|
+
src/scripts/lib_ralph_desk.zsh # R8(write_blocked_sentinel hygiene_violated), R11(write_cost_log note + bytes=0 path)
|
|
317
|
+
src/node/run.mjs # R6(--test-density-strict stub)
|
|
318
|
+
src/node/runner/campaign-main-loop.mjs # R7(verify_partial parser + malformed downgrade), R8(_checkBlockedHygiene), R9(consecutive_blocks state), R10(stale us_id scrub)
|
|
319
|
+
src/governance.md # R5(§1f A4 metric), R6(§7f Test Density), R7(§7g Signal Vocabulary + malformed), R8(§1f 5th channel), R9(§8 cb + canonicalization + exempt), R10(§7a quarantine)
|
|
320
|
+
|
|
321
|
+
[테스트]
|
|
322
|
+
tests/test_us017_a4_fallback_audit.sh
|
|
323
|
+
tests/test_us018_test_density.sh
|
|
324
|
+
tests/test_us019_verify_partial.sh
|
|
325
|
+
tests/test_us020_blocked_hygiene.sh
|
|
326
|
+
tests/test_us021_consecutive_blocks.sh
|
|
327
|
+
tests/test_us022_cross_mission_us_leak.sh
|
|
328
|
+
tests/test_us023_cost_log_nonempty.sh
|
|
329
|
+
tests/test_self_verification_0_11_handoff.sh # mechanical per-row
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
## 검증 (Self-Verification Gate)
|
|
333
|
+
|
|
334
|
+
1. **LOW** — `zsh -n` + `node --check` (~10s)
|
|
335
|
+
2. **MEDIUM** — us017~us023 7 신규 회귀 (~3min)
|
|
336
|
+
3. **CRITICAL** — us001/us007/us012/us013/us014/us015/us016 회귀 무손실 (~3min)
|
|
337
|
+
4. **자가검증 매핑 시나리오** — `test_self_verification_0_11_handoff.sh` 7/7 mechanical 증명
|
|
338
|
+
|
|
339
|
+
## 단일 PR 진행 결정 (사용자 명시 시)
|
|
340
|
+
|
|
341
|
+
사용자가 PR split 거부 + 단일 PR 명시한 경우:
|
|
342
|
+
- R5+R6+R7 (protocol) + R8+R9+R10+R11 (runtime) 단일 PR
|
|
343
|
+
- self-verification 시나리오는 양 영역 모두 포함하므로 보장 유지
|
|
344
|
+
- 단, codex review iteration 5+ 도달 시 split fallback 자동 트리거
|
|
345
|
+
|
|
346
|
+
## ADR (간결)
|
|
347
|
+
|
|
348
|
+
- **Decision**: 7건 fix. v2 patches: PR split 권고 (사용자 명시 시 단일), R7 schema fallback (verify_partial_malformed downgrade), R8 helper-side hygiene check, R9 canonical reason + edge exempt, R10 normalized extractor + quarantine, R11 trap-based final write + early-exit inventory, self-verification mechanical per-row.
|
|
349
|
+
- **Drivers**: silent failure 가시화 + backward-compat + minimal blast radius + mechanical self-verification.
|
|
350
|
+
- **Alternatives considered (각 R 별 v1 표 + v2 새 patches)**.
|
|
351
|
+
- **Consequences**: PR-A 먼저 머지 + soak → PR-B (권고). 단일 PR 도 가능. Worker prompt 길이 약간 증가. test-spec WARN 다수 발생 가능 (점진 strict 화).
|
|
352
|
+
- **Follow-ups**: test-density STRICT 의 default 화 (v0.12+), verify_partial deferred_acs 자동 우선 재시도, A4 fallback 0% 시 hard fail.
|