@ai-dev-methodologies/rlp-desk 0.4.0 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +145 -69
- package/docs/plans/cozy-gliding-trinket.md +53 -0
- package/docs/plans/keen-sauteeing-snowflake.md +245 -0
- package/docs/plans/toasty-whistling-diffie-agent-a6814625642e956da.md +201 -0
- package/docs/plans/toasty-whistling-diffie.md +117 -0
- package/docs/prompts/ralplan-codex-review.md +1 -1
- package/install.sh +5 -0
- package/package.json +1 -1
- package/scripts/postinstall.js +5 -0
- package/scripts/uninstall.js +1 -0
- package/src/commands/rlp-desk.md +193 -51
- package/src/governance.md +28 -10
- package/src/model-upgrade-table.md +50 -0
- package/src/scripts/init_ralph_desk.zsh +200 -19
- package/src/scripts/lib_ralph_desk.zsh +838 -0
- package/src/scripts/run_ralph_desk.zsh +821 -608
package/README.md
CHANGED
|
@@ -99,6 +99,22 @@ for iteration in 1..max_iter:
|
|
|
99
99
|
8. Update status, report to user, continue or stop
|
|
100
100
|
```
|
|
101
101
|
|
|
102
|
+
### Live PRD Update
|
|
103
|
+
|
|
104
|
+
The Leader computes a hash for `prd-<slug>.md` at startup and again at each iteration using `md5`.
|
|
105
|
+
|
|
106
|
+
When the hash changes, it:
|
|
107
|
+
|
|
108
|
+
- Logs `prd_changed=true` with `prd_hash`, previous/new US counts, and `new_us`
|
|
109
|
+
- Splits the PRD into per-US files (`prd-<slug>-US-<id>.md`)
|
|
110
|
+
- Splits the test-spec into per-US files (`test-spec-<slug>-US-<id>.md`)
|
|
111
|
+
- Updates the in-memory PRD US list used for per-US dispatch
|
|
112
|
+
- Adds `NOTE: PRD was updated since last iteration. New/changed US may exist.` to the Worker prompt
|
|
113
|
+
|
|
114
|
+
If the PRD hash is unchanged, `prd_changed=false` is logged and no re-split is triggered.
|
|
115
|
+
|
|
116
|
+
If the PRD file is missing, the process degrades gracefully and continues without failing the campaign loop.
|
|
117
|
+
|
|
102
118
|
### Verification Policy (v0.3.0)
|
|
103
119
|
|
|
104
120
|
RLP Desk enforces a comprehensive verification policy defined in `governance.md`:
|
|
@@ -133,15 +149,75 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
|
|
|
133
149
|
| 3 consecutive failures | Architecture Escalation (§7¾) → report to user |
|
|
134
150
|
| Max iterations reached | TIMEOUT |
|
|
135
151
|
|
|
136
|
-
###
|
|
152
|
+
### Verification Strategy (v0.5)
|
|
153
|
+
|
|
154
|
+
**Core principle: Worker and Verifier use different AI engines whenever possible.**
|
|
155
|
+
|
|
156
|
+
- Per-US: lightweight verification after each user story (catches issues early)
|
|
157
|
+
- Final: top-tier consensus gate before COMPLETE (quality guarantee)
|
|
158
|
+
- Progressive upgrade: auto-upgrade models on consecutive failure (2-attempt windows)
|
|
159
|
+
- Verifier minimum: claude sonnet (haiku cannot verify)
|
|
160
|
+
|
|
161
|
+
#### 1. Claude-only (codex not installed)
|
|
162
|
+
|
|
163
|
+
Verifier is always +1 tier above Worker. Same-engine shares blind spots — install codex for improved detection.
|
|
164
|
+
|
|
165
|
+
| Risk | Worker | Per-US Verifier | Worker upgrade path | Verifier upgrade path |
|
|
166
|
+
|------|--------|-----------------|--------------------|-----------------------|
|
|
167
|
+
| LOW | haiku | sonnet | sonnet → opus | sonnet → opus |
|
|
168
|
+
| MEDIUM | sonnet | sonnet | opus | sonnet → opus |
|
|
169
|
+
| HIGH | sonnet | opus | opus | opus (ceiling) |
|
|
170
|
+
| CRITICAL | opus | opus ⚠ | (ceiling) | (ceiling) |
|
|
171
|
+
|
|
172
|
+
Final: **opus solo** ⚠ same-engine warning displayed
|
|
173
|
+
|
|
174
|
+
#### 2. Cross-engine: GPT Pro (spark + 5.4)
|
|
175
|
+
|
|
176
|
+
Spark is speed-optimized for coding. Use as Worker for LOW-HIGH; 5.4 for CRITICAL.
|
|
177
|
+
|
|
178
|
+
| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
|
|
179
|
+
|------|---------------|--------------------------|--------------------|-----------------------|
|
|
180
|
+
| LOW | spark medium | sonnet | spark high → xhigh | sonnet → opus |
|
|
181
|
+
| MEDIUM | spark high | sonnet | spark xhigh → 5.4 medium | sonnet → opus |
|
|
182
|
+
| HIGH | spark xhigh | opus | 5.4 high → 5.4 xhigh | opus (ceiling) |
|
|
183
|
+
| CRITICAL | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
|
|
184
|
+
|
|
185
|
+
Final: **opus + 5.4 high** (both must PASS)
|
|
186
|
+
|
|
187
|
+
#### 3. Cross-engine: Non-Pro (5.4 only)
|
|
188
|
+
|
|
189
|
+
| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
|
|
190
|
+
|------|---------------|--------------------------|--------------------|-----------------------|
|
|
191
|
+
| LOW | 5.4 low | sonnet | 5.4 medium → high | sonnet → opus |
|
|
192
|
+
| MEDIUM | 5.4 medium | sonnet | 5.4 high → xhigh | sonnet → opus |
|
|
193
|
+
| HIGH | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
|
|
194
|
+
| CRITICAL | 5.4 xhigh | opus | (ceiling) | opus (ceiling) |
|
|
195
|
+
|
|
196
|
+
Final: **opus + 5.4 high** (both must PASS)
|
|
197
|
+
|
|
198
|
+
#### Final Verify
|
|
199
|
+
|
|
200
|
+
| Environment | Engine 1 | Engine 2 | Rule |
|
|
201
|
+
|-------------|----------|----------|------|
|
|
202
|
+
| Claude-only | opus | — | Solo ⚠ |
|
|
203
|
+
| Cross-engine | opus | 5.4 high | Both must PASS → COMPLETE |
|
|
204
|
+
|
|
205
|
+
#### Progressive Upgrade (Worker Only)
|
|
206
|
+
|
|
207
|
+
Worker auto-upgrades on consecutive same-US failure. Verifier is fixed at campaign start. CB default: 6.
|
|
208
|
+
|
|
209
|
+
```
|
|
210
|
+
fail 1-2: keep current model (2-attempt window)
|
|
211
|
+
fail 3-4: upgrade 1 step (e.g., haiku → sonnet)
|
|
212
|
+
fail 5-6: upgrade 2 steps (e.g., haiku → opus)
|
|
213
|
+
fail 7+: ceiling reached → BLOCKED
|
|
214
|
+
```
|
|
137
215
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
| Verification (default) | `opus` |
|
|
144
|
-
| Lightweight verification | `sonnet` |
|
|
216
|
+
See `src/model-upgrade-table.md` for full upgrade paths per engine and complexity level.
|
|
217
|
+
|
|
218
|
+
#### Sequential Final Verify
|
|
219
|
+
|
|
220
|
+
When all US pass individually, the final ALL verify runs **sequentially per-US** instead of one big check. This prevents verifier timeout on large PRDs. After all per-US checks pass, the project's test suite runs once as a cross-US integration check.
|
|
145
221
|
|
|
146
222
|
## Commands
|
|
147
223
|
|
|
@@ -159,18 +235,29 @@ RLP Desk enforces a comprehensive verification policy defined in `governance.md`
|
|
|
159
235
|
| Flag | Default | Description |
|
|
160
236
|
|------|---------|-------------|
|
|
161
237
|
| `--max-iter N` | 100 | Maximum iterations before timeout |
|
|
162
|
-
| `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
|
|
163
|
-
| `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
|
|
164
238
|
| `--mode agent\|tmux` | agent | Execution mode (see below) |
|
|
165
|
-
| `--worker-
|
|
166
|
-
| `--
|
|
167
|
-
| `--
|
|
168
|
-
| `--
|
|
239
|
+
| `--worker-model MODEL` | sonnet | Claude worker model (haiku/sonnet/opus) |
|
|
240
|
+
| `--worker-engine claude\|codex` | claude | Worker engine |
|
|
241
|
+
| `--verifier-model MODEL` | auto | Auto-selected: +1 tier (same-engine) or cross-engine |
|
|
242
|
+
| `--verifier-engine claude\|codex` | auto | Opposite of worker engine if codex available |
|
|
243
|
+
| `--codex-model MODEL` | gpt-5.4 | Codex model (spark requires GPT Pro) |
|
|
244
|
+
| `--codex-reasoning LEVEL` | medium | low/medium/high/xhigh |
|
|
169
245
|
| `--verify-mode per-us\|batch` | per-us | Verification strategy (see below) |
|
|
170
|
-
| `--
|
|
246
|
+
| `--lock-worker-model` | off | Disable progressive model upgrade on failure |
|
|
171
247
|
| `--debug` | off | Debug logging to `logs/<slug>/debug.log` |
|
|
172
248
|
| `--with-self-verification` | off | Campaign-level post-loop analysis report |
|
|
173
249
|
|
|
250
|
+
### Init Presets
|
|
251
|
+
|
|
252
|
+
After `brainstorm`, `init` detects your environment and presents run command presets:
|
|
253
|
+
|
|
254
|
+
- **Codex detected** → recommends cross-engine mode (`--worker-model gpt-5.4:high --verify-consensus`)
|
|
255
|
+
- **GPT Pro (spark)** → offers spark preset (`--worker-model gpt-5.3-codex-spark:high`)
|
|
256
|
+
- **Claude-only** → defaults to `--worker-model sonnet` with opus verifier
|
|
257
|
+
- **Basic** → minimal flags for quick iteration
|
|
258
|
+
|
|
259
|
+
The brainstorm phase evaluates complexity (US count, file scope, logic, dependencies, code impact) and recommends a starting model. You can override any recommendation.
|
|
260
|
+
|
|
174
261
|
## Execution Modes
|
|
175
262
|
|
|
176
263
|
RLP Desk supports two execution modes. Both honor the same governance protocol.
|
|
@@ -277,28 +364,18 @@ Uses the `codex` CLI via `Bash()` (agent mode) or as an interactive TUI (tmux mo
|
|
|
277
364
|
|
|
278
365
|
## Verification Modes
|
|
279
366
|
|
|
280
|
-
RLP Desk supports two verification strategies. **Per-US is the default.**
|
|
281
|
-
|
|
282
367
|
### Per-US Verification (default)
|
|
283
368
|
|
|
284
|
-
|
|
285
|
-
/rlp-desk run calculator
|
|
286
|
-
/rlp-desk run calculator --verify-mode per-us
|
|
287
|
-
```
|
|
288
|
-
|
|
289
|
-
Each user story is verified independently after completion, then a final full verification runs after all stories pass:
|
|
369
|
+
Each user story is verified independently, then a final full verification runs:
|
|
290
370
|
|
|
291
371
|
```
|
|
292
|
-
Worker: US-001 → Verifier: US-001
|
|
293
|
-
Worker: US-002 → Verifier: US-002
|
|
294
|
-
|
|
295
|
-
Final
|
|
372
|
+
Worker: US-001 → Verifier(per-US): US-001 only → pass
|
|
373
|
+
Worker: US-002 → Verifier(per-US): US-002 only → pass
|
|
374
|
+
...
|
|
375
|
+
Final Verify: opus + 5.4 high → both pass → COMPLETE
|
|
296
376
|
```
|
|
297
377
|
|
|
298
|
-
|
|
299
|
-
- Catch issues early, before later stories build on broken foundations
|
|
300
|
-
- Smaller verification scope = faster, more accurate checks
|
|
301
|
-
- Failed verification retries only the specific US
|
|
378
|
+
Per-US catches issues early before later stories build on broken foundations.
|
|
302
379
|
|
|
303
380
|
### Batch Verification
|
|
304
381
|
|
|
@@ -306,30 +383,7 @@ Benefits:
|
|
|
306
383
|
/rlp-desk run calculator --verify-mode batch
|
|
307
384
|
```
|
|
308
385
|
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
### Cross-Engine Consensus Verification
|
|
312
|
-
|
|
313
|
-
```
|
|
314
|
-
/rlp-desk run calculator --verify-consensus
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
When enabled, **both claude and codex verify independently**. Both must pass for verification to succeed.
|
|
318
|
-
|
|
319
|
-
```
|
|
320
|
-
Worker completes US → Claude verifies → Codex verifies
|
|
321
|
-
Both pass → proceed
|
|
322
|
-
Either fails → combined fix contract → Worker retry
|
|
323
|
-
3 rounds without consensus → BLOCKED
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
Consensus can be combined with per-US mode for maximum rigor:
|
|
327
|
-
|
|
328
|
-
```
|
|
329
|
-
/rlp-desk run calculator --verify-mode per-us --verify-consensus
|
|
330
|
-
```
|
|
331
|
-
|
|
332
|
-
Prerequisites: Both `claude` and `codex` CLIs must be installed.
|
|
386
|
+
Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
|
|
333
387
|
|
|
334
388
|
## Project Structure
|
|
335
389
|
|
|
@@ -337,20 +391,42 @@ After `init`, your project gets this scaffold:
|
|
|
337
391
|
|
|
338
392
|
```
|
|
339
393
|
your-project/
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
394
|
+
├── .claude/
|
|
395
|
+
│ ├── settings.local.json # rlp-desk permissions (auto-added by init)
|
|
396
|
+
│ └── ralph-desk/
|
|
397
|
+
│ ├── prompts/
|
|
398
|
+
│ │ ├── <slug>.worker.prompt.md
|
|
399
|
+
│ │ └── <slug>.verifier.prompt.md
|
|
400
|
+
│ ├── context/
|
|
401
|
+
│ │ └── <slug>-latest.md
|
|
402
|
+
│ ├── memos/
|
|
403
|
+
│ │ └── <slug>-memory.md
|
|
404
|
+
│ ├── plans/
|
|
405
|
+
│ │ ├── prd-<slug>.md
|
|
406
|
+
│ │ └── test-spec-<slug>.md
|
|
407
|
+
│ └── logs/<slug>/
|
|
408
|
+
│ └── status.json
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
### Local Settings
|
|
412
|
+
|
|
413
|
+
`init` automatically adds the following permissions to `.claude/settings.local.json`:
|
|
414
|
+
|
|
415
|
+
```json
|
|
416
|
+
{
|
|
417
|
+
"permissions": {
|
|
418
|
+
"allow": [
|
|
419
|
+
"Read(.claude/ralph-desk/**)",
|
|
420
|
+
"Edit(.claude/ralph-desk/**)",
|
|
421
|
+
"Write(.claude/ralph-desk/**)"
|
|
422
|
+
]
|
|
423
|
+
}
|
|
424
|
+
}
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
**Why:** Claude Code treats `.claude/` files as sensitive and prompts for confirmation on each access, even with `--dangerously-skip-permissions`. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.
|
|
428
|
+
|
|
429
|
+
**Note:** `settings.local.json` is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.
|
|
354
430
|
|
|
355
431
|
## Example: Calculator
|
|
356
432
|
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
# Plan: 리팩토링 실행 검증 + v05-remaining 재시작
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
Engine path refactoring Phase 0~7 완료 (38 TDD 구조적 테스트 pass).
|
|
5
|
+
하지만 **실제 tmux 실행 검증**을 안 했음. 리팩토링이 실제 캠페인에서 정상 동작하는지 확인 필요.
|
|
6
|
+
|
|
7
|
+
## 검증 순서
|
|
8
|
+
|
|
9
|
+
### Step 1: 좀비 runner + sentinel 정리
|
|
10
|
+
```bash
|
|
11
|
+
ps aux | grep run_ralph_desk | grep -v grep | awk '{print $2}' | xargs kill 2>/dev/null
|
|
12
|
+
for p in $(tmux list-panes -F '#{pane_id}' | grep -v '%360'); do tmux kill-pane -t "$p" 2>/dev/null; done
|
|
13
|
+
rm -f .claude/ralph-desk/memos/v05-remaining-blocked.md
|
|
14
|
+
rm -f .claude/ralph-desk/memos/v05-remaining-complete.md
|
|
15
|
+
rm -f .claude/ralph-desk/memos/v05-remaining-done-claim.json
|
|
16
|
+
rm -f .claude/ralph-desk/memos/v05-remaining-verify-verdict.json
|
|
17
|
+
rm -f .claude/ralph-desk/memos/v05-remaining-iter-signal.json
|
|
18
|
+
rm -f .claude/ralph-desk/logs/v05-remaining/session-config.json
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### Step 2: v05-remaining 캠페인 실행 (spark worker)
|
|
22
|
+
```bash
|
|
23
|
+
LOOP_NAME="v05-remaining" ROOT="$PWD" MAX_ITER=15 \
|
|
24
|
+
WORKER_MODEL=gpt-5.3-codex-spark WORKER_ENGINE=codex \
|
|
25
|
+
WORKER_CODEX_MODEL=gpt-5.3-codex-spark WORKER_CODEX_REASONING=medium \
|
|
26
|
+
VERIFIER_MODEL=sonnet VERIFIER_ENGINE=claude \
|
|
27
|
+
VERIFY_MODE=per-us VERIFY_CONSENSUS=0 CB_THRESHOLD=6 \
|
|
28
|
+
ITER_TIMEOUT=600 DEBUG=1 WITH_SELF_VERIFICATION=1 \
|
|
29
|
+
zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
30
|
+
```
|
|
31
|
+
(run_in_background=true)
|
|
32
|
+
|
|
33
|
+
### Step 3: 검증 체크리스트
|
|
34
|
+
- [ ] Pane 3개 생성됨 (leader + worker + verifier)
|
|
35
|
+
- [ ] Worker pane에서 codex exec 실행됨 (bash trigger, dead pane 오판 없음)
|
|
36
|
+
- [ ] Worker 완료 후 heartbeat exited → signal auto-generate
|
|
37
|
+
- [ ] Verifier(sonnet) 정상 시작 + verdict 작성
|
|
38
|
+
- [ ] US-002 이상 진행 (이전 US-001은 이미 verified)
|
|
39
|
+
- [ ] 좀비 runner 없음 (ps 확인)
|
|
40
|
+
|
|
41
|
+
### Step 4: 실패 시 대응
|
|
42
|
+
- codex worker 시작 실패 → trigger script 내용 확인 + 수동 실행 테스트
|
|
43
|
+
- verifier timeout → runner log tail + pane 상태 확인
|
|
44
|
+
- BLOCKED → sentinel 원인 분석 + 수정 후 재시도
|
|
45
|
+
|
|
46
|
+
### Step 5: 성공 시
|
|
47
|
+
- 캠페인 진행 모니터링 (status 확인)
|
|
48
|
+
- 완료 대기 또는 다음 세션 handoff
|
|
49
|
+
|
|
50
|
+
## 파일
|
|
51
|
+
- `src/scripts/run_ralph_desk.zsh` — 리팩토링된 runner
|
|
52
|
+
- `~/.claude/ralph-desk/run_ralph_desk.zsh` — 로컬 동기화된 사본
|
|
53
|
+
- `.claude/ralph-desk/logs/v05-remaining/` — 캠페인 아티팩트
|
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
# CB 정합성 수정 + 분석 로그 항상 생성
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
1. CB_THRESHOLD=3에서 모델 업그레이드 경로(3~4단계) 작동 불가 — 설계 결함
|
|
6
|
+
2. campaign.jsonl, metadata.json이 `--debug` 전용 — 기본 실행에서 분석 데이터 없음
|
|
7
|
+
3. campaign.jsonl에 분석에 필요한 필드 부족 (consecutive_failures, model_upgraded 등)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 변경 1: CB_THRESHOLD 기본값 3 → 6
|
|
12
|
+
|
|
13
|
+
**파일**: `src/scripts/run_ralph_desk.zsh:75`
|
|
14
|
+
|
|
15
|
+
```diff
|
|
16
|
+
- CB_THRESHOLD="${CB_THRESHOLD:-3}"
|
|
17
|
+
+ CB_THRESHOLD="${CB_THRESHOLD:-6}"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**파일**: `src/governance.md` §8 — CB 기본값 6 반영
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 변경 2: campaign.jsonl, metadata.json 항상 생성 (debug 게이팅 제거)
|
|
25
|
+
|
|
26
|
+
### 2a. analytics 디렉토리 항상 생성
|
|
27
|
+
|
|
28
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1890)
|
|
29
|
+
|
|
30
|
+
```diff
|
|
31
|
+
- # --- Analytics directory: create only when --debug or --with-self-verification ---
|
|
32
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
33
|
+
- mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
34
|
+
- fi
|
|
35
|
+
+ # --- Analytics directory: always create (campaign.jsonl + metadata.json are always-on) ---
|
|
36
|
+
+ mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### 2b. metadata.json 항상 작성
|
|
40
|
+
|
|
41
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1915-1940)
|
|
42
|
+
|
|
43
|
+
```diff
|
|
44
|
+
- # --- metadata.json: write at campaign start ---
|
|
45
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
46
|
+
- jq -n \
|
|
47
|
+
- ...
|
|
48
|
+
- fi
|
|
49
|
+
+ # --- metadata.json: always write at campaign start (cross-project identification) ---
|
|
50
|
+
+ jq -n \
|
|
51
|
+
+ ...
|
|
52
|
+
+ --arg project_name "$(basename "$ROOT")" \
|
|
53
|
+
+ ...
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
metadata.json에 `project_name` 필드 추가 (basename of ROOT).
|
|
57
|
+
|
|
58
|
+
### 2c. write_campaign_jsonl() debug 게이팅 제거
|
|
59
|
+
|
|
60
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh:356`
|
|
61
|
+
|
|
62
|
+
```diff
|
|
63
|
+
- write_campaign_jsonl() {
|
|
64
|
+
- if (( ! DEBUG )) && (( ! WITH_SELF_VERIFICATION )); then return 0; fi
|
|
65
|
+
+ write_campaign_jsonl() {
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### 2d. campaign.jsonl 레코드에 분석 필드 추가
|
|
69
|
+
|
|
70
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh` write_campaign_jsonl()
|
|
71
|
+
|
|
72
|
+
추가 필드:
|
|
73
|
+
- `consecutive_failures`: 현재 연속 실패 카운트
|
|
74
|
+
- `model_upgraded`: 이 iteration에서 모델 업그레이드 발생 여부 (0/1)
|
|
75
|
+
- `fix_contract`: 이전 iteration의 fix contract 존재 여부 (0/1)
|
|
76
|
+
|
|
77
|
+
```diff
|
|
78
|
+
'{iter: $iter, us_id: $us_id, worker_model: $worker_model, ...}'
|
|
79
|
+
+ 에 --argjson consecutive_failures "$CONSECUTIVE_FAILURES"
|
|
80
|
+
+ --argjson model_upgraded "$_MODEL_UPGRADED"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### 2e. campaign.jsonl 버전 관리도 항상 적용
|
|
84
|
+
|
|
85
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1904-1913)
|
|
86
|
+
|
|
87
|
+
```diff
|
|
88
|
+
- # --- campaign.jsonl versioning (in analytics dir, after mkdir) ---
|
|
89
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
90
|
+
- if [[ -f "$CAMPAIGN_JSONL" ]]; then
|
|
91
|
+
+ # --- campaign.jsonl versioning (always-on) ---
|
|
92
|
+
+ if [[ -f "$CAMPAIGN_JSONL" ]]; then
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
debug.log 버전 관리는 `--debug` 게이팅 유지 (debug.log 자체가 debug 전용이므로).
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 변경 3: `--with-self-verification` SV report 경로 버그 수정
|
|
100
|
+
|
|
101
|
+
**현재 버그:** SV report 경로 불일치
|
|
102
|
+
- **쓰기** (`lib_ralph_desk.zsh:553-556`): `$LOGS_DIR/self-verification-report-NNN.md` (프로젝트 로컬)
|
|
103
|
+
- **읽기** (`lib_ralph_desk.zsh:434`): `$ANALYTICS_DIR/self-verification-report-*.md` (홈)
|
|
104
|
+
- 쓰는 곳과 읽는 곳이 다르므로 campaign-report.md에서 SV report 참조 실패
|
|
105
|
+
|
|
106
|
+
**수정:** 읽기 경로를 `$LOGS_DIR`로 통일 (프로젝트 로컬이 맞음 — iteration 아티팩트를 분석한 결과물)
|
|
107
|
+
|
|
108
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh:434`
|
|
109
|
+
|
|
110
|
+
```diff
|
|
111
|
+
- sv_report=$(ls -t "$ANALYTICS_DIR"/self-verification-report-*.md 2>/dev/null | head -1)
|
|
112
|
+
+ sv_report=$(ls -t "$LOGS_DIR"/self-verification-report-*.md 2>/dev/null | head -1)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**governance §6 문서에서도 정리:**
|
|
116
|
+
- `self-verification-report-NNN.md` → 프로젝트 로컬 `logs/<slug>/` 에 명시
|
|
117
|
+
- `self-verification-data.json` → 코드에서 생성 안 함 (governance §6에만 명시, agent-mode 전용). 문서에서 "(agent-mode only)" 주석 추가
|
|
118
|
+
|
|
119
|
+
### `--with-self-verification` 파일 위치 정리
|
|
120
|
+
|
|
121
|
+
| 파일 | 위치 | 생성 조건 | 용도 |
|
|
122
|
+
|------|------|-----------|------|
|
|
123
|
+
| `self-verification-report-NNN.md` | 프로젝트 로컬 `logs/<slug>/` | `--with-self-verification` | claude CLI로 iteration 아티팩트 분석한 서술형 리포트 |
|
|
124
|
+
| `self-verification-data.json` | 홈 `analytics/<slug>--<hash>/` | agent-mode + `--with-self-verification` | 구조화된 SV 데이터 (tmux 모드에서는 생성 안 됨) |
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 변경하지 않는 것
|
|
129
|
+
|
|
130
|
+
| 항목 | 위치 | 이유 |
|
|
131
|
+
|------|------|------|
|
|
132
|
+
| `iter-NNN.*` 전체 | 프로젝트 로컬 | 프로젝트 코드/git과 직접 연결 |
|
|
133
|
+
| `campaign-report.md` | 프로젝트 로컬 | git diff stat 참조 |
|
|
134
|
+
| `cost-log.jsonl` | 프로젝트 로컬 | campaign-report가 참조 |
|
|
135
|
+
| `runtime/` | 프로젝트 로컬 | 실시간 운영 데이터 |
|
|
136
|
+
| `baseline.log` | 프로젝트 로컬 | 프로젝트 기준점 |
|
|
137
|
+
| `SV report` | 프로젝트 로컬 | iteration 아티팩트 분석 결과물 |
|
|
138
|
+
| `debug.log` | 홈, `--debug` 전용 | verbose, 필요할 때만 |
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## 수정 대상 파일
|
|
143
|
+
|
|
144
|
+
| 파일 | 변경 |
|
|
145
|
+
|------|------|
|
|
146
|
+
| `src/scripts/run_ralph_desk.zsh` | CB default 6, analytics dir 항상 생성, metadata.json 항상 쓰기 + project_name, campaign.jsonl 버전 관리 항상 적용 |
|
|
147
|
+
| `src/scripts/lib_ralph_desk.zsh` | write_campaign_jsonl() debug 게이팅 제거 + 필드 추가, SV report 읽기 경로 수정 |
|
|
148
|
+
| `src/governance.md` | §8 CB 기본값 6, §6 파일 구조 + SV 위치 정리 |
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Verification (TDD)
|
|
153
|
+
|
|
154
|
+
모든 변경은 **테스트 먼저 작성 → RED 확인 → 구현 → GREEN 확인** 순서로 진행.
|
|
155
|
+
|
|
156
|
+
테스트 파일: `tests/test_cb_and_analytics.sh`
|
|
157
|
+
|
|
158
|
+
### 테스트 목록 (RED → GREEN 순서대로)
|
|
159
|
+
|
|
160
|
+
**변경 1: CB_THRESHOLD**
|
|
161
|
+
```bash
|
|
162
|
+
# T1: CB 기본값이 6인지 확인
|
|
163
|
+
test_cb_default_is_6() {
|
|
164
|
+
source src/scripts/run_ralph_desk.zsh --dry-run 2>/dev/null # 또는 grep
|
|
165
|
+
assert CB_THRESHOLD == 6
|
|
166
|
+
}
|
|
167
|
+
|
|
168
|
+
# T2: consensus 모드에서 effective CB가 12(6*2)인지 확인
|
|
169
|
+
test_cb_consensus_doubles_to_12() {
|
|
170
|
+
VERIFY_CONSENSUS=1 source ...
|
|
171
|
+
assert EFFECTIVE_CB_THRESHOLD == 12
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**변경 2: campaign.jsonl 항상 생성**
|
|
176
|
+
```bash
|
|
177
|
+
# T3: --debug 없이 analytics 디렉토리 생성 확인
|
|
178
|
+
test_analytics_dir_created_without_debug() {
|
|
179
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
180
|
+
# run init section → assert mkdir -p "$ANALYTICS_DIR" called
|
|
181
|
+
assert -d "$ANALYTICS_DIR"
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
# T4: --debug 없이 metadata.json 생성 확인
|
|
185
|
+
test_metadata_written_without_debug() {
|
|
186
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
187
|
+
# run metadata write section
|
|
188
|
+
assert -f "$METADATA_FILE"
|
|
189
|
+
}
|
|
190
|
+
|
|
191
|
+
# T5: metadata.json에 project_name 필드 존재
|
|
192
|
+
test_metadata_has_project_name() {
|
|
193
|
+
assert jq -r '.project_name' "$METADATA_FILE" != "null"
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
# T6: write_campaign_jsonl()이 --debug 없이 쓰는지 확인
|
|
197
|
+
test_campaign_jsonl_written_without_debug() {
|
|
198
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
199
|
+
write_campaign_jsonl 1 "US-001" "pass"
|
|
200
|
+
assert -f "$CAMPAIGN_JSONL"
|
|
201
|
+
}
|
|
202
|
+
|
|
203
|
+
# T7: campaign.jsonl 레코드에 consecutive_failures 필드 존재
|
|
204
|
+
test_campaign_jsonl_has_consecutive_failures() {
|
|
205
|
+
CONSECUTIVE_FAILURES=2
|
|
206
|
+
write_campaign_jsonl 1 "US-001" "fail"
|
|
207
|
+
assert jq -r '.consecutive_failures' last_line == 2
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
# T8: campaign.jsonl 레코드에 model_upgraded 필드 존재
|
|
211
|
+
test_campaign_jsonl_has_model_upgraded() {
|
|
212
|
+
_MODEL_UPGRADED=1
|
|
213
|
+
write_campaign_jsonl 1 "US-001" "fail"
|
|
214
|
+
assert jq -r '.model_upgraded' last_line == 1
|
|
215
|
+
}
|
|
216
|
+
|
|
217
|
+
# T9: campaign.jsonl 재실행 시 버전 관리 (--debug 없이)
|
|
218
|
+
test_campaign_jsonl_versioned_without_debug() {
|
|
219
|
+
echo '{}' > "$CAMPAIGN_JSONL"
|
|
220
|
+
# run versioning section
|
|
221
|
+
assert -f "${CAMPAIGN_JSONL%.jsonl}-v1.jsonl"
|
|
222
|
+
}
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**변경 3: SV report 경로 수정**
|
|
226
|
+
```bash
|
|
227
|
+
# T10: generate_campaign_report()가 $LOGS_DIR에서 SV report 찾는지 확인
|
|
228
|
+
test_sv_report_read_from_logs_dir() {
|
|
229
|
+
WITH_SELF_VERIFICATION=1
|
|
230
|
+
touch "$LOGS_DIR/self-verification-report-001.md"
|
|
231
|
+
# generate_campaign_report 호출
|
|
232
|
+
# campaign-report.md 안에 $LOGS_DIR 경로의 SV report 참조 확인
|
|
233
|
+
assert grep "self-verification-report" "$LOGS_DIR/campaign-report.md"
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### 실행 순서
|
|
238
|
+
|
|
239
|
+
1. 테스트 파일 작성 (`tests/test_cb_and_analytics.sh`)
|
|
240
|
+
2. 전체 RED 확인 (모든 테스트 fail)
|
|
241
|
+
3. 변경 1 구현 → T1, T2 GREEN
|
|
242
|
+
4. 변경 2 구현 → T3~T9 GREEN
|
|
243
|
+
5. 변경 3 구현 → T10 GREEN
|
|
244
|
+
6. governance 문서 업데이트
|
|
245
|
+
7. 기존 테스트 회귀 확인: `bash tests/test_us005_tmux_docs.sh`
|