@ai-dev-methodologies/rlp-desk 0.5.0 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
# CB 정합성 수정 + 분석 로그 항상 생성
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
1. CB_THRESHOLD=3에서 모델 업그레이드 경로(3~4단계) 작동 불가 — 설계 결함
|
|
6
|
+
2. campaign.jsonl, metadata.json이 `--debug` 전용 — 기본 실행에서 분석 데이터 없음
|
|
7
|
+
3. campaign.jsonl에 분석에 필요한 필드 부족 (consecutive_failures, model_upgraded 등)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 변경 1: CB_THRESHOLD 기본값 3 → 6
|
|
12
|
+
|
|
13
|
+
**파일**: `src/scripts/run_ralph_desk.zsh:75`
|
|
14
|
+
|
|
15
|
+
```diff
|
|
16
|
+
- CB_THRESHOLD="${CB_THRESHOLD:-3}"
|
|
17
|
+
+ CB_THRESHOLD="${CB_THRESHOLD:-6}"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**파일**: `src/governance.md` §8 — CB 기본값 6 반영
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 변경 2: campaign.jsonl, metadata.json 항상 생성 (debug 게이팅 제거)
|
|
25
|
+
|
|
26
|
+
### 2a. analytics 디렉토리 항상 생성
|
|
27
|
+
|
|
28
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1890)
|
|
29
|
+
|
|
30
|
+
```diff
|
|
31
|
+
- # --- Analytics directory: create only when --debug or --with-self-verification ---
|
|
32
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
33
|
+
- mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
34
|
+
- fi
|
|
35
|
+
+ # --- Analytics directory: always create (campaign.jsonl + metadata.json are always-on) ---
|
|
36
|
+
+ mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### 2b. metadata.json 항상 작성
|
|
40
|
+
|
|
41
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1915-1940)
|
|
42
|
+
|
|
43
|
+
```diff
|
|
44
|
+
- # --- metadata.json: write at campaign start ---
|
|
45
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
46
|
+
- jq -n \
|
|
47
|
+
- ...
|
|
48
|
+
- fi
|
|
49
|
+
+ # --- metadata.json: always write at campaign start (cross-project identification) ---
|
|
50
|
+
+ jq -n \
|
|
51
|
+
+ ...
|
|
52
|
+
+ --arg project_name "$(basename "$ROOT")" \
|
|
53
|
+
+ ...
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
metadata.json에 `project_name` 필드 추가 (basename of ROOT).
|
|
57
|
+
|
|
58
|
+
### 2c. write_campaign_jsonl() debug 게이팅 제거
|
|
59
|
+
|
|
60
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh:356`
|
|
61
|
+
|
|
62
|
+
```diff
|
|
63
|
+
- write_campaign_jsonl() {
|
|
64
|
+
- if (( ! DEBUG )) && (( ! WITH_SELF_VERIFICATION )); then return 0; fi
|
|
65
|
+
+ write_campaign_jsonl() {
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### 2d. campaign.jsonl 레코드에 분석 필드 추가
|
|
69
|
+
|
|
70
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh` write_campaign_jsonl()
|
|
71
|
+
|
|
72
|
+
추가 필드:
|
|
73
|
+
- `consecutive_failures`: 현재 연속 실패 카운트
|
|
74
|
+
- `model_upgraded`: 이 iteration에서 모델 업그레이드 발생 여부 (0/1)
|
|
75
|
+
- `fix_contract`: 이전 iteration의 fix contract 존재 여부 (0/1)
|
|
76
|
+
|
|
77
|
+
```diff
|
|
78
|
+
'{iter: $iter, us_id: $us_id, worker_model: $worker_model, ...}'
|
|
79
|
+
+ 에 --argjson consecutive_failures "$CONSECUTIVE_FAILURES"
|
|
80
|
+
+ --argjson model_upgraded "$_MODEL_UPGRADED"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### 2e. campaign.jsonl 버전 관리도 항상 적용
|
|
84
|
+
|
|
85
|
+
**파일**: `src/scripts/run_ralph_desk.zsh` (~L1904-1913)
|
|
86
|
+
|
|
87
|
+
```diff
|
|
88
|
+
- # --- campaign.jsonl versioning (in analytics dir, after mkdir) ---
|
|
89
|
+
- if (( DEBUG )) || (( WITH_SELF_VERIFICATION )); then
|
|
90
|
+
- if [[ -f "$CAMPAIGN_JSONL" ]]; then
|
|
91
|
+
+ # --- campaign.jsonl versioning (always-on) ---
|
|
92
|
+
+ if [[ -f "$CAMPAIGN_JSONL" ]]; then
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
debug.log 버전 관리는 `--debug` 게이팅 유지 (debug.log 자체가 debug 전용이므로).
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 변경 3: `--with-self-verification` SV report 경로 버그 수정
|
|
100
|
+
|
|
101
|
+
**현재 버그:** SV report 경로 불일치
|
|
102
|
+
- **쓰기** (`lib_ralph_desk.zsh:553-556`): `$LOGS_DIR/self-verification-report-NNN.md` (프로젝트 로컬)
|
|
103
|
+
- **읽기** (`lib_ralph_desk.zsh:434`): `$ANALYTICS_DIR/self-verification-report-*.md` (홈)
|
|
104
|
+
- 쓰는 곳과 읽는 곳이 다르므로 campaign-report.md에서 SV report 참조 실패
|
|
105
|
+
|
|
106
|
+
**수정:** 읽기 경로를 `$LOGS_DIR`로 통일 (프로젝트 로컬이 맞음 — iteration 아티팩트를 분석한 결과물)
|
|
107
|
+
|
|
108
|
+
**파일**: `src/scripts/lib_ralph_desk.zsh:434`
|
|
109
|
+
|
|
110
|
+
```diff
|
|
111
|
+
- sv_report=$(ls -t "$ANALYTICS_DIR"/self-verification-report-*.md 2>/dev/null | head -1)
|
|
112
|
+
+ sv_report=$(ls -t "$LOGS_DIR"/self-verification-report-*.md 2>/dev/null | head -1)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**governance §6 문서에서도 정리:**
|
|
116
|
+
- `self-verification-report-NNN.md` → 프로젝트 로컬 `logs/<slug>/` 에 명시
|
|
117
|
+
- `self-verification-data.json` → 코드에서 생성 안 함 (governance §6에만 명시, agent-mode 전용). 문서에서 "(agent-mode only)" 주석 추가
|
|
118
|
+
|
|
119
|
+
### `--with-self-verification` 파일 위치 정리
|
|
120
|
+
|
|
121
|
+
| 파일 | 위치 | 생성 조건 | 용도 |
|
|
122
|
+
|------|------|-----------|------|
|
|
123
|
+
| `self-verification-report-NNN.md` | 프로젝트 로컬 `logs/<slug>/` | `--with-self-verification` | claude CLI로 iteration 아티팩트 분석한 서술형 리포트 |
|
|
124
|
+
| `self-verification-data.json` | 홈 `analytics/<slug>--<hash>/` | agent-mode + `--with-self-verification` | 구조화된 SV 데이터 (tmux 모드에서는 생성 안 됨) |
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 변경하지 않는 것
|
|
129
|
+
|
|
130
|
+
| 항목 | 위치 | 이유 |
|
|
131
|
+
|------|------|------|
|
|
132
|
+
| `iter-NNN.*` 전체 | 프로젝트 로컬 | 프로젝트 코드/git과 직접 연결 |
|
|
133
|
+
| `campaign-report.md` | 프로젝트 로컬 | git diff stat 참조 |
|
|
134
|
+
| `cost-log.jsonl` | 프로젝트 로컬 | campaign-report가 참조 |
|
|
135
|
+
| `runtime/` | 프로젝트 로컬 | 실시간 운영 데이터 |
|
|
136
|
+
| `baseline.log` | 프로젝트 로컬 | 프로젝트 기준점 |
|
|
137
|
+
| `SV report` | 프로젝트 로컬 | iteration 아티팩트 분석 결과물 |
|
|
138
|
+
| `debug.log` | 홈, `--debug` 전용 | verbose, 필요할 때만 |
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## 수정 대상 파일
|
|
143
|
+
|
|
144
|
+
| 파일 | 변경 |
|
|
145
|
+
|------|------|
|
|
146
|
+
| `src/scripts/run_ralph_desk.zsh` | CB default 6, analytics dir 항상 생성, metadata.json 항상 쓰기 + project_name, campaign.jsonl 버전 관리 항상 적용 |
|
|
147
|
+
| `src/scripts/lib_ralph_desk.zsh` | write_campaign_jsonl() debug 게이팅 제거 + 필드 추가, SV report 읽기 경로 수정 |
|
|
148
|
+
| `src/governance.md` | §8 CB 기본값 6, §6 파일 구조 + SV 위치 정리 |
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Verification (TDD)
|
|
153
|
+
|
|
154
|
+
모든 변경은 **테스트 먼저 작성 → RED 확인 → 구현 → GREEN 확인** 순서로 진행.
|
|
155
|
+
|
|
156
|
+
테스트 파일: `tests/test_cb_and_analytics.sh`
|
|
157
|
+
|
|
158
|
+
### 테스트 목록 (RED → GREEN 순서대로)
|
|
159
|
+
|
|
160
|
+
**변경 1: CB_THRESHOLD**
|
|
161
|
+
```bash
|
|
162
|
+
# T1: CB 기본값이 6인지 확인
|
|
163
|
+
test_cb_default_is_6() {
|
|
164
|
+
source src/scripts/run_ralph_desk.zsh --dry-run 2>/dev/null # 또는 grep
|
|
165
|
+
assert CB_THRESHOLD == 6
|
|
166
|
+
}
|
|
167
|
+
|
|
168
|
+
# T2: consensus 모드에서 effective CB가 12(6*2)인지 확인
|
|
169
|
+
test_cb_consensus_doubles_to_12() {
|
|
170
|
+
VERIFY_CONSENSUS=1 source ...
|
|
171
|
+
assert EFFECTIVE_CB_THRESHOLD == 12
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**변경 2: campaign.jsonl 항상 생성**
|
|
176
|
+
```bash
|
|
177
|
+
# T3: --debug 없이 analytics 디렉토리 생성 확인
|
|
178
|
+
test_analytics_dir_created_without_debug() {
|
|
179
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
180
|
+
# run init section → assert mkdir -p "$ANALYTICS_DIR" called
|
|
181
|
+
assert -d "$ANALYTICS_DIR"
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
# T4: --debug 없이 metadata.json 생성 확인
|
|
185
|
+
test_metadata_written_without_debug() {
|
|
186
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
187
|
+
# run metadata write section
|
|
188
|
+
assert -f "$METADATA_FILE"
|
|
189
|
+
}
|
|
190
|
+
|
|
191
|
+
# T5: metadata.json에 project_name 필드 존재
|
|
192
|
+
test_metadata_has_project_name() {
|
|
193
|
+
assert jq -r '.project_name' "$METADATA_FILE" != "null"
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
# T6: write_campaign_jsonl()이 --debug 없이 쓰는지 확인
|
|
197
|
+
test_campaign_jsonl_written_without_debug() {
|
|
198
|
+
DEBUG=0 WITH_SELF_VERIFICATION=0
|
|
199
|
+
write_campaign_jsonl 1 "US-001" "pass"
|
|
200
|
+
assert -f "$CAMPAIGN_JSONL"
|
|
201
|
+
}
|
|
202
|
+
|
|
203
|
+
# T7: campaign.jsonl 레코드에 consecutive_failures 필드 존재
|
|
204
|
+
test_campaign_jsonl_has_consecutive_failures() {
|
|
205
|
+
CONSECUTIVE_FAILURES=2
|
|
206
|
+
write_campaign_jsonl 1 "US-001" "fail"
|
|
207
|
+
assert jq -r '.consecutive_failures' last_line == 2
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
# T8: campaign.jsonl 레코드에 model_upgraded 필드 존재
|
|
211
|
+
test_campaign_jsonl_has_model_upgraded() {
|
|
212
|
+
_MODEL_UPGRADED=1
|
|
213
|
+
write_campaign_jsonl 1 "US-001" "fail"
|
|
214
|
+
assert jq -r '.model_upgraded' last_line == 1
|
|
215
|
+
}
|
|
216
|
+
|
|
217
|
+
# T9: campaign.jsonl 재실행 시 버전 관리 (--debug 없이)
|
|
218
|
+
test_campaign_jsonl_versioned_without_debug() {
|
|
219
|
+
echo '{}' > "$CAMPAIGN_JSONL"
|
|
220
|
+
# run versioning section
|
|
221
|
+
assert -f "${CAMPAIGN_JSONL%.jsonl}-v1.jsonl"
|
|
222
|
+
}
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**변경 3: SV report 경로 수정**
|
|
226
|
+
```bash
|
|
227
|
+
# T10: generate_campaign_report()가 $LOGS_DIR에서 SV report 찾는지 확인
|
|
228
|
+
test_sv_report_read_from_logs_dir() {
|
|
229
|
+
WITH_SELF_VERIFICATION=1
|
|
230
|
+
touch "$LOGS_DIR/self-verification-report-001.md"
|
|
231
|
+
# generate_campaign_report 호출
|
|
232
|
+
# campaign-report.md 안에 $LOGS_DIR 경로의 SV report 참조 확인
|
|
233
|
+
assert grep "self-verification-report" "$LOGS_DIR/campaign-report.md"
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### 실행 순서
|
|
238
|
+
|
|
239
|
+
1. 테스트 파일 작성 (`tests/test_cb_and_analytics.sh`)
|
|
240
|
+
2. 전체 RED 확인 (모든 테스트 fail)
|
|
241
|
+
3. 변경 1 구현 → T1, T2 GREEN
|
|
242
|
+
4. 변경 2 구현 → T3~T9 GREEN
|
|
243
|
+
5. 변경 3 구현 → T10 GREEN
|
|
244
|
+
6. governance 문서 업데이트
|
|
245
|
+
7. 기존 테스트 회귀 확인: `bash tests/test_us005_tmux_docs.sh`
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.5.
|
|
3
|
+
"version": "0.5.1",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
package/src/governance.md
CHANGED
|
@@ -185,7 +185,7 @@ Verification occurs at two boundaries, not as a single final event.
|
|
|
185
185
|
### Checkpoint 2: Release Readiness (us_id=ALL)
|
|
186
186
|
- Trigger: all individual US pass Checkpoint 1 → Worker signals verify with us_id = "ALL"
|
|
187
187
|
- Scope: all AC + L2 integration (if applicable) + L3 E2E Simulation + L4 deploy (if applicable) + mutation score (if CRITICAL, when mutation testing tool is configured in test-spec)
|
|
188
|
-
- On fail: fix loop; escalation to user if
|
|
188
|
+
- On fail: fix loop; escalation to user if 6 consecutive failures (default cb_threshold)
|
|
189
189
|
|
|
190
190
|
### Relationship to Existing Flow
|
|
191
191
|
- Checkpoint 1 = existing per-US verify (§7a). No change.
|
|
@@ -574,7 +574,7 @@ In tmux mode: Leader writes `<slug>-escalation.md` with the report and sets BLOC
|
|
|
574
574
|
|-----------|---------|
|
|
575
575
|
| context-latest.md unchanged for 3 consecutive iterations | BLOCKED |
|
|
576
576
|
| Same acceptance criterion fails 2 consecutive iterations | Upgrade model, retry once (Agent mode only; tmux: same model retry); if still failing → Architecture Escalation (§7¾) → BLOCKED |
|
|
577
|
-
| `cb_threshold` consecutive **fail** verdicts on `cb_threshold` unique criterion IDs | Upgrade to opus, retry once; if still failing → BLOCKED (adjustable via `--cb-threshold`) |
|
|
577
|
+
| `cb_threshold` (default: 6) consecutive **fail** verdicts on `cb_threshold` unique criterion IDs | Upgrade to opus, retry once; if still failing → BLOCKED (adjustable via `--cb-threshold`) |
|
|
578
578
|
| max_iter reached | TIMEOUT (report to user) |
|
|
579
579
|
|
|
580
580
|
The Leader tracks `consecutive_failures` in `status.json`:
|
|
@@ -351,9 +351,8 @@ write_cost_log() {
|
|
|
351
351
|
echo '{"iteration":'"$iter"',"estimated_tokens":'"$estimated_tokens"',"token_source":"estimated","prompt_bytes":'"$prompt_bytes"',"claim_bytes":'"$claim_bytes"',"verdict_bytes":'"$verdict_bytes"',"worker_start_time":"'"$worker_start_time"'","worker_end_time":"'"$worker_end_time"'","worker_duration_s":'"$worker_duration_s"',"verifier_start_time":"'"$verifier_start_time"'","verifier_end_time":"'"$verifier_end_time"'","verifier_duration_s":'"$verifier_duration_s"''"$consensus_fields"',"timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' >> "$COST_LOG"
|
|
352
352
|
}
|
|
353
353
|
|
|
354
|
-
# --- Analytics: write per-iteration structured data to campaign.jsonl ---
|
|
354
|
+
# --- Analytics: write per-iteration structured data to campaign.jsonl (always-on) ---
|
|
355
355
|
write_campaign_jsonl() {
|
|
356
|
-
if (( ! DEBUG )) && (( ! WITH_SELF_VERIFICATION )); then return 0; fi
|
|
357
356
|
local iter="$1"
|
|
358
357
|
local us_id="${2:-unknown}"
|
|
359
358
|
local verdict="${3:-unknown}"
|
|
@@ -376,12 +375,14 @@ write_campaign_jsonl() {
|
|
|
376
375
|
--arg claude_verdict "${CLAUDE_VERDICT:-$verdict}" \
|
|
377
376
|
--arg codex_verdict "${CODEX_VERDICT:-N/A}" \
|
|
378
377
|
--argjson consensus "$VERIFY_CONSENSUS" \
|
|
378
|
+
--argjson consecutive_failures "$CONSECUTIVE_FAILURES" \
|
|
379
|
+
--argjson model_upgraded "${_MODEL_UPGRADED:-0}" \
|
|
379
380
|
--argjson duration_worker_s "$worker_duration_s" \
|
|
380
381
|
--argjson duration_verifier_s "$verifier_duration_s" \
|
|
381
382
|
--arg project_root "$ROOT" \
|
|
382
383
|
--arg slug "$SLUG" \
|
|
383
384
|
--arg timestamp "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
384
|
-
'{iter: $iter, us_id: $us_id, worker_model: $worker_model, worker_engine: $worker_engine, verifier_engine: $verifier_engine, claude_verdict: $claude_verdict, codex_verdict: $codex_verdict, consensus: $consensus, duration_worker_s: $duration_worker_s, duration_verifier_s: $duration_verifier_s, project_root: $project_root, slug: $slug, timestamp: $timestamp}' \
|
|
385
|
+
'{iter: $iter, us_id: $us_id, worker_model: $worker_model, worker_engine: $worker_engine, verifier_engine: $verifier_engine, claude_verdict: $claude_verdict, codex_verdict: $codex_verdict, consensus: $consensus, consecutive_failures: $consecutive_failures, model_upgraded: $model_upgraded, duration_worker_s: $duration_worker_s, duration_verifier_s: $duration_verifier_s, project_root: $project_root, slug: $slug, timestamp: $timestamp}' \
|
|
385
386
|
>> "$CAMPAIGN_JSONL"
|
|
386
387
|
}
|
|
387
388
|
|
|
@@ -431,7 +432,7 @@ ${untracked}"
|
|
|
431
432
|
local sv_summary=""
|
|
432
433
|
if (( WITH_SELF_VERIFICATION )); then
|
|
433
434
|
local sv_report
|
|
434
|
-
sv_report=$(ls -t "$
|
|
435
|
+
sv_report=$(ls -t "$LOGS_DIR"/self-verification-report-*.md 2>/dev/null | head -1)
|
|
435
436
|
if [[ -n "$sv_report" ]]; then
|
|
436
437
|
sv_summary="See: $sv_report"
|
|
437
438
|
else
|
|
@@ -72,7 +72,7 @@ VERIFY_CONSENSUS="${VERIFY_CONSENSUS:-0}" # 0|1
|
|
|
72
72
|
FINAL_CONSENSUS="${FINAL_CONSENSUS:-0}" # 0|1 — consensus for final ALL verify only (independent of VERIFY_CONSENSUS)
|
|
73
73
|
CONSENSUS_SCOPE="${CONSENSUS_SCOPE:-all}" # all|final-only
|
|
74
74
|
CONSENSUS_FAIL_FAST="${CONSENSUS_FAIL_FAST:-0}" # 0|1 — skip second verifier if first fails
|
|
75
|
-
CB_THRESHOLD="${CB_THRESHOLD:-
|
|
75
|
+
CB_THRESHOLD="${CB_THRESHOLD:-6}" # consecutive failures before BLOCKED (default: 6)
|
|
76
76
|
# Effective CB threshold: doubled when consensus mode active (AC2 auto-double)
|
|
77
77
|
if [[ "${VERIFY_CONSENSUS:-0}" = "1" ]]; then
|
|
78
78
|
EFFECTIVE_CB_THRESHOLD=$(( CB_THRESHOLD * 2 ))
|
|
@@ -192,7 +192,7 @@ launch_worker_codex() {
|
|
|
192
192
|
|
|
193
193
|
log " Launching Worker codex via trigger script in pane $pane_id..."
|
|
194
194
|
paste_to_pane "$pane_id" "bash $trigger_file"
|
|
195
|
-
tmux send-keys -t "$pane_id"
|
|
195
|
+
tmux send-keys -t "$pane_id" C-m
|
|
196
196
|
log_debug "Worker codex trigger sent: $trigger_file"
|
|
197
197
|
sleep 3 # brief wait for codex to start
|
|
198
198
|
return 0
|
|
@@ -211,7 +211,7 @@ launch_worker_claude() {
|
|
|
211
211
|
|
|
212
212
|
log " Launching Worker claude in pane $pane_id..."
|
|
213
213
|
paste_to_pane "$pane_id" "$worker_launch"
|
|
214
|
-
tmux send-keys -t "$pane_id"
|
|
214
|
+
tmux send-keys -t "$pane_id" C-m
|
|
215
215
|
|
|
216
216
|
# Wait for claude TUI to be ready
|
|
217
217
|
if ! wait_for_pane_ready "$pane_id" 30; then
|
|
@@ -223,7 +223,7 @@ launch_worker_claude() {
|
|
|
223
223
|
sleep 3
|
|
224
224
|
local worker_instruction="Read and execute the instructions in $prompt_file"
|
|
225
225
|
paste_to_pane "$pane_id" "$worker_instruction"
|
|
226
|
-
tmux send-keys -t "$pane_id"
|
|
226
|
+
tmux send-keys -t "$pane_id" C-m
|
|
227
227
|
log_debug "Worker instruction sent directly (${#worker_instruction} chars)"
|
|
228
228
|
|
|
229
229
|
# 15-iteration submit loop — verify claude started working
|
|
@@ -244,7 +244,7 @@ launch_worker_claude() {
|
|
|
244
244
|
sleep 0.2
|
|
245
245
|
paste_to_pane "$pane_id" "$worker_instruction"
|
|
246
246
|
sleep 0.15
|
|
247
|
-
tmux send-keys -t "$pane_id"
|
|
247
|
+
tmux send-keys -t "$pane_id" C-m
|
|
248
248
|
sleep 1
|
|
249
249
|
fi
|
|
250
250
|
tmux send-keys -t "$pane_id" C-m 2>/dev/null
|
|
@@ -259,15 +259,15 @@ launch_worker_claude() {
|
|
|
259
259
|
log_debug "[GOV] iter=$iter worker_instruction_failed=true attempts=15 action=restart_claude"
|
|
260
260
|
tmux send-keys -t "$pane_id" C-c 2>/dev/null
|
|
261
261
|
sleep 0.5
|
|
262
|
-
tmux send-keys -t "$pane_id" "/exit"
|
|
262
|
+
tmux send-keys -t "$pane_id" "/exit" C-m 2>/dev/null
|
|
263
263
|
sleep 2
|
|
264
264
|
wait_for_pane_ready "$pane_id" 10 2>/dev/null || true
|
|
265
265
|
paste_to_pane "$pane_id" "$worker_launch"
|
|
266
|
-
tmux send-keys -t "$pane_id"
|
|
266
|
+
tmux send-keys -t "$pane_id" C-m
|
|
267
267
|
if wait_for_pane_ready "$pane_id" 30; then
|
|
268
268
|
sleep 3
|
|
269
269
|
paste_to_pane "$pane_id" "$worker_instruction"
|
|
270
|
-
tmux send-keys -t "$pane_id"
|
|
270
|
+
tmux send-keys -t "$pane_id" C-m
|
|
271
271
|
log " Worker restarted and instruction re-sent"
|
|
272
272
|
log_debug "[FLOW] iter=$iter worker_restart_recovery=success"
|
|
273
273
|
else
|
|
@@ -290,7 +290,7 @@ launch_verifier_codex() {
|
|
|
290
290
|
|
|
291
291
|
log " Launching Verifier codex in pane $pane_id..."
|
|
292
292
|
paste_to_pane "$pane_id" "$verifier_launch"
|
|
293
|
-
tmux send-keys -t "$pane_id"
|
|
293
|
+
tmux send-keys -t "$pane_id" C-m
|
|
294
294
|
sleep 3
|
|
295
295
|
return 0
|
|
296
296
|
}
|
|
@@ -306,7 +306,7 @@ launch_verifier_claude() {
|
|
|
306
306
|
|
|
307
307
|
log " Launching Verifier claude in pane $pane_id..."
|
|
308
308
|
paste_to_pane "$pane_id" "$verifier_launch"
|
|
309
|
-
tmux send-keys -t "$pane_id"
|
|
309
|
+
tmux send-keys -t "$pane_id" C-m
|
|
310
310
|
|
|
311
311
|
if ! wait_for_pane_ready "$pane_id" 30; then
|
|
312
312
|
log_error "Verifier failed to start"
|
|
@@ -316,7 +316,7 @@ launch_verifier_claude() {
|
|
|
316
316
|
sleep 3
|
|
317
317
|
local verifier_instruction="Read and execute the instructions in $prompt_file"
|
|
318
318
|
paste_to_pane "$pane_id" "$verifier_instruction"
|
|
319
|
-
tmux send-keys -t "$pane_id"
|
|
319
|
+
tmux send-keys -t "$pane_id" C-m
|
|
320
320
|
log_debug "Verifier instruction sent directly"
|
|
321
321
|
|
|
322
322
|
# Submit loop — verify verifier started working
|
|
@@ -334,7 +334,7 @@ launch_verifier_claude() {
|
|
|
334
334
|
tmux send-keys -t "$pane_id" C-u 2>/dev/null
|
|
335
335
|
sleep 0.1
|
|
336
336
|
paste_to_pane "$pane_id" "$verifier_instruction"
|
|
337
|
-
tmux send-keys -t "$pane_id"
|
|
337
|
+
tmux send-keys -t "$pane_id" C-m
|
|
338
338
|
fi
|
|
339
339
|
tmux send-keys -t "$pane_id" C-m 2>/dev/null
|
|
340
340
|
sleep 0.3
|
|
@@ -663,13 +663,13 @@ safe_send_keys() {
|
|
|
663
663
|
# Auto-approve permission prompts ("Do you want to create/overwrite X?")
|
|
664
664
|
if echo "$initial_capture" | grep -q "Do you want to" 2>/dev/null; then
|
|
665
665
|
log_debug " Permission prompt detected, auto-approving"
|
|
666
|
-
tmux send-keys -t "$pane_id"
|
|
666
|
+
tmux send-keys -t "$pane_id" C-m
|
|
667
667
|
sleep 0.3
|
|
668
668
|
fi
|
|
669
669
|
# Auto-dismiss codex update prompt (select Skip)
|
|
670
670
|
if echo "$initial_capture" | grep -qi "new version\|update.*codex\|codex.*update" 2>/dev/null; then
|
|
671
671
|
log_debug " Codex update prompt detected, selecting Skip"
|
|
672
|
-
tmux send-keys -t "$pane_id" "2"
|
|
672
|
+
tmux send-keys -t "$pane_id" "2" C-m
|
|
673
673
|
sleep 0.2
|
|
674
674
|
fi
|
|
675
675
|
# Send text via buffer paste (reliable for long strings)
|
|
@@ -761,9 +761,9 @@ wait_for_pane_ready() {
|
|
|
761
761
|
# Auto-dismiss trust prompt (tmux pattern: paneHasTrustPrompt)
|
|
762
762
|
if echo "$captured" | grep -q "Do you trust" 2>/dev/null; then
|
|
763
763
|
log " Trust prompt detected, auto-dismissing..."
|
|
764
|
-
tmux send-keys -t "$pane_id"
|
|
764
|
+
tmux send-keys -t "$pane_id" C-m
|
|
765
765
|
sleep 0.12
|
|
766
|
-
tmux send-keys -t "$pane_id"
|
|
766
|
+
tmux send-keys -t "$pane_id" C-m
|
|
767
767
|
sleep 2
|
|
768
768
|
continue
|
|
769
769
|
fi
|
|
@@ -771,7 +771,7 @@ wait_for_pane_ready() {
|
|
|
771
771
|
# Auto-approve permission prompts ("Do you want to create/overwrite X?")
|
|
772
772
|
if echo "$captured" | grep -q "Do you want to" 2>/dev/null; then
|
|
773
773
|
log " Permission prompt detected, auto-approving..."
|
|
774
|
-
tmux send-keys -t "$pane_id"
|
|
774
|
+
tmux send-keys -t "$pane_id" C-m
|
|
775
775
|
sleep 0.5
|
|
776
776
|
continue
|
|
777
777
|
fi
|
|
@@ -779,7 +779,7 @@ wait_for_pane_ready() {
|
|
|
779
779
|
# Auto-dismiss codex update prompt (select Skip = option 2)
|
|
780
780
|
if echo "$captured" | grep -qi "new version\|update.*codex\|codex.*update" 2>/dev/null; then
|
|
781
781
|
log " Codex update prompt detected, selecting Skip..."
|
|
782
|
-
tmux send-keys -t "$pane_id" "2"
|
|
782
|
+
tmux send-keys -t "$pane_id" "2" C-m
|
|
783
783
|
sleep 0.5
|
|
784
784
|
continue
|
|
785
785
|
fi
|
|
@@ -917,7 +917,7 @@ restart_worker() {
|
|
|
917
917
|
|
|
918
918
|
# Kill existing claude, wait for shell prompt
|
|
919
919
|
tmux send-keys -t "$pane_id" C-c 2>/dev/null
|
|
920
|
-
tmux send-keys -t "$pane_id" "/exit"
|
|
920
|
+
tmux send-keys -t "$pane_id" "/exit" C-m 2>/dev/null
|
|
921
921
|
sleep 2
|
|
922
922
|
|
|
923
923
|
# Re-launch worker (tmux interactive pattern)
|
|
@@ -1205,11 +1205,11 @@ cleanup() {
|
|
|
1205
1205
|
log_debug "cleanup: WORKER_PANE=${WORKER_PANE:-unset} VERIFIER_PANE=${VERIFIER_PANE:-unset}"
|
|
1206
1206
|
if [[ -n "${WORKER_PANE:-}" ]]; then
|
|
1207
1207
|
tmux send-keys -t "$WORKER_PANE" C-c 2>/dev/null
|
|
1208
|
-
tmux send-keys -t "$WORKER_PANE" "/exit"
|
|
1208
|
+
tmux send-keys -t "$WORKER_PANE" "/exit" C-m 2>/dev/null
|
|
1209
1209
|
fi
|
|
1210
1210
|
if [[ -n "${VERIFIER_PANE:-}" ]]; then
|
|
1211
1211
|
tmux send-keys -t "$VERIFIER_PANE" C-c 2>/dev/null
|
|
1212
|
-
tmux send-keys -t "$VERIFIER_PANE" "/exit"
|
|
1212
|
+
tmux send-keys -t "$VERIFIER_PANE" "/exit" C-m 2>/dev/null
|
|
1213
1213
|
fi
|
|
1214
1214
|
sleep 2
|
|
1215
1215
|
# Kill panes on completion
|
|
@@ -1410,7 +1410,7 @@ poll_for_signal() {
|
|
|
1410
1410
|
log " A5: Rate-limited pane shows 'queued messages' — restarting $role pane"
|
|
1411
1411
|
log_debug "[GOV] iter=$ITERATION phase=rate_limit_pane_restart role=$role reason=queued_messages"
|
|
1412
1412
|
tmux send-keys -t "$pane_id" C-c 2>/dev/null; sleep 0.5
|
|
1413
|
-
tmux send-keys -t "$pane_id" "/exit"
|
|
1413
|
+
tmux send-keys -t "$pane_id" "/exit" C-m 2>/dev/null; sleep 2
|
|
1414
1414
|
wait_for_pane_ready "$pane_id" 10 2>/dev/null || true
|
|
1415
1415
|
fi
|
|
1416
1416
|
sleep "$_API_RETRY_INTERVAL_S"
|
|
@@ -1487,7 +1487,7 @@ poll_for_signal() {
|
|
|
1487
1487
|
if echo "$poll_capture" | grep -q "Do you want to" 2>/dev/null; then
|
|
1488
1488
|
log " Permission prompt detected during poll, auto-approving..."
|
|
1489
1489
|
log_debug "[FLOW] iter=$ITERATION permission_prompt_auto_approved=true"
|
|
1490
|
-
tmux send-keys -t "$pane_id"
|
|
1490
|
+
tmux send-keys -t "$pane_id" C-m
|
|
1491
1491
|
sleep 0.5
|
|
1492
1492
|
fi
|
|
1493
1493
|
|
|
@@ -1529,12 +1529,12 @@ run_single_verifier() {
|
|
|
1529
1529
|
log_debug "[GOV] iter=$iter pane_dead=true pane_id=$VERIFIER_PANE cmd=$verifier_cmd action=reset_shell"
|
|
1530
1530
|
tmux send-keys -t "$VERIFIER_PANE" C-c C-u 2>/dev/null
|
|
1531
1531
|
sleep 0.2
|
|
1532
|
-
tmux send-keys -t "$VERIFIER_PANE" "clear"
|
|
1532
|
+
tmux send-keys -t "$VERIFIER_PANE" "clear" C-m 2>/dev/null
|
|
1533
1533
|
sleep 0.3
|
|
1534
1534
|
elif [[ "$verifier_cmd" == "node" || "$verifier_cmd" == "claude" || "$verifier_cmd" == "codex" ]]; then
|
|
1535
1535
|
tmux send-keys -t "$VERIFIER_PANE" C-c 2>/dev/null
|
|
1536
1536
|
sleep 0.5
|
|
1537
|
-
tmux send-keys -t "$VERIFIER_PANE" "/exit"
|
|
1537
|
+
tmux send-keys -t "$VERIFIER_PANE" "/exit" C-m 2>/dev/null
|
|
1538
1538
|
sleep 2
|
|
1539
1539
|
fi
|
|
1540
1540
|
# Always ensure clean shell state before launching new verifier
|
|
@@ -1628,7 +1628,7 @@ run_sequential_final_verify() {
|
|
|
1628
1628
|
verifier_cmd=$(tmux display-message -p -t "$VERIFIER_PANE" '#{pane_current_command}' 2>/dev/null)
|
|
1629
1629
|
if [[ "$verifier_cmd" == "node" || "$verifier_cmd" == "claude" || "$verifier_cmd" == "codex" ]]; then
|
|
1630
1630
|
tmux send-keys -t "$VERIFIER_PANE" C-c 2>/dev/null; sleep 0.5
|
|
1631
|
-
tmux send-keys -t "$VERIFIER_PANE" "/exit"
|
|
1631
|
+
tmux send-keys -t "$VERIFIER_PANE" "/exit" C-m 2>/dev/null; sleep 2
|
|
1632
1632
|
fi
|
|
1633
1633
|
wait_for_pane_ready "$VERIFIER_PANE" 10 2>/dev/null || true
|
|
1634
1634
|
|
|
@@ -1887,12 +1887,10 @@ main() {
|
|
|
1887
1887
|
trap cleanup EXIT INT TERM
|
|
1888
1888
|
mkdir -p "$LOGS_DIR" "$RUNTIME_DIR" 2>/dev/null
|
|
1889
1889
|
|
|
1890
|
-
# --- Analytics directory: create
|
|
1891
|
-
|
|
1892
|
-
mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
1893
|
-
fi
|
|
1890
|
+
# --- Analytics directory: always create (campaign.jsonl + metadata.json are always-on) ---
|
|
1891
|
+
mkdir -p "$ANALYTICS_DIR" 2>/dev/null
|
|
1894
1892
|
|
|
1895
|
-
# --- debug.log versioning (in analytics dir) ---
|
|
1893
|
+
# --- debug.log versioning (in analytics dir, --debug only) ---
|
|
1896
1894
|
if (( DEBUG )) && [[ -f "$DEBUG_LOG" ]]; then
|
|
1897
1895
|
local dbg_n=1
|
|
1898
1896
|
while [[ -f "${DEBUG_LOG%.log}-v${dbg_n}.log" ]]; do
|
|
@@ -1901,33 +1899,30 @@ main() {
|
|
|
1901
1899
|
mv "$DEBUG_LOG" "${DEBUG_LOG%.log}-v${dbg_n}.log"
|
|
1902
1900
|
fi
|
|
1903
1901
|
|
|
1904
|
-
# --- campaign.jsonl versioning (
|
|
1905
|
-
if
|
|
1906
|
-
|
|
1907
|
-
|
|
1908
|
-
|
|
1909
|
-
|
|
1910
|
-
|
|
1911
|
-
mv "$CAMPAIGN_JSONL" "${CAMPAIGN_JSONL%.jsonl}-v${cj_n}.jsonl"
|
|
1912
|
-
fi
|
|
1902
|
+
# --- campaign.jsonl versioning (always-on) ---
|
|
1903
|
+
if [[ -f "$CAMPAIGN_JSONL" ]]; then
|
|
1904
|
+
local cj_n=1
|
|
1905
|
+
while [[ -f "${CAMPAIGN_JSONL%.jsonl}-v${cj_n}.jsonl" ]]; do
|
|
1906
|
+
(( cj_n++ ))
|
|
1907
|
+
done
|
|
1908
|
+
mv "$CAMPAIGN_JSONL" "${CAMPAIGN_JSONL%.jsonl}-v${cj_n}.jsonl"
|
|
1913
1909
|
fi
|
|
1914
1910
|
|
|
1915
|
-
# --- metadata.json: write at campaign start ---
|
|
1916
|
-
|
|
1917
|
-
|
|
1918
|
-
|
|
1919
|
-
|
|
1920
|
-
|
|
1921
|
-
|
|
1922
|
-
|
|
1923
|
-
|
|
1924
|
-
|
|
1925
|
-
|
|
1926
|
-
|
|
1927
|
-
|
|
1928
|
-
|
|
1929
|
-
|
|
1930
|
-
fi
|
|
1911
|
+
# --- metadata.json: always write at campaign start (cross-project identification) ---
|
|
1912
|
+
jq -n \
|
|
1913
|
+
--arg slug "$SLUG" \
|
|
1914
|
+
--arg project_root "$ROOT" \
|
|
1915
|
+
--arg project_name "$(basename "$ROOT")" \
|
|
1916
|
+
--arg campaign_status "running" \
|
|
1917
|
+
--arg start_time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
1918
|
+
--arg end_time "" \
|
|
1919
|
+
--arg worker_model "$WORKER_MODEL" \
|
|
1920
|
+
--arg verifier_model "$VERIFIER_MODEL" \
|
|
1921
|
+
--argjson debug "$DEBUG" \
|
|
1922
|
+
--argjson with_sv "$WITH_SELF_VERIFICATION" \
|
|
1923
|
+
--argjson consensus "$VERIFY_CONSENSUS" \
|
|
1924
|
+
'{slug: $slug, project_root: $project_root, project_name: $project_name, campaign_status: $campaign_status, start_time: $start_time, end_time: $end_time, worker_model: $worker_model, verifier_model: $verifier_model, debug: $debug, with_self_verification: $with_sv, consensus: $consensus}' \
|
|
1925
|
+
> "$METADATA_FILE"
|
|
1931
1926
|
|
|
1932
1927
|
# --- Startup ---
|
|
1933
1928
|
log "Ralph Desk Tmux Runner starting..."
|
|
@@ -2065,7 +2060,7 @@ main() {
|
|
|
2065
2060
|
# Send C-c first (in case claude is mid-task), then /exit
|
|
2066
2061
|
tmux send-keys -t "$WORKER_PANE" C-c 2>/dev/null
|
|
2067
2062
|
sleep 1
|
|
2068
|
-
tmux send-keys -t "$WORKER_PANE" "/exit"
|
|
2063
|
+
tmux send-keys -t "$WORKER_PANE" "/exit" C-m 2>/dev/null
|
|
2069
2064
|
sleep 2
|
|
2070
2065
|
# Wait for shell prompt before proceeding
|
|
2071
2066
|
wait_for_pane_ready "$WORKER_PANE" 10 2>/dev/null || true
|
|
@@ -2261,12 +2256,12 @@ main() {
|
|
|
2261
2256
|
log_debug "[GOV] iter=$ITERATION pane_dead=true pane_id=$VERIFIER_PANE cmd=$verifier_cmd action=reset_shell"
|
|
2262
2257
|
tmux send-keys -t "$VERIFIER_PANE" C-c C-u 2>/dev/null
|
|
2263
2258
|
sleep 0.2
|
|
2264
|
-
tmux send-keys -t "$VERIFIER_PANE" "clear"
|
|
2259
|
+
tmux send-keys -t "$VERIFIER_PANE" "clear" C-m 2>/dev/null
|
|
2265
2260
|
sleep 0.3
|
|
2266
2261
|
elif [[ "$verifier_cmd" == "node" || "$verifier_cmd" == "claude" || "$verifier_cmd" == "codex" ]]; then
|
|
2267
2262
|
tmux send-keys -t "$VERIFIER_PANE" C-c 2>/dev/null
|
|
2268
2263
|
sleep 0.5
|
|
2269
|
-
tmux send-keys -t "$VERIFIER_PANE" "/exit"
|
|
2264
|
+
tmux send-keys -t "$VERIFIER_PANE" "/exit" C-m 2>/dev/null
|
|
2270
2265
|
sleep 2
|
|
2271
2266
|
fi
|
|
2272
2267
|
wait_for_pane_ready "$VERIFIER_PANE" 10 2>/dev/null || true
|