npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.5.3 → 0.6.0 - Mend

@ai-dev-methodologies/rlp-desk 0.5.3 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/docs/plans/mutable-booping-corbato.md +163 -0
package/package.json +1 -1
package/src/commands/rlp-desk.md +12 -0
package/src/model-upgrade-table.md +9 -9
package/src/scripts/init_ralph_desk.zsh +61 -2
package/src/scripts/lib_ralph_desk.zsh +6 -11
package/src/scripts/run_ralph_desk.zsh +173 -37

package/docs/plans/mutable-booping-corbato.md ADDED Viewed

@@ -0,0 +1,163 @@
+# Plan: rlp-desk Batch Mode + Operational Context 개선
+## Context
+실제 캠페인(`prod-local-parity`, spark:high)에서 두 가지 구조적 문제가 발견됨:
+1. **Batch 모드 무한 FAIL**: US 5개 이상이면 Worker가 일부만 완료 → Verifier가 전체 검증 → FAIL → 진전 무시 → CB BLOCKED. `VERIFIED_US` 추적이 per-us 모드에만 있고 batch에는 없음.
+2. **서버 프로젝트 지원 부재**: Worker가 코드 수정 후 서버 restart를 안 하고, 서버 포트를 모르고, health check도 없음. spark 모델 탓이 아니라 **rlp-desk가 operational context를 brainstorm/prompt에 반영하지 않는 설계 결함**.
+---
+## P0: Batch 모드 Partial Progress Tracking
+### 수정 대상
+- `src/scripts/run_ralph_desk.zsh`
+- `src/commands/rlp-desk.md` (agent mode ⑦c)
+### 변경 내용
+#### 1. Batch 모드에도 VERIFIED_US 추적 (run_ralph_desk.zsh)
+- PASS verdict 처리(L2423): `per-us` 조건 제거 → batch에서도 `signal_us_id`가 개별 US면 `VERIFIED_US`에 추가
+- FAIL verdict 처리(L2445): verdict JSON에서 `per_us_results` 파싱 → `met=true`인 US를 `VERIFIED_US`에 추가
+- status.json 갱신: batch 모드에서도 `verified_us` 배열 기록
+#### 2. Verifier Prompt에 VERIFIED_US 전달 (run_ralph_desk.zsh L1225-1232)
+- `if [[ "$VERIFY_MODE" = "per-us"` 조건 → `if [[ -n "$VERIFIED_US"` 로 변경
+- batch 모드 verifier에게도 "이미 verified된 US skip" 지시
+#### 3. Fix Contract Scope Narrowing (run_ralph_desk.zsh L2461-2473)
+- FAIL 시: verdict에서 pass한 US 추출 → fix contract에 "US-001~004 verified. Continue from US-005."
+- Worker prompt 조합 시 `VERIFIED_US` 참조하여 축소된 scope 전달
+#### 4. consecutive_failures 부분 리셋 (run_ralph_desk.zsh L2447)
+- 새로 pass된 US가 있으면 (`VERIFIED_US` 길어짐) → `CONSECUTIVE_FAILURES=0` 리셋
+- 진전 없이 같은 상태면 → 기존대로 증가
+#### 5. Verifier Verdict에 per_us_results 필수화
+- Verifier prompt template(init_ralph_desk.zsh L384-474)에 output format 추가:
+  ```json
+  {
+    "verdict": "fail",
+    "per_us_results": { "US-001": "pass", "US-005": "fail" },
+    "issues": [...]
+  }
+  ```
+- batch/per-us 공통으로 per_us_results 포함하도록 지시
+---
+## P1: Brainstorm Operational Context + Worker System Prompt
+### 수정 대상
+- `src/commands/rlp-desk.md` (brainstorm section)
+- `src/scripts/init_ralph_desk.zsh` (Worker/Verifier prompt template)
+### 변경 내용
+#### 1. Brainstorm: Operational Context 수집 (rlp-desk.md L24-93)
+현재 11개 항목 수집 중, **12번째 항목 추가**:
+```
+12. **Operational Context** (if applicable):
+    - Does this project require a running server/service? (y/n)
+    - Server start command (e.g., `npm run dev`, `python manage.py runserver`)
+    - Server port (e.g., 7001)
+    - Health check URL (e.g., `http://localhost:7001/health`)
+    - Other runtime dependencies (e.g., database, Redis)
+```
+brainstorm이 프로젝트 디렉토리에서 `package.json`의 `scripts.dev`/`scripts.start`, `Makefile`, `docker-compose.yml` 등을 자동 감지하여 추천.
+#### 2. Brainstorm: US 생성 시 Operational Step 포함 가이드
+US/AC 작성 가이드(rlp-desk.md L26-38)에 추가:
+```
+- If the project has operational context (server, DB, etc.):
+  - Each US that modifies server code MUST include AC:
+    "Given server is running, When code is modified, Then server is restarted and responds on health check URL"
+  - Do NOT assume Worker will restart server on its own — spell it out in AC
+```
+#### 3. Init: Worker Prompt에 Operational Rules 주입 (init_ralph_desk.zsh L285-380)
+brainstorm에서 수집한 operational context를 Worker prompt template에 주입:
+```markdown
+## Operational Context
+- **Server Command**: `npm run dev`
+- **Server Port**: 7001
+- **Health Check**: `http://localhost:7001/health`
+### Operational Rules (always apply)
+- After modifying server/application code, restart the server: `[server_cmd]`
+- Before signaling done, verify server responds: `curl -s [health_url] || fail`
+- Do NOT modify dependency files (package.json, requirements.txt, etc.) unless the AC explicitly requires it
+- Do NOT run package install commands (npm install, pip install, etc.) unless the AC explicitly requires it
+```
+operational context가 없는 프로젝트(코드만 수정)면 이 섹션 생략.
+#### 4. Init: Verifier Prompt에도 Operational Check 추가
+Verifier prompt template(init_ralph_desk.zsh L384-474)에:
+```markdown
+## Operational Verification (if server context provided)
+- Verify server is running on expected port before checking ACs
+- If server is down, verdict=FAIL with issue: "server not running"
+```
+#### 5. --server-cmd / --server-port CLI 옵션 (run_ralph_desk.zsh)
+brainstorm에서 수집한 값을 init이 prompt에 넣지만, run 시 override도 가능:
+- `--server-cmd "npm run dev"` → Worker prompt의 서버 명령어 override
+- `--server-port 7001` → Worker prompt의 포트 override
+- 런타임에 iteration 시작 시 health check (optional, `--server-health-check` flag)
+---
+## Verification Plan
+### P0 Tests
+```bash
+# Batch partial progress 단위 테스트
+zsh tests/test_batch_partial_progress.sh
+# 시나리오: batch FAIL verdict에 per_us_results 포함 → VERIFIED_US 추적 확인
+# 시나리오: 새 US pass 시 consecutive_failures 리셋 확인
+# 시나리오: verifier prompt에 VERIFIED_US 포함 확인 (batch 모드)
+```
+### P1 Tests
+```bash
+# Operational context 단위 테스트
+zsh tests/test_operational_context.sh
+# 시나리오: --server-cmd 옵션 파싱 확인
+# 시나리오: Worker prompt에 operational rules 주입 확인
+# 시나리오: operational context 없는 프로젝트에서는 섹션 생략 확인
+```
+### Self-Verification (CLAUDE.md 필수)
+변경된 src 파일에 대해 3개 시나리오 (LOW/MEDIUM/CRITICAL) 자체 검증 실행.
+### E2E
+실제 캠페인으로 테스트:
+1. batch 모드 + 10 US → partial progress 추적 확인
+2. server 프로젝트 + spark:high → 서버 restart 수행 확인
+---
+## File Map
+| 파일 | P0 | P1 |
+|------|----|----|
+| `src/scripts/run_ralph_desk.zsh` | VERIFIED_US batch 추적, fix contract narrowing, CF 리셋 | --server-cmd/port 옵션 |
+| `src/scripts/lib_ralph_desk.zsh` | - | - |
+| `src/scripts/init_ralph_desk.zsh` | - | Worker/Verifier prompt에 operational context 주입 |
+| `src/commands/rlp-desk.md` | agent mode ⑦c batch 로직 | brainstorm 12번 항목, US 가이드 |
+| `src/governance.md` | - | - |
+---
+## Scope / Non-Goals
+- 모델별 가드레일 (spark 전용 금지 목록) → **하지 않음**. brainstorm/prompt 구조로 해결
+- batch 모드 완전 제거 → **하지 않음**. 수정하여 사용 가능하게 함
+- auto-detect project type → brainstorm에서 사용자 확인 + 파일 기반 추천만. 완전 자동화 아님

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ai-dev-methodologies/rlp-desk",
-  "version": "0.5.3",
+  "version": "0.6.0",
   "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
   "scripts": {
     "postinstall": "node scripts/postinstall.js",

package/src/commands/rlp-desk.md CHANGED Viewed

@@ -91,6 +91,18 @@ Ask about these items one by one (or in small groups):
 9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
 10. **Consensus** — Ask: "Use cross-engine consensus? off (single engine), final-only (cross-engine on final verify only), or all (cross-engine on every verify). Requires codex CLI." Default: off. Recommended: final-only when codex is installed.
 11. **Max Iterations** — suggest based on story count, ask if OK.
+12. **Operational Context** — Auto-detect: scan project root for `package.json` (scripts.dev/start), `Makefile`, `docker-compose.yml`, `manage.py`. If detected, ask:
+   - "Does this project require a running server/service during development?" (y/n)
+   - If yes: "Server start command?" (pre-fill from detected scripts, e.g., `npm run dev`)
+   - "Server port?" (e.g., 7001)
+   - "Health check URL?" (e.g., `http://localhost:7001/health`) — optional
+   - Pass to init: `--server-cmd "CMD" --server-port PORT --server-health URL`
+   - If no server needed: skip. Init generates prompts without operational context.
+   **US generation guidance when server context is present:**
+   - Each US that modifies server/application code SHOULD include an AC or note:
+     "Given server is running, When code is modified, Then server is restarted and health check passes"
+   - Do NOT assume the Worker model will restart servers on its own — spell it out in the AC or rely on the operational rules injected by init.
 After all items are confirmed:

package/src/model-upgrade-table.md CHANGED Viewed

@@ -9,23 +9,23 @@ CB default: 6. Override: `--cb-threshold N`. Worker only — Verifier fixed at c
 - CB < table columns → BLOCKED at that column
 - CB > 6 → repeat ceiling model beyond column 6
-## GPT Pro (spark — separate token limit)
+## GPT Pro (gpt-5.3-codex-spark — separate token limit)
 | Complexity | 1-2 | 3-4 | 5-6 | 7+ |
 |------------|-----|-----|-----|-----|
-| LOW | spark:low | spark:medium | spark:high | BLOCKED |
-| MEDIUM | spark:medium | spark:high | spark:xhigh | BLOCKED |
-| HIGH | spark:high | spark:xhigh | spark:xhigh | BLOCKED |
-| CRITICAL | spark:xhigh | spark:xhigh | spark:xhigh | BLOCKED |
+| LOW | gpt-5.3-codex-spark:low | gpt-5.3-codex-spark:medium | gpt-5.3-codex-spark:high | BLOCKED |
+| MEDIUM | gpt-5.3-codex-spark:medium | gpt-5.3-codex-spark:high | gpt-5.3-codex-spark:xhigh | BLOCKED |
+| HIGH | gpt-5.3-codex-spark:high | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | BLOCKED |
+| CRITICAL | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | BLOCKED |
 ## Non-Pro (gpt-5.4)
 | Complexity | 1-2 | 3-4 | 5-6 | 7+ |
 |------------|-----|-----|-----|-----|
-| LOW | 5.4:low | 5.4:medium | 5.4:high | BLOCKED |
-| MEDIUM | 5.4:medium | 5.4:high | 5.4:xhigh | BLOCKED |
-| HIGH | 5.4:high | 5.4:xhigh | 5.4:xhigh | BLOCKED |
-| CRITICAL | 5.4:xhigh | 5.4:xhigh | 5.4:xhigh | BLOCKED |
+| LOW | gpt-5.4:low | gpt-5.4:medium | gpt-5.4:high | BLOCKED |
+| MEDIUM | gpt-5.4:medium | gpt-5.4:high | gpt-5.4:xhigh | BLOCKED |
+| HIGH | gpt-5.4:high | gpt-5.4:xhigh | gpt-5.4:xhigh | BLOCKED |
+| CRITICAL | gpt-5.4:xhigh | gpt-5.4:xhigh | gpt-5.4:xhigh | BLOCKED |
 ## Claude-only

package/src/scripts/init_ralph_desk.zsh CHANGED Viewed

@@ -11,11 +11,14 @@ set -euo pipefail
 #   ~/.claude/ralph-desk/init_ralph_desk.zsh <slug> [objective] [--mode fresh|improve]
 # =============================================================================
-SLUG="${1:?Usage: $0 <slug> [objective] [--mode fresh|improve]}"
+SLUG="${1:?Usage: $0 <slug> [objective] [--mode fresh|improve] [--server-cmd CMD] [--server-port PORT] [--server-health URL]}"
 MODE=""
 OBJECTIVE="TBD - fill in the objective"
+SERVER_CMD=""
+SERVER_PORT=""
+SERVER_HEALTH=""
-# Parse remaining arguments: --mode fresh|improve + optional positional objective
+# Parse remaining arguments
 shift
 while [[ $# -gt 0 ]]; do
   case "$1" in
@@ -27,6 +30,30 @@ while [[ $# -gt 0 ]]; do
       MODE="${1#--mode=}"
       shift
       ;;
+    --server-cmd)
+      SERVER_CMD="${2:?--server-cmd requires a command}"
+      shift 2
+      ;;
+    --server-cmd=*)
+      SERVER_CMD="${1#--server-cmd=}"
+      shift
+      ;;
+    --server-port)
+      SERVER_PORT="${2:?--server-port requires a port number}"
+      shift 2
+      ;;
+    --server-port=*)
+      SERVER_PORT="${1#--server-port=}"
+      shift
+      ;;
+    --server-health)
+      SERVER_HEALTH="${2:?--server-health requires a URL}"
+      shift 2
+      ;;
+    --server-health=*)
+      SERVER_HEALTH="${1#--server-health=}"
+      shift
+      ;;
     *)
       OBJECTIVE="$1"
       shift
@@ -378,6 +405,24 @@ execution_steps MUST be a JSON array of objects (not a dict with string keys). E
 ## Objective
 $OBJECTIVE
 EOF
+  # Inject operational context if server options provided
+  if [[ -n "$SERVER_CMD" || -n "$SERVER_PORT" ]]; then
+    cat >> "$F" <<OPCTX
+## Operational Context
+$([ -n "$SERVER_CMD" ] && echo "- **Server Start Command**: \`$SERVER_CMD\`")
+$([ -n "$SERVER_PORT" ] && echo "- **Server Port**: $SERVER_PORT")
+$([ -n "$SERVER_HEALTH" ] && echo "- **Health Check URL**: $SERVER_HEALTH")
+### Operational Rules (always apply when server context is present)
+- After modifying server/application code, restart the server$([ -n "$SERVER_CMD" ] && echo ": \`$SERVER_CMD\`")
+- Before signaling done, verify the server responds$([ -n "$SERVER_HEALTH" ] && echo ": \`curl -sf $SERVER_HEALTH\`" || [ -n "$SERVER_PORT" ] && echo ": \`curl -sf http://localhost:$SERVER_PORT/\`")
+- Do NOT modify dependency files (package.json, requirements.txt, etc.) unless the AC explicitly requires it
+- Do NOT run package install commands (npm install, pip install, etc.) unless the AC explicitly requires it
+OPCTX
+  fi
   echo "  + $F"
 else echo "  · $F"; fi
@@ -447,6 +492,7 @@ Verdict JSON:
   "us_id": "US-NNN or ALL (matches the scope you verified)",
   "verified_at_utc": "ISO timestamp",
   "summary": "...",
+  "per_us_results": {"US-001": "pass|fail|not_started", "US-002": "pass|fail|not_started"},
   "criteria_results": [{"criterion":"...","met":true/false,"evidence":"..."}],
   "missing_evidence": [],
   "issues": [{"id":"...","severity":"critical|major|minor","description":"...","fix_hint":"(suggestion, non-authoritative)"}],
@@ -471,7 +517,20 @@ Rules:
 - Deterministic checks (type hints, linting, security) delegate to test-spec tools; focus on AC verification + semantic review + smoke test.
 - Do NOT modify code or write sentinel files.
 - If Worker claims "inspection" or "review" for an AC that requires an automated command, verdict = FAIL.
+- **ALWAYS include per_us_results** in verdict JSON — map each US to "pass", "fail", or "not_started". This is required for partial progress tracking in both batch and per-us modes.
 EOF
+  # Inject operational verification if server options provided
+  if [[ -n "$SERVER_CMD" || -n "$SERVER_PORT" ]]; then
+    cat >> "$F" <<OPVER
+## Operational Verification (server context present)
+- Before verifying ACs, check that the server is running$([ -n "$SERVER_PORT" ] && echo " on port $SERVER_PORT")$([ -n "$SERVER_HEALTH" ] && echo ": \`curl -sf $SERVER_HEALTH\`")
+- If the server is not running, verdict = FAIL with issue: "server not running on expected port"
+- If Worker modified server code but did not restart the server, verdict = FAIL with issue: "server not restarted after code change"
+OPVER
+  fi
   echo "  + $F"
 else echo "  · $F"; fi

package/src/scripts/lib_ralph_desk.zsh CHANGED Viewed

@@ -31,7 +31,7 @@ log_error() {
 # parse_model_flag() — parse unified --worker-model / --verifier-model value
 # Colon format (model:reasoning) → codex engine; plain name → claude engine.
-# Spark alias: any model name containing "spark" is normalized to "spark".
+# Spark alias: bare "spark" is expanded to full model ID "gpt-5.3-codex-spark".
 # Usage:  parse_model_flag <value> <role>
 # Output (stdout): "engine model [reasoning]"  e.g. "codex gpt-5.4 medium" | "claude sonnet"
 # Returns: 0 on success, 1 on invalid format (error written to stderr)
@@ -47,8 +47,8 @@ parse_model_flag() {
   if (( colon_count == 1 )); then
     local model="${value%%:*}"
     local reasoning="${value##*:}"
-    if [[ "$model" == *"spark"* ]]; then
-      model="spark"
+    if [[ "$model" == "spark" ]]; then
+      model="gpt-5.3-codex-spark"
     fi
     echo "codex $model $reasoning"
   else
@@ -76,7 +76,7 @@ get_model_string() {
 # get_next_model() — return next model in Worker upgrade path, or empty at ceiling
 # Usage: get_next_model <model_str>
 #   claude: "haiku"|"sonnet"|"opus"
-#   codex:  "gpt-5.4:medium"|"gpt-5.4:high"|"gpt-5.4:xhigh"|"spark:medium"|...
+#   codex:  "gpt-5.4:medium"|"gpt-5.4:high"|"gpt-5.4:xhigh"|"gpt-5.3-codex-spark:medium"|...
 # Output: next model string, or empty string if at ceiling
 get_next_model() {
   local current="$1"
@@ -85,16 +85,11 @@ get_next_model() {
     haiku)          echo "sonnet"         ;;
     sonnet)         echo "opus"           ;;
     opus)           echo ""               ;;
-    # Codex GPT Pro upgrade path (short aliases)
-    spark:low)      echo "spark:medium"   ;;
-    spark:medium)   echo "spark:high"     ;;
-    spark:high)     echo "spark:xhigh"    ;;
-    spark:xhigh)    echo ""               ;;  # spark ceiling
-    # Codex GPT Pro upgrade path (full model names)
+    # Codex GPT Pro (spark) upgrade path
     gpt-5.3-codex-spark:low)    echo "gpt-5.3-codex-spark:medium" ;;
     gpt-5.3-codex-spark:medium) echo "gpt-5.3-codex-spark:high"   ;;
     gpt-5.3-codex-spark:high)   echo "gpt-5.3-codex-spark:xhigh"  ;;
-    gpt-5.3-codex-spark:xhigh)  echo ""                           ;;  # spark ceiling (full name)
+    gpt-5.3-codex-spark:xhigh)  echo ""                           ;;  # spark ceiling
     # Codex Non-Pro upgrade path
     gpt-5.4:low)    echo "gpt-5.4:medium" ;;
     gpt-5.4:medium) echo "gpt-5.4:high"   ;;

package/src/scripts/run_ralph_desk.zsh CHANGED Viewed

@@ -58,10 +58,30 @@ IDLE_NUDGE_THRESHOLD="${IDLE_NUDGE_THRESHOLD:-30}"
 MAX_NUDGES="${MAX_NUDGES:-3}"
 WITH_SELF_VERIFICATION="${WITH_SELF_VERIFICATION:-0}"
-# --- Engine Selection ---
-WORKER_ENGINE="${WORKER_ENGINE:-claude}"    # claude|codex
-VERIFIER_ENGINE="${VERIFIER_ENGINE:-claude}"  # claude|codex
-FINAL_VERIFIER_ENGINE="${FINAL_VERIFIER_ENGINE:-claude}"  # claude|codex (derived from FINAL_VERIFIER_MODEL)
+# --- Engine Selection (auto-detect from model format: name=claude, name:reasoning=codex) ---
+# If model contains ":", it's codex format — auto-set engine and split model/reasoning
+_auto_detect_engine() {
+  local model_var="$1" engine_var="$2" codex_model_var="$3" codex_reasoning_var="$4"
+  local model_val="${(P)model_var}"
+  if [[ "$model_val" == *:* ]]; then
+    local model_part="${model_val%%:*}"
+    local reasoning_part="${model_val##*:}"
+    [[ "$model_part" == "spark" ]] && model_part="gpt-5.3-codex-spark"
+    eval "$engine_var=codex"
+    eval "$model_var=$model_part"
+    [[ -n "$codex_model_var" ]] && eval "$codex_model_var=$model_part"
+    [[ -n "$codex_reasoning_var" ]] && eval "$codex_reasoning_var=$reasoning_part"
+  fi
+}
+WORKER_ENGINE="${WORKER_ENGINE:-claude}"
+VERIFIER_ENGINE="${VERIFIER_ENGINE:-claude}"
+FINAL_VERIFIER_ENGINE="${FINAL_VERIFIER_ENGINE:-claude}"
+# Auto-detect engine from model format for env var path (CLI path uses parse_model_flag)
+_auto_detect_engine WORKER_MODEL WORKER_ENGINE WORKER_CODEX_MODEL WORKER_CODEX_REASONING
+_auto_detect_engine VERIFIER_MODEL VERIFIER_ENGINE VERIFIER_CODEX_MODEL VERIFIER_CODEX_REASONING
+_auto_detect_engine FINAL_VERIFIER_MODEL FINAL_VERIFIER_ENGINE "" ""
 WORKER_CODEX_MODEL="${WORKER_CODEX_MODEL:-gpt-5.4}"
 WORKER_CODEX_REASONING="${WORKER_CODEX_REASONING:-high}"   # low|medium|high
 VERIFIER_CODEX_MODEL="${VERIFIER_CODEX_MODEL:-gpt-5.4}"
@@ -191,19 +211,55 @@ check_dead_pane() {
   return 1  # alive
 }
-# launch_worker_codex() — launch codex Worker via trigger script (non-interactive exec)
-# Args: $1=pane_id  $2=trigger_file  $3=iteration
-# Returns: 0 always (codex failures detected by poll_for_signal)
+# launch_worker_codex() — launch codex Worker TUI, send instruction, verify submission
+# Matches launch_worker_claude() pattern for consistent tmux-visible execution.
+# Args: $1=pane_id  $2=prompt_file  $3=iteration  $4=worker_launch_cmd
+# Returns: 0 on success, 1 on fatal failure
 launch_worker_codex() {
   local pane_id="$1"
-  local trigger_file="$2"
+  local prompt_file="$2"
   local iter="$3"
+  local worker_launch="$4"
+  log "  Launching Worker codex TUI in pane $pane_id..."
+  paste_to_pane "$pane_id" "$worker_launch"
+  tmux send-keys -t "$pane_id" C-m
-  log "  Launching Worker codex via trigger script in pane $pane_id..."
-  paste_to_pane "$pane_id" "bash $trigger_file"
+  # Wait for codex TUI to be ready
+  if ! wait_for_pane_ready "$pane_id" 30; then
+    log_error "Worker codex failed to start"
+    return 1
+  fi
+  # Send instruction to codex TUI
+  sleep 3
+  local worker_instruction="Read and execute the instructions in $prompt_file"
+  paste_to_pane "$pane_id" "$worker_instruction"
   tmux send-keys -t "$pane_id" C-m
-  log_debug "Worker codex trigger sent: $trigger_file"
-  sleep 3  # brief wait for codex to start
+  log_debug "Worker codex instruction sent (${#worker_instruction} chars)"
+  # Submit loop — verify codex started working
+  local submit_attempts=0
+  while (( submit_attempts < 15 )); do
+    sleep 2
+    local pane_check
+    pane_check=$(tmux capture-pane -t "$pane_id" -p 2>/dev/null)
+    if echo "$pane_check" | grep -qi "working\|thinking\|Exploring\|Running\|reading\|searching\|editing\|writing" 2>/dev/null; then
+      log_debug "Worker codex started working after $((submit_attempts + 1)) checks"
+      break
+    fi
+    if (( submit_attempts == 8 )); then
+      log_debug "Adaptive instruction retry: clearing line and re-typing"
+      tmux send-keys -t "$pane_id" C-u 2>/dev/null
+      sleep 0.1
+      paste_to_pane "$pane_id" "$worker_instruction"
+      tmux send-keys -t "$pane_id" C-m
+    fi
+    tmux send-keys -t "$pane_id" C-m 2>/dev/null
+    sleep 0.3
+    tmux send-keys -t "$pane_id" C-m 2>/dev/null
+    (( submit_attempts++ ))
+  done
   return 0
 }
@@ -288,19 +344,53 @@ launch_worker_claude() {
   return 0
 }
-# launch_verifier_codex() — launch codex Verifier in pane (non-interactive)
+# launch_verifier_codex() — launch codex Verifier TUI, send instruction, verify submission
+# Matches launch_verifier_claude() pattern for consistent tmux-visible execution.
 # Args: $1=pane_id  $2=prompt_file  $3=iteration  $4=launch_cmd
-# Returns: 0 always
+# Returns: 0 on success
 launch_verifier_codex() {
   local pane_id="$1"
   local prompt_file="$2"
   local iter="$3"
   local verifier_launch="$4"
-  log "  Launching Verifier codex in pane $pane_id..."
+  log "  Launching Verifier codex TUI in pane $pane_id..."
   paste_to_pane "$pane_id" "$verifier_launch"
   tmux send-keys -t "$pane_id" C-m
+  if ! wait_for_pane_ready "$pane_id" 30; then
+    log_error "Verifier codex failed to start"
+    return 1
+  fi
   sleep 3
+  local verifier_instruction="Read and execute the instructions in $prompt_file"
+  paste_to_pane "$pane_id" "$verifier_instruction"
+  tmux send-keys -t "$pane_id" C-m
+  log_debug "Verifier codex instruction sent"
+  # Submit loop — verify codex started working
+  local submit_attempts=0
+  while (( submit_attempts < 15 )); do
+    sleep 2
+    local vs_check
+    vs_check=$(tmux capture-pane -t "$pane_id" -p 2>/dev/null)
+    if echo "$vs_check" | grep -qi "working\|thinking\|Exploring\|Running\|reading\|searching\|editing\|writing" 2>/dev/null; then
+      log_debug "Verifier codex started working after $((submit_attempts + 1)) checks"
+      break
+    fi
+    if (( submit_attempts == 8 )); then
+      log_debug "Adaptive instruction retry: clearing line and re-typing"
+      tmux send-keys -t "$pane_id" C-u 2>/dev/null
+      sleep 0.1
+      paste_to_pane "$pane_id" "$verifier_instruction"
+      tmux send-keys -t "$pane_id" C-m
+    fi
+    tmux send-keys -t "$pane_id" C-m 2>/dev/null
+    sleep 0.3
+    tmux send-keys -t "$pane_id" C-m 2>/dev/null
+    (( submit_attempts++ ))
+  done
   return 0
 }
@@ -366,7 +456,7 @@ handle_worker_exit_codex() {
     local dc_us_id
     dc_us_id=$(jq -r '.us_id // "unknown"' "$DONE_CLAIM_FILE" 2>/dev/null)
     log "  Codex worker completed with done-claim (us_id=$dc_us_id). Auto-generating signal."
-    echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated after codex exec exit","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
+    echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$dc_us_id"'","summary":"auto-generated after codex exit","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
   else
     log "  WARNING: Codex worker exited without done-claim. Generating verify signal for current US."
     local current_us
@@ -374,7 +464,7 @@ handle_worker_exit_codex() {
     local mem_us
     mem_us=$(sed -n 's/.*Next.*US-\([0-9]*\).*/US-\1/p' "$DESK/memos/${SLUG}-memory.md" 2>/dev/null | head -1)
     [[ -n "$mem_us" ]] && current_us="$mem_us"
-    echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$current_us"'","summary":"auto-generated after codex exec exit (no done-claim)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
+    echo '{"iteration":'"$iter"',"status":"verify","us_id":"'"$current_us"'","summary":"auto-generated after codex exit (no done-claim)","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' > "$signal_file"
   fi
   return 0
 }
@@ -1048,23 +1138,31 @@ write_worker_trigger() {
     elif [[ "$VERIFY_MODE" = "batch" ]]; then
       echo ""
       echo "---"
-      echo "## BATCH MODE OVERRIDE"
-      echo "Ignore any per-US signal instructions above. In batch mode:"
-      echo "- Implement ALL user stories in this iteration"
-      echo '- Signal verify with us_id="ALL" only when ALL stories are complete'
-      echo "- Do NOT signal verify after individual stories"
+      if [[ -n "$VERIFIED_US" ]]; then
+        echo "## BATCH MODE — CONTINUE FROM PARTIAL PROGRESS"
+        echo "The following US have already been verified: **$VERIFIED_US**"
+        echo "- Do NOT re-implement these — they are done."
+        echo "- Focus ONLY on the remaining unverified user stories."
+        echo '- Signal verify with us_id="ALL" when the remaining stories are complete.'
+      else
+        echo "## BATCH MODE OVERRIDE"
+        echo "Ignore any per-US signal instructions above. In batch mode:"
+        echo "- Implement ALL user stories in this iteration"
+        echo '- Signal verify with us_id="ALL" only when ALL stories are complete'
+        echo "- Do NOT signal verify after individual stories"
+      fi
     fi
   } | atomic_write "$prompt_file"
   # Write trigger script (DO NOT use exec -- breaks heartbeat cleanup)
   # Engine-specific launch command (expanded at write time)
   if [[ "$WORKER_ENGINE" = "codex" ]]; then
-    local engine_cmd="${CODEX_BIN:-codex} exec \\
+    local engine_cmd="${CODEX_BIN:-codex} \\
   -m $WORKER_CODEX_MODEL \\
   -c model_reasoning_effort=\"$WORKER_CODEX_REASONING\" \\
   --dangerously-bypass-approvals-and-sandbox \\
   \"\$(cat $prompt_file)\""
-    local engine_comment="# Run codex exec with fresh context (no pipe — codex requires terminal)"
+    local engine_comment="# Run codex with fresh context (fallback trigger — TUI primary launch via launch_worker_codex)"
   else
     local engine_cmd="$CLAUDE_BIN -p \"\$(cat $prompt_file)\" \\
   --model $WORKER_MODEL \\
@@ -1132,13 +1230,15 @@ write_verifier_trigger() {
     echo "- **Iteration**: $iter"
     echo "- **Done Claim**: $DONE_CLAIM_FILE"
     echo "- **Verify Mode**: $VERIFY_MODE"
-    if [[ "$VERIFY_MODE" = "per-us" && -n "$us_id" ]]; then
+    if [[ -n "$us_id" ]]; then
       if [[ "$us_id" = "ALL" ]]; then
-        echo "- **Scope**: FINAL FULL VERIFY — check ALL acceptance criteria from the PRD"
-        echo "- **Previously verified US**: $VERIFIED_US"
+        echo "- **Scope**: FULL VERIFY — check ALL acceptance criteria from the PRD"
       else
         echo "- **Scope**: Verify ONLY the acceptance criteria for **${us_id}**"
+      fi
+      if [[ -n "$VERIFIED_US" ]]; then
         echo "- **Previously verified US**: $VERIFIED_US"
+        echo "- **Note**: Skip re-verifying the above US. Focus on unverified stories."
       fi
     fi
   } | atomic_write "$prompt_file"
@@ -1557,9 +1657,9 @@ run_single_verifier() {
   # Launch verifier — dispatch to engine-specific function
   local verifier_launch
   if [[ "$engine" = "codex" ]]; then
-    verifier_launch="${CODEX_BIN:-codex} exec \"\$(cat $prompt_file)\" -m $VERIFIER_CODEX_MODEL -c model_reasoning_effort=\"$VERIFIER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
+    verifier_launch="${CODEX_BIN:-codex} -m $VERIFIER_CODEX_MODEL -c model_reasoning_effort=\"$VERIFIER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
     launch_verifier_codex "$VERIFIER_PANE" "$prompt_file" "$iter" "$verifier_launch"
-    log_debug "Verifier$suffix codex exec dispatched"
+    log_debug "Verifier$suffix codex TUI dispatched"
   else
     verifier_launch="$CLAUDE_BIN --model $model --dangerously-skip-permissions"
     if ! launch_verifier_claude "$VERIFIER_PANE" "$prompt_file" "$iter" "$verifier_launch"; then
@@ -1572,7 +1672,7 @@ run_single_verifier() {
   # Poll for verdict
   if [[ "$engine" = "codex" ]]; then
     # Codex exec: simple file poll (non-interactive, no heartbeat/nudge needed)
-    log "  Polling for verify-verdict.json ($suffix, codex exec)..."
+    log "  Polling for verify-verdict.json ($suffix, codex TUI)..."
     local codex_poll_start
     codex_poll_start=$(date +%s)
     while true; do
@@ -1916,7 +2016,7 @@ main() {
     --arg verifier_model "$VERIFIER_MODEL" \
     --argjson debug "$DEBUG" \
     --argjson with_sv "$WITH_SELF_VERIFICATION" \
-    --argjson consensus "$VERIFY_CONSENSUS" \
+    --argjson consensus "${VERIFY_CONSENSUS:-0}" \
     '{slug: $slug, project_root: $project_root, project_name: $project_name, campaign_status: $campaign_status, start_time: $start_time, end_time: $end_time, worker_model: $worker_model, verifier_model: $verifier_model, debug: $debug, with_self_verification: $with_sv, consensus: $consensus}' \
     > "$METADATA_FILE"
@@ -1960,7 +2060,7 @@ main() {
       log_debug "[OPTION] expected_flow=worker(all)->verify(ALL)->COMPLETE"
     fi
-    if [[ "$VERIFY_CONSENSUS" = "1" ]]; then
+    if [[ "${VERIFY_CONSENSUS:-0}" = "1" ]]; then
       log_debug "[OPTION] consensus_flow=each_verify_runs_claude+codex_both_must_pass"
     fi
   fi
@@ -2084,9 +2184,12 @@ main() {
     local worker_launch
     if [[ "$WORKER_ENGINE" = "codex" ]]; then
-      local worker_trigger="$LOGS_DIR/iter-$(printf '%03d' $ITERATION).worker-trigger.sh"
-      worker_launch="bash $worker_trigger"
-      launch_worker_codex "$WORKER_PANE" "$worker_trigger" "$ITERATION"
+      worker_launch="${CODEX_BIN:-codex} -m $WORKER_CODEX_MODEL -c model_reasoning_effort=\"$WORKER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
+      if ! launch_worker_codex "$WORKER_PANE" "$worker_prompt" "$ITERATION" "$worker_launch"; then
+        write_blocked_sentinel "Worker codex failed to start in pane"
+        update_status "blocked" "worker_start_failed"
+        return 1
+      fi
     else
       worker_launch="$CLAUDE_BIN --model $WORKER_MODEL --dangerously-skip-permissions"
       if ! launch_worker_claude "$WORKER_PANE" "$worker_prompt" "$ITERATION" "$worker_launch"; then
@@ -2326,8 +2429,8 @@ main() {
               _MODEL_UPGRADED=0
             fi
-            # --- Per-US tracking ---
-            if [[ "$VERIFY_MODE" = "per-us" && -n "$signal_us_id" && "$signal_us_id" != "ALL" ]]; then
+            # --- Verified US tracking (both per-us and batch modes) ---
+            if [[ -n "$signal_us_id" && "$signal_us_id" != "ALL" ]]; then
               # Add this US to verified list
               if [[ -n "$VERIFIED_US" ]]; then
                 VERIFIED_US="${VERIFIED_US},${signal_us_id}"
@@ -2351,6 +2454,32 @@ main() {
             ;;
           fail)
             # --- governance.md s7½: Fix Loop (adapted for tmux lean mode) ---
+            # Parse per_us_results from verdict to track partial progress (batch + per-us)
+            local _prev_verified="$VERIFIED_US"
+            if jq -e '.per_us_results' "$VERDICT_FILE" &>/dev/null; then
+              local _newly_passed
+              _newly_passed=$(jq -r '.per_us_results | to_entries[] | select(.value == "pass") | .key' "$VERDICT_FILE" 2>/dev/null)
+              for _pus in $(echo "$_newly_passed"); do
+                if ! echo ",$VERIFIED_US," | grep -q ",$_pus,"; then
+                  if [[ -n "$VERIFIED_US" ]]; then
+                    VERIFIED_US="${VERIFIED_US},${_pus}"
+                  else
+                    VERIFIED_US="$_pus"
+                  fi
+                  log "  Partial progress: $_pus passed (overall FAIL). Verified so far: $VERIFIED_US"
+                fi
+              done
+              log_debug "[FLOW] iter=$ITERATION partial_progress prev=$_prev_verified now=$VERIFIED_US"
+            fi
+            # Partial progress resets consecutive failures (progress was made)
+            if [[ "$VERIFIED_US" != "$_prev_verified" ]]; then
+              CONSECUTIVE_FAILURES=0
+              log "  Progress detected — consecutive_failures reset to 0"
+              log_debug "[GOV] iter=$ITERATION consecutive_failures_reset=partial_progress"
+            fi
             (( CONSECUTIVE_FAILURES++ ))
             record_us_failure "${signal_us_id:-unknown}"
             check_model_upgrade "${signal_us_id:-unknown}"
@@ -2369,6 +2498,13 @@ main() {
             {
               echo "# Fix Contract (from Verifier iteration $ITERATION)"
               echo ""
+              if [[ -n "$VERIFIED_US" ]]; then
+                echo "## Verified US (do NOT re-implement these)"
+                echo "$VERIFIED_US" | tr ',' '\n' | sed 's/^/- /'
+                echo ""
+                echo "**Focus ONLY on unverified user stories. The above are already verified.**"
+                echo ""
+              fi
               echo "## Summary"
               echo "$verdict_summary_fail"
               echo ""