npm - okstra - Versions diffs - 0.41.0 → 0.43.0 - Mend

okstra 0.41.0 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/docs/kr/architecture.md +5 -4
package/docs/kr/cli.md +1 -1
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/agents/SKILL.md +2 -1
package/runtime/agents/workers/codex-worker.md +1 -1
package/runtime/agents/workers/gemini-worker.md +1 -1
package/runtime/bin/okstra-codex-exec.sh +51 -46
package/runtime/bin/okstra-gemini-exec.sh +34 -33
package/runtime/bin/okstra-trace-cleanup.sh +111 -69
package/runtime/prompts/profiles/_common-contract.md +19 -8
package/runtime/prompts/profiles/_implementation-executor.md +1 -0
package/runtime/prompts/profiles/_implementation-verifier.md +13 -2
package/runtime/prompts/profiles/final-verification.md +1 -0
package/runtime/python/okstra_ctl/qa_commands.py +5 -1
package/runtime/skills/okstra-report-writer/SKILL.md +2 -0
package/runtime/templates/reports/settings.template.json +1 -1

package/docs/kr/architecture.md CHANGED Viewed

@@ -239,6 +239,7 @@ per-process 환경 변수에 task 정체성·경로·workflow 상태를 보관
 - standard workflow의 기본 worker role은 `Claude worker`, `Codex worker`, `Report writer worker`이며, `Gemini worker`는 `--workers` 또는 프로필에서 명시할 때만 포함되는 옵션입니다.
 - worker 역할 분담과 최종 판단은 Claude가 task bundle을 읽고 수행합니다.
 - 사용자 홈에 설치된 okstra Claude assets(`~/.claude/skills`, `~/.claude/agents`) 는 Agent Teams 를 우선 시도하고, 팀 구성이 불가능할 때만 sequential/background fallback 을 사용하도록 Claude 를 유도합니다.
+- **팀 lifecycle**: lead 는 Phase 3 에서 `TeamCreate(team_name: "okstra-<task-key>")` 로 팀을 만들고 워커를 그 멤버로 dispatch 합니다. run 종료 시(Phase 7 토큰 집계 **이후**, 자동·무프롬프트) lead 는 팀 config 의 멤버에게 `SendMessage({type: "shutdown_request"})` 로 graceful 종료를 보낸 뒤 `TeamDelete` 로 팀을 해제합니다 — `TeamDelete` 는 active member 가 남아 있으면 실패하므로 종료 확인 후 호출하며, `~/.claude/teams/<team>/`·`~/.claude/tasks/<team>/` 만 지우고 토큰 집계 소스인 `~/.claude/projects/` jsonl 은 보존합니다. teardown 이 없으면 worker teammate 가 FleetView roster 에 계속 누적됩니다 (`prompts/profiles/_common-contract.md` 의 *Run-end team teardown*). no-`team_name` fallback 에서는 팀이 없으므로 silent-skip.
 ## Claude prompt contract
@@ -909,10 +910,10 @@ Phase 7 step 1.5 가 final-report MD 한 본을 입력으로 두 view 를 결정
 ### Live-log mirror (codex / gemini wrapper)
 - `scripts/okstra-codex-exec.sh`, `scripts/okstra-gemini-exec.sh` 는 dispatch 마다 prompt path 옆에 `<prompt>.log` sidecar 를 만들고 stdout 을 거기로 mirror 합니다 (`tee`, `PIPESTATUS[0]` 로 종료코드 보존). stderr 은 같은 파일에 append (subagent stderr 캡처 contract 보존), 매 dispatch 시 truncate. 호출 subagent 의 `BashOutput` 폴링은 60s 간격이라 long-running run (analysis 의 large-codebase scan, implementation 의 cargo / pytest) 동안 사용자가 stalled state 를 탐지할 수 없는 문제를 해소합니다.
-- `$TMUX` 가 셋팅된 lead 환경이면 wrapper 가 sibling pane 을 자동 분할해 `tail -F <log-path>` 를 띄웁니다. trace pane title 은 `<cli>-<role>-<pid>-trace[from=<caller-pane-id>]` (e.g. `codex-worker-93421-trace[from=%5]`, `gemini-executor-93422-trace[from=%5]`); 동일 시점에 caller (worker) pane title 도 `<cli>-<role>-<pid>` 로 셋팅됩니다. `<pid>` 는 wrapper 자기 자신의 PID 라서 동일 role 의 worker 가 둘 이상 동시에 spawn 돼도 서로 구분되며, trace title 에 박힌 caller pane id 덕분에 worker pane title 이 외부에서 덮어써져도 (Claude Code TUI 가 OSC 2 escape 로 자기 pane title 을 지속 emit) trace ↔ worker 매핑이 깨지지 않습니다. caller pane id 는 우선 `$TMUX_PANE` 에서, 비어 있으면 `tmux display-message -p '#{pane_id}'` (active pane) 으로 fallback — Claude Code Bash tool 환경처럼 `$TMUX_PANE` 가 stripping 돼도 caller pane 을 정확히 잡습니다. trace pane split 도 caller pane 을 `-t` 로 명시 anchor 합니다. role 은 wrapper 의 5번째 optional positional 인자이며, 누락 시 기본값 `worker` 로 떨어집니다. caller 가 다른 라벨(예: `executor`)을 원하면 5번째 인자로 명시해야 합니다. wrapper 진입 직전의 caller pane title 은 capture 해두고 EXIT trap 에서 복원하므로, dispatch 사이의 stale title 이 남지 않습니다. focus 는 caller pane 으로 복귀하고, CLI 종료 후 pane 은 유지돼 스크롤백 가능. `$TMUX` 미설정, split 실패, 구버전 tmux 등 모든 경로는 silent degrade.
-- **Claude `/exit` 시 자동 정리**: trace pane 의 `tail -F` 는 tmux 셸의 자식이라 Claude 가 종료돼도 살아남는 문제를 막기 위해, wrapper 는 spawn 한 pane id 를 caller `$TMUX_PANE` 으로 키된 registry (`${TMPDIR:-/tmp}/okstra-trace-panes/<caller-pane>.list`) 에 append 합니다. `templates/reports/settings.template.json` 의 `hooks.SessionEnd` 가 `$HOME/.okstra/bin/okstra-trace-cleanup.sh` 를 호출해 자신의 caller pane registry 만 읽어 `tmux kill-pane` 합니다. caller pane 단위로 scope 가 잡혀 있어 같은 tmux 세션에 Claude 인스턴스가 여러 개 떠 있어도 서로의 trace pane 을 죽이지 않습니다. tmux 가 없거나 stale pane id 인 경우 silent degrade.
-- **phase 전환 시 자동 정리 + worker-agent pane 포함**: `okstra-trace-cleanup.sh` 는 trace pane(registry) 뿐 아니라 dispatch 된 서브에이전트가 점유하는 worker-agent pane(title `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`)도 lead 세션(`tmux list-panes -s`) 범위에서 title allowlist 로 식별해 닫습니다. lead 자신의 pane(`$TMUX_PANE`)은 title 이 걸려도 절대 죽이지 않습니다. lead 는 새 phase 의 worker 를 dispatch 하기 직전(`PROGRESS: phase-5.5-convergence` / `phase-6-synthesis` 마커 직전) 이 스크립트를 무인자로 호출해 이전 phase 의 pane 을 prompt 없이 정리합니다.
-- **Phase 종료 시 사용자 확인**: run 최종 종료 시점(마지막 단계)에 lead 가 `okstra-trace-cleanup.sh --list` 로 잔여 okstra pane(worker-agent + trace) 목록을 출력한 뒤 사용자에게 "모두 닫기 / 그대로 두기" 양자택일을 묻고 응답대로 처리합니다 (`prompts/profiles/_common-contract.md` 의 *Phase wrap-up* 항목). `$TMUX_PANE` 미설정 환경에서는 단계 자체가 silent-skip. `--list` 모드는 pane 을 죽이지 않고 `<pane_id>\t<pane_title>` 만 출력하므로 사용자가 무엇이 닫힐지 시각적으로 확인할 수 있습니다.
+- tmux 가 reachable 한 lead 환경이면 wrapper 가 sibling pane 을 자동 분할해 `tail -F <log-path>` 를 띄웁니다. trace pane title 은 caller (worker) pane title 에 `-tail` 을 붙인 `<cli>-<role>-<pid>-tail` (e.g. `codex-worker-93421-tail`); 동일 시점에 caller (worker) pane title 은 `<cli>-<role>-<pid>` 로 셋팅됩니다. `<pid>` 는 wrapper 자기 자신의 PID 라서 동일 role 의 worker 가 둘 이상 동시에 spawn 돼도 서로 구분되고, 운영자는 `<caller> ↔ <caller>-tail` 로 시각적으로 매핑할 수 있습니다. **caller pane 해석** — Claude Code Bash tool 은 이제 `$TMUX` 와 `$TMUX_PANE` 를 둘 다 환경에서 제거하므로 env 변수에 의존하지 않습니다. wrapper 는 (1) prompt path 로부터 `<RUN_DIR>` (= `dirname(dirname(prompt_path))`, paths.py SSOT) 를 도출하고, (2) lead 가 자기 foreground pane 에서 1회 기록한 `<RUN_DIR>/state/lead-pane.id` 를 읽어 split anchor 로 씁니다 (background dispatch 에서도 신뢰 가능 — active-pane 추정과 달리 사용자가 pane 을 옮겨도 안전). 기록 파일이 없거나 pane 이 stale 이면 `tmux display-message -p '#{pane_id}'` (active pane) 으로 fallback. trace pane split 은 그 caller pane 을 `-t` 로 명시 anchor 합니다. role 은 wrapper 의 5번째 optional positional 인자이며, 누락 시 기본값 `worker`. caller pane title 은 capture 해두고 EXIT trap 에서 복원하므로 dispatch 사이의 stale title 이 남지 않습니다. focus 는 caller pane 으로 복귀하고, CLI 종료 후 pane 은 유지돼 스크롤백 가능. tmux 미reachable, split 실패, 구버전 tmux 등 모든 경로는 silent degrade.
+- **run-scoped 태깅으로 정리**: trace pane 의 `tail -F` 는 tmux 셸의 자식이라 Claude 가 종료돼도 살아남습니다. wrapper 는 spawn 한 pane 을 `tmux set-option -p @okstra_trace_run=<RUN_DIR>` 로 태깅하고, `okstra-trace-cleanup.sh` 는 `tmux list-panes -a` 에서 그 태그로 pane 을 server-wide 발견해 `tmux kill-pane` 합니다. tmux env 변수·pane-id registry 없이 동작하며, run-scoped 태그라 동시에 도는 다른 okstra run 의 trace pane 을 죽이지 않습니다. cleanup 은 두 진입 형태를 가집니다 — lead 가 `--run-dir <RUN_DIR>` 로 호출(해당 run 의 trace + worker-agent pane 정리)하거나, `templates/reports/settings.template.json` 의 `hooks.SessionEnd` 가 `--reap` 로 호출(`$CLAUDE_PROJECT_DIR/.okstra/` 하위 태그를 가진 trace pane 일괄 정리; 단일 run-dir 이 없는 종료 시점용). tmux 가 없거나 stale pane id 인 경우 silent degrade.
+- **phase 전환 시 자동 정리 + worker-agent pane 포함**: `okstra-trace-cleanup.sh --run-dir <RUN_DIR>` 는 태깅된 trace pane 뿐 아니라 dispatch 된 서브에이전트가 점유하는 worker-agent pane(title `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`)도 lead 세션(`tmux list-panes -s -t <lead-pane>`) 범위에서 title allowlist 로 식별해 닫습니다(worker-agent pane 은 harness 소유라 태깅 불가). 세션 scope 와 lead 자기 pane 제외는 `<RUN_DIR>/state/lead-pane.id` 로 결정되며, lead 자신의 pane 은 title 이 걸려도 절대 죽이지 않습니다. lead 는 새 phase 의 worker 를 dispatch 하기 직전(`PROGRESS: phase-5.5-convergence` / `phase-6-synthesis` 마커 직전) 이 스크립트를 `--run-dir` 로 호출해 이전 phase 의 pane 을 prompt 없이 정리합니다.
+- **Phase 종료 시 사용자 확인**: run 최종 종료 시점(마지막 단계)에 lead 가 `okstra-trace-cleanup.sh --list --run-dir <RUN_DIR>` 로 잔여 okstra pane(worker-agent + trace) 목록을 출력한 뒤 사용자에게 "모두 닫기 / 그대로 두기" 양자택일을 묻고 응답대로 처리합니다 (`prompts/profiles/_common-contract.md` 의 *Phase wrap-up* 항목). `<RUN_DIR>/state/lead-pane.id` 가 비어 있는(=tmux 밖) 환경에서는 단계 자체가 silent-skip. `--list` 모드는 pane 을 죽이지 않고 `<pane_id>\t<pane_title>` 만 출력하므로 사용자가 무엇이 닫힐지 시각적으로 확인할 수 있습니다.
 - 디스크 누적은 `okstra-logs` skill 이 read-only 로 인벤토리 + cleanup 명령을 제안합니다 (실행은 사용자 copy-paste).
 ### Linked-worktree `.git/` write 권한 (codex / gemini)

package/docs/kr/cli.md CHANGED Viewed

@@ -591,4 +591,4 @@ chmod +x ~/.local/bin/okstra-ctl
 ### Live-log sidecar
-codex / gemini wrapper 는 매 dispatch 마다 `runs/<task-type>/prompts/<worker>-prompt-<phase>-<seq>.log` sidecar 를 만들고 stdout / stderr 를 mirror 합니다. tmux 안에서 lead 를 띄우면 wrapper 가 자동으로 `tail -F` pane 을 분할합니다 (trace pane title: `<cli>-<role>-<pid>-trace`, caller (worker) pane title: `<cli>-<role>-<pid>` — wrapper PID 가 동일 role 의 동시 dispatch 를 구분합니다). 분할된 trace pane 은 caller `$TMUX_PANE` 으로 키된 registry 에 등록돼, Claude `/exit` 시 `SessionEnd` 훅이 `okstra-trace-cleanup.sh` 로 자동 정리합니다. 같은 스크립트는 dispatch 된 worker-agent pane(title `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`)도 lead 세션 범위에서 함께 정리하며(lead 자신의 pane 은 제외), lead 는 새 phase dispatch 직전 이를 호출해 이전 phase 의 okstra pane 을 자동 정리합니다. 사용량 인벤토리와 `find … -delete` cleanup 명령은 `okstra-logs` skill 이 read-only 로 제안합니다. 자세한 와이어링은 [`docs/kr/architecture.md`](architecture.md) 의 *Live-log mirror* 절 참고.
+codex / gemini wrapper 는 매 dispatch 마다 `runs/<task-type>/prompts/<worker>-prompt-<phase>-<seq>.log` sidecar 를 만들고 stdout / stderr 를 mirror 합니다. tmux 안에서 lead 를 띄우면 wrapper 가 자동으로 `tail -F` pane 을 분할합니다 (trace pane title: `<cli>-<role>-<pid>-tail`, caller (worker) pane title: `<cli>-<role>-<pid>` — wrapper PID 가 동일 role 의 동시 dispatch 를 구분합니다). 분할된 trace pane 은 `@okstra_trace_run=<RUN_DIR>` pane user-option 으로 태깅돼, Claude `/exit` 시 `SessionEnd` 훅이 `okstra-trace-cleanup.sh --reap` 로 (`$CLAUDE_PROJECT_DIR/.okstra/` scope) 자동 정리합니다. 같은 스크립트를 lead 가 `--run-dir <RUN_DIR>` 로 호출하면 그 run 의 trace pane + dispatch 된 worker-agent pane(title `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`)을 lead 세션 범위에서 함께 정리하며(lead 자신의 pane 은 제외), lead 는 새 phase dispatch 직전 이를 호출해 이전 phase 의 okstra pane 을 자동 정리합니다. 사용량 인벤토리와 `find … -delete` cleanup 명령은 `okstra-logs` skill 이 read-only 로 제안합니다. 자세한 와이어링은 [`docs/kr/architecture.md`](architecture.md) 의 *Live-log mirror* 절 참고.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.41.0",
+  "version": "0.43.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.41.0",
-  "builtAt": "2026-06-02T15:52:15.071Z",
+  "package": "0.43.0",
+  "builtAt": "2026-06-04T04:59:06.499Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/agents/SKILL.md CHANGED Viewed

@@ -43,7 +43,7 @@ This SKILL.md is the operating contract and phase index. Detailed procedures liv
 | 5. Fallback | Sequential/background dispatch when Teams unavailable | `okstra-team-contract` |
 | 5.5 Convergence | Cross-verify findings across workers | `okstra-convergence` |
 | 6. Synthesis | Dispatch Report writer worker, review draft. **For `implementation-planning`: then run the Phase 6 plan-body verification sub-step (see Phase 6 section below).** | `okstra-report-writer` + `okstra-convergence` (sub-step) |
-| 7. Persist | Run token-usage collector, update manifests | `okstra-report-writer` |
+| 7. Persist | Run token-usage collector, update manifests, then disband the worker team (shutdown teammates + `TeamDelete`, after collection) | `okstra-report-writer` + `_common-contract.md` "Run-end team teardown" |
 ## Core operating contract
@@ -94,6 +94,7 @@ Required checkpoints:
 - `PROGRESS: phase-5.5-convergence round=<N> queue=<count>` — at the start of each convergence round (Phase 5.5).
 - `PROGRESS: phase-6-synthesis dispatching report-writer-worker` — at the start of Phase 6.
 - `PROGRESS: phase-7-persist updating manifests` — at the start of Phase 7.
+- `PROGRESS: phase-7-teardown disbanding team` — after token-usage collection, immediately before shutting down worker teammates + `TeamDelete` (Teams mode only; see `_common-contract.md` "Run-end team teardown"). Skipped in the no-`team_name` fallback.
 - `PROGRESS: complete final-report=<relative-path>` — final summary line, after all persistence.
 These lines are the only structured signal the user has during a long run. Do NOT replace them with prose ("Now I'm starting Phase 2..."), do NOT skip a checkpoint because "the previous message already said that", and do NOT batch multiple checkpoints into one. Each line stands alone so the user (or any operator scraping stdout) can timestamp it externally.

package/runtime/agents/workers/codex-worker.md CHANGED Viewed

@@ -30,7 +30,7 @@ You are a Codex worker agent. Your job is to execute the OpenAI Codex CLI and re
 $HOME/.okstra/bin/okstra-codex-exec.sh "<absolute-project-root>" "<assigned-model-execution-value>" "<absolute-prompt-history-path>" [<absolute-worktree-path>] [<role>]
 ```
-The fifth argument `<role>` is folded into both the caller (worker) pane title `codex-<role>-<pid>` and the sibling trace-pane title `codex-<role>-<pid>-trace` (`<pid>` = the wrapper's PID, present so concurrent dispatches of the same role can be told apart). Pass the literal string `worker` for every dispatch from this subagent. The wrapper defaults to `worker` when the argument is omitted, but pass it explicitly so the dispatch is self-describing.
+The fifth argument `<role>` is folded into both the caller (worker) pane title `codex-<role>-<pid>` and the sibling trace-pane title `codex-<role>-<pid>-tail` (`<pid>` = the wrapper's PID, present so concurrent dispatches of the same role can be told apart). Pass the literal string `worker` for every dispatch from this subagent. The wrapper defaults to `worker` when the argument is omitted, but pass it explicitly so the dispatch is self-describing.
 The fourth argument is **mandatory for implementation phase** and optional otherwise. It must be the literal `EXECUTOR_WORKTREE_PATH` recorded in the run context; the wrapper forwards it to codex as `--add-dir`, which grants the codex sandbox write access to the worktree (where all implementation-phase mutations occur). Without it, codex's `workspace-write` sandbox is anchored only at `<project-root>` and rejects every Edit/Write that targets the worktree (EPERM), which is the failure pattern that originally motivated this argument.

package/runtime/agents/workers/gemini-worker.md CHANGED Viewed

@@ -30,7 +30,7 @@ You are a Gemini worker agent. Your job is to execute the Google Gemini CLI and
 $HOME/.okstra/bin/okstra-gemini-exec.sh "<absolute-project-root>" "<assigned-model-execution-value>" "<absolute-prompt-history-path>" [<absolute-worktree-path>] [<role>]
 ```
-The fifth argument `<role>` is folded into both the caller (worker) pane title `gemini-<role>-<pid>` and the sibling trace-pane title `gemini-<role>-<pid>-trace` (`<pid>` = the wrapper's PID, present so concurrent dispatches of the same role can be told apart). Pass the literal string `worker` for every dispatch from this subagent. The wrapper defaults to `worker` when the argument is omitted, but pass it explicitly so the dispatch is self-describing.
+The fifth argument `<role>` is folded into both the caller (worker) pane title `gemini-<role>-<pid>` and the sibling trace-pane title `gemini-<role>-<pid>-tail` (`<pid>` = the wrapper's PID, present so concurrent dispatches of the same role can be told apart). Pass the literal string `worker` for every dispatch from this subagent. The wrapper defaults to `worker` when the argument is omitted, but pass it explicitly so the dispatch is self-describing.
 The fourth argument is **mandatory for implementation phase** and optional otherwise. It must be the literal `EXECUTOR_WORKTREE_PATH` recorded in the run context; the wrapper appends it to gemini's `--include-directories` list so the model can both read and operate on the worktree alongside project-root.

package/runtime/bin/okstra-codex-exec.sh CHANGED Viewed

@@ -187,28 +187,40 @@ python3 "$script_dir/okstra-wrapper-status.py" \
   init "$status_path" "$(basename "$0")" "$role" "$$" "$started_ts" "$log_path" \
   >>"$log_path" 2>&1 || true
-# Resolve caller pane id robustly. tmux normally exports both `$TMUX` and
-# `$TMUX_PANE` to processes started inside a pane, but Claude Code's Bash
-# tool can drop `$TMUX_PANE` while preserving `$TMUX` — which would
-# silently skip the caller-pane rename below AND let `tmux split-window`
-# attach the trace pane to whatever tmux currently considers active
-# (not necessarily Claude's pane). When the wrapper is launched from
-# Claude Code, the Claude session's pane IS the active pane at this
-# moment, so falling back to `display-message -p '#{pane_id}'` recovers
-# the correct id.
+# Derive the okstra run dir from the prompt path. paths.py is the SSOT:
+# dispatched prompts live at `<RUN_DIR>/prompts/<cli>-worker-prompt<NNN>.md`,
+# so the run dir is two levels up. Used to (a) read the lead pane the lead
+# recorded in its own foreground pane and (b) tag the trace pane so cleanup
+# can find exactly this run's panes without any tmux env var. Empty if the
+# derivation fails — every dependent step below then degrades to a no-op.
+run_dir="$(cd "$(dirname "$prompt_path")/.." 2>/dev/null && pwd -P || true)"
+lead_pane_file="${run_dir:+$run_dir/state/lead-pane.id}"
+# Resolve the pane to anchor the trace split to. Claude Code's Bash tool now
+# strips BOTH `$TMUX` and `$TMUX_PANE`, and this wrapper frequently runs
+# backgrounded — so the bare active-pane probe can land on whatever pane the
+# user happens to be looking at now, not Claude's. Prefer the lead pane the
+# lead captured ONCE in its own foreground pane (reliable, see
+# `_common-contract.md`); fall back to `$TMUX_PANE`, then the active-pane
+# probe. A stale recorded id (pane since closed) is rejected via a liveness
+# check so we never anchor the split to a dead pane.
 caller_pane="${TMUX_PANE:-}"
-if [[ -z "$caller_pane" && -n "${TMUX:-}" ]]; then
+if [[ -z "$caller_pane" && -n "$lead_pane_file" && -r "$lead_pane_file" ]]; then
+  cand="$(head -n1 "$lead_pane_file" 2>/dev/null || true)"
+  if [[ -n "$cand" ]] && tmux display-message -p -t "$cand" '#{pane_id}' >/dev/null 2>&1; then
+    caller_pane="$cand"
+  fi
+fi
+if [[ -z "$caller_pane" ]]; then
   caller_pane=$(tmux display-message -p '#{pane_id}' 2>/dev/null || true)
 fi
 # Pane titles: worker (caller) pane gets `codex-<role>-<pid>`; the sibling
-# trace pane appends `-trace[from=<caller-pane-id>]`. The wrapper PID
-# disambiguates concurrent dispatches of the same role; the embedded
-# caller pane id keeps the trace ↔ worker mapping visible even if the
-# worker pane's title is later overwritten by the parent process (e.g.
-# Claude Code's TUI emitting OSC 2 escape sequences on its own pane).
+# trace pane is that same caller title with a `-tail` suffix, so the
+# operator can visually pair `<caller> ↔ <caller>-tail`. The wrapper PID in
+# the caller title disambiguates concurrent dispatches of the same role.
 pane_label="codex-${role}-$$"
-trace_label="${pane_label}-trace[from=${caller_pane:-?}]"
+trace_label="${pane_label}-tail"
 # Capture the caller pane's current title so the EXIT trap can restore it
 # once the wrapper returns. Empty when not in tmux or capture fails — the
@@ -243,42 +255,35 @@ fi
 # for the wrapper to exit. This fires in every phase the wrapper is invoked
 # from (analysis, error-analysis, implementation-planning, implementation,
 # …) — long-running codex dispatches are not implementation-specific. The
-# new pane carries the title `codex-<role>-<pid>-trace[from=<caller-pane>]`
-# so the operator can map trace ↔ worker by pane id even when the worker
-# pane title is later overwritten by Claude Code. The split is explicitly
-# anchored to the caller pane (`-t "$caller_pane"`) to avoid attaching to
-# tmux's idle active pane when `$TMUX_PANE` was missing. `role` is the
-# optional 5th positional arg (defaults to `worker`); callers that
-# dispatch a different role (e.g. `executor`) must pass it explicitly.
-# The `<pid>` suffix is the wrapper's PID and disambiguates concurrent
-# dispatches of the same role. The pane uses `tail -F` (follow-by-name)
-# so it survives any truncation a re-dispatch performs on the same log
-# path. Failures are tolerated silently: missing $TMUX, a tmux that
-# refuses to split (size constraints, locked client), or a stale socket
-# all degrade to "log file is still on disk; the operator can tail it
-# manually from any terminal." The wrapper does NOT switch focus to the
+# new pane carries the title `codex-<role>-<pid>-tail` so the operator can
+# pair it with its caller pane (`codex-<role>-<pid>`). The split is
+# explicitly anchored to the caller pane (`-t "$caller_pane"`) to avoid
+# attaching to tmux's idle active pane. `role` is the optional 5th
+# positional arg (defaults to `worker`); callers that dispatch a different
+# role (e.g. `executor`) must pass it explicitly. The `<pid>` suffix is the
+# wrapper's PID and disambiguates concurrent dispatches of the same role.
+# The pane uses `tail -F` (follow-by-name) so it survives any truncation a
+# re-dispatch performs on the same log path. We gate on a resolved
+# `$caller_pane` (non-empty only when tmux is reachable) rather than the
+# now-stripped `$TMUX`. Failures are tolerated silently: no tmux, a tmux
+# that refuses to split (size constraints, locked client), or a stale
+# socket all degrade to "log file is still on disk; the operator can tail
+# it manually from any terminal." The wrapper does NOT switch focus to the
 # new pane — control returns to the caller's pane via `tmux last-pane`.
-if [[ -n "${TMUX:-}" ]]; then
-  split_args=(-h -P -F '#{pane_id}' -c "$(dirname "$log_path")")
-  if [[ -n "$caller_pane" ]]; then
-    split_args+=(-t "$caller_pane")
-  fi
+if [[ -n "$caller_pane" ]]; then
+  split_args=(-h -P -F '#{pane_id}' -c "$(dirname "$log_path")" -t "$caller_pane")
   trace_pane=$(tmux split-window "${split_args[@]}" \
     "tail -F $(printf '%q' "$log_path")" 2>/dev/null || true)
   if [[ -n "$trace_pane" ]]; then
     tmux select-pane -t "$trace_pane" -T "$trace_label" 2>/dev/null || true
+    # Tag the spawned pane with THIS run's dir so `okstra-trace-cleanup.sh
+    # --run-dir <RUN_DIR>` (see that script + `_common-contract.md`) can find
+    # and close exactly this run's trace panes — discovered server-wide by
+    # tag, needing no tmux env var, no pane-id registry, and no active-pane
+    # assumption. The run-scoped tag also stops concurrent okstra runs from
+    # stomping each other's trace panes.
+    [[ -n "$run_dir" ]] && tmux set-option -p -t "$trace_pane" @okstra_trace_run "$run_dir" 2>/dev/null || true
     tmux last-pane 2>/dev/null || true
-    # Register the spawned pane so the `SessionEnd` hook (see
-    # `okstra-trace-cleanup.sh`) can kill it when the caller's Claude
-    # session exits. Scope by `$caller_pane` — the pane Claude itself is
-    # attached to — so concurrent Claude instances in the same tmux
-    # session do not stomp each other's trace panes.
-    if [[ -n "$caller_pane" ]]; then
-      registry_dir="${TMPDIR:-/tmp}/okstra-trace-panes"
-      mkdir -p "$registry_dir" 2>/dev/null || true
-      safe_pane="${caller_pane//[^A-Za-z0-9]/_}"
-      printf '%s\n' "$trace_pane" >> "$registry_dir/${safe_pane}.list" 2>/dev/null || true
-    fi
   fi
 fi

package/runtime/bin/okstra-gemini-exec.sh CHANGED Viewed

@@ -136,24 +136,32 @@ python3 "$script_dir/okstra-wrapper-status.py" \
   init "$status_path" "$(basename "$0")" "$role" "$$" "$started_ts" "$log_path" \
   >>"$log_path" 2>&1 || true
-# Resolve caller pane id robustly. See `okstra-codex-exec.sh` for the full
-# rationale — kept in lock-step: tmux normally exports both `$TMUX` and
-# `$TMUX_PANE`, but Claude Code's Bash tool can drop `$TMUX_PANE` while
-# preserving `$TMUX`, which silently skips the caller-pane rename and
-# lets `tmux split-window` attach to whatever tmux considers active.
+# Resolve the run dir and the trace-split anchor pane. See
+# `okstra-codex-exec.sh` for the full rationale — kept in lock-step: derive
+# `<RUN_DIR>` from the prompt path (paths.py SSOT) to read the lead-recorded
+# pane and to tag the trace pane; prefer that lead pane over the unreliable
+# active-pane probe (this wrapper runs backgrounded and `$TMUX`/`$TMUX_PANE`
+# are stripped).
+run_dir="$(cd "$(dirname "$prompt_path")/.." 2>/dev/null && pwd -P || true)"
+lead_pane_file="${run_dir:+$run_dir/state/lead-pane.id}"
 caller_pane="${TMUX_PANE:-}"
-if [[ -z "$caller_pane" && -n "${TMUX:-}" ]]; then
+if [[ -z "$caller_pane" && -n "$lead_pane_file" && -r "$lead_pane_file" ]]; then
+  cand="$(head -n1 "$lead_pane_file" 2>/dev/null || true)"
+  if [[ -n "$cand" ]] && tmux display-message -p -t "$cand" '#{pane_id}' >/dev/null 2>&1; then
+    caller_pane="$cand"
+  fi
+fi
+if [[ -z "$caller_pane" ]]; then
   caller_pane=$(tmux display-message -p '#{pane_id}' 2>/dev/null || true)
 fi
 # Pane titles: worker (caller) pane gets `gemini-<role>-<pid>`; the sibling
-# trace pane appends `-trace[from=<caller-pane-id>]`. The wrapper PID
-# disambiguates concurrent dispatches of the same role; the embedded
-# caller pane id keeps the trace ↔ worker mapping visible even if the
-# worker pane's title is later overwritten by the parent process (e.g.
-# Claude Code's TUI emitting OSC 2 escape sequences on its own pane).
+# trace pane is that same caller title with a `-tail` suffix, so the
+# operator can visually pair `<caller> ↔ <caller>-tail`. The wrapper PID in
+# the caller title disambiguates concurrent dispatches of the same role.
 pane_label="gemini-${role}-$$"
-trace_label="${pane_label}-trace[from=${caller_pane:-?}]"
+trace_label="${pane_label}-tail"
 # Capture the caller pane's current title so the EXIT trap can restore it
 # once the wrapper returns. Empty when not in tmux or capture fails — the
@@ -186,33 +194,26 @@ fi
 # When a tmux session is reachable, split a sibling pane tailing the log so
 # the operator can watch progress live. This fires in every phase the
 # wrapper is invoked from — long-running gemini dispatches are not
-# implementation-specific. Title `gemini-<role>-<pid>-trace[from=<caller-pane>]`
-# so the operator can map trace ↔ worker by pane id even when the worker
-# pane title is later overwritten by Claude Code. The split is explicitly
-# anchored to the caller pane to avoid attaching to tmux's idle active
-# pane when `$TMUX_PANE` was missing. `role` is the optional 5th
-# positional arg (defaults to `worker`); callers that dispatch a
-# different role must pass it explicitly. The `<pid>` suffix is the
-# wrapper's PID and disambiguates concurrent dispatches of the same role.
-# See the codex wrapper for the full design rationale and the
+# implementation-specific. Title `gemini-<role>-<pid>-tail` so the operator
+# can pair it with its caller pane (`gemini-<role>-<pid>`). The split is
+# explicitly anchored to the caller pane to avoid attaching to tmux's idle
+# active pane. `role` is the optional 5th positional arg (defaults to
+# `worker`); callers that dispatch a different role must pass it explicitly.
+# The `<pid>` suffix is the wrapper's PID and disambiguates concurrent
+# dispatches of the same role. We gate on a resolved `$caller_pane`
+# (non-empty only when tmux is reachable) rather than the now-stripped
+# `$TMUX`. See the codex wrapper for the full design rationale and the
 # silent-degrade failure model.
-if [[ -n "${TMUX:-}" ]]; then
-  split_args=(-h -P -F '#{pane_id}' -c "$(dirname "$log_path")")
-  if [[ -n "$caller_pane" ]]; then
-    split_args+=(-t "$caller_pane")
-  fi
+if [[ -n "$caller_pane" ]]; then
+  split_args=(-h -P -F '#{pane_id}' -c "$(dirname "$log_path")" -t "$caller_pane")
   trace_pane=$(tmux split-window "${split_args[@]}" \
     "tail -F $(printf '%q' "$log_path")" 2>/dev/null || true)
   if [[ -n "$trace_pane" ]]; then
     tmux select-pane -t "$trace_pane" -T "$trace_label" 2>/dev/null || true
+    # Tag with this run's dir for `okstra-trace-cleanup.sh --run-dir`. See
+    # `okstra-codex-exec.sh` for the rationale — kept in lock-step.
+    [[ -n "$run_dir" ]] && tmux set-option -p -t "$trace_pane" @okstra_trace_run "$run_dir" 2>/dev/null || true
     tmux last-pane 2>/dev/null || true
-    # See `okstra-codex-exec.sh` for the registry rationale — kept in lock-step.
-    if [[ -n "$caller_pane" ]]; then
-      registry_dir="${TMPDIR:-/tmp}/okstra-trace-panes"
-      mkdir -p "$registry_dir" 2>/dev/null || true
-      safe_pane="${caller_pane//[^A-Za-z0-9]/_}"
-      printf '%s\n' "$trace_pane" >> "$registry_dir/${safe_pane}.list" 2>/dev/null || true
-    fi
   fi
 fi

package/runtime/bin/okstra-trace-cleanup.sh CHANGED Viewed

@@ -1,93 +1,136 @@
 #!/usr/bin/env bash
 #
-# okstra-trace-cleanup.sh — manage tmux panes created during an okstra run for
-# the current Claude Code (lead) session.
+# okstra-trace-cleanup.sh — close tmux panes created during okstra runs.
 #
-# Two pane sources are cleaned for the lead's session:
-#   1. Trace panes — `tail -F` siblings spawned by the codex/gemini wrappers
-#      (`okstra-codex-exec.sh`, `okstra-gemini-exec.sh`). Tracked in a registry
-#      keyed by the lead's `$TMUX_PANE`.
-#   2. Worker-agent panes — panes the harness gives to dispatched okstra
-#      subagents (`claude-worker`, `codex-worker`, `gemini-worker`,
-#      `report-writer-worker`). Not registered anywhere by okstra; identified by
-#      a title allowlist within the lead's tmux session.
+# Trace panes are `tail -F` siblings spawned by the codex/gemini wrappers
+# (`okstra-codex-exec.sh`, `okstra-gemini-exec.sh`). Each wrapper tags the pane
+# it spawns with a pane-level user option `@okstra_trace_run=<RUN_DIR>`, so the
+# panes are found server-wide by tag — no tmux env var and no pane-id registry
+# are needed, and the run-scoped tag keeps concurrent okstra runs from closing
+# each other's panes.
 #
-# The lead's own pane (`$TMUX_PANE`) is NEVER killed, even if its title matches
-# the allowlist. The scan is scoped to the lead's session (`list-panes -s`),
-# never the whole server (`-a`).
+# Two invocation shapes:
 #
-# Two modes:
-#   (default)  kill every okstra pane (sources 1+2) and remove the registry
-#              file. Used by the `SessionEnd` hook, by the lead's per-phase
-#              auto-clean, and by the lead's end-of-run prompt (yes-branch).
-#   --list     print one line per okstra pane (`<pane_id>\t<pane_title>`) so the
-#              lead can show the user what would be closed. Empty stdout when
-#              nothing is tracked.
+#   --run-dir <RUN_DIR>   Used by the LEAD between phases and at wrap-up. Closes
+#                         (a) trace panes tagged with this run's dir and
+#                         (b) worker-agent panes the harness gives to dispatched
+#                         subagents (`claude-worker` / `codex-worker` /
+#                         `gemini-worker` / `report-writer-worker`), identified
+#                         by a title allowlist scoped to the LEAD's session. The
+#                         lead pane is read from `<RUN_DIR>/state/lead-pane.id`
+#                         (recorded once by the lead in its own foreground pane —
+#                         reliable even though Claude Code's Bash tool strips
+#                         `$TMUX`/`$TMUX_PANE`); it scopes the title scan and is
+#                         NEVER killed.
 #
-# Failures are tolerated silently — a stale pane id, missing $TMUX, or a locked
-# tmux client must never prevent Claude from exiting cleanly.
+#   --reap                Used by the `SessionEnd` hook, where no single run-dir
+#                         applies. Closes every trace pane whose tag points under
+#                         `$CLAUDE_PROJECT_DIR/.okstra/` (or every tagged trace
+#                         pane if that env var is unset). Harness-owned
+#                         worker-agent panes are left to the harness.
+#
+# `--list` (alias `--dry-run`) prints `<pane_id>\t<pane_title>` per pane instead
+# of killing — only meaningful with `--run-dir`.
+#
+# Failures are tolerated silently — a stale pane id, no tmux, or a locked tmux
+# client must never prevent Claude from exiting cleanly.
 set -u
-MODE="kill"
-case "${1:-}" in
-  "")        MODE="kill" ;;
-  --list)    MODE="list" ;;
-  --dry-run) MODE="list" ;;  # alias
-  -h|--help)
-    cat <<'USAGE'
-usage: okstra-trace-cleanup.sh [--list]
+MODE="kill"   # kill | list
+REAP=0
+run_dir=""
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --list|--dry-run) MODE="list" ;;
+    --reap)           REAP=1 ;;
+    --run-dir)        shift; run_dir="${1-}" ;;
+    --run-dir=*)      run_dir="${1#--run-dir=}" ;;
+    -h|--help)
+      cat <<'USAGE'
+usage: okstra-trace-cleanup.sh (--run-dir <RUN_DIR> [--list] | --reap)
-  (no args)   kill every okstra pane for $TMUX_PANE (trace + worker-agent);
-              remove the trace registry file.
-  --list      print "<pane_id>\t<pane_title>" per okstra pane; no kill.
+  --run-dir   okstra run directory; closes that run's trace + worker-agent panes.
+  --list      with --run-dir: print "<pane_id>\t<pane_title>" per pane; no kill.
   --dry-run   alias for --list.
+  --reap      close every okstra trace pane under $CLAUDE_PROJECT_DIR/.okstra
+              (SessionEnd hook; no single run-dir applies).
 USAGE
-    exit 0 ;;
-  *)
-    printf 'okstra-trace-cleanup.sh: unknown option: %s\n' "$1" >&2
-    exit 2 ;;
-esac
+      exit 0 ;;
+    *)
+      printf 'okstra-trace-cleanup.sh: unknown option: %s\n' "$1" >&2
+      exit 2 ;;
+  esac
+  shift
+done
-# No tmux pane context → nothing to clean / list.
-if [[ -z "${TMUX_PANE:-}" ]]; then
-  exit 0
+if [[ "$REAP" -eq 0 && -z "$run_dir" ]]; then
+  printf 'okstra-trace-cleanup.sh: --run-dir <RUN_DIR> (or --reap) is required\n' >&2
+  exit 2
 fi
-registry_dir="${TMPDIR:-/tmp}/okstra-trace-panes"
-safe_pane="${TMUX_PANE//[^A-Za-z0-9]/_}"
-registry_file="$registry_dir/${safe_pane}.list"
+# Canonicalize paths used in tag string-compares. The wrappers tag panes with
+# `pwd -P` (symlink-resolved), so the scope paths must be resolved the same way
+# — else a symlinked component (e.g. macOS /tmp -> /private/tmp) makes the
+# compare miss. Fall back to the literal value if the dir does not resolve.
+_resolve() { (cd "$1" 2>/dev/null && pwd -P) || printf '%s' "$1"; }
+[[ -n "$run_dir" ]] && run_dir="$(_resolve "$run_dir")"
+project_dir=""
+[[ -n "${CLAUDE_PROJECT_DIR:-}" ]] && project_dir="$(_resolve "$CLAUDE_PROJECT_DIR")"
+# Lead pane. For a run, prefer the value the lead recorded in its own foreground
+# pane; fall back to the active-pane probe. Rejected if the recorded pane is
+# gone. For --reap there is no run state — probe the active pane, used only to
+# avoid killing whatever pane the reap runs from.
+lead_pane=""
+if [[ "$REAP" -eq 0 ]]; then
+  lead_pane_file="$run_dir/state/lead-pane.id"
+  [[ -r "$lead_pane_file" ]] && lead_pane="$(head -n1 "$lead_pane_file" 2>/dev/null || true)"
+fi
+if [[ -z "$lead_pane" ]] || ! tmux display-message -p -t "$lead_pane" '#{pane_id}' >/dev/null 2>&1; then
+  lead_pane="$(tmux display-message -p '#{pane_id}' 2>/dev/null || true)"
+fi
+# Does a trace pane's tag belong to the set we are closing?
+_tag_in_scope() {
+  local tag="$1"
+  if [[ "$REAP" -eq 1 ]]; then
+    [[ -z "$tag" ]] && return 1
+    [[ -n "$project_dir" ]] && { [[ "$tag" == "$project_dir/"* ]]; return; }
+    return 0   # no project scope available → reap every tagged trace pane
+  fi
+  [[ "$tag" == "$run_dir" ]]
+}
-# Collect okstra pane ids for the lead session: registered trace panes ∪
-# title-allowlisted worker-agent panes, always excluding the lead pane itself.
 collect_okstra_panes() {
   local -a panes=()
-  local pid title
+  local pid tag title
+  # (1) Trace panes tagged in scope — found server-wide by tag, so no tmux env
+  # var or pane-id registry is needed.
+  while IFS=$'\t' read -r pid tag; do
+    [[ -n "$pid" ]] || continue
+    [[ "$pid" == "$lead_pane" ]] && continue
+    _tag_in_scope "$tag" && panes+=("$pid")
+  done < <(tmux list-panes -a -F '#{pane_id}'$'\t''#{@okstra_trace_run}' 2>/dev/null || true)
-  # (1) Registered trace panes — scoped to THIS lead's registry only, so
-  # concurrent Claude instances do not stomp each other's trace panes.
-  if [[ -f "$registry_file" ]]; then
-    while IFS= read -r pid; do
+  # (2) Title-allowlisted worker-agent panes in the lead's session. Only for a
+  # run (reap leaves these harness-owned panes to the harness). `list-panes -s
+  # -t <pane>` resolves the session containing that pane, so the scan never
+  # reaches other sessions (no `-a`). Skipped when the lead pane is unknown.
+  if [[ "$REAP" -eq 0 && -n "$lead_pane" ]]; then
+    while IFS=$'\t' read -r pid title; do
       [[ -n "$pid" ]] || continue
-      [[ "$pid" == "$TMUX_PANE" ]] && continue
-      panes+=("$pid")
-    done < "$registry_file"
+      [[ "$pid" == "$lead_pane" ]] && continue
+      case "$title" in
+        *claude-worker*|*codex-worker*|*gemini-worker*|*report-writer-worker*)
+          panes+=("$pid") ;;
+      esac
+    done < <(tmux list-panes -s -t "$lead_pane" \
+               -F '#{pane_id}'$'\t''#{pane_title}' 2>/dev/null || true)
   fi
-  # (2) Title-allowlisted worker-agent panes in the lead's session.
-  # `list-panes -s -t <pane>` resolves the session containing that pane, so the
-  # scan never reaches other sessions (no `-a`).
-  while IFS=$'\t' read -r pid title; do
-    [[ -n "$pid" ]] || continue
-    [[ "$pid" == "$TMUX_PANE" ]] && continue
-    case "$title" in
-      *claude-worker*|*codex-worker*|*gemini-worker*|*report-writer-worker*)
-        panes+=("$pid") ;;
-    esac
-  done < <(tmux list-panes -s -t "$TMUX_PANE" \
-             -F '#{pane_id}'$'\t''#{pane_title}' 2>/dev/null || true)
-  # Dedupe — a live trace pane can match both the registry and the title scan.
+  # Dedupe — a live trace pane can match both the tag scan and the title scan.
   if (( ${#panes[@]} )); then
     printf '%s\n' "${panes[@]}" | awk 'NF && !seen[$0]++'
   fi
@@ -109,5 +152,4 @@ while IFS= read -r pane_id; do
   tmux kill-pane -t "$pane_id" 2>/dev/null || true
 done < <(collect_okstra_panes)
-rm -f "$registry_file" 2>/dev/null || true
 exit 0

package/runtime/prompts/profiles/_common-contract.md CHANGED Viewed

@@ -29,22 +29,33 @@ profile document.
   - This rule does NOT relax any phase-specific Forbidden actions list; safety rules in the per-profile document remain in force regardless of the user's authority.
 - Anti-escalation rule (shared):
   - treating "다음 단계 진행해" or equivalent user phrases as authorisation to start a *different* lifecycle phase is forbidden. The next phase begins only in a separate okstra run launched with the new `--task-type`. Per-profile documents may further restrict this within their own scope.
+- Run-start pane recording (shared — runs ONCE at run start, before the FIRST worker dispatch):
+  - The wrappers anchor each trace pane to the lead's pane and the cleanup scopes the worker-agent scan to it, but Claude Code's Bash tool strips `$TMUX`/`$TMUX_PANE`, so the lead MUST record its own pane explicitly. Because the lead runs this in its OWN foreground pane, the active pane IS the lead's — reliable, unlike a backgrounded wrapper's later probe.
+  - The lead MUST run once, at run start: `mkdir -p "<RUN_DIR>/state" && tmux display-message -p '#{pane_id}' > "<RUN_DIR>/state/lead-pane.id" 2>/dev/null || true` (substitute the run's absolute `RUN_DIR`). Outside tmux this writes nothing and every pane step below silently no-ops — that empty/absent file is the single signal that the lead is not in tmux.
 - Phase-start pane reset (shared — runs BEFORE dispatching each new worker batch):
-  - okstra creates two kinds of tmux pane per run: (a) **worker-agent panes** the harness gives to dispatched subagents (titled `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`), and (b) **trace panes** the codex/gemini wrappers spawn (`<cli>-<role>-<pid>-trace`). Both accumulate across internal phases because each new phase dispatches a fresh worker batch and the prior panes are never reclaimed.
-  - When `$TMUX_PANE` is set, the lead MUST run `$HOME/.okstra/bin/okstra-trace-cleanup.sh` (no args) **immediately before** dispatching the next phase's workers — i.e. just before emitting each `PROGRESS: phase-5.5-convergence round=<N>` marker and just before `PROGRESS: phase-6-synthesis dispatching report-writer-worker`. This closes every prior-phase okstra pane (worker-agent + trace) for the lead session, while NEVER killing the lead's own pane.
-  - This is **automatic and silent** — NO user prompt. Report it in one short line (e.g. `이전 phase okstra pane 3개 정리`) and proceed. It is silent-skipped when `$TMUX_PANE` is unset; the lead MUST NOT fabricate a synthetic pane list in that case.
+  - okstra creates two kinds of tmux pane per run: (a) **worker-agent panes** the harness gives to dispatched subagents (titled `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`), and (b) **trace panes** the codex/gemini wrappers spawn (`<cli>-<role>-<pid>-tail`). Both accumulate across internal phases because each new phase dispatches a fresh worker batch and the prior panes are never reclaimed.
+  - When `<RUN_DIR>/state/lead-pane.id` is non-empty (the lead is in tmux), the lead MUST run `$HOME/.okstra/bin/okstra-trace-cleanup.sh --run-dir "<RUN_DIR>"` **immediately before** dispatching the next phase's workers — i.e. just before emitting each `PROGRESS: phase-5.5-convergence round=<N>` marker and just before `PROGRESS: phase-6-synthesis dispatching report-writer-worker`. This closes every prior-phase okstra pane (worker-agent + trace) for this run, while NEVER killing the lead's own pane.
+  - This is **automatic and silent** — NO user prompt. Report it in one short line (e.g. `이전 phase okstra pane 3개 정리`) and proceed. It is silent-skipped when the lead is not in tmux; the lead MUST NOT fabricate a synthetic pane list in that case.
+- Run-end team teardown (shared — runs AFTER Phase 7 persistence/token collection, BEFORE the pane disposition step below):
+  - The lead created the worker team in Phase 3 (`TeamCreate(team_name: "okstra-<task-key>")`). Worker teammates are NOT reclaimed on their own — without an explicit teardown they linger in the FleetView roster across this and later runs in the session. The lead MUST release them once the run's work is done.
+  - This step is **automatic and silent** — NO user prompt (workers are idle sessions that have already delivered their results; there is nothing for the user to preserve). It runs only when team-state's `teamCreate.status == "ok"` (Teams mode was actually used); in the no-`team_name` fallback there is no team to delete, so silent-skip.
+  - Sequence (token-usage collection MUST already be complete — `TeamDelete` removes `~/.claude/teams/<team>/` + `~/.claude/tasks/<team>/` but NOT the `~/.claude/projects/` jsonls Phase 7 reads, yet the read MUST precede teardown):
+    1. Read `~/.claude/teams/okstra-<task-key>/config.json` and, for every `members` entry whose name is not the lead, `SendMessage(to: <name>, message: { type: "shutdown_request" })` to terminate it gracefully.
+    2. Wait for the shutdown confirmations / idle notifications from all addressed teammates.
+    3. Call `TeamDelete()`. If it errors with an active-members message, a teammate has not finished shutting down — wait briefly and retry `TeamDelete()` once.
+  - Report it in one short line (e.g. `worker 6명 종료 + 팀 해제`) and proceed. Emit `PROGRESS: phase-7-teardown disbanding team` immediately before step 1.
 - Phase wrap-up — okstra pane disposition (shared, MUST be the *last* step before returning control to the user):
-  - At run end the only residual okstra panes are the LAST phase's (e.g. the `report-writer-worker` agent pane and any codex/gemini trace pane). `okstra-trace-cleanup.sh --list` returns one tab-separated `<pane_id>\t<pane_title>` line per residual okstra pane (worker-agent + trace) for this lead session.
-  - When `$TMUX_PANE` is set, after the final-report file has been written and the routing recommendation has been issued, the lead MUST run `$HOME/.okstra/bin/okstra-trace-cleanup.sh --list` exactly once. The output lists every residual okstra pane (worker-agent + trace) for this Claude session, never the lead's own pane.
+  - At run end the only residual okstra panes are the LAST phase's (e.g. the `report-writer-worker` agent pane and any codex/gemini trace pane). `okstra-trace-cleanup.sh --list --run-dir "<RUN_DIR>"` returns one tab-separated `<pane_id>\t<pane_title>` line per residual okstra pane (worker-agent + trace) for this run.
+  - When `<RUN_DIR>/state/lead-pane.id` is non-empty, after the final-report file has been written and the routing recommendation has been issued, the lead MUST run `$HOME/.okstra/bin/okstra-trace-cleanup.sh --list --run-dir "<RUN_DIR>"` exactly once. The output lists every residual okstra pane (worker-agent + trace) for this run, never the lead's own pane.
   - If the list is empty, skip the question — there is nothing to ask about (the phase-start resets above usually already cleared prior phases).
   - Otherwise the lead MUST present the user with a strict binary choice **before** declaring the phase complete. Use one prompt of this shape (Korean preferred, English acceptable if the rest of the run is in English):
     > 현재 phase 종료 시점입니다. 다음 okstra pane 이 열려 있습니다 — 닫을까요?
     > <인용된 `--list` 출력>
     > (예) 모두 닫기 / (아니오) 그대로 두기
-  - On `예` / `y` / `close` → run `$HOME/.okstra/bin/okstra-trace-cleanup.sh` (no args) and report the kill count back in one sentence.
-  - On `아니오` / `n` / `keep` → leave the panes intact; remind the user that they will be cleaned up automatically when Claude `/exit` fires the `SessionEnd` hook.
+  - On `예` / `y` / `close` → run `$HOME/.okstra/bin/okstra-trace-cleanup.sh --run-dir "<RUN_DIR>"` and report the kill count back in one sentence.
+  - On `아니오` / `n` / `keep` → leave the panes intact; remind the user that they will be cleaned up automatically when Claude `/exit` fires the `SessionEnd` hook (`--reap`).
   - The question MUST be a clean yes/no — do NOT offer "close some / keep some" partial answers, do NOT propose alternatives like "close only codex panes". The whole-set decision keeps the wrap-up predictable.
-  - This step is mandatory for every phase (`requirements-discovery`, `error-analysis`, `implementation-planning`, `implementation`, `final-verification`, `release-handoff`). It is silent-skipped when `$TMUX_PANE` is unset (lead running outside tmux); the lead MUST NOT fabricate a synthetic pane list in that case.
+  - This step is mandatory for every phase (`requirements-discovery`, `error-analysis`, `implementation-planning`, `implementation`, `final-verification`, `release-handoff`). It is silent-skipped when `<RUN_DIR>/state/lead-pane.id` is empty/absent (lead running outside tmux); the lead MUST NOT fabricate a synthetic pane list in that case.
 - Brief handoff contract (shared — applies whenever the run consumes a task brief produced by `okstra-brief`):
   - the brief is a **pre-discovery artifact**: it converts a domain-reporter's words (non-expert *or* developer) into expert-consumable form so this and later phases can run with zero fill-in questions to the operator. The brief is **not** authoritative on solution decisions; it is authoritative on the reporter's intent.
   - **Reporter confirmation precondition (BLOCKING)**: the brief's frontmatter carries `reporter-confirmations: <complete | partial | pending | skipped>` set by `okstra-brief` Step 6.5. Every phase that consumes the brief MUST read this field before doing analysis. The handling matrix is:

package/runtime/prompts/profiles/_implementation-executor.md CHANGED Viewed

@@ -26,6 +26,7 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
   - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
   - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
+- **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
 - re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
   - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
   - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.

package/runtime/prompts/profiles/_implementation-verifier.md CHANGED Viewed

@@ -30,7 +30,8 @@ Verifier obtains the QA command set from exactly two declared sources, in order
        "lint":      [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
        "format":    [{ "label": "cargo fmt",    "cmd": "cargo fmt --check",                          "language": "rust" }],
        "typecheck": [{ "label": "tsc",          "cmd": "pnpm exec tsc --noEmit",                     "language": "ts"   }],
-       "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }]
+       "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }],
+       "db-test":   [{ "label": "db integ",     "cmd": "pnpm test:db",                               "language": "ts"   }]
      }
    }
    ```
@@ -42,7 +43,7 @@ Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each comm
 ### Missing-tier handling
-If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
+If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`; and `db-test` **only when the diff touches DB/IO/SQL**, where a missing `db-test` is escalated to a blocking finding per the DB real-execution gate below — not a passive note) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
 ### `cmd` field deny-list (Tier 2 validation)
@@ -74,6 +75,16 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
 - **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
 - **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
+### DB / IO / SQL change — real-execution gate (mock-only acceptance forbidden)
+A mocked unit test cannot observe the SQL a query builder actually emits — `count({ col: 'FontFamily.fontFamily' })` passes a mocked suite yet throws `Unknown column` on a real database. For this class of change a green mock-only suite is therefore NOT evidence; only a run against a real (or faithful-replica) datastore is. This gate is the verifier's enforcement of that rule.
+- **Trigger.** Fires when `git diff <base>...HEAD` touches DB/IO/SQL: ORM / query-builder code (sequelize / typeorm / prisma / knex / raw SQL), `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string.
+- **Requirement when fired.** The verifier MUST reproduce a real-DB execution: run the `db-test` tier (Tier 1 = plan `validation` db step; else Tier 2 = `project.json.qaCommands.db-test`) against a **local / replica** datastore (same engine + schema — never shared / staging / prod, consistent with the verifier forbidden-actions list) and record its exact command + exit code. A mock, an in-memory shim that does not parse real SQL, or static reasoning does NOT satisfy this.
+- **No `db-test` command available → blocking, not a passive skip.** If neither tier declares a `db-test` command, the verifier records the blocking finding `db-test not configured — DB change unverified (mock-only)` and sets the verdict to `FAIL`; it MUST NOT emit only the passive `qa-command not configured` note and pass. Recommended fix: declare a `db-test` command in `project.json.qaCommands` or the plan's validation set.
+- **Mock-only evidence → unverified.** If the diff's only DB coverage is mocked, the verifier labels the DB portion `정적 분석상 …, 미검증(실행 안 함)` (never `검증됨`), records it as a blocking finding, and sets `FAIL`. Never downplay the real run as "too heavy / static proof suffices".
+- **Surface it at every layer.** The finding is copied verbatim into the verifier result and MUST survive into the final report's `## 1.` and Verdict Card, so the user sees the DB-unverified state continuously — it is the load-bearing reason a downstream `final-verification` cannot reach `accepted` and `release-handoff` cannot push.
 ## All-verifier-failure policy
 If every verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) ends with a non-result terminal status (`timeout`, `error`, `not-run`) — i.e. zero independent verdicts were produced — the run MUST end with status `blocked` and route to a follow-up `error-analysis` run. `Claude lead` MUST NOT substitute its own verdict in place of the missing verifier outputs; synthesis requires at least one independent verifier's verdict. If one or more verifiers fail but at least one returns a verdict, the run proceeds with the surviving verdict(s) and the final report MUST explicitly notate which verifiers were unavailable, with the captured error / timeout evidence per failed verifier.

package/runtime/prompts/profiles/final-verification.md CHANGED Viewed

@@ -14,6 +14,7 @@
     - delivered artifacts match recorded expected values in `reference-expectations` (config files, deployment manifests, other recorded expected states); when reference-expectations are absent, record it as missing information rather than assuming a match
     - test & validation suite pass status — independently re-run the read-only two-tier command set (Tier 1 = brief/approved-plan `validation`, Tier 2 = `project.json` `qaCommands`) and confirm each passes on the verified head, citing exact command + exit code
     - test correctness — delivered tests actually assert the intended behaviour: no gutted/weakened assertions, no tautological or always-passing tests, no tests exercising only mocks; new behaviour has matching coverage
+    - DB / IO / SQL real-execution evidence — when the diff touches DB/IO/SQL (ORM / query-builder, `*.repository.*`, model / `migrations/**` / `*.sql`, or changed query strings), Validation Evidence MUST cite a real (or faithful-replica) DB execution — the `db-test` command + exit code — not a mock-only suite, because a mocked suite cannot observe the SQL actually emitted (`count({ col: 'FontFamily.fontFamily' })` passed mocks yet threw `Unknown column` on the real DB). A DB-touching change whose only evidence is mocked, or for which no `db-test` ran, is an **Acceptance Blocker** (`major`+): record it, and since `accepted` requires zero blockers the verdict becomes `conditional-accept` / `blocked`. This is the gate that stops an unverified DB change from reaching `release-handoff` and being pushed.
     - no new defects introduced — the diff does not break previously-working behaviour and adds no new bug (logic/off-by-one, null/empty handling, resource leaks, broken error paths)
     - scope conformance — the delivered diff stays within the approved plan's scope; flag out-of-scope edits, unrelated file changes, leftover debug/commented-out code, and unintended deletions
   - Residual-tracked — note as Residual Risk unless severe enough to block:

package/runtime/python/okstra_ctl/qa_commands.py CHANGED Viewed

@@ -20,7 +20,11 @@ import re
 from typing import Iterable
 # 카테고리 화이트리스트. 알 수 없는 카테고리는 오타 가능성이 높으므로 거부.
-ALLOWED_CATEGORIES: tuple[str, ...] = ("lint", "format", "typecheck", "test")
+# `db-test` 는 DB/IO/SQL 변경의 실제 DB(또는 충실한 복제) 실행 테스트 전용 카테고리 —
+# mocked 단위테스트로는 query builder 가 실제로 emit 하는 SQL 을 관측할 수 없으므로
+# `test` 와 분리한다. implementation verifier / final-verification 의 DB 실제실행 게이트가
+# diff 가 DB 를 건드릴 때 이 카테고리(또는 plan validation 의 db 스텝)를 요구한다.
+ALLOWED_CATEGORIES: tuple[str, ...] = ("lint", "format", "typecheck", "test", "db-test")
 # Mutation 을 유발하거나 lockfile 을 갱신하는 토큰. 각 토큰은 `cmd` 문자열을
 # 공백으로 단순 분해한 결과 또는 부분 일치 패턴(prefix/suffix sensitive) 로 검출한다.

package/runtime/skills/okstra-report-writer/SKILL.md CHANGED Viewed

@@ -127,6 +127,8 @@ The four steps below MUST execute in this exact order. Reordering them is the re
 The status file is written after step 3 completes.
+**Run-end team teardown follows this whole sequence.** Token-usage collection (step 1) reads the worker session jsonls, so the lead MUST NOT disband the team until every step above is done. Only then does the lead shut down worker teammates + `TeamDelete` per `_common-contract.md` "Run-end team teardown" (Teams mode only; silent-skip in the no-`team_name` fallback).
 ## Final Report Structure
 The final report follows the structure encoded in `schemas/final-report-v1.0.schema.json`. The schema is the single source of truth for section names, row shapes, enum values, and task-type-conditional blocks. The Jinja2 template `templates/reports/final-report.template.md` produces the human-readable form from any data.json that validates against the schema. The structure description below is a reading guide for writers; the schema is the binding contract.

package/runtime/templates/reports/settings.template.json CHANGED Viewed

@@ -153,7 +153,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "$HOME/.okstra/bin/okstra-trace-cleanup.sh"
+            "command": "$HOME/.okstra/bin/okstra-trace-cleanup.sh --reap"
           }
         ]
       }