okstra 0.68.0 → 0.70.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/bin/okstra +18 -0
  2. package/docs/kr/architecture.md +1 -0
  3. package/docs/kr/cli.md +2 -1
  4. package/docs/superpowers/plans/2026-06-11-wizard-whole-task-final-verification.md +526 -0
  5. package/docs/superpowers/specs/2026-06-11-wizard-whole-task-final-verification-design.md +89 -0
  6. package/package.json +1 -1
  7. package/runtime/BUILD.json +2 -2
  8. package/runtime/agents/SKILL.md +3 -3
  9. package/runtime/agents/workers/claude-worker.md +1 -1
  10. package/runtime/agents/workers/codex-worker.md +3 -3
  11. package/runtime/agents/workers/gemini-worker.md +3 -3
  12. package/runtime/agents/workers/report-writer-worker.md +2 -2
  13. package/runtime/prompts/launch.template.md +2 -2
  14. package/runtime/prompts/profiles/_implementation-deliverable.md +1 -0
  15. package/runtime/prompts/profiles/_implementation-executor.md +3 -1
  16. package/runtime/prompts/profiles/_implementation-verifier.md +1 -0
  17. package/runtime/prompts/profiles/improvement-discovery.md +1 -1
  18. package/runtime/prompts/wizard/prompts.ko.json +8 -4
  19. package/runtime/python/okstra_ctl/conformance.py +17 -0
  20. package/runtime/python/okstra_ctl/paths.py +7 -4
  21. package/runtime/python/okstra_ctl/render.py +10 -3
  22. package/runtime/python/okstra_ctl/run.py +97 -20
  23. package/runtime/python/okstra_ctl/wizard.py +140 -38
  24. package/runtime/python/okstra_ctl/worktree.py +18 -0
  25. package/runtime/python/okstra_token_usage/collect.py +27 -0
  26. package/runtime/skills/okstra-convergence/SKILL.md +3 -3
  27. package/runtime/skills/okstra-inspect/SKILL.md +1 -1
  28. package/runtime/skills/okstra-report-writer/SKILL.md +6 -6
  29. package/runtime/skills/okstra-team-contract/SKILL.md +5 -5
  30. package/runtime/validators/validate-run.py +2 -2
  31. package/src/_python-helper.mjs +52 -0
  32. package/src/error-log.mjs +19 -0
  33. package/src/inject-report-index.mjs +22 -0
  34. package/src/render-final-report.mjs +22 -0
  35. package/src/render-views.mjs +9 -48
  36. package/src/spawn-followups.mjs +23 -0
  37. package/src/token-usage.mjs +3 -34
@@ -0,0 +1,89 @@
1
+ # wizard whole-task final-verification 노출 설계
2
+
3
+ - 작성일: 2026-06-11
4
+ - 상태: 설계 승인됨 (사용자 승인 2026-06-11)
5
+ - 선행: [final-verification-whole-task-gate-design.md](2026-06-06-final-verification-whole-task-gate-design.md), 커밋 `54b9482` (stage worktree 기반 검증으로 위저드 단순화)
6
+
7
+ ## 1. 배경 / 문제
8
+
9
+ `final-verification` 은 두 검증 모드를 가진다 — (A) 전체-task 모드(모든 stage 머지 후 한 번), (B) 단독-stage 모드(격리 worktree에서 그 stage만). 이 두 모드와 자동 target 해소는 prepare 레이어에 이미 구현돼 있다([final-verification-whole-task-gate-design.md](2026-06-06-final-verification-whole-task-gate-design.md)).
10
+
11
+ 커밋 `54b9482` 은 단독-stage 위저드 UX를 단순화하면서(`base-ref`/branch-confirm 생략, 명시 stage 강제) **위저드에서 전체-task 모드를 선택할 길을 막았다**:
12
+
13
+ - [`_stage_auto_allowed`](../../../scripts/okstra_ctl/wizard.py:778) 가 `implementation` 에서만 `True` → final-verification stage picker 에 전체 옵션이 뜨지 않는다.
14
+ - [`_submit_stage_pick`](../../../scripts/okstra_ctl/wizard.py:1539) 와 [`render_args`](../../../scripts/okstra_ctl/wizard.py:2799) 가 final-verification + 비-명시-stage 를 거부한다.
15
+
16
+ 결과: 전체-task final-verification 은 CLI `okstra.sh --stage auto` (또는 stage 생략)로만 가능하고, okstra-run 위저드 사용자는 도달할 수 없다. prepare 는 이미 전체-task 를 지원하므로([run.py:1829](../../../scripts/okstra_ctl/run.py:1829), [run.py:1533](../../../scripts/okstra_ctl/run.py:1533)) **막힌 곳은 위저드 레이어뿐**이다.
17
+
18
+ ## 2. 목표 / 비목표
19
+
20
+ 목표:
21
+ - okstra-run 위저드의 final-verification stage picker 에서 **전체-task 검증을 명시 항목으로 선택** 가능하게 한다.
22
+ - picker 에 stage 별 **done 상태를 표시**해, 전체-task 가 안 되는 경우 어느 stage 가 미완인지 사용자가 바로 본다.
23
+ - prepare 계약(CLI `--stage`)을 **변경하지 않는다** — 위저드가 기존 전체-task 트리거(빈 stage)를 emit 한다.
24
+
25
+ 비목표:
26
+ - `implementation` 의 `auto` 토큰 의미("가장 낮은 ready stage")를 final-verification 에서 재사용하지 않는다. picker 라벨·내부 값 어디에도 `auto` 를 쓰지 않는다.
27
+ - 머지/clean/active 전제의 위저드 사전 검사 — 이는 prepare 의 PrepareError 게이트에 위임한다(§4).
28
+ - 자동 stage 머지(여전히 사용자 수동 머지).
29
+ - prepare / `_reserve_final_verification_target` / target 해소 로직 변경.
30
+
31
+ ## 3. 노출 형태 — stage picker 명시 항목
32
+
33
+ [`_build_stage_pick`](../../../scripts/okstra_ctl/wizard.py:1494) 가 final-verification 일 때:
34
+
35
+ 1. 각 stage 항목 라벨에 **done 마킹**을 붙인다. 예: `1: <제목> [done]` / `2: <제목> [미완]`.
36
+ 2. **모든** Stage Map stage 가 done 이면 picker 맨 위에 `전체 task 검증` 항목을 추가한다. done 이 아닌 stage 가 하나라도 있으면 이 항목을 노출하지 않는다 — picker 의 stage 별 `[미완]` 마킹이 그대로 보이므로 사용자가 왜 전체 검증이 불가능한지 자명하다.
37
+
38
+ 이 구조는 "stage 가 done 이 아닌 걸 보여준다" 는 요구를 충족하면서, 전체-task 선택 가능 여부도 같은 화면에서 드러낸다.
39
+
40
+ ## 4. done 상태 데이터원 — git 호출 없음
41
+
42
+ 위저드가 읽는 것은 prepare 와 동일한 SSOT 인 `consumers.jsonl` 의 `status:done` 행뿐이다. git 상태(머지/clean)는 위저드가 검사하지 않는다.
43
+
44
+ - `plan_run_root` 도출: `Path(approved_plan_path).resolve().parents[1]` ([run.py:1525](../../../scripts/okstra_ctl/run.py:1525) 와 동일 규칙).
45
+ - done 행: `backfill_done_from_carry(plan_run_root)` → `read_consumers(plan_run_root)` → `latest_done_by_stage(rows)` ([consumers.py:24](../../../scripts/okstra_ctl/consumers.py:24), [consumers.py:37](../../../scripts/okstra_ctl/consumers.py:37), [consumers.py:182](../../../scripts/okstra_ctl/consumers.py:182)).
46
+ - Stage Map: 기존 `_build_stage_pick` 이 이미 승인 plan 을 `_parse_stage_map` 으로 파싱한다([wizard.py:1499](../../../scripts/okstra_ctl/wizard.py:1499)). 그 stage 번호 집합과 done 집합을 비교한다.
47
+
48
+ 책임 분담:
49
+ - **위저드**: done 여부만 사전 표시(파일 읽기). 전체-task 항목 노출 게이트.
50
+ - **prepare**: 머지(`head_commit` 이 task worktree HEAD 의 ancestor) · clean · task-key worktree active 를 최종 강제. 미충족이면 기존 PrepareError 로 어느 stage 가 미머지인지 안내([run.py:587](../../../scripts/okstra_ctl/run.py:587), [run.py:594](../../../scripts/okstra_ctl/run.py:594), [worktree.py:655](../../../scripts/okstra_ctl/worktree.py:655)).
51
+
52
+ 위저드가 git 을 호출하지 않으므로 picker 빌드가 가볍고, 전제 강제는 한 곳(prepare)에 단일화된다.
53
+
54
+ ## 5. prepare 계약 — 빈 stage emit (계약 무변경)
55
+
56
+ 전체-task 선택의 위저드 내부 표현과 prepare 로 넘기는 값을 분리한다:
57
+
58
+ - 위저드 내부: `selected_stage` 에 명시 sentinel(예: `"whole-task"`)을 담는다.
59
+ - [`render_args`](../../../scripts/okstra_ctl/wizard.py:2792): 이 sentinel 을 **빈 stage(`""`)** 로 변환해 prepare 에 넘긴다.
60
+
61
+ prepare 는 빈/`auto` stage 를 이미 전체-task 로 해석한다(`if inp.stage and inp.stage != "auto"` 의 else 분기, [run.py:1829](../../../scripts/okstra_ctl/run.py:1829) · [run.py:1533](../../../scripts/okstra_ctl/run.py:1533)). 따라서 CLI `--stage` 계약·validator·prepare 분기 변경이 전혀 없고, 표면 어디에도 `auto` 가 노출되지 않는다.
62
+
63
+ base-ref / branch-confirm skip 은 현행 유지([`_base_ref_required`](../../../scripts/okstra_ctl/wizard.py:766), [`_branch_confirm_required`](../../../scripts/okstra_ctl/wizard.py:774)) — 전체-task·단독-stage 모두 base 가 prepare 에서 자동 해소되므로 위저드가 base 를 물을 필요가 없다.
64
+
65
+ ## 6. 게이트 함수 조정
66
+
67
+ | 함수 | 현재 | 변경 |
68
+ |---|---|---|
69
+ | [`_stage_auto_allowed`](../../../scripts/okstra_ctl/wizard.py:778) | `task_type == "implementation"` | final-verification 에서 "전체 task" 항목 노출을 **전 stage done** 조건으로 허용. (implementation 의 `auto` 와 의미가 다르므로 함수명/의도 재정의 또는 final-verification 전용 헬퍼 신설) |
70
+ | [`_submit_stage_pick`](../../../scripts/okstra_ctl/wizard.py:1539) | `auto` 만 특수 처리, final-verification+auto 거부 | "전체 task" sentinel 수용 |
71
+ | [`render_args`](../../../scripts/okstra_ctl/wizard.py:2799) | final-verification + (빈/auto) → `WizardError` | sentinel → 빈 stage 변환(§5); 단독은 기존대로 명시 번호 |
72
+ | [`confirmation_block`](../../../scripts/okstra_ctl/wizard.py:2872) | stage 값 그대로 표기 | sentinel 을 사람이 읽을 라벨("전체 task")로 표기 |
73
+
74
+ ## 7. 프롬프트 (prompts/wizard/prompts.ko.json)
75
+
76
+ `stage_pick` 프롬프트에 전체-task 항목 라벨과 stage done/미완 마킹 문자열을 추가한다. `54b9482` 가 제거한 `auto` 라벨은 복원하지 않는다 — 새 라벨은 `auto` 가 아닌 "전체 task 검증" 계열 문자열이다.
77
+
78
+ ## 8. 테스트
79
+
80
+ - `tests/test_wizard_stage_pick.py`: final-verification + 전 stage done → "전체 task" 항목 노출 / 일부 미완 → 미노출 + 해당 stage `[미완]` 마킹.
81
+ - `tests/test_wizard_final_verification_stage.py`: "전체 task" 선택 → `render_args` 가 빈 stage emit; 단독 선택 → 명시 번호 emit.
82
+ - `tests/test_okstra_ctl_wizard.py`: 전체-task 경로에서도 base-ref/branch-confirm 단계가 생략되는지(현행 불변 회귀).
83
+ - prepare 측 회귀는 `tests/test_final_verification_target.py` 기존 케이스로 충분(prepare 계약 무변경).
84
+
85
+ ## 9. 비변경 확인 (회귀 가드)
86
+
87
+ - prepare 분기·`_reserve_final_verification_target`·target 해소: 변경 없음.
88
+ - CLI `okstra.sh --stage`: 변경 없음(빈 stage = 전체-task 는 기존 계약).
89
+ - 단독-stage 위저드 UX(`54b9482` 도입분): 변경 없음.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "okstra",
3
- "version": "0.68.0",
3
+ "version": "0.70.0",
4
4
  "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
5
5
  "license": "MIT",
6
6
  "author": "devonshin",
@@ -1,5 +1,5 @@
1
1
  {
2
- "package": "0.68.0",
3
- "builtAt": "2026-06-10T14:01:48.019Z",
2
+ "package": "0.70.0",
3
+ "builtAt": "2026-06-10T17:00:34.215Z",
4
4
  "repoRoot": "/home/runner/work/okstra/okstra"
5
5
  }
@@ -246,13 +246,13 @@ After each worker terminates, BEFORE classifying its terminal status, verify the
246
246
  After each worker terminates (any terminal status), if its errors sidecar exists, dump it to the run error log using the same resolved paths from the launch prompt:
247
247
 
248
248
  ```bash
249
- python3 scripts/okstra-error-log.py append-from-worker \
249
+ okstra error-log append-from-worker \
250
250
  --sidecar <absolute-sidecar-path-from-launch-prompt> \
251
251
  --out <absolute-errors-log-path-from-launch-prompt> \
252
252
  --task-key <taskKey> --agent <agent> --agent-role <role> --model <model>
253
253
  ```
254
254
 
255
- For Codex/Gemini wrappers: if the CLI returns non-zero, times out, or hits a rate limit, immediately call `okstra-error-log.py append-observed --error-type cli-failure ...` with the captured exit code, duration, message, and stderr excerpt. The wrapper subagent does this from inside its own Bash tool — Lead does NOT need to re-record. Token usage is NOT available from Agent tool results in real time; it is collected post-hoc at the start of Phase 7.
255
+ For Codex/Gemini wrappers: if the CLI returns non-zero, times out, or hits a rate limit, immediately call `okstra error-log append-observed --error-type cli-failure ...` with the captured exit code, duration, message, and stderr excerpt. The wrapper subagent does this from inside its own Bash tool — Lead does NOT need to re-record. Token usage is NOT available from Agent tool results in real time; it is collected post-hoc at the start of Phase 7.
256
256
 
257
257
  ## Phase 5.5: Convergence loop
258
258
 
@@ -271,7 +271,7 @@ When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolv
271
271
 
272
272
  **Confirmed findings are pruned from the queue.** Findings classified as `full-consensus`, `partial-consensus`, or `worker-unique` MUST NOT appear in any subsequent round's reverify prompt for any worker. `contested` is a final classification assigned only when the last executed round completes and the queue is still non-empty — it is NEVER an intermediate queue label.
273
273
 
274
- If any re-verification batch yields a `verification-error` terminal status, or a worker result fails the contract, Lead MUST record one event per violation via `python3 scripts/okstra-error-log.py append-observed --error-type contract-violation --agent <offending-agent> ...`. Use `agent: "claude-lead"` only when the violation is detected internally without a specific worker.
274
+ If any re-verification batch yields a `verification-error` terminal status, or a worker result fails the contract, Lead MUST record one event per violation via `okstra error-log append-observed --error-type contract-violation --agent <offending-agent> ...`. Use `agent: "claude-lead"` only when the violation is detected internally without a specific worker.
275
275
 
276
276
  If convergence is disabled, proceed directly to Phase 6 with the raw worker results.
277
277
 
@@ -104,7 +104,7 @@ If you find yourself thinking "let me double-check section 3" or "I should read
104
104
 
105
105
  ## Error reporting
106
106
 
107
- This agent is responsible for recording its own tool failures via `scripts/okstra-error-log.py`:
107
+ This agent is responsible for recording its own tool failures via `okstra error-log`:
108
108
 
109
109
  **Path extraction (BLOCKING).** Before recording anything, extract the absolute sidecar path from the lead's dispatch prompt body:
110
110
 
@@ -103,7 +103,7 @@ The wrapper exists because Claude Code's Bash permission matcher rejects simple-
103
103
 
104
104
  c. **Result-file existence check (exit 0 only).** If `exit_code == 0` BUT no file exists at the extracted Result Path, the Codex CLI returned 0 without producing the analysis artifact. Observed failure mode: the CLI streams analysis prose on stdout, hits its token budget or a sandbox EPERM mid-`Write`, and exits 0 with the artifact never persisted. Forwarding the partial stdout silently degrades lead synthesis (the case that motivated this rule), so this path is required.
105
105
  1. Capture the final ~10 lines of the wrapper's live log for diagnostics — single Bash call: `tail -n 10 "${prompt_path%.md}.log"` (substitute the literal absolute prompt-history path; the wrapper writes the log next to it per the §"trace pane" comment in `okstra-codex-exec.sh`). Write the captured lines to a temp file (e.g. `<errors-sidecar-dir>/codex-result-missing-tail.txt`) so `--stderr-excerpt-file` can reference it.
106
- 2. Record a `cli-failure` event directly to the run-level error log via the exact `okstra-error-log.py append-observed` template in §"Error reporting" — substitute `--exit-code 0`, `--duration-ms <observed-ms>`, `--message "okstra-codex-exec.sh exited 0 but no result file at <abs-path>"`, and `--stderr-excerpt-file <temp-tail-path>`.
106
+ 2. Record a `cli-failure` event directly to the run-level error log via the exact `okstra error-log append-observed` template in §"Error reporting" — substitute `--exit-code 0`, `--duration-ms <observed-ms>`, `--message "okstra-codex-exec.sh exited 0 but no result file at <abs-path>"`, and `--stderr-excerpt-file <temp-tail-path>`.
107
107
  3. Return `CODEX_RESULT_MISSING: codex exited 0 but result file absent at <abs-path>` instead of the raw stdout. The lead is responsible for deciding redispatch per `okstra-team-contract` "Lead Redispatch Policy on Result-Missing".
108
108
 
109
109
  d. **Normal return.** Otherwise (`exit_code == 0` AND result file exists), return the wrapper's accumulated stdout from `BashOutput`, prefixed by exactly one model-identity line copied verbatim from the `**Model:** Codex worker, <execution-value>` line in the lead prompt (per Worker Preamble → "Return message to the lead"):
@@ -175,7 +175,7 @@ This contract mirrors the `okstra-team-contract` skill's Worker Output Contract
175
175
  ## Error reporting
176
176
 
177
177
  The wrapper agent (this Codex worker subagent) is responsible for recording
178
- two kinds of errors via `scripts/okstra-error-log.py`:
178
+ two kinds of errors via `okstra error-log`:
179
179
 
180
180
  **Path extraction (BLOCKING).** Before recording anything, extract the
181
181
  following two absolute paths verbatim from the lead's dispatch prompt body:
@@ -207,7 +207,7 @@ and the run-level error log staying empty.
207
207
  the dispatched `bash_id`:
208
208
 
209
209
  ```bash
210
- python3 scripts/okstra-error-log.py append-observed \
210
+ okstra error-log append-observed \
211
211
  --out "<absolute-errors-log-path-from-lead-prompt>" \
212
212
  --task-key "<task-key>" \
213
213
  --phase "<phase>" \
@@ -103,7 +103,7 @@ The wrapper exists because Claude Code's Bash permission matcher rejects simple-
103
103
 
104
104
  c. **Result-file existence check (exit 0 only).** If `exit_code == 0` BUT no file exists at the extracted Result Path, the Gemini CLI returned 0 without producing the analysis artifact. Observed failure mode: the CLI streams analysis prose on stdout, hits its token budget or a sandbox EPERM mid-`Write`, and exits 0 with the artifact never persisted. Forwarding the partial stdout silently degrades lead synthesis (the case that motivated this rule), so this path is required.
105
105
  1. Capture the final ~10 lines of the wrapper's live log for diagnostics — single Bash call: `tail -n 10 "${prompt_path%.md}.log"` (substitute the literal absolute prompt-history path; the wrapper writes the log next to it per the §"trace pane" comment in `okstra-gemini-exec.sh`). Write the captured lines to a temp file (e.g. `<errors-sidecar-dir>/gemini-result-missing-tail.txt`) so `--stderr-excerpt-file` can reference it.
106
- 2. Record a `cli-failure` event directly to the run-level error log via the exact `okstra-error-log.py append-observed` template in §"Error reporting" — substitute `--exit-code 0`, `--duration-ms <observed-ms>`, `--message "okstra-gemini-exec.sh exited 0 but no result file at <abs-path>"`, and `--stderr-excerpt-file <temp-tail-path>`.
106
+ 2. Record a `cli-failure` event directly to the run-level error log via the exact `okstra error-log append-observed` template in §"Error reporting" — substitute `--exit-code 0`, `--duration-ms <observed-ms>`, `--message "okstra-gemini-exec.sh exited 0 but no result file at <abs-path>"`, and `--stderr-excerpt-file <temp-tail-path>`.
107
107
  3. Return `GEMINI_RESULT_MISSING: gemini exited 0 but result file absent at <abs-path>` instead of the raw stdout. The lead is responsible for deciding redispatch per `okstra-team-contract` "Lead Redispatch Policy on Result-Missing".
108
108
 
109
109
  d. **Normal return.** Otherwise (`exit_code == 0` AND result file exists), return the wrapper's accumulated stdout from `BashOutput`, prefixed by exactly one model-identity line copied verbatim from the `**Model:** Gemini worker, <execution-value>` line in the lead prompt (per Worker Preamble → "Return message to the lead"):
@@ -175,7 +175,7 @@ This contract mirrors the `okstra-team-contract` skill's Worker Output Contract
175
175
  ## Error reporting
176
176
 
177
177
  The wrapper agent (this Gemini worker subagent) is responsible for recording
178
- two kinds of errors via `scripts/okstra-error-log.py`:
178
+ two kinds of errors via `okstra error-log`:
179
179
 
180
180
  **Path extraction (BLOCKING).** Before recording anything, extract the
181
181
  following two absolute paths verbatim from the lead's dispatch prompt body:
@@ -207,7 +207,7 @@ and the run-level error log staying empty.
207
207
  the dispatched `bash_id`:
208
208
 
209
209
  ```bash
210
- python3 scripts/okstra-error-log.py append-observed \
210
+ okstra error-log append-observed \
211
211
  --out "<absolute-errors-log-path-from-lead-prompt>" \
212
212
  --task-key "<task-key>" \
213
213
  --phase "<phase>" \
@@ -101,9 +101,9 @@ Rules (the schema enforces most of these — they are listed here so you know *w
101
101
  - Cite file paths and line numbers in every `evidence.primary[].source` / `consensus[].evidence` cell.
102
102
  - Preserve every analysis worker's ticket tagging — every row's `ticketId` field carries the ticket key or the task-fallback. For single-ticket runs, set `ticketCoverage` to `{"singleTicket": "<ticket>"}`. For runs that do not require ticket tagging (`release-handoff`, `final-verification`), set `ticketCoverage` to `{"omit": true}`.
103
103
  - For `implementation-planning`, populate `implementationPlanning.requirementCoverage` with one row per concrete requirement from the brief / packet, using IDs `R-001`, `R-002`, ... in source order. `coveredBy` MUST name the specific Option Candidate plus Stage/Step that satisfies the requirement. Use `status: "covered"` only when the report's plan actually covers it; otherwise use `gap` or `blocked C-NNN` and ensure the corresponding `Clarification Items` row blocks approval. Do not collapse this into `ticketCoverage`; ticket coverage is not requirement coverage.
104
- - When the `Task Type` is `improvement-discovery`, populate `## 5.9 Improvement Candidates` with the 10-column schema enforced by `validators/validate-improvement-report.py`. Source the row IDs (`I-NNN`), lens whitelist, and Source workers patterns from `scripts/okstra_ctl/improvement_lenses.py` — do NOT introduce new lens names or worker prefixes. `improvement-discovery` is NOT in the data.json schema enum, so author its markdown directly (not via `okstra-render-final-report.py`). Immediately after writing the markdown, run (`Bash`): `python3 scripts/okstra-inject-report-index.py <markdown path> --report-language <en|ko>`. That adds the top-of-report Index plus `I-NNN` / `C-NNN` scroll anchors; the run validator fails the report when the Index anchor is absent.
104
+ - When the `Task Type` is `improvement-discovery`, populate `## 5.9 Improvement Candidates` with the 10-column schema enforced by `validators/validate-improvement-report.py`. Source the row IDs (`I-NNN`), lens whitelist, and Source workers patterns from `scripts/okstra_ctl/improvement_lenses.py` — do NOT introduce new lens names or worker prefixes. `improvement-discovery` is NOT in the data.json schema enum, so author its markdown directly (not via `okstra-render-final-report.py`). Immediately after writing the markdown, run (`Bash`): `okstra inject-report-index <markdown path> --report-language <en|ko>`. That adds the top-of-report Index plus `I-NNN` / `C-NNN` scroll anchors; the run validator fails the report when the Index anchor is absent.
105
105
 
106
- Write the data.json with your `Write` tool against the absolute `Result Path`. Then invoke the renderer (`Bash`): `python3 scripts/okstra-render-final-report.py <data.json path>`. Confirm both files exist and respond with a short status line prefixed by your model identity, copied verbatim from the `**Model:** Report writer worker, <modelExecutionValue>` line in your dispatch prompt (per Worker Preamble → "Return message to the lead"):
106
+ Write the data.json (and the audit sidecar `.md`) with your `Write` tool that is the canonical authoring path, and okstra ships no hook that blocks `.md` writes (its only settings hook is the `SessionEnd` trace-cleanup; the coding-preflight hook emits reminders but never blocks). A Bash heredoc is acceptable ONLY when a specific `Write` call is genuinely rejected by the host environment, and it MUST produce byte-identical content — do not reach for it pre-emptively. Then invoke the renderer (`Bash`): `okstra render-final-report <data.json path>`. Confirm both files exist and respond with a short status line prefixed by your model identity, copied verbatim from the `**Model:** Report writer worker, <modelExecutionValue>` line in your dispatch prompt (per Worker Preamble → "Return message to the lead"):
107
107
 
108
108
  ```
109
109
  **Model:** Report writer worker, <modelExecutionValue>
@@ -67,8 +67,8 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
67
67
  - When dispatching any worker you MUST inject **two header lines** into the dispatch prompt body so the worker subagent can record errors without guessing paths:
68
68
  - `**Errors log path:** <absolute run-level errors log path>`
69
69
  - `**Errors sidecar path:** <absolute per-worker sidecar path matching the dispatched worker>`
70
- - These lines are the canonical contract — worker subagents extract them verbatim and pass them to `okstra-error-log.py append-observed --out ...` (run-level cli-failure / contract-violation events) and to their internal sidecar writes (worker-reported tool-failure events) respectively.
71
- - After each worker terminates, dump its sidecar into the run-level errors log via `python3 scripts/okstra-error-log.py append-from-worker --sidecar <sidecar-path> --out <run-errors-log-path> --task-key {{TASK_KEY}} --agent <worker-id> --agent-role worker --model <assigned-model-execution-value>` (per `okstra-team-contract` Worker Output Contract).
70
+ - These lines are the canonical contract — worker subagents extract them verbatim and pass them to `okstra error-log append-observed --out ...` (run-level cli-failure / contract-violation events) and to their internal sidecar writes (worker-reported tool-failure events) respectively.
71
+ - After each worker terminates, dump its sidecar into the run-level errors log via `okstra error-log append-from-worker --sidecar <sidecar-path> --out <run-errors-log-path> --task-key {{TASK_KEY}} --agent <worker-id> --agent-role worker --model <assigned-model-execution-value>` (per `okstra-team-contract` Worker Output Contract).
72
72
 
73
73
  ## Executor Worktree
74
74
 
@@ -44,6 +44,7 @@ are collected and convergence finished. Phase 1-5 do not need it.
44
44
  git diff <base>..HEAD | grep -E '^\+[^+].*\b(TBD|TODO|FIXME|XXX|implement later|handle edge cases|similar to|placeholder)\b' || echo 'clean'
45
45
  ```
46
46
  Only newly-added lines (those starting with `+` and not part of the `+++` header) are inspected. If output is anything other than `clean`, the run MUST either remove the placeholders before finalising or record an explicit justification per occurrence in the final report.
47
+ 7. **Stage-foreign literal scrub** — when the report-writer modelled this stage's `data.json` on another stage's report (a common shortcut for structural consistency), stage-specific literals get copied verbatim and silently misattribute this run. Confirm every branch name, commit SHA, stageKey, and stage number in the report resolves to **this** run's stage `<N>` — its worktree branch is `<prefix>-<task-id>-s<N>`, its stageKey `<task-id>-stage-<N>`. Sweep the report for any `-s<M>` / `stage-<M>` / `Stage <M>` where `M ≠ N` and for SHAs not in this run's `Commit list`; each hit is a copy-from-other-stage defect to correct before finalising.
47
48
 
48
49
  ## Lead post-stage persistence (BLOCKING — runs after the Executor emits `### Stage Carry Evidence`)
49
50
 
@@ -24,12 +24,13 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
24
24
  - **Language-agnostic principles that ALWAYS bind (the TDD loop below MUST satisfy them):** (1) no self-mocking of the SUT — stub/spy only injected collaborators, never the subject's own methods; (2) behavioral assertions on outcomes (return value, state, persisted rows, events, boundary calls) — never `toHaveBeenCalled*` on an internal helper as the only/primary assertion; (3) truthful names — a `get*` / `find*` that writes/inserts, or a name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*`), is a defect; (4) single-purpose functions ≤50 effective lines, plain-English readability.
25
25
  - **Graceful degradation (codex / gemini executor runtimes, or any runtime where the `~/.claude/skills/okstra-coding-preflight/` files are absent or unreadable):** do NOT skip the gate — apply the agnostic principles above plus the project's own `CLAUDE.md` / `CONTRIBUTING` / formatter+lint config, and record `coding-conventions: skill-unavailable → applied <project rules + agnostic principles>` in the final report. Never claim a skill read that did not happen.
26
26
  - **CLI executor transcription (BLOCKING when the executor provider is `codex` or `gemini`):** the executor CLI process does NOT share the lead's context — a gate that stays in lead memory never reaches it. The lead MUST copy this entire "Coding-conventions preflight" bullet tree (file-read instructions, project review rule packs, agnostic principles, graceful degradation) verbatim into the dispatched executor prompt body. Enforcement: the CLI wrapper agents refuse an implementation-Executor dispatch whose persisted prompt lacks the literal heading `Coding-conventions preflight`, returning `<SENTINEL_PREFIX>_PREFLIGHT_MISSING` (see `agents/workers/_cli-wrapper-template.md` → Prompt Composition).
27
+ - **Non-interactive auto-execution (BLOCKING when the executor provider is `codex` or `gemini`).** A CLI executor runs head-less (`codex exec` / gemini equivalent) — there is no human at the keyboard. Skills loaded during the run (tdd, coding-preflight, and others) contain "get user approval", "state your plan to the user and wait", or "ask before proceeding" gates written for interactive sessions; in this run those gates are **already satisfied** by the upstream `implementation-planning` approval (the plan this stage executes was human-approved). The executor MUST NOT stop to request approval, MUST NOT end its turn after only producing a plan, and MUST carry the stage through end-to-end — RED → GREEN → refactor → per-step commits → `### Stage Carry Evidence`. The ONLY skill step to skip is the interactive user-approval prompt itself; every other skill rule (TDD discipline, conventions, real-IO isolation) still binds. The lead MUST transcribe this bullet verbatim into the dispatched CLI executor prompt (same reason as the preflight transcription rule above — the CLI process does not share lead context). Stopping early for approval in a head-less run is the observed empty-exit failure (exit 0, no diff): treat it as `contract-violated`.
27
28
  - **Mandatory TDD loop**: BEFORE the first `Edit` or `Write` call, the executor MUST apply a red-green-refactor loop for every code change in this run. This is required; skipping it is a `contract-violated` outcome. This governs HOW each step is executed (failing test first → minimal implementation → refactor); it does not override the approved plan's WHAT/file scope.
28
29
  - Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
29
30
  - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
30
31
  - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
31
32
  - **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
32
- - **Real-IO test isolation (BLOCKING).** A test that exercises a **real** datastore, HTTP endpoint, external service, message queue, or filesystem — a live DB connection / DSN, a real `fetch` / `axios` / `http` request, an actual S3 / queue client, anything the project's normal CI test suite cannot run because that backend is absent — MUST be written under the task's qa directory `<task_root>/qa/` (the `TASK_QA_PATH` token; same directory that holds the Tier 3 conformance manifest). It MUST NOT be written into the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*`, or anywhere the project's lint/test globs collect. Two reasons: (a) the project's CI / normal suite has no real DB or network, so a real-IO test placed in source silently breaks the pipeline; (b) it is an okstra verification artifact, and the artifact-home rule confines okstra outputs to `.okstra/`. **The dividing line is the IO, not the intent:** a unit test that stubs/spies only *injected collaborators* (mock — no real socket, no real DB handle) is a TDD red-green artifact and stays in source; the moment a test opens a real connection or makes a real network call it belongs in qa. A stage's real-IO requirement check is a Tier 3 conformance script under `<task_root>/qa/` (declared via the implementation-planning conformance entry) — never smuggle real IO into a `*.spec.*` in source to make it run "as a unit test". The `db-test` real-execution gate above is satisfied by the conformance/db-test path against the replica, NOT by adding a live-DB `*.spec.*` to the project suite.
33
+ - **Real-IO test isolation (BLOCKING).** A test that exercises a **real** datastore, HTTP endpoint, external service, message queue, or filesystem — a live DB connection / DSN, a real `fetch` / `axios` / `http` request, an actual S3 / queue client, anything the project's normal CI test suite cannot run because that backend is absent — MUST be written under the task's qa directory `<task_root>/qa/` (the `TASK_QA_PATH` token; same directory that holds the Tier 3 conformance manifest). It MUST NOT be written into the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*`, or anywhere the project's lint/test globs collect. Two reasons: (a) the project's CI / normal suite has no real DB or network, so a real-IO test placed in source silently breaks the pipeline; (b) it is an okstra verification artifact, and the artifact-home rule confines okstra outputs to `.okstra/`. **The dividing line is the IO, not the intent:** a unit test that stubs/spies only *injected collaborators* (mock — no real socket, no real DB handle) is a TDD red-green artifact and stays in source; the moment a test opens a real connection or makes a real network call it belongs in qa. A stage's real-IO requirement check is a Tier 3 conformance script under `<task_root>/qa/` (declared via the implementation-planning conformance entry) — never smuggle real IO into a `*.spec.*` in source to make it run "as a unit test". The `db-test` real-execution gate above is satisfied by the conformance/db-test path against the replica, NOT by adding a live-DB `*.spec.*` to the project suite. **These qa artifacts stay untracked — never commit them.** `.okstra/**` is gitignored (the artifact-home rule); conformance scripts and their results are *executed* and recorded in the carry sidecar / verifier result, never written into git history. A committed `.okstra/qa` file is a stage-branch defect that leaks okstra internals into the eventual PR (see the `git add` rules below).
33
34
  - re-read the approved plan end-to-end and parse the `## 5.5 Stage Map`. Read the **Stage** injected in the launch prompt (`Stage for this implementation run`): the single stage number this run owns. The runtime already selected and reserved this stage (one run = one stage) — do NOT recompute the start stage from `consumers.jsonl`.
34
35
  - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(this stage)` and inject them into the executor's working context as "runtime carry-in". For a `depends-on (none)` stage, no sidecar load — task-brief only.
35
36
  - this stage's `depends-on` are all already `status:done`. Its file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope.
@@ -58,6 +59,7 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
58
59
  - read-only inspection commands: `git status`, `git diff`, `git log`, `grep`, `rg`, `find`, `cat`, `ls`, file Read tools
59
60
  - build, lint, type-check, and test commands (`npm test`, `pytest`, `go build`, `cargo test`, `bash -n`, etc.)
60
61
  - **local git operations only**: `git add`, `git commit`. Prefer small commits keyed to plan steps.
62
+ - **No okstra artifacts in commits (BLOCKING).** Never use `git add -f`. Before every `git commit`, run `git diff --cached --name-only` and confirm it contains zero `.okstra/` paths (and zero `.project-docs/` paths when the legacy symlink is present). `.okstra/**` is gitignored; force-staging it onto the stage branch is the one way these verification artifacts reach the upstream PR. Conformance/qa evidence belongs in the carry sidecar and verifier result — committing it is never correct, even when a step's instructions seem to ask for it.
61
63
  - **Commit message format (mandatory)**: every commit message MUST follow Conventional Commits — `<type>(<scope>): <subject>` for the first line, optional body separated by a blank line, optional footer. Constraints:
62
64
  - `<type>` MUST be one of: `feat` / `fix` / `perf` / `revert` / `deps` / `docs` / `refactor` / `build` / `ci` / `chore` / `test`. When the repo is `release-please`-managed, this aligns the commit with a configured changelog section.
63
65
  - `<scope>` SHOULD be the plan step identifier or the primary module touched (e.g. `feat(report-writer): ...`). Omit the parentheses only when no meaningful scope applies.
@@ -96,6 +96,7 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
96
96
  - **Tautological delegation assertion:** a test asserts the SUT result equals a direct call to the same pure helper/collaborator that the SUT delegates to, instead of asserting an independent literal value or observable state.
97
97
  - **Untruthful name:** a read-named function (`get*` / `find*` / `load*`) that writes/inserts/mutates; an adapter or repository name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*` / `findActive*`).
98
98
  - **Hexagonal (only when the overlay is loaded):** business logic inside a port body; an adapter method that is not pure I/O (post-fetch JS filtering on domain state, domain-rule evaluation); a domain object declared outside the `domain/` boundary.
99
+ - **okstra artifact committed to the branch:** any path in the `git diff --name-only <base>...HEAD` enumeration that lives under `.okstra/` (or `.project-docs/` when the legacy symlink is present). `.okstra/**` is gitignored, so a committed okstra file means the executor force-staged it (`git add -f`) — leaking verification artifacts (qa scripts, conformance results) into the eventual PR. Cite the path; recommend `git rm --cached <path>` to untrack it while keeping the file on disk. Conformance/qa evidence belongs in the carry sidecar / verifier result, never in git history.
99
100
  - **Real-IO test in source tree:** a changed/added test under the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*` — that opens a **real** DB connection / DSN, makes a real `fetch` / `axios` / `http` request, or otherwise hits real external IO without mocking the injected collaborator (a live handle, not a stub/spy). Real-IO tests MUST live under `<task_root>/qa/` per the executor's *Real-IO test isolation* rule — a live-IO test in source silently breaks the project's CI suite and violates the artifact-home rule. Cite the test file + the real-IO line; recommend moving it to `<task_root>/qa/` (or declaring it as a Tier 3 conformance script). Mock-only unit tests in source are NOT a hit.
100
101
  - **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion, newly orphaned private/public code that is safe to remove but not on a critical path, or weak-but-not-misleading names. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
101
102
  - **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
@@ -33,7 +33,7 @@
33
33
  - `Consensus` cells in `## 5.9 Improvement Candidates` use the table enum exactly: `full`, `partial`, `contested`, `worker-unique`. Map convergence's `full-consensus` / `partial-consensus` labels to `full` / `partial` before writing the table.
34
34
  - `## 7. Final Verdict` Verdict Token ∈ {`candidates-ready`, `no-candidates`, `blocked`}; Direction `routing`; Next Step "사용자에게 후보 K개 선택 의뢰 (## 5.9 표 참조)"
35
35
  - `## 3. Recommended Next Steps` first entry summarises per-candidate routing and proposes new task-key names of the form `<task-group>/imp-<Cand-ID>`
36
- - this report is authored free-form (improvement-discovery is not in the data.json schema enum); after the markdown is written, the report-writer runs `scripts/okstra-inject-report-index.py <report.md> --report-language <en|ko>` to add the top-of-report Index + `I-NNN`/`C-NNN` scroll anchors. The run validator fails the report when the Index anchor is missing.
36
+ - this report is authored free-form (improvement-discovery is not in the data.json schema enum); after the markdown is written, the report-writer runs `okstra inject-report-index <report.md> --report-language <en|ko>` to add the top-of-report Index + `I-NNN`/`C-NNN` scroll anchors. The run validator fails the report when the Index anchor is missing.
37
37
  - Clarification request policy (phase-specific addenda — shared policy is in `_common-contract.md`):
38
38
  - if scan-scope or priority-lenses cannot be made concrete during Phase 1.5, end the run with Verdict Token `blocked`, populate `## 1. Clarification Items` with `Blocks=next-phase` rows, and do not run worker dispatch
39
39
  - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell
@@ -168,7 +168,7 @@
168
168
  "echo_template": "approved-plan: {value}"
169
169
  },
170
170
  "approve_plan_confirm": {
171
- "label": "이 플랜으로 implementation 을 진행할까요?\n {path}\n· 예 — 진행합니다. 플랜이 아직 승인 전이면 지금 data.json(정본) + 리포트를 함께 approved 로 처리한 뒤 진행합니다. (markdown 만 손으로 고치면 일관성 검증에서 거부되므로 이 경로로 승인하세요.)\n· 아니오 — 진행하지 않습니다.",
171
+ "label": "이 플랜으로 진행할까요?\n {path}\n· 예 — 진행합니다. 플랜이 아직 승인 전이면 지금 data.json(정본) + 리포트를 함께 approved 로 처리한 뒤 진행합니다. (markdown 만 손으로 고치면 일관성 검증에서 거부되므로 이 경로로 승인하세요.)\n· 아니오 — 진행하지 않습니다.",
172
172
  "echo_template": "approve-plan: {value}",
173
173
  "options": {
174
174
  "yes": "예 — 승인하고 진행",
@@ -179,16 +179,19 @@
179
179
  "approved": "approved-plan: {path} (승인·진행 확인됨)"
180
180
  },
181
181
  "errors": {
182
- "declined": "진행을 선택하지 않으면 implementation 시작할 수 없습니다. 진행(예)하거나 위저드를 종료하세요.",
182
+ "declined": "진행을 선택하지 않으면 다음 단계로 넘어갈 수 없습니다. 진행(예)하거나 위저드를 종료하세요.",
183
183
  "still_unapproved": "approve-plan: 승인 처리 후에도 승인 상태가 아닙니다 (data.json/markdown 불일치): {path}"
184
184
  }
185
185
  },
186
186
  "stage_pick": {
187
187
  "label": "stage 범위를 선택하세요. auto 는 전체 task(모든 stage)를, 특정 번호는 해당 stage 만 대상으로 합니다.",
188
+ "label_final_verification": "검증할 implementation stage 를 선택하세요.",
188
189
  "echo_template": "stage: {value}",
189
190
  "options": {
190
191
  "auto": "auto (다음 미완료 stage)",
191
- "auto_final_verification": "auto (전체 task 모든 stage 머지 후 한 번)"
192
+ "whole_task": "전체 task 검증 (모든 stage)",
193
+ "done_mark": "[done]",
194
+ "undone_mark": "[미완]"
192
195
  }
193
196
  },
194
197
  "directive_pick": {
@@ -367,6 +370,7 @@
367
370
  },
368
371
  "confirmation": {
369
372
  "header": "선택 확인:",
370
- "workers_implementation_default": " workers : (프로필 기본 — executor + verifier 2 + report-writer)"
373
+ "workers_implementation_default": " workers : (프로필 기본 — executor + verifier 2 + report-writer)",
374
+ "stage_whole_task": "전체 task"
371
375
  }
372
376
  }
@@ -258,6 +258,23 @@ def apply_qa_waiver(manifest: object, stage_key: str, reason: str, *, at: str,
258
258
  return False
259
259
 
260
260
 
261
+ def clear_qa_waiver(manifest: object, stage_key: str) -> bool:
262
+ """stage_key entry 의 `waiver` 를 제거한다(in place). 제거했으면 True.
263
+
264
+ 한 stage 의 새 run 이 시작될 때, 그 stage entry 에 남아 있던 이전 run 의
265
+ waiver(예: all-gate run 이 미래 stage 를 미리 waive 한 것)는 stale 다 —
266
+ 그대로 두면 verifier 가 conformance 를 skip 해 마스킹된다. 이 run 이 실제로
267
+ 검증하도록 제거한다. 사용자가 이번 run 에 같은 stage 를 명시 waive 한
268
+ 경우(--qa-waiver)는 호출 측에서 걸러 보존한다."""
269
+ entries = manifest.get("entries") if isinstance(manifest, dict) else None
270
+ if not isinstance(entries, list):
271
+ return False
272
+ for entry in entries:
273
+ if isinstance(entry, dict) and entry.get("stageKey") == stage_key:
274
+ return entry.pop("waiver", None) is not None
275
+ return False
276
+
277
+
261
278
  def manifest_required_surfaces(manifest: object) -> set[str]:
262
279
  """매니페스트 전 entry 의 `requires` 합집합 — 선언된 surface 집합."""
263
280
  entries = manifest.get("entries") if isinstance(manifest, dict) else None
@@ -124,14 +124,17 @@ def compute_run_paths(
124
124
  timeline_file = history_dir / "timeline.json"
125
125
 
126
126
  run_dir = runs_dir / task_type_segment
127
- # implementation stage isolation: each stage's run artifacts live in a
128
- # dedicated `stage-<N>` subtree (mirrors the per-stage worktree) so two
129
- # concurrent `implementation` runs never share reports/state/worker-results.
127
+ # Stage isolation: each stage's run artifacts live in a dedicated
128
+ # `stage-<N>` subtree (mirrors the per-stage worktree) so two concurrent
129
+ # runs of the same task-key never share reports/state/worker-results.
130
+ # Applies to `implementation` and single-stage `final-verification`
131
+ # (whole-task final-verification has stage=None and stays flat).
130
132
  # consumers.jsonl + the worktree registry stay at the task-type level (the
131
133
  # shared stage ledger / occupancy SSOT); they are computed OUTSIDE this
132
134
  # function and are intentionally NOT stage-scoped. Other task-types have no
133
135
  # stage concept, so their run_dir is unchanged.
134
- if task_type_segment == "implementation" and stage is not None:
136
+ if (task_type_segment in ("implementation", "final-verification")
137
+ and stage is not None):
135
138
  run_dir = run_dir / f"stage-{int(stage)}"
136
139
  run_manifests = run_dir / "manifests"
137
140
  run_state = run_dir / "state"
@@ -1681,11 +1681,18 @@ def inject_lead_prompt_computed_tokens(ctx: dict) -> None:
1681
1681
  )
1682
1682
  else:
1683
1683
  team_name = f'okstra-{ctx.get("TASK_KEY", "")}'
1684
- stage = str(ctx.get("EFFECTIVE_STAGES", "") or "").strip()
1685
- if task_type == "implementation" and stage:
1684
+ impl_stage = str(ctx.get("EFFECTIVE_STAGES", "") or "").strip()
1685
+ fv_stage = str(ctx.get("RUN_STAGE", "") or "").strip()
1686
+ if task_type == "implementation" and impl_stage:
1686
1687
  # stage 격리 run 은 stage 별 team — 같은 task 의 다른 stage 가 남긴
1687
1688
  # team 과 이름이 충돌하지 않는다(worktree branch `-s<N>` 접미사와 동형).
1688
- team_name = f"{team_name}-s{stage}"
1689
+ team_name = f"{team_name}-s{impl_stage}"
1690
+ elif task_type == "final-verification" and fv_stage:
1691
+ # 단일-stage final-verification 도 stage 별 team. `-fv-` 를 끼워
1692
+ # 같은 stage 의 implementation team(`-s<N>`)과도 구분한다 — 둘은
1693
+ # 동시에 살아 있을 수 있다. whole-task 검증(RUN_STAGE 빈 값)은
1694
+ # 기본 이름 유지.
1695
+ team_name = f"{team_name}-fv-s{fv_stage}"
1689
1696
  team_creation_gate_block = (
1690
1697
  "## Team Creation Gate (BLOCKING)\n"
1691
1698
  "\n"
@@ -81,7 +81,11 @@ from .workers import (
81
81
  )
82
82
  from .workflow import compute_workflow_state
83
83
  from .locks import worktree_provision_mutex
84
- from .worktree import provision_task_worktree
84
+ from .worktree import (
85
+ WorktreeProvision,
86
+ okstra_clean_gate_excludes,
87
+ provision_task_worktree,
88
+ )
85
89
 
86
90
  # Frontmatter approval-flag matcher.
87
91
  #
@@ -1151,6 +1155,46 @@ def _apply_qa_waiver_if_requested(inp: "PrepareInputs", project_root: Path) -> N
1151
1155
  manifest_path.write_text(json.dumps(manifest, indent=2, ensure_ascii=False) + "\n")
1152
1156
 
1153
1157
 
1158
+ def _clear_stale_stage_waiver(inp: "PrepareInputs", project_root: Path, stage: int) -> None:
1159
+ """A fresh `implementation` run of stage N must not inherit a waiver left on
1160
+ its conformance entry by an earlier run (e.g. an all-gate run that pre-waived
1161
+ future stages, or an abandoned attempt). A stale waiver makes the verifier
1162
+ skip Tier 3 conformance and silently mask this stage, so clear it — UNLESS
1163
+ the user re-waived this exact stage for this run via `--qa-waiver` (already
1164
+ applied upstream in `_apply_qa_waiver_if_requested`)."""
1165
+ from .conformance import clear_qa_waiver, parse_qa_waiver_arg
1166
+ from .paths import task_dir
1167
+ manifest_path = (
1168
+ task_dir(project_root, inp.task_group, inp.task_id)
1169
+ / "qa" / "conformance-manifest.json"
1170
+ )
1171
+ if not manifest_path.is_file():
1172
+ return
1173
+ manifest = json.loads(manifest_path.read_text())
1174
+ entries = manifest.get("entries") if isinstance(manifest, dict) else None
1175
+ if not isinstance(entries, list):
1176
+ return
1177
+ # The manifest stageKey is `<task-id>-stage-<N>` authored by planning; match
1178
+ # on the `-stage-<N>` suffix so we do not assume the task-id's exact form.
1179
+ suffix = f"-stage-{stage}"
1180
+ stage_key = next(
1181
+ (e["stageKey"] for e in entries
1182
+ if isinstance(e, dict) and isinstance(e.get("stageKey"), str)
1183
+ and e["stageKey"].endswith(suffix)),
1184
+ None,
1185
+ )
1186
+ if stage_key is None:
1187
+ return
1188
+ if inp.qa_waiver:
1189
+ parsed = parse_qa_waiver_arg(inp.qa_waiver)
1190
+ if parsed is not None and parsed[0] == stage_key:
1191
+ return # user intentionally waived this stage for this run
1192
+ if clear_qa_waiver(manifest, stage_key):
1193
+ manifest_path.write_text(
1194
+ json.dumps(manifest, indent=2, ensure_ascii=False) + "\n"
1195
+ )
1196
+
1197
+
1154
1198
  def _register_and_check_project(project_root: Path, inp: PrepareInputs) -> None:
1155
1199
  """project.json self-registration + (implementation 한정) qaCommands gate 검증."""
1156
1200
  from okstra_project import ResolverError
@@ -1493,10 +1537,22 @@ def _is_ancestor(cwd, commit, head) -> bool:
1493
1537
 
1494
1538
 
1495
1539
  def _is_dirty_excluding_okstra(cwd) -> bool:
1496
- out = _git_out(cwd, "status", "--short", "--", ".", ":(exclude).okstra")
1540
+ excludes = [f":(exclude){p}" for p in okstra_clean_gate_excludes(Path(cwd))]
1541
+ out = _git_out(cwd, "status", "--short", "--", ".", *excludes)
1497
1542
  return bool(out.strip())
1498
1543
 
1499
1544
 
1545
+ def _single_stage_final_verification_worktree(inp: "PrepareInputs") -> WorktreeProvision:
1546
+ """Placeholder until the selected stage registry row is resolved."""
1547
+ return WorktreeProvision(
1548
+ status="deferred-final-verification",
1549
+ note=(
1550
+ "final-verification single-stage uses the selected implementation "
1551
+ "stage worktree from the registry"
1552
+ ),
1553
+ )
1554
+
1555
+
1500
1556
  def _reserve_final_verification_target(
1501
1557
  inp: "PrepareInputs", ctx: dict, ctx_stage_map: list,
1502
1558
  ) -> None:
@@ -1519,6 +1575,7 @@ def _reserve_final_verification_target(
1519
1575
  row = _reg.get_stage_row(inp.project_id, inp.task_group, inp.task_id, n)
1520
1576
  wt_path = (row or {}).get("worktree_path", "")
1521
1577
  stage_base = (row or {}).get("base_ref", "")
1578
+ stage_branch = (row or {}).get("branch", "")
1522
1579
  head = _git_out(wt_path, "rev-parse", "HEAD") if wt_path else ""
1523
1580
  target = _resolve_single_stage_target(
1524
1581
  requested_stage=inp.stage, done_rows=done_rows,
@@ -1527,6 +1584,12 @@ def _reserve_final_verification_target(
1527
1584
  stage_dirty=_is_dirty_excluding_okstra(wt_path) if wt_path else False,
1528
1585
  )
1529
1586
  ctx["EXECUTOR_WORKTREE_PATH"] = wt_path
1587
+ ctx["EXECUTOR_WORKTREE_BRANCH"] = stage_branch
1588
+ ctx["EXECUTOR_WORKTREE_BASE_REF"] = stage_base
1589
+ ctx["EXECUTOR_WORKTREE_STATUS"] = "reused-stage"
1590
+ ctx["EXECUTOR_WORKTREE_NOTE"] = (
1591
+ f"final-verification uses implementation stage {n} worktree"
1592
+ )
1530
1593
  else:
1531
1594
  wt_path = ctx["EXECUTOR_WORKTREE_PATH"]
1532
1595
  anchor = _reg.get_implementation_base(
@@ -1803,38 +1866,52 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
1803
1866
  with worktree_provision_mutex(
1804
1867
  okstra_home(), inp.project_id, task_group_segment, task_id_segment,
1805
1868
  ):
1806
- try:
1807
- worktree = provision_task_worktree(
1808
- task_type=inp.task_type,
1809
- project_root=project_root,
1810
- project_id=inp.project_id,
1811
- task_group_segment=task_group_segment,
1812
- task_id_segment=task_id_segment,
1813
- work_category=inp.work_category,
1814
- base_ref=inp.base_ref,
1815
- require_base_ref=True,
1816
- )
1817
- except RuntimeError as exc:
1818
- raise PrepareError(
1819
- f"task worktree provisioning failed: {exc}"
1820
- ) from exc
1869
+ if inp.task_type == "final-verification" and inp.stage and inp.stage != "auto":
1870
+ worktree = _single_stage_final_verification_worktree(inp)
1871
+ # Single-stage final-verification namespaces its run path under
1872
+ # runs/final-verification/stage-<N> (same isolation as
1873
+ # implementation) so concurrent per-stage verifications never
1874
+ # share state/reports/worker-results.
1875
+ fv_stage_arg = int(inp.stage)
1876
+ else:
1877
+ fv_stage_arg = None
1878
+ try:
1879
+ worktree = provision_task_worktree(
1880
+ task_type=inp.task_type,
1881
+ project_root=project_root,
1882
+ project_id=inp.project_id,
1883
+ task_group_segment=task_group_segment,
1884
+ task_id_segment=task_id_segment,
1885
+ work_category=inp.work_category,
1886
+ base_ref=inp.base_ref,
1887
+ require_base_ref=True,
1888
+ )
1889
+ except RuntimeError as exc:
1890
+ raise PrepareError(
1891
+ f"task worktree provisioning failed: {exc}"
1892
+ ) from exc
1821
1893
 
1822
1894
  # ---- implementation stage selection (path-independent) ----
1823
1895
  # Resolve + provision the stage BEFORE run-path compute so RUN_DIR
1824
1896
  # lands in runs/implementation/stage-<N>. The registry stage-key is
1825
1897
  # reserved exactly once here (inside provision_stage_worktree), and
1826
1898
  # the surrounding mutex makes the registry read in stage selection
1827
- # and that reserve atomic. Non-implementation task-types skip this
1828
- # entirely stage_arg stays None identical paths.
1899
+ # and that reserve atomic. Other task-types skip this selection;
1900
+ # single-stage final-verification threads its explicit stage via
1901
+ # fv_stage_arg, everything else keeps stage_arg=None (flat paths).
1829
1902
  if inp.task_type == "implementation":
1830
1903
  impl_stage_selection = _select_and_provision_implementation_stage(
1831
1904
  inp, ctx_stage_map, task_group_segment, task_id_segment,
1832
1905
  task_key, worktree.status,
1833
1906
  )
1834
1907
  stage_arg = impl_stage_selection.stage
1908
+ # Drop any stale waiver on this stage so the run actually verifies
1909
+ # conformance (kept inside the per-task-key mutex so concurrent
1910
+ # same-task runs don't race the manifest write).
1911
+ _clear_stale_stage_waiver(inp, project_root, impl_stage_selection.stage)
1835
1912
  else:
1836
1913
  impl_stage_selection = None
1837
- stage_arg = None
1914
+ stage_arg = fv_stage_arg
1838
1915
 
1839
1916
  ctx = compute_and_write_run_context(
1840
1917
  workspace_root=workspace_root, project_root=project_root,