npm - okstra - Versions diffs - 0.45.1 → 0.47.0 - Mend

okstra 0.45.1 → 0.47.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/docs/superpowers/specs/2026-06-04-stage-splitting-cost-aware-design.md ADDED Viewed

@@ -0,0 +1,98 @@
+# stage 분할 기준의 비용-인식 재설계 — 설계
+- 작성일: 2026-06-04
+- 범위: `implementation-planning` 의 stage 분할 기준과 `implementation` 의 run 실행 단위를 **비용-인식** 구조로 바꾼다. stage 는 계획/PR/검증-증거 단위로 유지하되, 교차검증 기계(verifier·convergence·report·teardown)의 고정비가 stage 수에 비례해 곱해지지 않도록 run 을 별도 단위로 분리한다.
+- 비범위
+  - 신규 task-type 추가 없음. `requirements-discovery` / `error-analysis` / `final-verification` / `release-handoff` 산출 구조 불변.
+  - sidecar carry-in JSON 스키마(`runs/<impl-key>/carry/stage-<N>.json`) 자체는 불변 — stage 별 방출 유지.
+  - 다국어/i18n.
+- 관계: 본 문서는 [`2026-05-20-implementation-planning-multi-stage-design.md`](2026-05-20-implementation-planning-multi-stage-design.md) 의 **§2.3 "병렬화 최대화 우선(분할 1급 기준)"(line 36–44)** 을 대체한다. 같은 문서의 `step ≤6 cap`(line 46–50), carry-in 모델(§2.4), 데이터 모델(§3) 은 그대로 유지된다. 구현 단계에서 옛 spec 의 §2.3 을 본 결정으로 재작성한다.
+## 1. 동기 — 1줄 fix 가 고정비 1세트를 통째로 삼킨다
+실측: `fontradar-v2-api` 의 1줄 fix(await 1개 제거, +1/−3)가 3 stage 로 분할되어 Stage 1 단독으로 교차검증 1세트를 전부 떠안았다. 원인은 코드가 아니라 분할 기준의 구조적 결함이다.
+1. **고정비는 run 당이다.** 한 `implementation` run 은 executor 1 + verifier 2(claude+codex) + report-writer 를 무조건 단다 ([`prompts/profiles/implementation.md:4`](../../../prompts/profiles/implementation.md)). 여기에 Phase 5.5 convergence 와 Phase 7 teardown 이 더해진다. 이 비용은 변경 크기와 무관한 고정비다.
+2. **그 고정비가 stage 수만큼 곱해진다.** 현재 `1 run = 정확히 1 stage` 가 강제다 ([`prompts/profiles/_implementation-executor.md:41`](../../../prompts/profiles/_implementation-executor.md)). 따라서 stage 수 = 고정비 배수.
+3. **분할 기준에 하한이 없다.** 옛 spec 의 분할 기준은 상한(stage 당 step ≤6)과 "병렬 가능 stage 최대화"(1급) 뿐이다 ([`2026-05-20-...-design.md:36`](2026-05-20-implementation-planning-multi-stage-design.md), [`prompts/profiles/implementation-planning.md:68`](../../../prompts/profiles/implementation-planning.md)). "너무 작으면 합쳐라" 가 없다.
+4. **옛 spec 내부 모순.** §2.2 는 "한 stage 안의 step 은 상호 독립 → 독립이면 같은 stage"([line 31](2026-05-20-implementation-planning-multi-stage-design.md)), §2.3 은 "독립 stage 가 여럿 나오도록 분할"([line 42](2026-05-20-implementation-planning-multi-stage-design.md)). 독립 작업을 묶을지/쪼갤지가 정반대로 읽힌다.
+5. **잘못된 유추.** §2.3 은 step ≤6 cap 을 "함수 50라인 cap 과 같은 정신" 이라 했다 ([line 48](2026-05-20-implementation-planning-multi-stage-design.md)). 함수 분리는 공짜지만 okstra 의 stage 분리는 멀티에이전트 run 1세트를 새로 사는 일이다. 비용 구조가 다른데 같은 규칙을 이식했다.
+## 2. 핵심 원칙
+### 2.1 stage 와 run 의 분리
+| 단위 | 의미 | 경계 기준 |
+|---|---|---|
+| **stage** | 계획·PR·검증-증거(sidecar)의 단위 | `depends-on` 종속 + effective step ≤6 cap |
+| **run** | 교차검증 기계(verifier·convergence·report·teardown)의 단위 = 고정비 1세트 | ready-set + run step 예산 |
+옛 spec 의 `1 run = 1 stage` 등식을 깬다. 한 run 이 여러 stage 를 batch 로 소유할 수 있다.
+### 2.2 플래너: 응집 기준점 = 파일/모듈 근접성, 한도 = ≤6 cap
+묶음의 **기준점은 공유 파일/모듈 근접성**이다. "독립이고 크기가 맞으면 아무거나 묶어도 된다" 가 아니라, **같은 파일·디렉터리·모듈을 건드리는 작업끼리 묶는다.** 이래야 diff·PR·rollback 단위가 의미적으로 응집한다.
+- **기본값 = 같은 파일/모듈을 건드리는 작업은 같은 stage 로 묶는다.** §2.2(독립=같은 stage) 를 채택하고 옛 §2.3(병렬화 위해 쪼개라) 을 폐기해 모순을 해소한다.
+- stage 를 **분리**하는 조건 (셋 중 하나):
+  - (a) 실제 `depends-on` 종속이 존재한다 (한 step 의 산출을 다른 step 이 소비), 또는
+  - (b) 합산 effective step 이 6 을 넘는다, 또는
+  - (c) **건드리는 파일 집합이 서로소다** (공유 파일/모듈이 없는 독립 작업 — 억지로 한 stage 에 섞지 않는다).
+- **병렬화는 분할의 이유가 아니다.** cap 이 만든 여러 stage 가 우연히 `depends-on (none)` 이면 두 run 이 동시에 잡아 병렬 진행할 수 있다 — 그건 부수효과일 뿐, stage 를 더 만들 근거가 아니다.
+- `step ≤6 cap`(옛 §2.3 line 46–50) 은 그대로 유지한다.
+**병렬-안전 불변식 (파일 근접성 기준의 enforcement teeth):** 서로 `depends-on (none)` 인 두 stage 의 `Stage Exit Contract` 예측 파일 집합은 **서로소여야 한다.** 안 그러면 두 병렬 run 이 같은 파일을 동시에 편집해 충돌한다. 같은 파일을 건드리는 두 작업은 (i) 같은 stage 로 묶거나 (ii) `depends-on` 으로 순서를 매겨야 한다.
+효과:
+- 같은 파일/모듈을 손대는 독립 버그 → 1 stage 로 합쳐짐 (합산 ≤6 한도).
+- 서로 다른 모듈을 손대는 독립 버그 → 별도 stage (응집 보존). 단 고정비는 §2.3 의 run batch 가 흡수하므로 stage 수가 늘어도 비용은 곱해지지 않는다.
+### 2.3 실행기: "ready-set + run step 예산 batch"
+- 디스패치 시점에, `depends-on` 이 모두 `status:done` 이고 미점유인 stage 들을 **합산 effective step 이 run 예산에 닿을 때까지** 한 run 이 흡수한다.
+- **run step 예산 = 8** (stage cap 6 보다 큼). 1줄짜리 잔여 stage 를 더 큰 stage 옆에 흡수하되, stage 경계·검증 정밀도는 거의 그대로 유지하는 보수적 값.
+- start-stage 단일 선택([`_implementation-executor.md:30`](../../../prompts/profiles/_implementation-executor.md)) 을 ready-set 선택으로 확장한다. 선택 알고리즘:
+  1. ready 집합 = `depends-on` 이 전부 done 이고 `consumers.jsonl` 에 `started`/`done` 행이 없는 stage.
+  2. ready 를 stage 번호 오름차순으로 정렬.
+  3. 누적 effective step 이 8 을 넘지 않는 한도에서 앞에서부터 담는다. 단일 stage 가 8 을 넘으면(불가능 — cap 6) 그 stage 하나만.
+  4. 최소 1개는 보장(빈 batch 금지).
+- 점유: `consumers.jsonl` reverse-link 와 `~/.okstra/worktrees/registry.json` 이 run 당 **stage-key 집합**을 원자적으로(flock) 예약한다. batch 안 모든 stage 에 `started` 행을 한 번에 append, 완료 시 각 stage 에 `done` 행 + `carry_path` append.
+- **응집은 이미 stage 단위에 있다.** run batch 는 순수 비용 메커니즘 — 응집 기준점(파일 근접성)은 §2.2 에서 stage 형성 시 이미 적용됐고, PR 은 stage 섹션으로 분리되므로 batch 가 서로 다른 모듈의 stage 를 묶어도 리뷰·rollback 응집은 stage 경계에 보존된다. 한 run 은 단일 worktree 에서 stage 순서대로 직렬 실행하므로 batch 내부 충돌은 없다. §2.2 의 병렬-안전 불변식 덕분에 서로 다른 run 이 잡는 stage 들도 파일이 겹치지 않는다.
+### 2.4 검증·산출물 단위
+- verifier·convergence·report-writer·Phase 7 은 batch **전체 diff 에 대해 run 당 1회.** ← 실제 절감의 핵심.
+- executor 는 stage 마다 `Stage Validation post` 통과 시 `carry/stage-N.json` sidecar 를 **stage 별로 방출**한다 ([`_implementation-executor.md:43`](../../../prompts/profiles/_implementation-executor.md)). carry-in 계약 보존. 즉 batch 안에서도 stage 순서대로 실행하며 각 stage 경계에서 post 검증 + sidecar 방출, 마지막에 run 단위 교차검증 1회.
+- PR 은 **run 당 1개.** 제목 `Stages <X>–<Y>: <run 요약>`, body 에 stage 섹션 분리 + Previous/Next run 링크. one-PR-per-stage([`_implementation-executor.md:45`](../../../prompts/profiles/_implementation-executor.md)) 를 one-PR-per-run 으로 전환.
+## 3. 손대는 파일
+| 파일 | 변경 |
+|---|---|
+| 본 문서 | 신규 설계(옛 §2.3 대체) |
+| [`2026-05-20-...-design.md:36`](2026-05-20-implementation-planning-multi-stage-design.md) | §2.3 병렬화-우선 규칙 삭제, §2.1 표에 "run 단위" 추가, run batch·run단위 검증 섹션 추가, 본 문서 참조 |
+| [`prompts/profiles/implementation-planning.md:68`](../../../prompts/profiles/implementation-planning.md) | "Parallelisation-first rule (1st-class)" → "응집 기준점=파일/모듈 근접성, cap 이 유일 분할기, 병렬화는 분할 이유 아님" |
+| [`prompts/profiles/implementation-planning.md:96`](../../../prompts/profiles/implementation-planning.md) | Stage Map self-check 에 (i) "depends-on 제거 가능하면 재분할" → "병렬화 목적의 분할 금지", (ii) 병렬-안전 불변식 자가검사("`depends-on (none)` stage 들의 예측 파일 집합 서로소") 추가 |
+| [`prompts/profiles/_implementation-executor.md:30`](../../../prompts/profiles/_implementation-executor.md) | start-stage 단일 선택 → ready-set batch 선택(§2.3 알고리즘) |
+| [`prompts/profiles/_implementation-executor.md:41`](../../../prompts/profiles/_implementation-executor.md) | "owns exactly one stage" → "owns a ready-set batch (run 예산 8 이내)" |
+| [`prompts/profiles/_implementation-executor.md:43-49`](../../../prompts/profiles/_implementation-executor.md) | sidecar/consumers 를 batch 안 stage 별로 방출하도록, one-PR-per-stage → one-PR-per-run |
+| [`prompts/profiles/implementation.md:4`](../../../prompts/profiles/implementation.md) | (필요 시) Phase 설명에 "검증·report 는 run 단위 1회" 명시 |
+| [`validators/validate-implementation-plan-stages.py`](../../../validators/validate-implementation-plan-stages.py) | 기존 ≤6 cap([line 140](../../../validators/validate-implementation-plan-stages.py))·step-count 셀 일치([line 143](../../../validators/validate-implementation-plan-stages.py))·depends-on DAG([line 149](../../../validators/validate-implementation-plan-stages.py)) 유지. **신규 체크 S9** 추가 — `Stage Exit Contract` 의 "추가/변경된 파일 (예측)" 라인에서 경로를 추출해, 서로 `depends-on (none)` 인 stage 쌍의 파일 집합이 겹치면 거부 (병렬-안전 불변식). 예측 기반 best-effort 이며, 실제 충돌의 backstop 은 worktree 직렬성 + flock registry 다 |
+## 4. 비범위 / 향후
+- run 예산 8 은 상수로 시작. 사용자 노출 플래그(`--run-step-budget`)는 YAGNI — 실제 요구 전까지 도입하지 않는다.
+- 명시적 `--stages 1,2,3` batch 지정은 본 설계의 자동 ready-set 으로 충분 — 별도 도입하지 않는다.
+- **parallel-run started-exclusion 은 비범위.** ready-set 선택은 다른 run 이 `started` 한 stage 를 배제하지 않는다 — 두 병렬 run 은 같은 ready-set batch 를 잡는다(기존 동작, [`tests/test_e2e_multi_stage_q1_q9.py::test_q7`](../../../tests/test_e2e_multi_stage_q1_q9.py) 가 문서화). consumers.jsonl 의 flock 은 파일 손상만 막을 뿐 batch 분리를 보장하지 않는다. 사용자의 순차 phase-continuation 사용에는 영향 없으며, 진짜 충돌 backstop 은 §2.2 의 파일-서로소 불변식 + worktree 직렬성이다. started-exclusion 도입은 별도 작업으로 남긴다.
+- **stage 선택은 Python SOT.** `_resolve_effective_stages`([`scripts/okstra_ctl/run.py`](../../../scripts/okstra_ctl/run.py)) 가 batch 를 선택하고 `prepare_task_bundle` 이 stage 별 `started` row 를 기록한 뒤 `{{STAGE_BATCH_DIRECTIVE}}` 로 lead 프롬프트에 주입한다. 기존의 "lead 가 consumers.jsonl 로 start stage 자가계산" 이중 경로(drift)는 제거됐다.
+## 5. 검증 방식 (enforcement)
+| 계약 | 강제 위치 |
+|---|---|
+| stage 당 effective step ≤6 | [`validators/validate-implementation-plan-stages.py`](../../../validators/validate-implementation-plan-stages.py) (기존 유지) |
+| 응집 기준점 = 파일/모듈 근접성 | planner 자가검사 prose ([`implementation-planning.md:96`](../../../prompts/profiles/implementation-planning.md)). 정성 기준이라 validator 강제 불가 — 아래 불변식이 그 위반의 대리 신호 |
+| 병렬-안전 불변식 (`depends-on (none)` stage 파일 서로소) | validator 신규 체크 S9 (best-effort, 예측 경로 기반). 진짜 backstop 은 worktree 직렬성 + flock registry |
+| run batch 합산 step ≤8 | executor 가 ready-set 선택 시 누적 카운트로 자르고, run 시작 로그에 batch 구성을 기록. (planner 산출물이 아니므로 plan validator 대상 아님) |
+| stage 별 sidecar 방출 | [`_implementation-executor.md:43`](../../../prompts/profiles/_implementation-executor.md) BLOCKING 규칙 유지 |
+| run 단위 검증 1회 | Phase 5.5/6 진입을 run 당 1회로 묶는 lead 오케스트레이션 (sidecar 규칙) |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.45.1",
+  "version": "0.47.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.45.1",
-  "builtAt": "2026-06-04T08:12:55.290Z",
+  "package": "0.47.0",
+  "builtAt": "2026-06-04T12:46:31.759Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/agents/SKILL.md CHANGED Viewed

@@ -250,7 +250,8 @@ Convergence is enabled by default. Configure via task-manifest.json:
 - `convergence.enabled`: true/false (default: true)
 - `convergence.maxRounds`: 1–3 — **phase-aware default**: `1` for `requirements-discovery`, `2` for all other task types
-- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`)
+- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`; the adversarial phases below force `"full-reanalysis"`)
+- `convergence.adversarial`: true/false — **phase-aware default**: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise. When `true`, Phase 5.5 runs in adversarial mode (verifiers refute findings; burden of proof on the claim). See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Adversarial Verification Mode".
 When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolve the effective value via the phase-aware default above before entering Phase 5.5, and record the resolved value in the convergence state artifact at `config.effectiveMaxRounds`.

package/runtime/prompts/launch.template.md CHANGED Viewed

@@ -15,6 +15,7 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
 - Forbidden actions in this phase:
 {{PHASE_FORBIDDEN_ACTIONS}}
 - This run executes `{{WORKFLOW_CURRENT_PHASE}}` only. Do not start `{{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` or any later phase inside this run, even if the user says "다음 단계 진행해" or similar.
+{{STAGE_BATCH_DIRECTIVE}}
 - Phase advancement requires a new okstra invocation launched with `--task-type {{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` after this run's final report is written and approved. The lead must not write source code, run builds/migrations/deployments, or otherwise produce artifacts of a different phase from inside this run.
 - See `Lifecycle Phase Boundaries` in the okstra skill (`agents/SKILL.md`) for the canonical rules and the phase-transition checklist.

package/runtime/prompts/profiles/_common-contract.md CHANGED Viewed

@@ -14,7 +14,7 @@ profile document.
 - Worker interaction model (shared — read before inferring behaviour from the roster):
   - the per-profile `Required workers:` block is a **roster**, not a behaviour contract. Each role's interaction mode changes across operating phases of the same run.
   - **Phase 4 / 5 (independent analysis)**: analyser workers (`claude`, `codex`, `gemini` when opted in) produce findings independently and have no access to one another's outputs. `report-writer` does not analyse.
-  - **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`).
+  - **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`). For `requirements-discovery` and `error-analysis` this phase runs in **adversarial mode** (`convergence.adversarial=true`): verifiers try to refute each finding against its cited evidence and the burden of proof sits on the claim — see that skill's §"Adversarial Verification Mode".
   - Do NOT conclude "no peer review happens" from the roster alone — every profile that lists ≥2 analyser workers runs convergence by default (`convergence.enabled=true` in `task-manifest.json`).
 - Tooling — read-only MCP availability (shared):
   - MCP is not implicit okstra context. Query an MCP server only when the task brief explicitly lists it as source material for this run. Any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.

package/runtime/prompts/profiles/_implementation-deliverable.md CHANGED Viewed

@@ -48,6 +48,7 @@ are collected and convergence finished. Phase 1-5 do not need it.
 ## Lead post-stage persistence (BLOCKING — runs after the Executor emits `### Stage Carry Evidence`)
 - Parse the executor's `### Stage Carry Evidence` JSON block. If absent or unparsable, end with status `contract-violated` and route to a follow-up `error-analysis`.
-- Write the JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
-- Append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
-- Quote both files' new contents (the sidecar JSON in full, the new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.
+- For EACH stage in this run's batch: write its JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
+- For EACH stage in this run's batch: append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
+- The verifier round, Phase 5.5 convergence, and this Phase 6 report run **once per run** over the batch's combined diff — NOT per stage. The single final report covers every batched stage, with a per-stage subsection.
+- Quote every batched stage's new contents (each sidecar JSON in full, each new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.

package/runtime/prompts/profiles/_implementation-executor.md CHANGED Viewed

@@ -27,10 +27,9 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
   - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
 - **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
-- re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
-  - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
-  - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
-  - extract the **start stage's** file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path. These — not the whole plan — are the authoritative scope for this run.
+- re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Read the **Stage batch** injected in the launch prompt (`Stage batch for this implementation run`): it lists the stage numbers this run owns, ascending. The runtime already selected and reserved this batch — do NOT recompute the start stage from `consumers.jsonl`.
+  - for each stage in the batch, load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
+  - the batch's stages are mutually independent (each one's `depends-on` are all already `status:done`, never another batch member), so execute them in ascending order; each stage's file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope for that stage.
 - inspect the current state of every file the plan names; if any file has changed materially since the plan was written, stop and route to a new `implementation-planning` run instead of editing speculatively
 - "materially changed" means: the function, class, section, or behaviour the plan targets has been edited, renamed, moved, removed, or otherwise altered in a way that invalidates the plan's reasoning. Cosmetic edits (whitespace, comment-only changes, unrelated function modifications elsewhere in the same file) do NOT trigger a re-plan; cite the diff (`git log --oneline <plan-created-at>..HEAD -- <file>`) in the final report and proceed.
 - distinguish the two file-scope rules (they are not in conflict):
@@ -38,15 +37,14 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - **out-of-plan rule** (Allowed actions section below): if a step *requires touching a file NOT in the plan list*, that is permitted with `Out-of-plan edits` justification. This handles honest scope discovery during execution.
 - confirm the test/build commands referenced in the plan still exist and run from a clean state
-## Stage execution contract (this run owns exactly one stage of the plan)
+## Stage execution contract (this run owns the injected stage batch)
-- **Sidecar evidence writer (BLOCKING).** When the start stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. The file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
-- **Reverse link (BLOCKING).** Before the first Edit/Write, append a `status:"started"` row to `runs/<plan-task-key>/consumers.jsonl` (lock via the okstra runtime). On stage completion, append a `status:"done"` row with `carry_path` populated.
-- **One-PR-per-stage.** This run creates exactly one PR titled `Stage <N>: <stage title>`. The PR body MUST include:
-  - `## Stage` — number and title (from Stage Map row).
-  - `## Carry-In summary` — depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
-  - `## Next stage` — next stage number/title or `(last stage)`.
-  Stage PRs link back to each other in their bodies (`Previous: #<n>, Next: #<m>` lines) so a reviewer can navigate the chain.
+- **Sidecar evidence writer (BLOCKING, per stage).** For each stage in the batch, when that stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. Each file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
+- **Reverse link (BLOCKING, per stage).** The runtime already appended a `status:"started"` row per batch stage before this run began. On each stage's completion, append a `status:"done"` row with `carry_path` populated for that stage number.
+- **One-PR-per-run.** This run creates exactly one PR titled `Stages <first>–<last>: <run summary>` (or `Stage <N>: <title>` when the batch is a single stage). The PR body MUST include:
+  - `## Stage <N>` — one section per batched stage: number, title (from Stage Map row), touched files, and validation result.
+  - `## Carry-In summary` — per stage, depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
+  - `## Previous run` / `## Next run` — links so a reviewer can navigate the run chain.
 ## Allowed actions during the run

package/runtime/prompts/profiles/error-analysis.md CHANGED Viewed

@@ -30,6 +30,8 @@
   - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
   - **Codebase-first ambiguity resolution (defect rule)**: any ambiguity about repro, file behavior, or symbol semantics that can be answered by `Read` / `Grep` / log inspection MUST be resolved that way and recorded with file:line (or log-line) evidence. Writing a clarification row for something the codebase or shipped logs already answer is a defect of this phase.
   - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <reporter-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed). A row with `none` that *could* have been answered by code or logs is a defect.
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each root-cause / reproduction claim by directly re-inspecting the cited code, logs, or config; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
 - Non-goals:
   - implementation details unless they are necessary to validate the cause
   - **source code edits, builds, migrations, or deployments** — this run produces evidence and cause analysis only; the fix belongs to a later `implementation-planning` run followed by an `implementation` run

package/runtime/prompts/profiles/implementation-planning.md CHANGED Viewed

@@ -65,7 +65,8 @@
     - `### Stepwise Execution Order` — bite-sized table with `step | action | files | command | expected`. **Effective row count ≤ 6** (excluding header / divider / blank). Each step is one action completable in 2–5 minutes; for code steps include actual code or diff sketch; prefer TDD ordering (failing test → implementation → green → commit).
     - `### Stage Exit Contract` — predicted added/modified files, newly exposed identifiers/types/endpoints, downstream-usable resources.
     - `### Stage Validation` — pre / mid / post exact commands or observable outcomes for this stage only.
-  - **Parallelisation-first rule (1st-class):** the writer MUST prefer the partition that maximises the number of `depends-on (none)` stages. Given two partitions with equal total step count, the one with fewer `depends-on` edges wins. Conservative `let's serialise to be safe` groupings are forbidden — each `depends-on` link is justified by a concrete data/contract dependency, not a vague risk concern.
+  - **Cohesion-first partition rule (1st-class):** the grouping anchor is **shared file/module proximity** — steps touching the same file/directory/module go in the same stage so the diff, PR, and rollback unit are semantically cohesive. A stage is split ONLY when (a) a real `depends-on` data/contract dependency exists, (b) effective steps would exceed 6, or (c) the file sets are disjoint (unrelated work touching no shared file is not crammed together). Maximising the number of parallel stages is NOT a reason to split — parallelism is an emergent property of independent stages, never a partitioning goal.
+  - **Parallel-safety invariant (BLOCKING):** any two stages that are both `depends-on (none)` MUST predict disjoint file sets in their `Stage Exit Contract`. Two parallel `implementation` runs would otherwise edit the same file concurrently. Work touching a shared file must either go in one stage or be ordered with `depends-on`. Enforced by `validators/validate-implementation-plan-stages.py` check S9.
   - **Stage exit contract is the carry surface:** keep it as narrow as possible. Wider surface = more downstream coupling.
   - dependency / migration risk assessment (ordering constraints, data backfills, feature-flag prerequisites, repo-internal sequencing)
   - validation checklist (pre / mid / post) — each item is an exact command or observable outcome
@@ -93,4 +94,4 @@
   4. **Ambiguity check** — any requirement that could be read two ways must be made explicit or moved to the `## 5. Clarification Items` table as a `Blocks=approval` row.
   5. **Scope check** — if the recommended plan now spans multiple independent subsystems, recommend splitting into separate planning runs rather than shipping an oversized plan.
   6. **Plan-body verification reconciliation (BLOCKING for implementation-planning).** Inspect the `### 4.5.9 Plan Body Verification` verdict table. For every plan-item row classified as `majority-disagree → C-<N>`, the corresponding `C-<N>` row MUST exist in `## 5. Clarification Items` with `Kind` chosen per the standard policy and `Blocks=approval`. Do NOT create a parallel `### 4.5.x Open Questions` block — the unified table is the single home. Conversely, the `Classification` column's `C-<N>` reference and the `## 5. Clarification Items` `ID` column MUST match 1:1; an orphan on either side is a contract violation. For `partial-consensus` and `worker-unique` plan-items, the dissenting opinion lives in §4.5.9 `Dissent log` and is NOT promoted to §5.
-  7. **Stage Map self-check** — for every stage, count the effective rows of its `Stepwise Execution Order` table by hand; reject the draft if any stage exceeds 6. Walk the `depends-on` graph and confirm it is a DAG (no cycle, no self-reference). For each `depends-on` link, ask "can this be removed by re-partitioning?" — if yes, re-partition and re-count.
+  7. **Stage Map self-check** — for every stage, count the effective rows of its `Stepwise Execution Order` table by hand; reject the draft if any stage exceeds 6. Walk the `depends-on` graph and confirm it is a DAG (no cycle, no self-reference). For each `depends-on` link, confirm it encodes a real data/contract dependency — do NOT add links to serialise unrelated work, and do NOT split a stage merely to create more parallel stages. **Parallel-safety:** for every pair of `depends-on (none)` stages, confirm their `Stage Exit Contract` predicted file sets are disjoint; if they share a file, merge them or add a `depends-on` link (validator S9 rejects overlap).

package/runtime/prompts/profiles/implementation.md CHANGED Viewed

@@ -1,6 +1,7 @@
 # Implementation Profile
 - Purpose: realise the approved `implementation-planning` deliverable as actual source changes, with cross-model verification, while keeping the run reversible
+- **Run-level fixed cost:** the verifier set, Phase 5.5 convergence, and the Phase 6 report-writer run exactly once per run, over the combined diff of all stages in this run's batch — never once per stage.
 - Required workers:
   - claude
   - codex

package/runtime/prompts/profiles/requirements-discovery.md CHANGED Viewed

@@ -51,6 +51,8 @@
   - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
   - **Codebase-first ambiguity resolution (defect rule)**: any ambiguity that can be answered by `Read` / `Grep` / file inspection MUST be resolved that way and recorded with file:line evidence. Writing a clarification row for something the codebase already answers is a defect of this phase.
   - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, external authority). A row with `none` that *could* have been answered by the codebase is a defect.
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
 - Non-goals:
   - full implementation design unless it is required to decide the next phase
   - **source code edits, plan authoring, builds, or deployments** — this run only classifies the work and routes it; deeper analysis and planning belong to subsequent phases

package/runtime/python/okstra_ctl/render.py CHANGED Viewed

@@ -903,6 +903,8 @@ def _build_convergence_block(ctx: dict) -> dict:
     - `enabled` default True
     - `maxRounds` default 1 for `requirements-discovery`, 2 otherwise
     - `verificationMode` default "lightweight"
+    - `adversarial` default True for `requirements-discovery` / `error-analysis`
+      (forces `verificationMode` to "full-reanalysis"), False otherwise
     - `planBodyVerification` is implementation-planning specific; the key is
       always emitted (dead-letter on other phases) so the schema stays stable.
@@ -912,12 +914,15 @@ def _build_convergence_block(ctx: dict) -> dict:
     """
     task_type = ctx.get("TASK_TYPE", "")
     default_max_rounds = 1 if task_type == "requirements-discovery" else 2
+    adversarial_phases = {"requirements-discovery", "error-analysis"}
+    is_adversarial = task_type in adversarial_phases
     raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
     plan_verify_enabled = raw_plan_verify != "false"
     return {
         "enabled": True,
+        "adversarial": is_adversarial,
         "maxRounds": default_max_rounds,
-        "verificationMode": "lightweight",
+        "verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
         "planBodyVerification": {
             "enabled": plan_verify_enabled,
             "maxRounds": 1,
@@ -1514,11 +1519,11 @@ def inject_lead_prompt_computed_tokens(ctx: dict) -> None:
 def apply_lead_prompt_defaults(ctx: dict) -> None:
     """Apply default values for optional lead-prompt ctx fields.
-    Sets four optional tokens that the lead prompt template references but
+    Sets the optional tokens that the lead prompt template references but
     which callers may legitimately leave unset (e.g., no validation has run
-    yet, no related tasks were declared). Caller-supplied values are
-    preserved via `setdefault` / `if-not-in` semantics — this function only
-    fills gaps, never overwrites.
+    yet, no related tasks were declared, the run is not an implementation
+    batch). Caller-supplied values are preserved via `setdefault` / `if-not-in`
+    semantics — this function only fills gaps, never overwrites.
     Companion to `inject_lead_prompt_computed_tokens` (which always
     overwrites with deterministically-derived values). The two functions
@@ -1528,6 +1533,9 @@ def apply_lead_prompt_defaults(ctx: dict) -> None:
     ctx.setdefault("VALIDATION_STATUS", "not-run")
     ctx.setdefault("RELATED_TASKS_BULLETS", "- None recorded")
     ctx.setdefault("RELATED_TASKS_INLINE", "None")
+    # Empty for non-implementation runs; the implementation prepare path
+    # overwrites it with the resolved stage-batch directive.
+    ctx.setdefault("STAGE_BATCH_DIRECTIVE", "")
     ctx.setdefault(
         "WORKER_PROMPT_PREAMBLE_PATH",
         str(Path.home() / ".okstra" / "templates" / "worker-prompt-preamble.md"),

package/runtime/python/okstra_ctl/run.py CHANGED Viewed

@@ -208,42 +208,58 @@ def _validate_stage_structure(plan_path: str) -> None:
         )
-def _resolve_effective_stage(
+RUN_STEP_BUDGET = 8
+def _resolve_effective_stages(
     stages: list,
     done_stages: set,
     requested: str,
-) -> int:
-    """Return the stage number to execute.
+    budget: int = RUN_STEP_BUDGET,
+) -> list:
+    """Return the ordered list of stage numbers this run executes.
+    `requested` is "auto" or a decimal string. For "auto" the run batches all
+    ready stages (depends-on all done, itself not done) in stage-number order up
+    to `budget` effective steps — but always at least one. A numeric request is a
+    single forced stage. Raises PrepareError on rejection cases."""
+    if requested != "auto":
+        try:
+            n = int(requested)
+        except ValueError:
+            raise PrepareError(
+                f"--stage must be 'auto' or an integer, got {requested!r}"
+            )
+        target = next((s for s in stages if s["stage_number"] == n), None)
+        if target is None:
+            raise PrepareError(
+                f"--stage {n} not in Stage Map "
+                f"(have {[s['stage_number'] for s in stages]})"
+            )
+        if n in done_stages:
+            raise PrepareError(
+                f"--stage {n} already completed (consumers.jsonl status:done exists)"
+            )
+        return [n]
-    `requested` is either "auto" or a decimal string.
-    Raises PrepareError on all rejection cases.
-    """
-    if requested == "auto":
-        for s in stages:
-            if s["stage_number"] in done_stages:
-                continue
-            if all(d in done_stages for d in s["depends_on"]):
-                return s["stage_number"]
+    ready = [
+        s for s in stages
+        if s["stage_number"] not in done_stages
+        and all(d in done_stages for d in s["depends_on"])
+    ]
+    if not ready:
         raise PrepareError(
             "no stage is ready: every remaining stage has unsatisfied depends-on"
         )
-    try:
-        n = int(requested)
-    except ValueError:
-        raise PrepareError(
-            f"--stage must be 'auto' or an integer, got {requested!r}"
-        )
-    target = next((s for s in stages if s["stage_number"] == n), None)
-    if target is None:
-        raise PrepareError(
-            f"--stage {n} not in Stage Map "
-            f"(have {[s['stage_number'] for s in stages]})"
-        )
-    if n in done_stages:
-        raise PrepareError(
-            f"--stage {n} already completed (consumers.jsonl status:done exists)"
-        )
-    return n
+    batch: list = []
+    total = 0
+    for s in ready:
+        sc = s.get("step_count", 0) or 0
+        if batch and total + sc > budget:
+            break
+        batch.append(s["stage_number"])
+        total += sc
+    return batch
 def _parse_stage_map_into_ctx(plan_path: str) -> list:
@@ -842,31 +858,42 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
     })
     if inp.task_type == "implementation":
         ctx["parsed_stage_map"] = ctx_stage_map
-        # Resolve effective stage and append `started` row to consumers.jsonl
+        # Resolve the ready-set batch and append a `started` row per batched stage.
         from .consumers import read_consumers, append_consumer
         import datetime as _dt
         plan_run_root = Path(inp.approved_plan_path).resolve().parents[1]
         consumed = read_consumers(plan_run_root)
         done_stages = {r["stage"] for r in consumed if r.get("status") == "done"}
-        effective = _resolve_effective_stage(
+        effective = _resolve_effective_stages(
             ctx["parsed_stage_map"], done_stages, inp.stage
         )
-        ctx["effective_stage"] = effective
-        inp.stage = str(effective)
-        print(f"selected stage: {inp.stage}", file=sys.stdout)
+        ctx["effective_stages"] = effective
+        csv = ",".join(str(n) for n in effective)
+        ctx["EFFECTIVE_STAGES"] = csv
+        ctx["STAGE_BATCH_DIRECTIVE"] = (
+            f"- **Stage batch for this implementation run:** `{csv}` "
+            "(comma-separated stage numbers, ascending). Execute exactly these "
+            "Stage Map stages in this order — this is the authoritative scope. "
+            "Do NOT recompute the start stage from `consumers.jsonl`; the runtime "
+            "already selected and reserved this batch."
+        )
+        inp.stage = csv
+        print(f"selected stages: {csv}", file=sys.stdout)
         head_proc = _subprocess.run(
             ["git", "rev-parse", "HEAD"],
             cwd=inp.project_root, capture_output=True, text=True,
         )
         head_sha = head_proc.stdout.strip() if head_proc.returncode == 0 else ""
-        append_consumer(
-            plan_run_root,
-            impl_task_key=ctx["TASK_KEY"],
-            stage=effective,
-            status="started",
-            started_at=_dt.datetime.now(_dt.timezone.utc).isoformat(),
-            head_commit=head_sha,
-        )
+        now = _dt.datetime.now(_dt.timezone.utc).isoformat()
+        for stage_n in effective:
+            append_consumer(
+                plan_run_root,
+                impl_task_key=ctx["TASK_KEY"],
+                stage=stage_n,
+                status="started",
+                started_at=now,
+                head_commit=head_sha,
+            )
     # ---- prepare directories + cleanup ----
     _ensure_task_directories(ctx)