npm - okstra - Versions diffs - 0.45.0 → 0.46.0 - Mend

okstra 0.45.0 → 0.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/docs/superpowers/plans/2026-06-04-stage-run-batching.md ADDED Viewed

@@ -0,0 +1,457 @@
+# Plan B — 실행기 측 run batching (ready-set + 예산 8) 구현 계획
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** 한 `implementation` run 이 정확히 1 stage 대신, `depends-on` 이 모두 done 인 ready stage 들을 합산 effective step 예산 8 까지 batch 로 실행하게 하여, 교차검증·report 고정비를 stage 수가 아닌 run 수에 비례하게 만든다(사용자가 체감한 비용 절감의 본체).
+**Architecture:** stage 선택 권한을 Python 으로 일원화한다(현재 `_resolve_effective_stage` 의 단일 int + 프롬프트의 lead 자가계산 이중 경로를 제거). `_resolve_effective_stages` 가 ready-set 배치 리스트를 반환하고, `prepare_task_bundle` 가 batch 의 각 stage 에 `started` consumer row 를 쓰고 `{{EFFECTIVE_STAGES}}` 를 launch 프롬프트에 주입한다. 프롬프트 계약은 "주입된 batch 의 stage 들을 오름차순으로 실행, 각 경계에서 stage 별 sidecar/consumers done row 방출, run 끝에 교차검증·report 1회, PR 1개" 로 바뀐다.
+**Tech Stack:** Python 3 (`scripts/okstra_ctl/run.py`, `consumers.py`), pytest + e2e bash, jinja2 launch 템플릿, markdown 프로파일 프롬프트, Node 빌드(`npm run build`).
+> **선행:** Plan A(응집 + S9 병렬-안전 불변식) 가 먼저 머지되어 있어야 batch 와 병렬 run 의 파일 충돌이 구조적으로 방지된다. 본 계획은 Plan A 브랜치(`feat/stage-cohesion-planner`) 위에서 이어 작업하거나 그것이 main 에 머지된 뒤 새 브랜치에서 시작한다.
+> **명시적 비범위:** parallel-run started-exclusion. 현재 `_resolve_effective_stage` 는 다른 run 이 `started` 한 stage 를 배제하지 않는다(두 병렬 run 이 같은 ready-set 을 잡을 수 있음 — `tests/test_e2e_multi_stage_q1_q9.py::test_q7` 가 이 기존 동작을 문서화). batch 도입이 이 gap 을 악화시키지 않으며, 사용자의 실제 사용(순차 phase-continuation)에는 영향 없다. started-exclusion 은 별도 작업으로 남긴다.
+---
+## 파일 구조
+| 파일 | 책임 | 작업 |
+|---|---|---|
+| `scripts/okstra_ctl/run.py` | bundle 준비, stage 선택, consumers started row, ctx 주입 | 수정 |
+| `tests/test_resolve_effective_stages.py` | 순수 선택 함수 유닛 커버리지 | 신규 |
+| `tests/test_auto_stage_selection.py` | 통합 선택 동작 | 수정(batch 의미) |
+| `tests/test_e2e_multi_stage_q1_q9.py` | e2e Q1–Q9 | 수정(batch 의미) |
+| `prompts/launch.template.md` | lead 런치 프롬프트(jinja) | 수정(`{{EFFECTIVE_STAGES}}` 주입) |
+| `prompts/profiles/_implementation-executor.md` | executor 역할 계약 | 수정(batch 소비, one-PR-per-run) |
+| `prompts/profiles/_implementation-deliverable.md` | Phase 6 산출 계약 | 수정(stage 별 done row, report 1회) |
+| `prompts/profiles/implementation.md` | implementation 프로파일 | 수정(run 단위 검증 명시) |
+| `docs/superpowers/specs/2026-06-04-stage-splitting-cost-aware-design.md` | 신규 설계 스펙 | 수정(구현 결과 반영: started-exclusion 비범위 명시) |
+---
+## Task 1: `_resolve_effective_stages` 순수 함수 (TDD)
+**Files:**
+- Create: `tests/test_resolve_effective_stages.py`
+- Modify: `scripts/okstra_ctl/run.py`
+`_resolve_effective_stage`(단일 int) 를 `_resolve_effective_stages`(list) 로 교체. ready = `depends-on` 이 모두 done 이고 자신은 not-done. batch = ready 를 stage 번호 순으로 누적 step_count ≤ 예산까지 (단 최소 1개 보장). numeric 요청은 단일 stage 리스트.
+- [ ] **Step 1: 실패 테스트 작성**
+Create `tests/test_resolve_effective_stages.py`:
+```python
+"""Unit coverage for run._resolve_effective_stages (ready-set batch selection)."""
+import importlib.util
+from pathlib import Path
+REPO = Path(__file__).resolve().parents[1]
+spec = importlib.util.spec_from_file_location(
+    "okstra_run", REPO / "scripts" / "okstra_ctl" / "run.py"
+)
+run = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(run)
+def _stage(n, deps, steps):
+    return {"stage_number": n, "depends_on": deps, "step_count": steps}
+def test_auto_batches_independent_ready_stages_within_budget():
+    stages = [_stage(1, [], 2), _stage(2, [], 3), _stage(3, [1, 2], 2)]
+    # stages 1+2 are ready (deps empty), 2+3=5 ≤ 8 → batch [1, 2]; 3 not ready.
+    assert run._resolve_effective_stages(stages, set(), "auto", budget=8) == [1, 2]
+def test_auto_stops_at_budget():
+    stages = [_stage(1, [], 5), _stage(2, [], 5)]
+    # 5 + 5 = 10 > 8 → only stage 1 fits after the first.
+    assert run._resolve_effective_stages(stages, set(), "auto", budget=8) == [1]
+def test_auto_guarantees_at_least_one_even_over_budget():
+    stages = [_stage(1, [], 6)]
+    # single ready stage at the cap; budget never drops it.
+    assert run._resolve_effective_stages(stages, set(), "auto", budget=8) == [1]
+def test_auto_skips_done_and_unready():
+    stages = [_stage(1, [], 2), _stage(2, [1], 2), _stage(3, [1], 2)]
+    # stage 1 done → 2 and 3 become ready (deps satisfied), 2+2=4 ≤ 8 → [2, 3].
+    assert run._resolve_effective_stages(stages, {1}, "auto", budget=8) == [2, 3]
+def test_auto_raises_when_nothing_ready():
+    stages = [_stage(1, [], 2), _stage(2, [1], 2)]
+    # stage 1 not done and stage 2 depends on it → with stage 1 done-set empty,
+    # stage 1 IS ready; to force "nothing ready" mark 1 done but 2 depends on a
+    # missing dep. Simulate by marking 1 done and 2 depending on un-done 3.
+    stages = [_stage(2, [3], 2)]
+    import pytest
+    with pytest.raises(run.PrepareError):
+        run._resolve_effective_stages(stages, set(), "auto", budget=8)
+def test_numeric_returns_single_stage():
+    stages = [_stage(1, [], 2), _stage(2, [], 2)]
+    assert run._resolve_effective_stages(stages, set(), "2", budget=8) == [2]
+def test_numeric_rejects_done_stage():
+    stages = [_stage(1, [], 2)]
+    import pytest
+    with pytest.raises(run.PrepareError):
+        run._resolve_effective_stages(stages, {1}, "1", budget=8)
+def test_numeric_rejects_unknown_stage():
+    stages = [_stage(1, [], 2)]
+    import pytest
+    with pytest.raises(run.PrepareError):
+        run._resolve_effective_stages(stages, set(), "9", budget=8)
+```
+- [ ] **Step 2: 실행 → 실패 확인**
+Run: `python3 -m pytest tests/test_resolve_effective_stages.py -v`
+Expected: collection 또는 AttributeError 실패 — `_resolve_effective_stages` 가 아직 없음.
+- [ ] **Step 3: 함수 구현**
+`scripts/okstra_ctl/run.py` 에서 기존 `_resolve_effective_stage`(line 211–246) 전체를 다음으로 교체:
+```python
+RUN_STEP_BUDGET = 8
+def _resolve_effective_stages(
+    stages: list,
+    done_stages: set,
+    requested: str,
+    budget: int = RUN_STEP_BUDGET,
+) -> list:
+    """Return the ordered list of stage numbers this run executes.
+    `requested` is "auto" or a decimal string. For "auto" the run batches all
+    ready stages (depends-on all done, itself not done) in stage-number order up
+    to `budget` effective steps — but always at least one. A numeric request is a
+    single forced stage. Raises PrepareError on rejection cases."""
+    if requested != "auto":
+        try:
+            n = int(requested)
+        except ValueError:
+            raise PrepareError(
+                f"--stage must be 'auto' or an integer, got {requested!r}"
+            )
+        target = next((s for s in stages if s["stage_number"] == n), None)
+        if target is None:
+            raise PrepareError(
+                f"--stage {n} not in Stage Map "
+                f"(have {[s['stage_number'] for s in stages]})"
+            )
+        if n in done_stages:
+            raise PrepareError(
+                f"--stage {n} already completed (consumers.jsonl status:done exists)"
+            )
+        return [n]
+    ready = [
+        s for s in stages
+        if s["stage_number"] not in done_stages
+        and all(d in done_stages for d in s["depends_on"])
+    ]
+    if not ready:
+        raise PrepareError(
+            "no stage is ready: every remaining stage has unsatisfied depends-on"
+        )
+    batch: list = []
+    total = 0
+    for s in ready:
+        sc = s.get("step_count", 0) or 0
+        if batch and total + sc > budget:
+            break
+        batch.append(s["stage_number"])
+        total += sc
+    return batch
+```
+- [ ] **Step 4: 실행 → 통과 확인**
+Run: `python3 -m pytest tests/test_resolve_effective_stages.py -v`
+Expected: 8 passed.
+- [ ] **Step 5: 커밋**
+```bash
+git add scripts/okstra_ctl/run.py tests/test_resolve_effective_stages.py
+git commit -m "feat(okstra_ctl/run): batch ready stages up to a run step budget"
+```
+---
+## Task 2: `prepare_task_bundle` 배선 + 통합/e2e 테스트 batch 의미 반영
+**Files:**
+- Modify: `scripts/okstra_ctl/run.py:843–869` (callsite)
+- Modify: `tests/test_auto_stage_selection.py`
+- Modify: `tests/test_e2e_multi_stage_q1_q9.py`
+- [ ] **Step 1: callsite 교체**
+`scripts/okstra_ctl/run.py` 의 line 851–869 블록을 다음으로 교체:
+```python
+        effective = _resolve_effective_stages(
+            ctx["parsed_stage_map"], done_stages, inp.stage
+        )
+        ctx["effective_stages"] = effective
+        csv = ",".join(str(n) for n in effective)
+        ctx["EFFECTIVE_STAGES"] = csv
+        ctx["STAGE_BATCH_DIRECTIVE"] = (
+            f"- **Stage batch for this implementation run:** `{csv}` "
+            "(comma-separated stage numbers, ascending). Execute exactly these "
+            "Stage Map stages in this order — this is the authoritative scope. "
+            "Do NOT recompute the start stage from `consumers.jsonl`; the runtime "
+            "already selected and reserved this batch."
+        )
+        inp.stage = csv
+        print(f"selected stages: {csv}", file=sys.stdout)
+        head_proc = _subprocess.run(
+            ["git", "rev-parse", "HEAD"],
+            cwd=inp.project_root, capture_output=True, text=True,
+        )
+        head_sha = head_proc.stdout.strip() if head_proc.returncode == 0 else ""
+        now = _dt.datetime.now(_dt.timezone.utc).isoformat()
+        for stage_n in effective:
+            append_consumer(
+                plan_run_root,
+                impl_task_key=ctx["TASK_KEY"],
+                stage=stage_n,
+                status="started",
+                started_at=now,
+                head_commit=head_sha,
+            )
+```
+(Non-impl 경로에서 `EFFECTIVE_STAGES` 가 비도록, ctx 기본값을 보장한다. ctx 초기화부에 `ctx.setdefault("EFFECTIVE_STAGES", "")` 가 없으면, jinja 렌더 시 KeyError 방지를 위해 launch 렌더 직전 `ctx.setdefault("EFFECTIVE_STAGES", "")` 한 줄을 추가 — Step 1b.)
+- [ ] **Step 1b: 비-impl 기본값 보장**
+`prepare_task_bundle` 에서 ctx 가 launch 렌더에 넘어가기 전(즉 `if inp.task_type == "implementation":` 블록 밖, 그 위) 한 줄 추가:
+```python
+    ctx.setdefault("EFFECTIVE_STAGES", "")
+    ctx.setdefault("STAGE_BATCH_DIRECTIVE", "")
+```
+- [ ] **Step 2: `test_auto_stage_selection.py` 갱신**
+기존 단일-stage assertion 을 batch 의미로 교체. 다음 두 곳을 수정한다.
+`test_auto_picks_stage_1_when_none_done` 의
+```python
+    assert "selected stage: 1" in r.stdout
+```
+를
+```python
+    assert "selected stages: 1" in r.stdout
+```
+로. (이 테스트의 fixture 가 stage 1·2 를 둘 다 ready 로 만들고 합산 ≤8 이면 출력이 `selected stages: 1,2` 가 된다. 픽스처를 읽어 stage 2 가 ready 인지·step 합을 확인하고, 그렇다면 `assert "selected stages: 1,2" in r.stdout` 로, stage 2 가 stage 1 에 depends-on 이면 `selected stages: 1` 로 맞춘다.)
+`test_consumers_started_row_appended_on_success` 의
+```python
+    started_rows = [row for row in rows if row.get("status") == "started"]
+    assert len(started_rows) == 1
+    assert started_rows[0]["stage"] == 1
+```
+를, 위에서 확인한 batch 크기에 맞춰: stage 1 단독 batch 면 그대로; stage 1+2 batch 면
+```python
+    started_rows = [row for row in rows if row.get("status") == "started"]
+    assert [row["stage"] for row in started_rows] == [1, 2]
+```
+로.
+`test_auto_picks_stage_2_when_stage_1_done_and_stage_2_independent` 의
+```python
+    assert "selected stage: 2" in r.stdout
+```
+는 stage 1 done 후 ready 집합이 {2,3...} 이 되므로, fixture 의 stage 3 depends-on 을 보고 `selected stages: 2` 또는 `selected stages: 2,3` 로 맞춘다.
+> 구현자 주의: 이 파일의 fixture 정의(인라인 plan 텍스트 또는 `tests/fixtures/plans/`)를 먼저 읽고 각 stage 의 `depends-on`·`step-count` 를 확인한 뒤, 위 규칙대로 기대 batch 를 계산해 assertion 을 채운다. 추측 금지.
+- [ ] **Step 3: `test_e2e_multi_stage_q1_q9.py` 갱신**
+batch 의미로 깨지는 assertion (Explore 맵 기준):
+- `test_q1_one_stage_plan_happy_path` (1-stage plan): batch=[1] 이므로 `"selected stage: 1"` → `"selected stages: 1"`, `len(started)==1` 유지.
+- `test_q2_three_stage_first_run_picks_stage_1`: 3-stage plan 의 stage 1·2 가 depends-on(none)·합산 ≤8 이면 batch=[1,2]. `"selected stage: 1"` → 실제 batch 문자열로, `len(rows)==1` → batch 크기로, `rows[0]["stage"]==1` → `[r["stage"] for r in rows]==<batch>` 로. (fixture 확인 후 확정.)
+- `test_q3_after_stage_1_done_picks_stage_2_and_loads_sidecar`: stage 1 done 후 ready 집합 기준으로 `"selected stages: ..."` 갱신.
+- `test_q7_parallel_runs_pick_distinct_none_stages`: 이 테스트는 started-exclusion 부재(비범위)를 문서화한다. 두 run 이 같은 batch 를 잡는 동작은 유지되나 출력 문자열이 `selected stages:` 로 바뀐다 — `"selected stage: 1"` → `"selected stages: ..."`. `len(started)==2` 와 `all(row["stage"]==1 ...)` 는 batch 크기에 따라 갱신(두 run 이 각각 같은 batch 를 started 하므로 started row 수 = 2 × batch 크기). fixture 확인 후 확정.
+- `test_q8_partial_depends_on_blocks_higher_stage`: `"selected stage: 1"` → `"selected stages: 1"` (stage 2·3 이 미충족 depends-on 으로 ready 가 아니면 batch=[1]).
+- S4/S5/S6/S8 거부 테스트(q4,q5,q6,q9)는 prepare 이전 validator 단계라 영향 없음 — 변경 금지.
+> 구현자 주의: 각 Q 테스트가 사용하는 plan fixture 의 stage depends-on·step-count 를 읽고 batch 를 계산해 정확한 문자열·카운트로 채운다.
+- [ ] **Step 4: 실행 → 통과 확인**
+Run: `python3 -m pytest tests/test_auto_stage_selection.py tests/test_e2e_multi_stage_q1_q9.py -v`
+Expected: 전부 통과 (batch 의미 반영).
+- [ ] **Step 5: 커밋**
+```bash
+git add scripts/okstra_ctl/run.py tests/test_auto_stage_selection.py tests/test_e2e_multi_stage_q1_q9.py
+git commit -m "feat(okstra_ctl/run): wire ready-set batch into bundle prep and consumers"
+```
+---
+## Task 3: 프롬프트 계약 — batch 소비 + one-PR-per-run + run 단위 검증
+**Files:**
+- Modify: `prompts/launch.template.md` (EFFECTIVE_STAGES 주입)
+- Modify: `prompts/profiles/_implementation-executor.md:30–49`
+- Modify: `prompts/profiles/_implementation-deliverable.md:51–52`
+- Modify: `prompts/profiles/implementation.md`
+프로즈 변경(유닛 테스트 없음) — 검증은 빌드 동기화 + 사람 읽기 + e2e.
+- [ ] **Step 1: launch 프롬프트에 batch 주입**
+`launch.template.md` 는 `{% %}` 제어구문을 쓰지 않고 `{{ }}` 치환만 쓴다(확인됨). 따라서 조건 분기 없이, Python 이 만든 완성 지시문 ctx 변수를 무조건 렌더한다 — 비-impl 에서는 `STAGE_BATCH_DIRECTIVE` 가 빈 문자열이라 빈 줄만 남는다.
+`prompts/launch.template.md` 의 `- This run executes \`{{WORKFLOW_CURRENT_PHASE}}\` only.` 줄(현재 line 17) 바로 아래에 추가:
+```markdown
+{{STAGE_BATCH_DIRECTIVE}}
+```
+- [ ] **Step 2: `_implementation-executor.md` batch 계약**
+line 30–33 의 "Determine **start stage**" 블록을 batch 소비로 교체:
+```markdown
+- read the **Stage batch** injected in the launch prompt (`Stage batch for this implementation run`). It lists the stage numbers this run owns, ascending. The runtime already selected and reserved them — do NOT recompute from `consumers.jsonl`.
+  - for each stage in the batch, load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(stage)` and inject them as runtime carry-in. `depends-on (none)` stages need no sidecar — task-brief only.
+  - the batch's stages are mutually independent (each one's depends-on are all already `status:done`, never another batch member), so execute them in ascending order; each stage's file list, step order, Stage Validation, Stage Exit Contract, and rollback path are the authoritative scope for that stage.
+```
+line 41 의 헤딩
+```markdown
+## Stage execution contract (this run owns exactly one stage of the plan)
+```
+을
+```markdown
+## Stage execution contract (this run owns the injected stage batch)
+```
+로.
+line 43–44 의 sidecar/reverse-link 규칙을 stage 별 반복으로:
+```markdown
+- **Sidecar evidence writer (BLOCKING, per stage).** For each stage in the batch, when that stage's Stage Validation `post` commands all succeed, the Executor MUST emit the JSON object (schema: `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2) and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. Each file MUST NOT exist before the run starts (overwrite refused).
+- **Reverse link (BLOCKING, per stage).** The runtime already appended a `status:"started"` row per batch stage before this run began. On each stage's completion, append a `status:"done"` row with `carry_path` populated for that stage number.
+```
+line 45–49 의 One-PR-per-stage 를 one-PR-per-run 으로:
+```markdown
+- **One-PR-per-run.** This run creates exactly one PR titled `Stages <first>–<last>: <run summary>` (or `Stage <N>: <title>` when the batch is a single stage). The PR body MUST include one `## Stage <N>` section per batched stage (title, files, validation result), and `## Previous run` / `## Next run` links so a reviewer can navigate the run chain.
+```
+- [ ] **Step 3: `_implementation-deliverable.md` done row + report 1회**
+line 51–52 의 sidecar/done-row 문구를 batch 인지로:
+```markdown
+- For EACH stage in this run's batch: write the carry JSON verbatim to `runs/<impl-key>/carry/stage-<N>.json` (refuse to overwrite an existing file), then append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the HEAD SHA for that stage.
+- The verifier round, Phase 5.5 convergence, and this Phase 6 report run **once per run** over the batch's combined diff — NOT per stage. The single final report covers every batched stage, with a per-stage subsection.
+```
+- [ ] **Step 4: `implementation.md` run 단위 검증 명시**
+`prompts/profiles/implementation.md` 의 `- Required workers:` 블록 바로 위 또는 Purpose 줄(line 3) 아래에 한 줄 추가:
+```markdown
+- **Run-level fixed cost:** the verifier set, Phase 5.5 convergence, and the Phase 6 report-writer run exactly once per run, over the combined diff of all stages in this run's batch — never once per stage.
+```
+- [ ] **Step 5: 토큰 정합성 확인**
+Run: `grep -rn "owns exactly one stage\|One-PR-per-stage\|EFFECTIVE_STAGES" prompts/profiles/ prompts/launch.template.md`
+Expected: `owns exactly one stage` / `One-PR-per-stage` 잔재 없음, `EFFECTIVE_STAGES` 는 launch + executor 에서 의미 일치.
+- [ ] **Step 6: 커밋**
+```bash
+git add prompts/launch.template.md prompts/profiles/_implementation-executor.md prompts/profiles/_implementation-deliverable.md prompts/profiles/implementation.md
+git commit -m "feat(prompts/implementation): consume injected stage batch, one PR and one verification per run"
+```
+---
+## Task 4: 신규 설계 스펙 — started-exclusion 비범위 명시
+**Files:**
+- Modify: `docs/superpowers/specs/2026-06-04-stage-splitting-cost-aware-design.md`
+구현 중 확정된 사실(started-exclusion 비범위, drift 일원화)을 스펙에 반영해 문서-구현 정합을 맞춘다.
+- [ ] **Step 1: §4 비범위에 한 줄 추가**
+`## 4. 비범위 / 향후` 섹션의 마지막 bullet 아래에 추가:
+```markdown
+- **parallel-run started-exclusion 은 비범위.** 현재 ready-set 선택은 다른 run 이 `started` 한 stage 를 배제하지 않는다(두 병렬 run 이 같은 batch 를 잡을 수 있음 — 기존 동작, `tests/test_e2e_multi_stage_q1_q9.py::test_q7` 가 문서화). 사용자의 순차 phase-continuation 사용에는 영향 없으며, 진짜 충돌 backstop 은 §2.2 의 파일-서로소 불변식 + worktree 직렬성이다. started-exclusion 도입은 별도 작업.
+- **stage 선택은 Python SOT.** `_resolve_effective_stages`(run.py) 가 batch 를 선택·예약하고 `{{EFFECTIVE_STAGES}}` 로 lead 에 주입한다. 기존의 "lead 가 consumers.jsonl 로 자가계산" 이중 경로(drift)는 제거됐다.
+```
+- [ ] **Step 2: 커밋**
+```bash
+git add docs/superpowers/specs/2026-06-04-stage-splitting-cost-aware-design.md
+git commit -m "docs(specs): record started-exclusion non-goal and Python-SOT stage selection"
+```
+---
+## Task 5: 빌드 동기화 + 전체 회귀 + e2e
+**Files:** (없음 — 빌드·검증만)
+- [ ] **Step 1: runtime 동기화**
+Run: `npm run build`
+Expected: 종료 코드 0, 22/22 동기화.
+- [ ] **Step 2: stage 관련 + 전체 테스트**
+Run: `python3 -m pytest tests/`
+Expected: 전부 통과. 특히 `test_resolve_effective_stages.py`, `test_auto_stage_selection.py`, `test_e2e_multi_stage_q1_q9.py`, `test_wizard_stage_pick.py`, `test_run_stage_arg.py` 통과.
+- [ ] **Step 3: e2e 시나리오(있다면)**
+Run: `ls tests-e2e/ | grep -i stage` — stage 관련 e2e 시나리오가 있으면 실행:
+`bash tests-e2e/<scenario>.sh`
+Expected: 종료 코드 0. (없으면 이 step 생략하고 그 사실을 기록.)
+- [ ] **Step 4: 워크플로 validator**
+Run: `bash validators/validate-workflow.sh`
+Expected: 종료 코드 0.
+- [ ] **Step 5: CLI 스모크**
+Run: `node bin/okstra --version && node bin/okstra doctor`
+Expected: 버전 출력 + doctor 진단 정상.
+---
+## Self-Review (작성자 체크 — 실행 전 1회)
+- **Spec coverage:** 신규 스펙 §2.1(stage/run 분리)→Task 1+3, §2.3(ready-set 예산 8)→Task 1, §2.4(run 단위 검증, stage 별 sidecar)→Task 3, started-exclusion 비범위→Task 4. Plan A 의 S9·응집은 본 계획 비범위(이미 완료).
+- **Placeholder scan:** Python·테스트·프롬프트 step 에 실제 코드/문자열/명령 포함. 단, Task 2 의 일부 테스트 assertion 은 fixture 의 depends-on·step-count 에 의존하므로 "구현자가 fixture 를 읽고 batch 를 계산해 채운다" 는 명시적 규칙으로 대체 — 이는 placeholder 가 아니라 데이터-의존 변환 규칙이며, 변환 규칙 자체는 완전 명세돼 있다.
+- **Type consistency:** `_resolve_effective_stages` → `list[int]` 반환. callsite 는 `effective`(list), `ctx["EFFECTIVE_STAGES"]`(CSV str), `inp.stage`(CSV str), started row 루프. 프롬프트는 `{{EFFECTIVE_STAGES}}`(CSV) 소비. 함수명 `_resolve_effective_stages`(복수형) 로 통일 — 옛 단수형 `_resolve_effective_stage` 호출 잔재가 없는지 Task 2 Step 1 에서 grep 으로 확인할 것.
+- **잔재 확인 권고:** Task 2 착수 전 `grep -rn "_resolve_effective_stage\b" scripts/ tests/` 로 단수형 호출처가 callsite(run.py:851) 외에 없음을 확인.
+- **검증 한계(실DB/IO 규칙):** 본 변경은 DB/IO 를 직접 건드리지 않으나, e2e(`tests-e2e/`)와 실제 `implementation` run 의 lead 동작(프롬프트 계약)은 pytest 로 완전 재현되지 않는다. 프롬프트 계약 변경의 "검증" 은 빌드 동기화 + e2e 시나리오까지이며, lead 의 실제 batch 실행은 다음 실 run 에서 관측해야 최종 확인된다 — 그 전까지는 "정적·e2e 상 통과, 실 run 미관측" 으로 보고한다.

package/docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md CHANGED Viewed

@@ -35,6 +35,8 @@ planner LLM 은 **항상** Stage Map + N 개의 Stage 섹션으로 산출한다.
 ### 2.3 병렬화 최대화 우선 (분할 1 급 기준)
+> **[2026-06-04 대체됨]** 본 절의 "병렬화 최대화 우선" 은 [`2026-06-04-stage-splitting-cost-aware-design.md`](2026-06-04-stage-splitting-cost-aware-design.md) §2.2 의 "응집 기준점=파일/모듈 근접성, ≤6 cap 유일 분할기" 로 대체되었다. 병렬화는 더 이상 분할의 1급 기준이 아니다. 아래 원문은 역사 참조용. (같은 문서의 `step ≤6 cap`(§2.3 line 46–50)·carry-in(§2.4)·데이터 모델(§3) 은 유효.)
 stage·step 을 구성할 때 **종속을 최소화해 병렬 가능 단위를 최대화하는 것을 1 순위 기준**으로 삼는다. 두 분할 안이 같은 step 수를 갖는다면, `depends-on` 링크가 더 적은(=병렬 가능 stage 가 더 많은) 쪽을 채택한다.
 구체적 가이드:

package/docs/superpowers/specs/2026-06-04-stage-splitting-cost-aware-design.md ADDED Viewed

@@ -0,0 +1,98 @@
+# stage 분할 기준의 비용-인식 재설계 — 설계
+- 작성일: 2026-06-04
+- 범위: `implementation-planning` 의 stage 분할 기준과 `implementation` 의 run 실행 단위를 **비용-인식** 구조로 바꾼다. stage 는 계획/PR/검증-증거 단위로 유지하되, 교차검증 기계(verifier·convergence·report·teardown)의 고정비가 stage 수에 비례해 곱해지지 않도록 run 을 별도 단위로 분리한다.
+- 비범위
+  - 신규 task-type 추가 없음. `requirements-discovery` / `error-analysis` / `final-verification` / `release-handoff` 산출 구조 불변.
+  - sidecar carry-in JSON 스키마(`runs/<impl-key>/carry/stage-<N>.json`) 자체는 불변 — stage 별 방출 유지.
+  - 다국어/i18n.
+- 관계: 본 문서는 [`2026-05-20-implementation-planning-multi-stage-design.md`](2026-05-20-implementation-planning-multi-stage-design.md) 의 **§2.3 "병렬화 최대화 우선(분할 1급 기준)"(line 36–44)** 을 대체한다. 같은 문서의 `step ≤6 cap`(line 46–50), carry-in 모델(§2.4), 데이터 모델(§3) 은 그대로 유지된다. 구현 단계에서 옛 spec 의 §2.3 을 본 결정으로 재작성한다.
+## 1. 동기 — 1줄 fix 가 고정비 1세트를 통째로 삼킨다
+실측: `fontradar-v2-api` 의 1줄 fix(await 1개 제거, +1/−3)가 3 stage 로 분할되어 Stage 1 단독으로 교차검증 1세트를 전부 떠안았다. 원인은 코드가 아니라 분할 기준의 구조적 결함이다.
+1. **고정비는 run 당이다.** 한 `implementation` run 은 executor 1 + verifier 2(claude+codex) + report-writer 를 무조건 단다 ([`prompts/profiles/implementation.md:4`](../../../prompts/profiles/implementation.md)). 여기에 Phase 5.5 convergence 와 Phase 7 teardown 이 더해진다. 이 비용은 변경 크기와 무관한 고정비다.
+2. **그 고정비가 stage 수만큼 곱해진다.** 현재 `1 run = 정확히 1 stage` 가 강제다 ([`prompts/profiles/_implementation-executor.md:41`](../../../prompts/profiles/_implementation-executor.md)). 따라서 stage 수 = 고정비 배수.
+3. **분할 기준에 하한이 없다.** 옛 spec 의 분할 기준은 상한(stage 당 step ≤6)과 "병렬 가능 stage 최대화"(1급) 뿐이다 ([`2026-05-20-...-design.md:36`](2026-05-20-implementation-planning-multi-stage-design.md), [`prompts/profiles/implementation-planning.md:68`](../../../prompts/profiles/implementation-planning.md)). "너무 작으면 합쳐라" 가 없다.
+4. **옛 spec 내부 모순.** §2.2 는 "한 stage 안의 step 은 상호 독립 → 독립이면 같은 stage"([line 31](2026-05-20-implementation-planning-multi-stage-design.md)), §2.3 은 "독립 stage 가 여럿 나오도록 분할"([line 42](2026-05-20-implementation-planning-multi-stage-design.md)). 독립 작업을 묶을지/쪼갤지가 정반대로 읽힌다.
+5. **잘못된 유추.** §2.3 은 step ≤6 cap 을 "함수 50라인 cap 과 같은 정신" 이라 했다 ([line 48](2026-05-20-implementation-planning-multi-stage-design.md)). 함수 분리는 공짜지만 okstra 의 stage 분리는 멀티에이전트 run 1세트를 새로 사는 일이다. 비용 구조가 다른데 같은 규칙을 이식했다.
+## 2. 핵심 원칙
+### 2.1 stage 와 run 의 분리
+| 단위 | 의미 | 경계 기준 |
+|---|---|---|
+| **stage** | 계획·PR·검증-증거(sidecar)의 단위 | `depends-on` 종속 + effective step ≤6 cap |
+| **run** | 교차검증 기계(verifier·convergence·report·teardown)의 단위 = 고정비 1세트 | ready-set + run step 예산 |
+옛 spec 의 `1 run = 1 stage` 등식을 깬다. 한 run 이 여러 stage 를 batch 로 소유할 수 있다.
+### 2.2 플래너: 응집 기준점 = 파일/모듈 근접성, 한도 = ≤6 cap
+묶음의 **기준점은 공유 파일/모듈 근접성**이다. "독립이고 크기가 맞으면 아무거나 묶어도 된다" 가 아니라, **같은 파일·디렉터리·모듈을 건드리는 작업끼리 묶는다.** 이래야 diff·PR·rollback 단위가 의미적으로 응집한다.
+- **기본값 = 같은 파일/모듈을 건드리는 작업은 같은 stage 로 묶는다.** §2.2(독립=같은 stage) 를 채택하고 옛 §2.3(병렬화 위해 쪼개라) 을 폐기해 모순을 해소한다.
+- stage 를 **분리**하는 조건 (셋 중 하나):
+  - (a) 실제 `depends-on` 종속이 존재한다 (한 step 의 산출을 다른 step 이 소비), 또는
+  - (b) 합산 effective step 이 6 을 넘는다, 또는
+  - (c) **건드리는 파일 집합이 서로소다** (공유 파일/모듈이 없는 독립 작업 — 억지로 한 stage 에 섞지 않는다).
+- **병렬화는 분할의 이유가 아니다.** cap 이 만든 여러 stage 가 우연히 `depends-on (none)` 이면 두 run 이 동시에 잡아 병렬 진행할 수 있다 — 그건 부수효과일 뿐, stage 를 더 만들 근거가 아니다.
+- `step ≤6 cap`(옛 §2.3 line 46–50) 은 그대로 유지한다.
+**병렬-안전 불변식 (파일 근접성 기준의 enforcement teeth):** 서로 `depends-on (none)` 인 두 stage 의 `Stage Exit Contract` 예측 파일 집합은 **서로소여야 한다.** 안 그러면 두 병렬 run 이 같은 파일을 동시에 편집해 충돌한다. 같은 파일을 건드리는 두 작업은 (i) 같은 stage 로 묶거나 (ii) `depends-on` 으로 순서를 매겨야 한다.
+효과:
+- 같은 파일/모듈을 손대는 독립 버그 → 1 stage 로 합쳐짐 (합산 ≤6 한도).
+- 서로 다른 모듈을 손대는 독립 버그 → 별도 stage (응집 보존). 단 고정비는 §2.3 의 run batch 가 흡수하므로 stage 수가 늘어도 비용은 곱해지지 않는다.
+### 2.3 실행기: "ready-set + run step 예산 batch"
+- 디스패치 시점에, `depends-on` 이 모두 `status:done` 이고 미점유인 stage 들을 **합산 effective step 이 run 예산에 닿을 때까지** 한 run 이 흡수한다.
+- **run step 예산 = 8** (stage cap 6 보다 큼). 1줄짜리 잔여 stage 를 더 큰 stage 옆에 흡수하되, stage 경계·검증 정밀도는 거의 그대로 유지하는 보수적 값.
+- start-stage 단일 선택([`_implementation-executor.md:30`](../../../prompts/profiles/_implementation-executor.md)) 을 ready-set 선택으로 확장한다. 선택 알고리즘:
+  1. ready 집합 = `depends-on` 이 전부 done 이고 `consumers.jsonl` 에 `started`/`done` 행이 없는 stage.
+  2. ready 를 stage 번호 오름차순으로 정렬.
+  3. 누적 effective step 이 8 을 넘지 않는 한도에서 앞에서부터 담는다. 단일 stage 가 8 을 넘으면(불가능 — cap 6) 그 stage 하나만.
+  4. 최소 1개는 보장(빈 batch 금지).
+- 점유: `consumers.jsonl` reverse-link 와 `~/.okstra/worktrees/registry.json` 이 run 당 **stage-key 집합**을 원자적으로(flock) 예약한다. batch 안 모든 stage 에 `started` 행을 한 번에 append, 완료 시 각 stage 에 `done` 행 + `carry_path` append.
+- **응집은 이미 stage 단위에 있다.** run batch 는 순수 비용 메커니즘 — 응집 기준점(파일 근접성)은 §2.2 에서 stage 형성 시 이미 적용됐고, PR 은 stage 섹션으로 분리되므로 batch 가 서로 다른 모듈의 stage 를 묶어도 리뷰·rollback 응집은 stage 경계에 보존된다. 한 run 은 단일 worktree 에서 stage 순서대로 직렬 실행하므로 batch 내부 충돌은 없다. §2.2 의 병렬-안전 불변식 덕분에 서로 다른 run 이 잡는 stage 들도 파일이 겹치지 않는다.
+### 2.4 검증·산출물 단위
+- verifier·convergence·report-writer·Phase 7 은 batch **전체 diff 에 대해 run 당 1회.** ← 실제 절감의 핵심.
+- executor 는 stage 마다 `Stage Validation post` 통과 시 `carry/stage-N.json` sidecar 를 **stage 별로 방출**한다 ([`_implementation-executor.md:43`](../../../prompts/profiles/_implementation-executor.md)). carry-in 계약 보존. 즉 batch 안에서도 stage 순서대로 실행하며 각 stage 경계에서 post 검증 + sidecar 방출, 마지막에 run 단위 교차검증 1회.
+- PR 은 **run 당 1개.** 제목 `Stages <X>–<Y>: <run 요약>`, body 에 stage 섹션 분리 + Previous/Next run 링크. one-PR-per-stage([`_implementation-executor.md:45`](../../../prompts/profiles/_implementation-executor.md)) 를 one-PR-per-run 으로 전환.
+## 3. 손대는 파일
+| 파일 | 변경 |
+|---|---|
+| 본 문서 | 신규 설계(옛 §2.3 대체) |
+| [`2026-05-20-...-design.md:36`](2026-05-20-implementation-planning-multi-stage-design.md) | §2.3 병렬화-우선 규칙 삭제, §2.1 표에 "run 단위" 추가, run batch·run단위 검증 섹션 추가, 본 문서 참조 |
+| [`prompts/profiles/implementation-planning.md:68`](../../../prompts/profiles/implementation-planning.md) | "Parallelisation-first rule (1st-class)" → "응집 기준점=파일/모듈 근접성, cap 이 유일 분할기, 병렬화는 분할 이유 아님" |
+| [`prompts/profiles/implementation-planning.md:96`](../../../prompts/profiles/implementation-planning.md) | Stage Map self-check 에 (i) "depends-on 제거 가능하면 재분할" → "병렬화 목적의 분할 금지", (ii) 병렬-안전 불변식 자가검사("`depends-on (none)` stage 들의 예측 파일 집합 서로소") 추가 |
+| [`prompts/profiles/_implementation-executor.md:30`](../../../prompts/profiles/_implementation-executor.md) | start-stage 단일 선택 → ready-set batch 선택(§2.3 알고리즘) |
+| [`prompts/profiles/_implementation-executor.md:41`](../../../prompts/profiles/_implementation-executor.md) | "owns exactly one stage" → "owns a ready-set batch (run 예산 8 이내)" |
+| [`prompts/profiles/_implementation-executor.md:43-49`](../../../prompts/profiles/_implementation-executor.md) | sidecar/consumers 를 batch 안 stage 별로 방출하도록, one-PR-per-stage → one-PR-per-run |
+| [`prompts/profiles/implementation.md:4`](../../../prompts/profiles/implementation.md) | (필요 시) Phase 설명에 "검증·report 는 run 단위 1회" 명시 |
+| [`validators/validate-implementation-plan-stages.py`](../../../validators/validate-implementation-plan-stages.py) | 기존 ≤6 cap([line 140](../../../validators/validate-implementation-plan-stages.py))·step-count 셀 일치([line 143](../../../validators/validate-implementation-plan-stages.py))·depends-on DAG([line 149](../../../validators/validate-implementation-plan-stages.py)) 유지. **신규 체크 S9** 추가 — `Stage Exit Contract` 의 "추가/변경된 파일 (예측)" 라인에서 경로를 추출해, 서로 `depends-on (none)` 인 stage 쌍의 파일 집합이 겹치면 거부 (병렬-안전 불변식). 예측 기반 best-effort 이며, 실제 충돌의 backstop 은 worktree 직렬성 + flock registry 다 |
+## 4. 비범위 / 향후
+- run 예산 8 은 상수로 시작. 사용자 노출 플래그(`--run-step-budget`)는 YAGNI — 실제 요구 전까지 도입하지 않는다.
+- 명시적 `--stages 1,2,3` batch 지정은 본 설계의 자동 ready-set 으로 충분 — 별도 도입하지 않는다.
+- **parallel-run started-exclusion 은 비범위.** ready-set 선택은 다른 run 이 `started` 한 stage 를 배제하지 않는다 — 두 병렬 run 은 같은 ready-set batch 를 잡는다(기존 동작, [`tests/test_e2e_multi_stage_q1_q9.py::test_q7`](../../../tests/test_e2e_multi_stage_q1_q9.py) 가 문서화). consumers.jsonl 의 flock 은 파일 손상만 막을 뿐 batch 분리를 보장하지 않는다. 사용자의 순차 phase-continuation 사용에는 영향 없으며, 진짜 충돌 backstop 은 §2.2 의 파일-서로소 불변식 + worktree 직렬성이다. started-exclusion 도입은 별도 작업으로 남긴다.
+- **stage 선택은 Python SOT.** `_resolve_effective_stages`([`scripts/okstra_ctl/run.py`](../../../scripts/okstra_ctl/run.py)) 가 batch 를 선택하고 `prepare_task_bundle` 이 stage 별 `started` row 를 기록한 뒤 `{{STAGE_BATCH_DIRECTIVE}}` 로 lead 프롬프트에 주입한다. 기존의 "lead 가 consumers.jsonl 로 start stage 자가계산" 이중 경로(drift)는 제거됐다.
+## 5. 검증 방식 (enforcement)
+| 계약 | 강제 위치 |
+|---|---|
+| stage 당 effective step ≤6 | [`validators/validate-implementation-plan-stages.py`](../../../validators/validate-implementation-plan-stages.py) (기존 유지) |
+| 응집 기준점 = 파일/모듈 근접성 | planner 자가검사 prose ([`implementation-planning.md:96`](../../../prompts/profiles/implementation-planning.md)). 정성 기준이라 validator 강제 불가 — 아래 불변식이 그 위반의 대리 신호 |
+| 병렬-안전 불변식 (`depends-on (none)` stage 파일 서로소) | validator 신규 체크 S9 (best-effort, 예측 경로 기반). 진짜 backstop 은 worktree 직렬성 + flock registry |
+| run batch 합산 step ≤8 | executor 가 ready-set 선택 시 누적 카운트로 자르고, run 시작 로그에 batch 구성을 기록. (planner 산출물이 아니므로 plan validator 대상 아님) |
+| stage 별 sidecar 방출 | [`_implementation-executor.md:43`](../../../prompts/profiles/_implementation-executor.md) BLOCKING 규칙 유지 |
+| run 단위 검증 1회 | Phase 5.5/6 진입을 run 당 1회로 묶는 lead 오케스트레이션 (sidecar 규칙) |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.45.0",
+  "version": "0.46.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.45.0",
-  "builtAt": "2026-06-04T06:11:59.186Z",
+  "package": "0.46.0",
+  "builtAt": "2026-06-04T11:19:42.641Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/prompts/launch.template.md CHANGED Viewed

@@ -15,6 +15,7 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
 - Forbidden actions in this phase:
 {{PHASE_FORBIDDEN_ACTIONS}}
 - This run executes `{{WORKFLOW_CURRENT_PHASE}}` only. Do not start `{{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` or any later phase inside this run, even if the user says "다음 단계 진행해" or similar.
+{{STAGE_BATCH_DIRECTIVE}}
 - Phase advancement requires a new okstra invocation launched with `--task-type {{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` after this run's final report is written and approved. The lead must not write source code, run builds/migrations/deployments, or otherwise produce artifacts of a different phase from inside this run.
 - See `Lifecycle Phase Boundaries` in the okstra skill (`agents/SKILL.md`) for the canonical rules and the phase-transition checklist.

package/runtime/prompts/profiles/_implementation-deliverable.md CHANGED Viewed

@@ -48,6 +48,7 @@ are collected and convergence finished. Phase 1-5 do not need it.
 ## Lead post-stage persistence (BLOCKING — runs after the Executor emits `### Stage Carry Evidence`)
 - Parse the executor's `### Stage Carry Evidence` JSON block. If absent or unparsable, end with status `contract-violated` and route to a follow-up `error-analysis`.
-- Write the JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
-- Append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
-- Quote both files' new contents (the sidecar JSON in full, the new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.
+- For EACH stage in this run's batch: write its JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
+- For EACH stage in this run's batch: append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
+- The verifier round, Phase 5.5 convergence, and this Phase 6 report run **once per run** over the batch's combined diff — NOT per stage. The single final report covers every batched stage, with a per-stage subsection.
+- Quote every batched stage's new contents (each sidecar JSON in full, each new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.