npm - okstra - Versions diffs - 0.34.1 → 0.36.1 - Mend

okstra 0.34.1 → 0.36.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (108) hide show

package/README.kr.md +27 -19
package/README.md +27 -19
package/docs/kr/architecture.md +59 -45
package/docs/kr/cli.md +61 -18
package/docs/pr-template-usage.md +65 -0
package/docs/project-structure-overview.md +353 -354
package/docs/superpowers/plans/2026-05-12-ticket-id-in-reports.md +1 -1
package/docs/superpowers/plans/2026-05-14-convergence-queue-pruning.md +1 -1
package/docs/superpowers/plans/2026-05-17-dual-format-final-report.md +1 -1
package/docs/superpowers/plans/2026-05-20-final-report-language.md +1501 -0
package/docs/superpowers/plans/2026-05-20-implementation-planning-multi-stage.md +1267 -0
package/docs/superpowers/plans/2026-05-20-okstra-run-prompt-sot-b1.md +1007 -0
package/docs/superpowers/plans/2026-05-20-wizard-messages-json-sot.md +720 -0
package/docs/superpowers/plans/2026-05-20-wizard-prompt-json-sot-a1.md +681 -0
package/docs/superpowers/plans/2026-05-21-improvement-discovery-task-type.md +1691 -0
package/docs/superpowers/plans/2026-05-24-implementation-lead-context-slimming.md +1700 -0
package/docs/superpowers/specs/2026-05-20-final-report-language-design.md +383 -0
package/docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md +320 -0
package/docs/superpowers/specs/2026-05-20-okstra-run-prompt-sot-design.md +299 -0
package/docs/superpowers/specs/2026-05-21-improvement-discovery-task-type-design.md +335 -0
package/docs/task-process/README.md +74 -0
package/docs/task-process/common-flow.md +166 -0
package/docs/task-process/error-analysis.md +101 -0
package/docs/task-process/final-verification.md +167 -0
package/docs/task-process/implementation-planning.md +128 -0
package/docs/task-process/implementation.md +149 -0
package/docs/task-process/release-handoff.md +206 -0
package/docs/task-process/requirements-discovery.md +115 -0
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/agents/SKILL.md +30 -7
package/runtime/agents/workers/claude-worker.md +31 -6
package/runtime/agents/workers/codex-worker.md +37 -10
package/runtime/agents/workers/gemini-worker.md +34 -7
package/runtime/agents/workers/report-writer-worker.md +19 -10
package/runtime/bin/okstra-central.sh +6 -6
package/runtime/bin/okstra-codex-exec.sh +49 -28
package/runtime/bin/okstra-gemini-exec.sh +39 -21
package/runtime/bin/okstra-render-final-report.py +13 -2
package/runtime/bin/okstra-wrapper-status.py +155 -0
package/runtime/bin/okstra.sh +2 -2
package/runtime/prompts/launch.template.md +1 -0
package/runtime/prompts/profiles/_common-contract.md +11 -6
package/runtime/prompts/profiles/_implementation-deliverable.md +53 -0
package/runtime/prompts/profiles/_implementation-executor.md +60 -0
package/runtime/prompts/profiles/_implementation-verifier.md +76 -0
package/runtime/prompts/profiles/error-analysis.md +3 -7
package/runtime/prompts/profiles/implementation-planning.md +22 -21
package/runtime/prompts/profiles/implementation.md +28 -118
package/runtime/prompts/profiles/improvement-discovery.md +42 -0
package/runtime/prompts/profiles/release-handoff.md +1 -1
package/runtime/prompts/profiles/requirements-discovery.md +8 -12
package/runtime/prompts/wizard/prompts.ko.json +230 -0
package/runtime/python/lib/okstra/cli.sh +2 -49
package/runtime/python/lib/okstra/globals.sh +21 -21
package/runtime/python/lib/okstra/interactive.sh +7 -7
package/runtime/python/okstra_ctl/clarification_items.py +3 -9
package/runtime/python/okstra_ctl/consumers.py +53 -0
package/runtime/python/okstra_ctl/final_report_schema.py +0 -7
package/runtime/python/okstra_ctl/i18n.py +73 -0
package/runtime/python/okstra_ctl/improvement_lenses.py +44 -0
package/runtime/python/okstra_ctl/index.py +1 -1
package/runtime/python/okstra_ctl/paths.py +26 -20
package/runtime/python/okstra_ctl/render.py +166 -207
package/runtime/python/okstra_ctl/render_final_report.py +53 -10
package/runtime/python/okstra_ctl/run.py +299 -108
package/runtime/python/okstra_ctl/run_context.py +22 -0
package/runtime/python/okstra_ctl/seeding.py +186 -0
package/runtime/python/okstra_ctl/session.py +65 -7
package/runtime/python/okstra_ctl/wizard.py +348 -127
package/runtime/python/okstra_ctl/workflow.py +21 -2
package/runtime/python/okstra_ctl/worktree.py +54 -1
package/runtime/python/okstra_project/resolver.py +4 -3
package/runtime/python/okstra_token_usage/report.py +2 -2
package/runtime/schemas/final-report-v1.0.schema.json +22 -16
package/runtime/skills/okstra-brief/SKILL.md +102 -218
package/runtime/skills/okstra-convergence/SKILL.md +2 -3
package/runtime/skills/okstra-inspect/SKILL.md +581 -0
package/runtime/skills/okstra-report-writer/SKILL.md +35 -15
package/runtime/skills/okstra-run/SKILL.md +8 -7
package/runtime/skills/okstra-schedule/SKILL.md +14 -157
package/runtime/skills/okstra-setup/SKILL.md +28 -1
package/runtime/skills/okstra-team-contract/SKILL.md +16 -107
package/runtime/templates/okstra.CLAUDE.md +104 -0
package/runtime/templates/reports/brief.template.md +204 -0
package/runtime/templates/reports/final-report.template.md +93 -98
package/runtime/templates/reports/i18n/en.json +135 -0
package/runtime/templates/reports/i18n/ko.json +135 -0
package/runtime/templates/reports/implementation-planning-input.template.md +18 -0
package/runtime/templates/reports/improvement-discovery-input.template.md +78 -0
package/runtime/templates/reports/schedule.template.md +12 -3
package/runtime/templates/reports/task-brief.template.md +2 -2
package/runtime/templates/worker-prompt-preamble.md +108 -0
package/runtime/validators/lib/fixtures.sh +30 -0
package/runtime/validators/lib/runners.sh +1 -1
package/runtime/validators/validate-implementation-plan-stages.py +211 -0
package/runtime/validators/validate-run.py +121 -26
package/runtime/validators/validate-workflow.sh +2 -2
package/runtime/validators/validate_improvement_report.py +275 -0
package/src/config.mjs +18 -0
package/src/install.mjs +41 -14
package/src/setup.mjs +133 -1
package/src/uninstall.mjs +27 -3
package/runtime/skills/okstra-history/SKILL.md +0 -165
package/runtime/skills/okstra-logs/SKILL.md +0 -173
package/runtime/skills/okstra-report-finder/SKILL.md +0 -111
package/runtime/skills/okstra-status/SKILL.md +0 -246
package/runtime/skills/okstra-time-summary/SKILL.md +0 -172

package/runtime/prompts/launch.template.md CHANGED Viewed

@@ -32,6 +32,7 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
 - Therefore: at the start of every phase the prompts/ directory is normally empty (or contains only previously-dispatched workers' files). This is expected. Do NOT narrate it as "missing", "누락", or "not yet rendered" — it just means dispatch has not happened yet.
 - Before dispatching any required worker, **you (the lead) construct the worker prompt and persist it to the assigned absolute path using `Write`** (per `okstra-team-contract` rule 6). Only after persisting do you call the worker subagent (Agent tool / Codex / Gemini wrapper). The wrapper subagents will also re-write the same file on their end; the double-write is intentional and idempotent.
 - Do not "check if the file exists and skip dispatch" — file presence is not a signal to skip. Worker selection and skipping rules come from team-state, never from prompts/ directory contents.
+- **Worker Preamble Path (single anchor — replaces inlined `[Required reading]` and `[Error reporting]` blocks).** Every worker prompt you construct MUST include the anchor header `**Worker Preamble Path:** {{WORKER_PROMPT_PREAMBLE_PATH}}`. The file at that path is the canonical SSOT for Required Reading + Error Reporting + Output sections; workers Read it end-to-end before producing output. Do NOT re-inline those blocks into worker prompt bodies — that is the legacy ~80-line dispatch boilerplate that this anchor is designed to replace.
 ## Manifests

package/runtime/prompts/profiles/_common-contract.md CHANGED Viewed

@@ -17,8 +17,12 @@ profile document.
   - **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`).
   - Do NOT conclude "no peer review happens" from the roster alone — every profile that lists ≥2 analyser workers runs convergence by default (`convergence.enabled=true` in `task-manifest.json`).
 - Tooling — read-only MCP availability (shared):
-  - the read-only MCP servers declared in the task brief's `## Available MCP Servers` section may be queried as a read-only cross-check; that section is the canonical source of which servers and tools exist for this run, and any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.
-- Authority & permissions assumption (HARD RULE — applies to every okstra task-type):
+  - MCP is not implicit okstra context. Query an MCP server only when the task brief explicitly lists it as source material for this run. Any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.
+- Resource boundary (shared — artifact-home rule):
+  - Okstra-owned project artifacts live only under `<PROJECT_ROOT>/.project-docs/okstra/**`. Treat `project.json`, task bundles, run artifacts, `glossary.md`, and `decisions/` under that subtree as the canonical okstra memory.
+  - Treat paths outside `<PROJECT_ROOT>/.project-docs/okstra/**` as source material only when the task brief's `Source Material` or `Reporter Confirmations` explicitly cites them; they never become okstra memory.
+  - Create, modify, or delete only inside `<PROJECT_ROOT>/.project-docs/okstra/**` unless the brief verbatim requests a specific non-okstra edit. The phase performing that edit must quote the user instruction in its report. Implementation source edits also require the approved implementation plan.
+- Authority & permissions assumption (applies to every okstra task-type):
   - **Assume the user (and their team) holds full authority and every permission required for the anticipated, in-flight, or follow-up work.** Treat external approvals, third-party access grants, role/IAM permissions, organisational sign-off, legal/compliance review, vendor coordination, and "verify access exists" steps as already satisfied unless the task brief explicitly states otherwise.
   - Do NOT add such items to routing decisions, missing-materials lists, clarification questions, option trade-offs, dependency/migration risk, validation checklists, rollout plans, acceptance blockers, residual risks, release recommendations, the `## 5. Clarification Items` table, or any day/effort estimate. They are not legitimate sources of schedule extension.
   - Internal okstra phase handoffs (e.g. the `User Approval Request` block in `implementation-planning`) are unaffected — those are the user themselves approving and proceed without external coordination.
@@ -54,7 +58,7 @@ profile document.
     - `intent-check:` → `Kind=decision`, recommended answer = reporter confirmation. NEVER silently resolve an `intent-check:` by inference at this layer.
     - `terminology:` → `Kind=decision`, recommended answer = canonical term from `<PROJECT_ROOT>/.project-docs/okstra/glossary.md` (or "extend okstra glossary via brief Step 4.5").
     - `conversion-block:` → `Kind=decision`, recommended answer = "보고자에게 직접 확인". The brief is explicitly signalling that translation failed; further inference is forbidden until the reporter clarifies.
-    - `adr-candidate:` → handled by `implementation-planning`; carry forward without modification. Approved decision files land at `<PROJECT_ROOT>/.project-docs/okstra/decisions/<NNNN>-<slug>.md` (okstra-internal), never at external `<PROJECT_ROOT>/docs/adr/`.
+    - `adr-candidate:` → handled by `implementation-planning`; carry forward without modification. Approved decision files land only at `<PROJECT_ROOT>/.project-docs/okstra/decisions/<NNNN>-<slug>.md`.
     - `general:` → free-form; classify per the standard `Clarification Items` rules.
   - Any decision in this run that contradicts the brief's `Source Material` must be raised back to the reporter via a `Clarification Items` row; it must NOT be silently overridden. Disagreement with the reporter is allowed only after the row is resolved.
   - This contract is the single authority on brief consumption. Phase-specific addenda may *tighten* these rules but may not relax them.
@@ -65,7 +69,8 @@ profile document.
   - section 5 is a **single unified table** per `final-report-template.md`. Every clarification item — whether the user must attach a file, choose between options, or supply a single number/path — is one row of that table. Do not split it into sub-sections (`5.1 추가 자료 요청` / `5.2 사용자 확인 질문` / `4.5.9 Open Questions` are removed and the validator fails reports that reintroduce them), do not create a parallel table elsewhere in the report, and do not duplicate the same item into the top-of-report `User Approval Request (사용자 승인 게이트)` block or any other section.
   - each row's `Kind` column picks one of `{material, decision, data-point}`: `material` for files / snapshots / logs / screenshots the user must attach (the `User input` cell will hold a path or URL); `decision` for choices and yes/no confirmations only the user can make; `data-point` for a single number, ID, date, or short string the user can answer inline. Items that mix "yes/no + file path if yes" are one row of `Kind=material` with the combined expectation written into `Expected form`.
   - each row's `Blocks` column picks one of `{approval, next-phase, none}`. `approval` is reserved for items that gate an approval action, especially the `implementation-planning` User Approval Request; outside `implementation-planning`, unresolved brief reporter-confirmation rows use `next-phase` instead. `next-phase` blocks the next run from starting cleanly. `none` is informational/audit-only.
-  - write every entry in full, descriptive sentences that a non-developer can act on without further context. Avoid abbreviations and internal jargon. The `Statement` cell must state *what* is needed, *why* the answer / attachment changes the next step, and (for `material`) *where* the user can find it and *where* to place it. The `Expected form` cell must state the shape of the answer (예/아니오, 보기 중 하나, 숫자/날짜, 파일 경로, 짧은 서술 등); supply concrete option choices when applicable.
+  - write every entry in full, descriptive sentences that a non-developer can act on without further context. Avoid abbreviations and internal jargon. The `Statement` cell must state *what* is needed, *why* the answer / attachment changes the next step, and (for `material`) *where* the user can find it and *where* to place it. The `Expected form` cell must state the answer shape (예/아니오, 보기 중 하나, 숫자/날짜, 파일 경로, 짧은 서술 등); supply concrete option choices when applicable.
+  - if a phase requires a recommended answer, alternatives, or an evidence-check note, encode it inside the existing 8-column schema: put evidence notes in `Statement` as `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>`, and put recommendations/options in `Expected form` as `Recommended: <answer> — <rationale>; Alternatives: <options>`. Do not add `Recommended`, `Evidence`, `Alternatives`, or `evidence-checked` columns.
   - the same `final-report.md` file is the canonical artifact carried into the next run; the user appends answers inline before rerunning. The preferred turn-around is `scripts/okstra.sh --resume-clarification --task-key <project-id>:<task-group>:<task-id>` (opens the latest report in `$EDITOR`, then auto-reruns the same phase with `--clarification-response` carry-in). The lower-level form `--clarification-response <path>` remains available for scripted runs.
   - if a clarification response was carried in for this run, render the conditional `## 0. Clarification Response Carried In From Previous Run` section (the template's `RENDER_IF` guard activates it), walk every `C-*` row of the prior report's `## 5. Clarification Items` table, reconcile each one against new evidence, and update its `Status` to `resolved` or `obsolete` before issuing the next decision/verdict. When no carry-in path was provided, omit the `## 0.` heading entirely — the validator fails reports that emit an empty Section 0 stub (e.g. "No prior clarification response was provided for this run.").
 - Verdict Card (shared — applies to every final-report regardless of profile):
@@ -78,8 +83,8 @@ profile document.
   - Reading Confirmation lines (one short line per input file confirming end-to-end reading) live in the **worker audit sidecar** at `runs/<task-type>/worker-results/<worker>-audit-<task-type>-<seq>.md`, NOT in the worker's main worker-results file. The worker-results body starts at section 1 (Findings). The validator fails worker-results files that contain a `## 0. Reading Confirmation` heading.
   - The audit sidecar carries any other meta the worker wants to log (tool-call counts, MCP query summaries, timing notes). The lead's final-report does NOT duplicate this content — it is consumed by the validator and by post-run audit tooling, not by end-user readers.
-- Markdown authoring (shared — applies to every markdown document produced by the lead or any worker, including final-reports, worker-results, briefs, and ad-hoc notes):
-  - every document must begin with an `Index` section.
+- Markdown authoring (shared — applies to markdown documents not already governed by an okstra template/schema):
+  - ad-hoc markdown documents should begin with an `Index` section. Template-governed artifacts such as final-reports, worker-results, and briefs follow their own schema first.
   - include only information necessary to fulfill the user's stated purpose and directly related requirements.
   - follow only the sections, format, tone, and scope specified by the user, plus the required `Index` section.
   - when writing task instructions or work orders, define the scope of work clearly and specifically, including deliverables, acceptance criteria, and verification steps when relevant.

package/runtime/prompts/profiles/_implementation-deliverable.md ADDED Viewed

@@ -0,0 +1,53 @@
+<!--
+Implementation profile — deliverable sidecar. Lead lazy-loads this file ONCE
+at the start of Phase 6 (report-writer dispatch), AFTER all worker results
+are collected and convergence finished. Phase 1-5 do not need it.
+-->
+# Implementation profile — Deliverable sidecar
+> **When to read**: lead reads this file ONCE at the start of Phase 6 (after Phase 5.5 convergence completes, before constructing the report-writer dispatch prompt). Carries the final-report deliverable shape, lead's post-stage persistence rules, and the self-review checklist.
+## Required deliverable shape (final report, in addition to the standard sections)
+- **Plan link & approval evidence**: path to the approved `final-report.md`, the exact quoted approval marker, AND the executed stage number / title quoted from the Stage Map row.
+- **Commit list**: each commit's SHA (or short SHA), message, and the plan step it satisfies
+- **Diff summary**: `git diff --stat <base>..HEAD` output, plus a per-file one-line summary of changes
+- **Out-of-plan edits block**: every file edited that was not in the approved plan's file list, with rationale (empty block is acceptable and preferred)
+- **Stage sidecar evidence**: the JSON payload of `runs/<impl-task-key>/carry/stage-<N>.json` is embedded verbatim in a fenced ```json``` block, AND the `consumers.jsonl` rows this run appended are quoted line-by-line, so reviewers can audit the carry surface without grepping artifact directories.
+- **Validation evidence**: actual command output (stdout/stderr) for every `pre / mid / post` validation command from the plan. Truncated output is acceptable but the command line and exit code MUST be exact. No paraphrasing of test results.
+- **TDD evidence (when applicable)**: for steps that should be TDD-ordered, show the failing-test output BEFORE the implementation commit and the passing-test output AFTER, with commit SHAs framing the transition.
+- **Verifier results**: a section per verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) containing:
+  - their independent verdict (PASS / CONCERNS / FAIL),
+  - cited diff snippets supporting the verdict,
+  - the verifier's `Read-only command log` (every command they ran with exact invocation and exit code, in execution order — copied verbatim from the worker result),
+  - **independent validation re-run results** — per plan-validation command: command line, exit code, and tail of output captured by the verifier (not the executor); any divergence from the executor's reported result MUST be called out as a `Discrepancy` line citing both sides,
+  - **style / lint / type-check results** — each check-only tool the verifier ran, its exit code, and the count of new findings attributable to lines this run introduced. When no tool is configured for a touched language, record the single line `no lint/style tool configured for <language>`,
+  - any fix recommendations the verifier declined to apply.
+  `Claude lead` synthesises a unified verdict but MUST preserve dissent — do not collapse opinions into one paragraph. If any verifier issued `FAIL` on a `Discrepancy` line, the synthesised verdict MUST be `FAIL` unless lead cites a concrete reproduction-time reason (committed flaky-test record, documented environment delta) for overriding.
+- **Rollback verification**: confirmation that the plan's rollback path is still valid after the changes. Strength of verification depends on the change category:
+  - **Pure code changes** (no persisted state, no infra mutation): a reachable revert SHA is sufficient. Record the exact `git revert <SHA>` command that would undo the change, and confirm `git rev-parse <SHA>` resolves.
+  - **Feature-flag-gated changes**: confirm the off-switch path was exercised in this run's validation evidence (i.e. one of the validation commands ran with the flag off and succeeded). A plan that ships a flag without exercising the off-path does NOT satisfy this requirement.
+  - **Schema migrations, config-format changes, or any change with persisted state**: a **dry-run of the rollback step is mandatory**, not preferred. Record the exact rollback command and its captured exit code / stdout. If the migration tool offers no dry-run mode (`--dry-run`, `--plan`, equivalent), the executor MUST refuse to claim rollback verification and instead end the run with a routing recommendation back to `implementation-planning` for a safer rollback strategy. Skipping this step on a stateful change is treated as a `contract-violated` outcome by `final-verification`.
+- **Routing recommendation for `final-verification`**: brief note on whether the changes are ready for final-verification phase or need a new error-analysis / planning loop first.
+- **Follow-up tasks (Section 7 of the final report)**: every item discovered during this run that was *not* delivered MUST appear in the final report's `## 7. Follow-up Tasks (후속 작업)` table with a concrete `Origin`, `New Task ID`, `Suggested task-type`, `Scope`, and `Reason / Why deferred`. Sources include: out-of-scope discoveries that the executor consciously chose not to fold into this run, verifier concerns the executor declined to fix in-place, scope-boundary items from the approved plan that turned out to need their own ticket, and any unresolved `## 5. Clarification Items` row carried over from the approved plan (`Status` ∈ `{open, answered}` at approval time). An empty section is acceptable but only when expressed as the single line `- 후속 작업 없음.` — silence is treated as a contract violation. Rows with `Auto-spawn? = yes` will be materialised by `scripts/okstra-spawn-followups.py` in Phase 7; rows with `Auto-spawn? = no` MUST also appear in `Section 6. Recommended Next Steps` so the user knows to act manually.
+## Self-review pass before finalising the report (`Claude lead` runs this; do not delegate to a generic subagent)
+1. **Plan coverage** — every step in the approved plan's recommended option must point to a commit (or an explicit `Skipped: <reason>` entry). List gaps.
+2. **Evidence completeness** — every `Validation evidence` and `TDD evidence` claim has the actual command line and exit code? No paraphrased "tests pass" without output?
+3. **Out-of-plan honesty** — files in the diff that are NOT in the plan list must appear in the `Out-of-plan edits` block. Cross-check with `git diff --name-only`.
+4. **Verifier dissent preserved** — if the verifiers in the resolved roster disagree, the disagreement is visible in the report? Synthesis hides nothing?
+5. **Forbidden action audit** — `git push`, publish, deploy, migration, third-party write commands: scan the run's session transcripts for any occurrence and confirm none happened.
+6. **Placeholder scan** — restrict the scan to lines this run actually introduced; pre-existing placeholders in unchanged regions of touched files are out of scope. Required command (substitute `<base>` with the parent of the first commit in this run's commit list):
+   ```
+   git diff <base>..HEAD | grep -E '^\+[^+].*\b(TBD|TODO|FIXME|XXX|implement later|handle edge cases|similar to|placeholder)\b' || echo 'clean'
+   ```
+   Only newly-added lines (those starting with `+` and not part of the `+++` header) are inspected. If output is anything other than `clean`, the run MUST either remove the placeholders before finalising or record an explicit justification per occurrence in the final report.
+## Lead post-stage persistence (BLOCKING — runs after the Executor emits `### Stage Carry Evidence`)
+- Parse the executor's `### Stage Carry Evidence` JSON block. If absent or unparsable, end with status `contract-violated` and route to a follow-up `error-analysis`.
+- Write the JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
+- Append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
+- Quote both files' new contents (the sidecar JSON in full, the new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.

package/runtime/prompts/profiles/_implementation-executor.md ADDED Viewed

@@ -0,0 +1,60 @@
+<!--
+Implementation profile — executor sidecar. Lead lazy-loads this file at the
+start of Phase 5 (executor dispatch). NOT included via {{INCLUDE:}} because
+lead context should NOT carry this body during Phase 1-4. Read once, retain
+until Phase 5 ends, then drop from active context for Phase 6/7.
+-->
+# Implementation profile — Executor sidecar
+> **When to read**: lead reads this file ONCE at the start of Phase 5 (after Stage Map parse, before issuing the Executor's first `Edit` / `Write`). The body governs ONLY the Executor role's behaviour. Verifier / report-writer behaviour lives in sibling sidecars.
+## Executor role binding (carried over from the thin core)
+- The `Executor` (bound in `implementation.md` thin core) is the **only worker permitted to use Edit / Write / state-mutating Bash commands** on project files. All other workers run read-only. When the executor provider is `codex` or `gemini`, the actual file mutation happens inside the executor CLI's own auto-edit mode (e.g. `codex exec --sandbox workspace-write`, gemini's equivalent) — not through Claude-side Edit/Write tools — but the safety rules in this sidecar still apply identically.
+- Worktree cwd handling — when the thin core's Task worktree block resolves status to `created` or `reused`, the Executor MUST run every Edit / Write / build / test / commit command with the worktree path as cwd. Treat it as `project_root` for the duration of this run. Do NOT mutate the caller's original checkout. Do NOT `cd` out of the worktree to reach files; if a file outside the worktree is needed, the dependency is a planning gap — record it in `Out-of-plan edits` and continue.
+  - **How to set cwd per Bash call**: the Claude Bash tool inherits its cwd from the lead session, which is NOT the worktree. To put cwd-sensitive toolchains (`cargo`, `npm`, `pnpm`, `bun`, `pytest`, `make`, `go`) into the worktree, prefix the command with `cd {{EXECUTOR_WORKTREE_PATH}} && ` inside the same Bash invocation — e.g. `cd {{EXECUTOR_WORKTREE_PATH}} && cargo test -p foo`. **Never wrap in `bash -lc "..."` or `bash -c "..."`** — the wrapper hides the leading `cd` token from Claude Code's permission auto-allow layer (causing prompts on every call) without any safety benefit. For tools that accept an explicit working-directory flag (`git -C <path>`, `cargo --manifest-path`, `pytest --rootdir`), prefer that form over the `cd && ` chain. Edit / Write / Read tool calls already use absolute paths and need no cwd handling. The codex / gemini executor CLI wrappers (`okstra-codex-exec.sh -C`, `okstra-gemini-exec.sh --include-directories`) already inject worktree cwd at the CLI layer, so this rule applies primarily to the Claude executor.
+- **Synced okstra state directory.** At provision time `okstra-ctl` may symlink `.project-docs/` from the repo's **main worktree** into the task worktree. This is NOT an independent copy — writes through it land in the main worktree. Inside this run the executor MUST confine okstra artifact writes to its own task scope (i.e. `.project-docs/okstra/tasks/<this-task-id>/...`). Other synced directories, if present due to local configuration, are not implicit okstra context; read them only when the brief explicitly cites them as source material.
+## Pre-implementation context exploration (executor before first edit)
+- **Mandatory TDD loop**: BEFORE the first `Edit` or `Write` call, the executor MUST apply a red-green-refactor loop for every code change in this run. This is required; skipping it is a `contract-violated` outcome. This governs HOW each step is executed (failing test first → minimal implementation → refactor); it does not override the approved plan's WHAT/file scope.
+  - Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
+  - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
+  - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
+- re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
+  - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
+  - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
+  - extract the **start stage's** file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path. These — not the whole plan — are the authoritative scope for this run.
+- inspect the current state of every file the plan names; if any file has changed materially since the plan was written, stop and route to a new `implementation-planning` run instead of editing speculatively
+- "materially changed" means: the function, class, section, or behaviour the plan targets has been edited, renamed, moved, removed, or otherwise altered in a way that invalidates the plan's reasoning. Cosmetic edits (whitespace, comment-only changes, unrelated function modifications elsewhere in the same file) do NOT trigger a re-plan; cite the diff (`git log --oneline <plan-created-at>..HEAD -- <file>`) in the final report and proceed.
+- distinguish the two file-scope rules (they are not in conflict):
+  - **drift rule** (this section): if a file *named in the plan* has materially drifted, refuse to edit and route back to planning. This protects trust in the approved scope.
+  - **out-of-plan rule** (Allowed actions section below): if a step *requires touching a file NOT in the plan list*, that is permitted with `Out-of-plan edits` justification. This handles honest scope discovery during execution.
+- confirm the test/build commands referenced in the plan still exist and run from a clean state
+## Stage execution contract (this run owns exactly one stage of the plan)
+- **Sidecar evidence writer (BLOCKING).** When the start stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. The file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
+- **Reverse link (BLOCKING).** Before the first Edit/Write, append a `status:"started"` row to `runs/<plan-task-key>/consumers.jsonl` (lock via the okstra runtime). On stage completion, append a `status:"done"` row with `carry_path` populated.
+- **One-PR-per-stage.** This run creates exactly one PR titled `Stage <N>: <stage title>`. The PR body MUST include:
+  - `## Stage` — number and title (from Stage Map row).
+  - `## Carry-In summary` — depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
+  - `## Next stage` — next stage number/title or `(last stage)`.
+  Stage PRs link back to each other in their bodies (`Previous: #<n>, Next: #<m>` lines) so a reviewer can navigate the chain.
+## Allowed actions during the run
+- **Edit / Write on approved project source files**: scope is bounded first by the shared Resource boundary, then by the approved plan's file list. Editing files outside the plan's list is permitted only when strictly needed to satisfy a step, and MUST be recorded in the final report's `Out-of-plan edits` block with rationale.
+- read-only inspection commands: `git status`, `git diff`, `git log`, `grep`, `rg`, `find`, `cat`, `ls`, file Read tools
+- build, lint, type-check, and test commands (`npm test`, `pytest`, `go build`, `cargo test`, `bash -n`, etc.)
+- **local git operations only**: `git add`, `git commit`. Prefer small commits keyed to plan steps.
+- **Commit message format (mandatory)**: every commit message MUST follow Conventional Commits — `<type>(<scope>): <subject>` for the first line, optional body separated by a blank line, optional footer. Constraints:
+  - `<type>` MUST be one of: `feat` / `fix` / `perf` / `revert` / `deps` / `docs` / `refactor` / `build` / `ci` / `chore` / `test`. When the repo is `release-please`-managed, this aligns the commit with a configured changelog section.
+  - `<scope>` SHOULD be the plan step identifier or the primary module touched (e.g. `feat(report-writer): ...`). Omit the parentheses only when no meaningful scope applies.
+  - `<subject>` MUST be ≤72 characters, imperative mood (`add`, `fix`, `remove` — not `added` / `adding`), no trailing period, no emoji, no AI attribution lines (no `Co-Authored-By: Claude ...`, no `Generated with Claude Code`).
+  - Body (when present) explains *why*, not *what*; wrap at ~100 chars.
+  - Do NOT append okstra artefact paths to the commit message — no `Plan: .project-docs/okstra/...`, no `Report: ...`, no `Run: ...`, no `Task: ...` footers, and no other reference to files under `.project-docs/okstra/`. Those paths belong in the final report's `Plan link & approval evidence` section, not in git history; they rot quickly and leak internal layout into the upstream changelog.
+  - Allowed footers are limited to standard Conventional Commits trailers (`BREAKING CHANGE: ...`, `Refs: <issue/ticket-id>`, `Closes #<n>`). When citing a ticket, use the ticket id only (e.g. `Refs: DEV-9423`) — never a filesystem path.
+  - One commit MUST correspond to one plan step (or one cohesive sub-step). Do NOT bundle unrelated steps into a single commit, and do NOT split a single step across commits unless the plan explicitly sequenced it that way.
+  - The exact message used for each commit MUST be reproduced verbatim in the final report's `Commit list` so reviewers can audit it without re-running `git log`.

package/runtime/prompts/profiles/_implementation-verifier.md ADDED Viewed

@@ -0,0 +1,76 @@
+<!--
+Implementation profile — verifier sidecar. Lead lazy-loads this file ONCE
+at Phase 5, BEFORE constructing the verifier worker dispatch prompts.
+-->
+# Implementation profile — Verifier sidecar
+> **When to read**: lead reads this file ONCE in Phase 5, between executor stage completion and the first verifier dispatch. Carries the two-tier command lookup, deny-list, discrepancy rule, and verifier-specific forbidden actions.
+## Verifier roles (resolved at run-prep time)
+- The verifier slots are `Claude verifier` and `Codex verifier`, plus `Gemini verifier` **only when `gemini` is in the resolved `--workers` roster**. Every verifier in the resolved roster is dispatched regardless of which provider holds the executor role; the executor's own provider is run *separately* as a verifier (a fresh CLI session with no shared context) so that no verdict is produced from the same session that wrote the diff. Verifiers MUST NOT call Edit, Write, or any Bash command that mutates files outside the run's artifact directories. If a verifier wants a fix, it records the recommendation in its worker result; it does not apply the fix itself.
+- Session isolation — not model-variant divergence — is the primary self-review safeguard: each verifier is a separate CLI invocation with its own context window, so reusing the same model variant for executor and same-provider verifier is acceptable. Different model variants (e.g. executor=opus / Claude verifier=sonnet) remain recommended when available.
+- Phase-specific model defaults override the shared defaults: `Claude verifier`=`sonnet`, `Codex verifier`=`gpt-5.5`, `Gemini verifier`=`auto` (only when present in the roster). The `Executor`'s model is taken from the provider-specific worker model corresponding to `--executor`: claude→`--claude-model` (default `sonnet`, override to `opus` recommended when this run's executor is claude), codex→`--codex-model` (default `gpt-5.5`), gemini→`--gemini-model` (default `auto`).
+- Verifiers read from the SAME working tree path the Executor used so they observe the exact diff the Executor produced. Verifiers remain strictly read-only there.
+## Verifier QA duties (independent re-run mandate)
+Every verifier acts as a QA gate, not just a diff reviewer. Trusting the executor's reported evidence is forbidden — verifiers MUST reproduce it themselves from the same worktree path the executor used.
+### Two-tier command lookup (NO auto-detection)
+Verifier obtains the QA command set from exactly two declared sources, in order — there is **no fallback to guessing tools from manifest files**.
+1. **Tier 1 — plan validation set (task-specific):** every command listed under the approved plan's `validation` block (pre / mid / post).
+2. **Tier 2 — project baseline (`project.json.qaCommands`):** the project's standing QA baseline declared in `<PROJECT_ROOT>/.project-docs/okstra/project.json` under the `qaCommands` key. Schema (each category is an array of `{ "label", "cmd", "language"? }` objects):
+   ```json
+   {
+     "qaCommands": {
+       "lint":      [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
+       "format":    [{ "label": "cargo fmt",    "cmd": "cargo fmt --check",                          "language": "rust" }],
+       "typecheck": [{ "label": "tsc",          "cmd": "pnpm exec tsc --noEmit",                     "language": "ts"   }],
+       "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }]
+     }
+   }
+   ```
+   `language` is optional; when present, verifier MAY skip categories whose `language` is not represented in this run's diff (recorded as `qa-command skipped: <label> (language=<x> not in diff)`). Absent `language` means "always run".
+### Execution rule
+Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each command runs in the worktree cwd, and is recorded in the worker result with its exact command line, exit code, and the tail of stdout/stderr. Substituting or paraphrasing a Tier 1 command is forbidden (see Verifier-specific forbidden actions below).
+### Missing-tier handling
+If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
+### `cmd` field deny-list (Tier 2 validation)
+The runtime AND the verifier MUST reject any `cmd` containing tokens that imply mutation: `--fix`, `--write`, ` -w` (gofmt write), ` -u` (jest snapshot update), `--update-snapshots`, `--snapshot-update`, `--update-goldens`, `INSTA_UPDATE=` (with any value other than `no`), `cargo insta accept`, `npm install` (without `ci`), `cargo update`, `pip install -U`, `pnpm add`, `bun add`. Encountering a denied token aborts the verifier run with `contract-violated` and the operator is asked to re-declare the command in check-only form.
+### Discrepancy rule
+If the verifier's re-run result differs from what the executor reported (a passing test fails on re-run, a clean lint surfaces warnings, an exit code mismatches), the verifier MUST issue verdict `FAIL` with the divergence cited. `Claude lead` MUST NOT silently prefer the executor's evidence over a verifier's reproduced result during synthesis; if it overrides, it MUST cite a concrete reproduction-time reason (flaky-test commit-cited, environment delta documented) — handwaving is not allowed.
+### Read-only command log (per verifier)
+The worker result MUST contain a `Read-only command log` block listing every command executed during the verifier run with its exact invocation and exit code, in execution order. No mutating command may appear in this block. This log is copied into the final report's verifier result section verbatim.
+### Verifier evidence is independent of executor evidence
+The final report keeps both — executor's `Validation evidence` AND each verifier's `Read-only command log` — so reviewers can compare them line-by-line.
+## All-verifier-failure policy
+If every verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) ends with a non-result terminal status (`timeout`, `error`, `not-run`) — i.e. zero independent verdicts were produced — the run MUST end with status `blocked` and route to a follow-up `error-analysis` run. `Claude lead` MUST NOT substitute its own verdict in place of the missing verifier outputs; synthesis requires at least one independent verifier's verdict. If one or more verifiers fail but at least one returns a verdict, the run proceeds with the surviving verdict(s) and the final report MUST explicitly notate which verifiers were unavailable, with the captured error / timeout evidence per failed verifier.
+## Verifier-specific forbidden actions (any occurrence → terminal status `contract-violated`)
+- running lint / formatter auto-fix modes during a verifier's re-run — `eslint --fix`, `prettier --write`, `ruff check --fix`, `rustfmt` (writes by default; verifiers MUST use `cargo fmt --check` or `rustfmt --check`), `gofmt -w`, `black .` (use `black --check`), `isort .` (use `isort --check-only`), or any equivalent rewrite mode
+- updating snapshots / golden fixtures during verification — `jest -u` / `--updateSnapshot`, `pytest --snapshot-update`, `INSTA_UPDATE=*` (any value other than `no`), `cargo insta accept`, `--update-goldens`, or any equivalent "make the test agree with current output" flag
+- masking test failure with selection or shell tricks during re-run — `-k <expr>` / `--ignore` / `--deselect` to skip subsets, trailing `|| true`, `set +e` followed by a manually softened comparison, redirecting non-zero exit to success. The plan's listed test command MUST run in full
+- substituting the plan's validation commands — verifier MUST run the plan's pre/mid/post validation commands verbatim; replacing them with paraphrased or "equivalent" commands is forbidden. Adding supplementary check-only lint/type-check is allowed and is logged separately in the verifier's Read-only command log
+- mutating lockfiles or dependency manifests — `npm install <pkg>`, `npm install` (without lockfile freeze; use `npm ci`), `pnpm add`, `bun add`, `cargo add`, `cargo update`, `pip install -U`, or any dependency install that is not lockfile-frozen (`--locked` / `--frozen-lockfile` / `npm ci` / `pip install --require-hashes`)
+- git state mutations — `git add`, `git commit`, `git stash`, `git checkout -- <file>`, `git restore`, `git reset`, `git rebase`, `git merge`, branch creation/deletion, tag creation. Only read-only git queries (`git status`, `git diff`, `git log`, `git show`, `git rev-parse`, `git blame`) are permitted for verifiers
+- running integration / end-to-end tests that produce non-local side effects (DB writes against a non-local datastore, external API writes, docker compose against a non-isolated environment) unless that exact command is listed in the approved plan's validation set
+- redirecting tool caches or output to paths outside the worktree — e.g. setting `CARGO_TARGET_DIR`, `PYTEST_CACHE_DIR`, `NODE_OPTIONS=--require=<external>`, or any env var that causes the verifier's command to write outside the worktree's normal build artifact paths

package/runtime/prompts/profiles/error-analysis.md CHANGED Viewed

@@ -9,11 +9,7 @@
   - gemini — when added to the roster it joins the analyser set; omitted by default
 {{INCLUDE:_common-contract.md}}
 - Brief consumption (phase-specific addendum — shared rules live in `_common-contract.md` under "Brief handoff contract"):
-  - **Precondition check (BLOCKING — runs before any analysis)**: read the brief's frontmatter `reporter-confirmations:` field and inspect every `Open Questions` row prefixed `intent-check:` / `conversion-block:` for the `[CONFIRMED …]` marker.
-    - `reporter-confirmations: complete` → proceed normally.
-    - `reporter-confirmations: partial` → proceed; treat still-unmarked `intent-check:` / `conversion-block:` rows per the `skipped` branch below.
-    - `reporter-confirmations: skipped` (or `partial` with remainder) → do NOT silently infer the missing answers. Promote each unmarked `intent-check:` / `conversion-block:` row into this run's `## 5. Clarification Items` as `Kind=decision, Blocks=next-phase`, with the recommended answer drawn from the brief's matching `intent-inference` / `conversion-block:` text and clearly labelled `보고자 직접 확인 권장`. Then proceed with the root-cause analysis using the inference as a *hypothesis* only.
-    - `reporter-confirmations: pending` (or field missing) → ABORT analysis. Write only `## 0. Reporter Confirmation Required` summarising which rows are pending and stop. The final report carries `Blocks=next-phase`.
+  - Apply the shared reporter-confirmation precondition exactly as written. In this phase, unresolved `intent-check:` / `conversion-block:` rows use `Blocks=next-phase`; any unconfirmed inference may be used as a labelled hypothesis only.
   - the reporter's symptom description in `Source Material` is the ground truth for what to reproduce. Do not paraphrase it when stating the symptom in the report; quote it.
   - any `intent-inference` augmentation that re-characterises the symptom (e.g. classifying "가끔 안 됨" as "intermittent failure on a specific code path") is a **hypothesis**, not a confirmed symptom. If `[CONFIRMED …]` appears on the matching `intent-check:` row, treat the confirmation as the symptom; otherwise, follow the precondition's `skipped` branch above and keep the inference labelled as hypothesis in the root-cause analysis.
   - `conversion-block:` rows mean the brief could not map a reporter statement to project vocabulary; never attempt to invent the missing mapping in this phase — the precondition above already handled them.
@@ -31,9 +27,9 @@
 - Clarification request policy (phase-specific addenda — shared policy is in `_common-contract.md`):
   - if any blocking uncertainty remains at the time of writing the final report, populate `## 5. Clarification Items` in `final-report-template.md` (a single unified table; `Blocks=next-phase` for items the next run cannot start without)
   - prefer plain Korean over abbreviations (e.g. write "초당 평균 요청 수" instead of "QPS", "재현 절차" instead of "repro")
-  - every clarification row carries a `Recommended` answer + one-line rationale; rows that lack a recommendation are rejected as half-formed.
+  - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
   - **Codebase-first ambiguity resolution (defect rule)**: any ambiguity about repro, file behavior, or symbol semantics that can be answered by `Read` / `Grep` / log inspection MUST be resolved that way and recorded with file:line (or log-line) evidence. Writing a clarification row for something the codebase or shipped logs already answer is a defect of this phase.
-  - **`evidence-checked:` cell required**: every clarification row carries an `evidence-checked: <path:line> | none` cell. `evidence-checked: <path:line>` means the codebase / log / reproducer was inspected and the row records what was found. `evidence-checked: none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed); the row body must state which one in one line. A row with `evidence-checked: none` that *could* have been answered by code or logs is a defect.
+  - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <reporter-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed). A row with `none` that *could* have been answered by code or logs is a defect.
 - Non-goals:
   - implementation details unless they are necessary to validate the cause
   - **source code edits, builds, migrations, or deployments** — this run produces evidence and cause analysis only; the fix belongs to a later `implementation-planning` run followed by an `implementation` run

package/runtime/prompts/profiles/implementation-planning.md CHANGED Viewed

@@ -9,11 +9,7 @@
   - gemini — when added to the roster it joins the analyser set; omitted by default
 {{INCLUDE:_common-contract.md}}
 - Brief consumption (phase-specific addendum — shared rules live in `_common-contract.md` under "Brief handoff contract"):
-  - **Precondition check (BLOCKING — runs before option drafting)**: read the brief's frontmatter `reporter-confirmations:` field and inspect every `Open Questions` row prefixed `intent-check:` / `conversion-block:` for the `[CONFIRMED …]` marker.
-    - `reporter-confirmations: complete` → proceed normally.
-    - `reporter-confirmations: partial` → proceed; treat still-unmarked `intent-check:` / `conversion-block:` rows per the `skipped` branch below.
-    - `reporter-confirmations: skipped` (or `partial` with remainder) → do NOT silently infer the missing answers. Promote each unmarked `intent-check:` / `conversion-block:` row into this run's `## 5. Clarification Items` as `Kind=decision, Blocks=approval`, with the recommended answer drawn from the brief's matching `intent-inference` / `conversion-block:` text and clearly labelled `보고자 직접 확인 권장`. Then proceed; the operator cannot toggle `User Approval Request` until those rows are resolved.
-    - `reporter-confirmations: pending` (or field missing) → ABORT planning. Write only `## 0. Reporter Confirmation Required` summarising which rows are pending and stop. The final report carries `Blocks=approval`.
+  - Apply the shared reporter-confirmation precondition exactly as written. In this phase, unresolved `intent-check:` / `conversion-block:` rows use `Blocks=approval`; the operator cannot toggle `User Approval Request` until those rows are resolved.
   - never plan around an unconfirmed `intent-inference` augmentation as if it were a settled requirement. After the precondition runs, a `[CONFIRMED …]` marker on the matching `intent-check:` row is the signal that the inference can be treated as settled; otherwise it remains a `Blocks=approval` clarification item per the precondition's `skipped` branch.
   - `conversion-block:` rows are handled by the precondition; planning around an untranslated reporter phrase is forbidden until it is resolved.
 - Pre-planning context exploration (mandatory before option drafting):
@@ -22,7 +18,7 @@
   - skim recent commits touching those files (`git log -- <path>`) to surface in-flight work or contested areas
   - **codebase-first ambiguity resolution**: any ambiguity that can be answered by `Read` / `Grep` MUST be resolved that way and recorded with file:line evidence. Only ambiguities that genuinely require a human decision are escalated as `Clarification Items` rows. Writing a clarification row for something the code already answers is a defect of this phase.
   - flag any requirement that is ambiguous, contradictory, or missing success criteria — register each one as a row in the report's `## 5. Clarification Items` table with `Blocks=approval` instead of guessing
-  - read in priority order — (authoritative) `<PROJECT_ROOT>/.project-docs/okstra/glossary.md` and `<PROJECT_ROOT>/.project-docs/okstra/decisions/` titles if present; (supplementary) `<PROJECT_ROOT>/CONTEXT.md` (or `CONTEXT-MAP.md` → per-context `CONTEXT.md`) and `<PROJECT_ROOT>/docs/adr/` titles if present. Absent external files are the normal state — do not error. Treat the brief's `terminology:*` resolutions from `requirements-discovery` (if any) as authoritative; if missing, resolve any remaining fuzzy term as a `Blocks=approval` clarification row.
+  - read `<PROJECT_ROOT>/.project-docs/okstra/glossary.md` and `<PROJECT_ROOT>/.project-docs/okstra/decisions/` titles if present. Absent okstra memory files are the normal state — do not error. Treat the brief's `terminology:*` resolutions from `requirements-discovery` (if any) as authoritative; if missing, resolve any remaining fuzzy term as a `Blocks=approval` clarification row.
 - Primary focus areas:
   - requirement gaps
   - affected components and boundaries
@@ -30,7 +26,7 @@
   - implementation options and trade-offs
   - hidden dependency or migration risk
   - validation and rollout approach
-- Design principles applied when scoring options (borrowed from `brainstorming`):
+- Design principles applied when scoring options:
   - **Isolation & single responsibility**: each unit touched should have one clear purpose, well-defined interface, and be independently testable. Penalize options that widen a unit's responsibility.
   - **Files that change together live together**: split by responsibility, not by technical layer. Penalize options that scatter one logical change across unrelated layers.
   - **Follow established patterns**: in existing codebases, conform to current conventions. Targeted cleanup of a file you are already modifying is acceptable; unrelated refactors are not.
@@ -40,19 +36,19 @@
   - dependency and risk visibility
   - recommended execution order
 - Approval gate (phase-specific addendum to shared authority rule):
-  - The `User Approval Request (사용자 승인 게이트)` block at the **top of the final report** (immediately under the metadata header, before `Section 0`) is the only authorised approval gate. The user clears it either by (a) editing the single checkbox `- [ ] Approved` to `- [x] Approved` directly, or (b) invoking the next phase with `--approve` so the CLI invocation itself is treated as the approval signal and the runtime flips the checkbox on the user's behalf.
+  - The YAML frontmatter `approved: true|false` field is the only authorised approval gate. report-writer always emits `approved: false`. The user clears it either by (a) editing the frontmatter line to `approved: true` directly, or (b) invoking the next phase with `--approve` so the CLI flips the frontmatter on the user's behalf. `okstra_ctl.run._validate_approved_plan` reads this field and refuses entry until it is `true`.
 - Non-goals:
   - code-level micro-optimization unless it changes the implementation approach
   - **source code edits of any kind** — this run produces a plan document only; Edit/Write on project source files is forbidden until the plan is approved and a separate `implementation` run starts
   - executing builds, migrations, deployments, or any command that mutates project state outside the run's own artifact directories (`reports/`, `prompts/`, `state/`, `manifests/`, `worker-results/`, `status/`, `sessions/`)
   - this run stays in `implementation-planning` regardless of user phrasing — the shared anti-escalation rule applies
   - dispatching parallel sub-agents beyond the required worker roster — okstra owns worker fan-out
-  - writing artifacts to `docs/superpowers/specs/` or `docs/superpowers/plans/` — the run's `reports/` directory is the canonical location
+  - writing artifacts anywhere except `<PROJECT_ROOT>/.project-docs/okstra/` — the run's `reports/` directory is the canonical location for this phase
 - Clarification request policy (phase-specific addenda — shared policy is in `_common-contract.md`):
-  - every clarification row carries a `Recommended` answer + one-line rationale; rows that lack a recommendation are rejected as half-formed.
-  - **`evidence-checked:` cell required**: every clarification row carries an `evidence-checked: <path:line> | none` cell. `evidence-checked: <path:line>` means the codebase was inspected and the row records what was found. `evidence-checked: none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, organisational decision); the row body must state which one in one line. A row with `evidence-checked: none` that *could* have been answered by the codebase is a defect of this phase, restated from the pre-planning rule above.
+  - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
+  - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, organisational decision). A row with `none` that *could* have been answered by the codebase is a defect of this phase, restated from the pre-planning rule above.
 - Section heading contract (BLOCKING — validator scans for these literal English substrings):
-  - The final report MUST include section headings containing each of the following exact strings: `Option Candidates`, `Trade-off`, `Recommended Option`, `Stepwise Execution Order`, `Dependency`, `Validation Checklist`, `Rollback`, `User Approval Request`.
+  - The final report MUST include section headings containing each of the following exact strings: `Option Candidates`, `Trade-off`, `Recommended Option`, `Stage Map`, `Stage Exit Contract`, `Stage Validation`, `Dependency`, `Validation Checklist`, `Rollback`. (Approval is no longer a body section — it is the YAML frontmatter `approved` field.)
   - Korean translations are allowed in parentheses (e.g. `### Recommended Option (권장 옵션)`), but the English keyword must be present verbatim in the heading line.
   - The shape and ordering follow `final-report-template.md` section 4.5 (`Implementation Plan Deliverables`). Do NOT translate the heading keywords — `validators/validate-run.py` does substring matching on the raw report text and 7-of-8 missing strings is a real, repeatedly observed failure mode (root cause: writer translated the headings to Korean).
 - Required deliverable shape (final report, in addition to the standard sections):
@@ -62,24 +58,28 @@
     - estimated blast radius (units, configs, deployment manifests, data migrations)
   - trade-off matrix across options (rows = options, columns at minimum: complexity, risk, reversibility, test coverage cost, rollout cost)
   - recommended option with rationale tied to the design principles above
-  - **stepwise execution order for the recommended option**, written as bite-sized tasks:
-    - each step is one action completable in roughly 2–5 minutes (e.g. "write the failing test for X", "run it to confirm it fails", "implement minimal code", "run test to confirm pass", "commit")
-    - every step names exact file paths and exact commands; for code steps, include the actual code or the diff sketch — not a description
-    - prefer TDD ordering (failing test → implementation → green → commit) when the touched area has or can have tests
+  - **Stage Map (mandatory — always emitted, even when N=1):** a table of all stages with `stage | title | depends-on | step-count | exit-contract-summary`. `depends-on` is `(none)` or a comma-separated stage number list. Stages with `depends-on (none)` can be implemented in parallel by two simultaneous `implementation` runs.
+  - **Per-stage subsections** (`## 4.5.<i> Stage <i>: <title>` for each `i`), each containing the four required subsections:
+    - `### Carry-In` — for `depends-on (none)`: task-brief only. Otherwise: each depended-on stage's static exit contract + runtime sidecar path `runs/<impl-key>/carry/stage-<i>.json` placeholder.
+    - `### Stepwise Execution Order` — bite-sized table with `step | action | files | command | expected`. **Effective row count ≤ 6** (excluding header / divider / blank). Each step is one action completable in 2–5 minutes; for code steps include actual code or diff sketch; prefer TDD ordering (failing test → implementation → green → commit).
+    - `### Stage Exit Contract` — predicted added/modified files, newly exposed identifiers/types/endpoints, downstream-usable resources.
+    - `### Stage Validation` — pre / mid / post exact commands or observable outcomes for this stage only.
+  - **Parallelisation-first rule (1st-class):** the writer MUST prefer the partition that maximises the number of `depends-on (none)` stages. Given two partitions with equal total step count, the one with fewer `depends-on` edges wins. Conservative `let's serialise to be safe` groupings are forbidden — each `depends-on` link is justified by a concrete data/contract dependency, not a vague risk concern.
+  - **Stage exit contract is the carry surface:** keep it as narrow as possible. Wider surface = more downstream coupling.
   - dependency / migration risk assessment (ordering constraints, data backfills, feature-flag prerequisites, repo-internal sequencing)
   - validation checklist (pre / mid / post) — each item is an exact command or observable outcome
   - rollback strategy — exact revert path (commits, flags, migrations) and the signal that triggers rollback
-  - explicit `User Approval Request (사용자 승인 게이트)` block placed at the **top of the report** with a single canonical checkbox marker `- [ ] Approved` (user toggles to `- [x] Approved` to authorise the next `implementation` run). The `User Approval Request` validator key-substring is satisfied by this top block — do NOT recreate a `### 4.5.8 User Approval Request` body stub (the validator now fails reports that contain one, see `validators/validate-run.py` and `templates/reports/final-report.template.md` §4.5.8).
-  - **the marker line is rendered only when the plan-body verification gate (§4.5.9) returns `passed` or `passed-with-dissent`.** When the gate returns `blocked-by-disagreement` or `aborted-non-result`, the top-of-report Approval block is rendered **without** the canonical `- [ ] Approved` bullet (the rest of the block — title, summary, audit lines — stays). The `validators/validate-run.py` `validate_phase_boundary` function enforces this exact correspondence between gate result and marker line presence.
+  - the YAML frontmatter MUST include the line `approved: false` (report-writer always emits the unflipped value). The user authorises the next `implementation` run by flipping it to `approved: true` (manual edit or `--approve` CLI). Do NOT recreate any `User Approval Request` body block — the validator fails reports that contain one (see `validators/validate-run.py` deprecated patterns).
+  - **the frontmatter `approved: false` line is rendered unconditionally; if the plan-body verification gate (§4.5.9) returns `blocked-by-disagreement` or `aborted-non-result`, the writer MUST keep `approved: false` and the validator refuses any report that ships with `approved: true` under such a gate result.**
   - every ambiguity flagged during pre-planning that the user must resolve before approval registered as a `Blocks=approval` row in the `## 5. Clarification Items` table (do NOT create a separate `Open Questions` block under `4.5.x` — the unified table is the single home)
   - **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation.
-  - **ADR evaluation (grill-with-docs adopted, sole owner)**: this phase is the **single owner** of ADR evaluation in the okstra lifecycle. The brief never evaluates or drafts ADRs — it only forwards `adr-candidate:*` signals. Every `adr-candidate:*` entry inherited from the brief's `Open Questions` is a mandatory evaluation target. In addition, evaluate every decision the recommended option introduces against the three ADR criteria:
+  - **Decision-record evaluation (sole owner)**: this phase is the **single owner** of decision-record evaluation in the okstra lifecycle. The brief never evaluates or drafts decision records — it only forwards `adr-candidate:*` signals. Every `adr-candidate:*` entry inherited from the brief's `Open Questions` is a mandatory evaluation target. In addition, evaluate every decision the recommended option introduces against the three criteria:
     1. **Hard to reverse** — would changing the decision later cost meaningfully more than deciding now?
     2. **Surprising without context** — would a future reader, seeing only the code, wonder "why was it built this way?"?
     3. **Real trade-off** — were there named alternatives, and was one picked for specific reasons?
     If **all three** hold, attach a decision draft as a report appendix section titled `Decision Drafts` (one decision per subsection). Each draft uses the `## Context / ## Decision / ## Consequences / ## Alternatives Considered` shape, names the alternatives that were rejected and why, and starts with `## Status: Proposed`. The next decision number is `(max existing in <PROJECT_ROOT>/.project-docs/okstra/decisions/ + 1)` zero-padded to 4 digits. If any of the three criteria is missing, do NOT raise a decision draft — instead record `skipped adr-candidate: <topic> — reason: <criterion that failed>` on one line under `Decision Drafts` so the next reader knows the candidate was evaluated and intentionally dropped.
-    The drafts are NOT written by this phase. The approved plan's stepwise execution order MUST include the step `Create <PROJECT_ROOT>/.project-docs/okstra/decisions/<NNNN>-<slug>.md from the decision draft in section X` so the `implementation` run commits the file. External `<PROJECT_ROOT>/docs/adr/` is never touched.
-  - **Domain-doc proposals**: if `CONTEXT.md` / `CONTEXT-MAP.md` needs a new term or edited definition, add the step `Update CONTEXT.md: <term> = <definition>` to the stepwise execution order. Do NOT edit the file in this phase.
+    The drafts are NOT written by this phase. The approved plan's stepwise execution order MUST include the step `Create <PROJECT_ROOT>/.project-docs/okstra/decisions/<NNNN>-<slug>.md from the decision draft in section X` so the `implementation` run commits the file inside okstra's subtree.
+  - **Glossary proposals**: if a term or definition should become okstra institutional memory, add the step `Update <PROJECT_ROOT>/.project-docs/okstra/glossary.md: <term> = <definition>` to the stepwise execution order. Use no other project-memory path.
 - No-placeholder rule (plan failures — reject any option or step that contains these):
   - "TBD", "TODO", "implement later", "fill in details", "add appropriate error handling", "handle edge cases", "write tests for the above" without actual test code
   - "similar to Option/Task N" without repeating the concrete content (readers may consume sections out of order)
@@ -92,3 +92,4 @@
   4. **Ambiguity check** — any requirement that could be read two ways must be made explicit or moved to the `## 5. Clarification Items` table as a `Blocks=approval` row.
   5. **Scope check** — if the recommended plan now spans multiple independent subsystems, recommend splitting into separate planning runs rather than shipping an oversized plan.
   6. **Plan-body verification reconciliation (BLOCKING for implementation-planning).** Inspect the `### 4.5.9 Plan Body Verification` verdict table. For every plan-item row classified as `majority-disagree → C-<N>`, the corresponding `C-<N>` row MUST exist in `## 5. Clarification Items` with `Kind` chosen per the standard policy and `Blocks=approval`. Do NOT create a parallel `### 4.5.x Open Questions` block — the unified table is the single home. Conversely, the `Classification` column's `C-<N>` reference and the `## 5. Clarification Items` `ID` column MUST match 1:1; an orphan on either side is a contract violation. For `partial-consensus` and `worker-unique` plan-items, the dissenting opinion lives in §4.5.9 `Dissent log` and is NOT promoted to §5.
+  7. **Stage Map self-check** — for every stage, count the effective rows of its `Stepwise Execution Order` table by hand; reject the draft if any stage exceeds 6. Walk the `depends-on` graph and confirm it is a DAG (no cycle, no self-reference). For each `depends-on` link, ask "can this be removed by re-partitioning?" — if yes, re-partition and re-count.