npm - okstra - Versions diffs - 0.55.0 → 0.56.0 - Mend

okstra 0.55.0 → 0.56.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/runtime/prompts/profiles/_implementation-verifier.md CHANGED Viewed

@@ -39,7 +39,28 @@ Verifier obtains the QA command set from exactly two declared sources, in order
 ### Execution rule
-Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each command runs in the worktree cwd, and is recorded in the worker result with its exact command line, exit code, and the tail of stdout/stderr. Substituting or paraphrasing a Tier 1 command is forbidden (see Verifier-specific forbidden actions below).
+Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Then the Tier 3 stage conformance script (below) runs once. Each command runs in the worktree cwd, and is recorded in the worker result with its exact command line, exit code, and the tail of stdout/stderr. Substituting or paraphrasing a Tier 1 command is forbidden (see Verifier-specific forbidden actions below).
+### Tier 3 — stage conformance scripts (요구사항 부합 검증)
+Tiers 1·2 prove the diff *builds and passes*; Tier 3 proves the stage actually *meets the upper-level requirement* it was scoped to, by running a declared conformance script against the running state. This is a real gate — its result sidecar is the input the `validate-run.py` Tier 3 gate reads, so a missing or non-PASS result BLOCKS acceptance.
+- **Source.** The conformance manifest is `<task_root>/qa/conformance-manifest.json` (the directory is the `TASK_QA_PATH` token). This run's stage conformance entry is the manifest `entries[]` item whose `stageKey` equals this run's stageKey — `<task-id>-stage-<N>`, where `<N>` is the injected Stage number. Find that one entry; ignore the others (other stages are run by their own implementation runs or by final-verification).
+- **Exemption / waiver → do NOT run.** If the entry carries an `exemption` (or a user `waiver`), the verifier does NOT execute the script. It records the fact and the reason (`exemption.reason` / `waiver.reason` + `waiver.acknowledgedBy`) in the Read-only command log AND writes the result sidecar reflecting the skip. An `exemption` passes the gate outright; a `waiver` passes but is conditional (conformance left unverified by explicit user acknowledgement). No script runs in either case.
+- **Otherwise run `runCommand` in the worktree cwd.** Execute the entry's `runCommand` verbatim from the worktree cwd. Inject env from `<PROJECT_ROOT>/.okstra/project.json`'s `qaEnv` (replica DB DSN / app base URL / env file — declared in Phase 4e). This is a **replica / test environment only** path — never run it against shared / staging / prod, identical to the DB real-execution gate principle above.
+- **Interpret the standard interface.** Parse the process exit code together with stdout: the `QA-RESULT: PASS|FAIL` marker line (if several appear, the last one wins) and the per-requirement `REQ <id>: PASS|FAIL: <reason>` lines. If no `QA-RESULT` marker is emitted, the overall result is `MISSING` — which the gate treats as BLOCKING (the script broke the contract).
+- **Write the result sidecar (BLOCKING deliverable).** Write `<task_root>/qa/result-<stageKey>.json` as:
+  ```json
+  {
+    "stageKey": "<task-id>-stage-<N>",
+    "overall": "PASS",
+    "ranAt": "<UTC ISO8601>",
+    "requirements": { "<id>": { "status": "PASS", "reason": "<from REQ line>" } }
+  }
+  ```
+  `overall` is exactly one of `PASS` / `FAIL` / `MISSING`. This file is the input to the `validate-run.py` Tier 3 gate — if it is absent the gate reports the stage as "never ran" and BLOCKS, so writing it is mandatory whenever the script runs (and on the exemption/waiver skip path, recording the skip outcome).
+- **Read-only command log.** Record the `runCommand` exact line + its exit code in the Read-only command log. Unlike Tiers 1·2, a conformance script MAY mutate the **replica datastore** (exercising integrated state is its whole purpose) — but only the `qaEnv` replica target, never a shared/staging/prod store. The `runCommand` itself is still subject to the same source/lockfile mutation deny-list as Tier 2 (`--fix`, `npm install` without `ci`, etc.); a denied token aborts with `contract-violated`.
+- **No manifest / no entry for this stage.** If the manifest file is absent, or it has no entry whose `stageKey` matches this run's stageKey, the verifier records `conformance: no manifest entry for <stageKey>` and proceeds (forcing the *declaration* of conformance entries is the job of planning Step 11 + the `validate-run.py` diff-surface cross-check, not the verifier).
 ### Missing-tier handling
@@ -55,7 +76,7 @@ If the verifier's re-run result differs from what the executor reported (a passi
 ### Read-only command log (per verifier)
-The worker result MUST contain a `Read-only command log` block listing every command executed during the verifier run with its exact invocation and exit code, in execution order. No mutating command may appear in this block. This log is copied into the final report's verifier result section verbatim.
+The worker result MUST contain a `Read-only command log` block listing every command executed during the verifier run with its exact invocation and exit code, in execution order — including the Tier 3 conformance `runCommand` (or the exemption/waiver skip note when no script ran). No source-mutating command may appear in this block; the only permitted mutation is a Tier 3 conformance script writing to its `qaEnv` replica datastore, which is logged like any other command. This log is copied into the final report's verifier result section verbatim.
 ### Verifier evidence is independent of executor evidence

package/runtime/prompts/profiles/final-verification.md CHANGED Viewed

@@ -36,6 +36,7 @@
   - **Validation Evidence**: for every requirement in the originating plan or task brief, cite the artifact (commit SHA, test output, log line, MCP SELECT result) that demonstrates coverage. Paraphrased "verified" claims without an artifact are rejected.
   - **Read-only command log**: any pre-existing test/validation command executed during this run MUST be listed with its exact command line and exit code. No mutating commands may appear here.
   - **Two-tier command lookup (shared with `implementation`):** when this phase performs its own independent re-validation, the command source is exactly the same two tiers `implementation` verifiers use — Tier 1 is the originating task brief / approved plan's `validation` set, Tier 2 is `<PROJECT_ROOT>/.okstra/project.json` under `qaCommands`. Auto-detecting tools from manifest files is forbidden; missing tiers are recorded as `qa-command not configured: <category>` and do NOT trigger a guess. The `cmd` deny-list (`--fix`, `--write`, ` -w`, ` -u`, `--snapshot-update`, `INSTA_UPDATE=<not-no>`, `cargo update`, `npm install` without `ci`, etc.) is enforced identically. NOTE: runtime fail-fast validation (`okstra_ctl.qa_commands.validate_qa_commands`) only fires at `--task-type implementation` run-prep, so this phase MUST self-check each `qaCommands` entry against the deny-list before executing it — if a denied token is present, skip the command and record it as a `Read-only command log` line `qa-command rejected (denied token: <token>): <label>`.
+  - **Tier 3 — stage conformance scripts (whole-task union):** because this phase verifies the **integrated, merged** state, it re-runs conformance against that state rather than per-stage. Read the task-level manifest `<task_root>/qa/conformance-manifest.json` (the directory is the `TASK_QA_PATH` token) and, in **whole-task scope**, run the `runCommand` of **every** `entries[]` item against the merged worktree, refreshing each `<task_root>/qa/result-<stageKey>.json` (`{ "stageKey", "overall": "PASS"|"FAIL"|"MISSING", "ranAt", "requirements" }`). In **single-stage scope**, run only the entry whose `stageKey` matches the verified stage. An entry carrying an `exemption` or user `waiver` is NOT executed — record the skip and reason; a `waiver` becomes a `conditional-accept` condition surfaced in the section 7 Verdict (conformance left unverified by user acknowledgement). Each `runCommand` runs in the worktree cwd with `qaEnv` env (replica DB DSN / app base URL / env file) — **replica / test environment only**, never shared / staging / prod, and the same source/lockfile mutation deny-list applies (a conformance script MAY mutate only its `qaEnv` replica datastore). Interpret each result from the exit code + stdout `QA-RESULT: PASS|FAIL` (last wins) and `REQ <id>: PASS|FAIL: <reason>` lines; no `QA-RESULT` marker → `MISSING`. Any entry whose result is not `PASS` (including `MISSING` or a never-run/missing sidecar) is an **Acceptance Blocker** (`major`+) — exactly like the DB real-execution gate above, since `accepted` requires zero blockers the verdict becomes `conditional-accept` / `blocked`. This is the same gate the `validate-run.py` Tier 3 check enforces on the result sidecars.
   - **Routing recommendation**: the next safe phase — one of `release-handoff`, `done`, `error-analysis`, `implementation-planning` — tied to the verdict and blocker list. `release-handoff` is allowed ONLY when the Verdict Token is `accepted`. `release-handoff` is additionally allowed ONLY when the verification scope (the `Verification scope:` line of the injected `VERIFICATION_TARGET` block, recorded as the report's `verificationScope` field) is `whole-task`; a `single-stage` run is partial and routes to `implementation` / `done` even on an `accepted` verdict.
 - Clarification request policy (phase-specific addendum — shared policy is in `_common-contract.md`):
   - populate `## 1. Clarification Items` only when a blocker hinges on information only the user can supply (deployment intent, intended target environment, business-rule interpretation); use `Blocks=next-phase` for items that gate continuing to release-handoff

package/runtime/prompts/profiles/implementation-planning.md CHANGED Viewed

@@ -71,6 +71,10 @@
   - **Per-stage subsections** (`## 5.5.<i> Stage <i>: <title>` for each `i`), each containing the four required subsections:
     - `### Carry-In` — for `depends-on (none)`: task-brief only. Otherwise: each depended-on stage's static exit contract + runtime sidecar path `runs/<impl-key>/carry/stage-<i>.json` placeholder.
     - `### Stepwise Execution Order` — bite-sized table with `step | action | files | command | expected`. **Effective row count ≤ 6** (excluding header / divider / blank). Each step is one action completable in 2–5 minutes; for code steps include actual code or diff sketch. **TDD ordering is MUST, not a preference:** the **first** effective step's `action` cell MUST start with the literal `RED:` and describe the failing test that captures this stage's `Acceptance` (`expected` = FAIL); at least one later `action` cell MUST start with the literal `GREEN:` and describe the minimal implementation that makes it pass (`expected` = PASS); an optional refactor step starts with `REFACTOR:`. **Exemption:** doc-only / config-only / pure-rename stages with no observable runtime behaviour may omit RED/GREEN by declaring one line `TDD exemption: <reason>` in the stage section (mirrors the executor's per-step exemption in `_implementation-executor.md`). Validator S10c enforces RED-first + GREEN, or the exemption line.
+    - **Per-stage conformance declaration (mandatory one line, in the stage section — same placement freedom as `TDD exemption:`):** the stage MUST carry exactly one of:
+      - `Conformance tests: stage-<N> — <task_root>/qa/stage-<N>.<ext> (requires=[db|io|http|external,...])` — a Tier3 verification script that proves this stage's upstream requirements (brief / requirements-discovery / error-analysis / improvement-discovery → this stage's `Acceptance`) hold against **real** DB rows, real endpoints, or the real external API — NOT mocks. When you emit this line you MUST also (a) write the script to `<task_root>/qa/stage-<N>.<ext>` and (b) add a matching entry to `<task_root>/qa/conformance-manifest.json` with fields `stageKey` (= `<task-id>-stage-<N>`), `script`, `runCommand`, `requirementIds`, `requires` (subset of `{db, io, http, external}`), `passContract`, `exemption: null`, `waiver: null`. The script's standard interface: a `main` that exits `0`=PASS / non-zero=FAIL, and whose stdout ends with `QA-RESULT: PASS|FAIL` followed by one `REQ <id>: PASS|FAIL: <근거>` line per requirement.
+      - `Conformance exemption: <reason>` — only for stages that touch no db/io/http/external surface, or where unit tests fully cover the increment. (If the eventual `implementation` diff actually touches one of those surfaces, `validate-run.py`'s diff-surface cross-check is BLOCKING — an exemption cannot hide a real db/io/http/external change.)
+      The manifest lives at the **task level** (`<task_root>/qa/`, path token `TASK_QA_PATH`) and is shared across planning → implementation → final-verification. This declaration is enforced at three layers: `validators/validate-implementation-plan-stages.py` check **S11** forces every stage to carry one of the two lines; the manifest JSON structure is enforced by `validate_conformance_manifest` (run / validate-run); and the result gate (each script's `QA-RESULT`) is enforced by the verifier Tier3 + validate-run.
     - `### Stage Exit Contract` — predicted added/modified files, newly exposed identifiers/types/endpoints, downstream-usable resources.
     - `### Stage Validation` — pre / mid / post exact commands or observable outcomes for this stage only.
   - **Vertical-slice-first partition rule (1st-class):** the grouping anchor is a **thin end-to-end vertical slice** — one stage delivers a single user-observable increment, crossing whatever layers are needed (data → service → API → UI) to make that one increment work. File/module proximity is demoted to the **intra-slice grouping rule**: within a slice, keep steps touching the same file/directory/module together so the diff, PR, and rollback unit stay cohesive. **Horizontal layer-splitting is forbidden** — never carve "the DB layer" into one stage and "the service layer" into the next; that produces stages that ship no standalone user value. A stage is split ONLY when (a) a real `depends-on` data/contract dependency exists, (b) effective steps would exceed 6, or (c) it is a distinct vertical slice (a different user-value increment). Maximising the number of parallel stages is NOT a reason to split — parallelism is an emergent property of independent stages, never a partitioning goal.

package/runtime/python/okstra_ctl/conformance.py ADDED Viewed

@@ -0,0 +1,270 @@
+"""Stage conformance(Tier 3) 매니페스트 검증 + `QA-RESULT` 파서.
+implementation/final-verification 의 verifier 는 stage 별 conformance 스크립트를
+실행해 상위 요구사항 부합을 검증한다. 본 모듈은 그 검증/파싱의 결정론적 코어다.
+1. `conformance-manifest.json` 구조 검증 (`validate_conformance_manifest`).
+2. 스크립트 stdout 의 `QA-RESULT` 마커 파싱 (`parse_qa_result`).
+스크립트 실행/게이트 강제는 verifier prompt 와 validators/validate-run.py 가 담당한다.
+"""
+from __future__ import annotations
+import fnmatch
+import re
+from dataclasses import dataclass
+# diff 가 건드린 표면과 대조할 capability 태그 화이트리스트.
+CAPABILITY_WHITELIST: tuple[str, ...] = ("db", "io", "http", "external")
+def _check_nonempty_str(value: object, path: str, errors: list[str]) -> bool:
+    if not isinstance(value, str) or not value.strip():
+        errors.append(f"{path} must be a non-empty string")
+        return False
+    return True
+def _check_capabilities(value: object, path: str, errors: list[str]) -> None:
+    if not isinstance(value, list):
+        errors.append(f"{path} must be an array")
+        return
+    for cap in value:
+        if cap not in CAPABILITY_WHITELIST:
+            errors.append(
+                f"{path}: unknown capability {cap!r} "
+                f"(allowed: {', '.join(CAPABILITY_WHITELIST)})"
+            )
+def _check_exemption(value: object, path: str, errors: list[str]) -> None:
+    if value is None:
+        return
+    if not isinstance(value, dict):
+        errors.append(f"{path} must be an object or null")
+        return
+    _check_nonempty_str(value.get("reason"), f"{path}.reason", errors)
+    _check_nonempty_str(value.get("declaredAt"), f"{path}.declaredAt", errors)
+def _check_waiver(value: object, path: str, errors: list[str]) -> None:
+    if value is None:
+        return
+    if not isinstance(value, dict):
+        errors.append(f"{path} must be an object or null")
+        return
+    _check_nonempty_str(value.get("acknowledgedBy"), f"{path}.acknowledgedBy", errors)
+    _check_nonempty_str(value.get("reason"), f"{path}.reason", errors)
+    _check_nonempty_str(value.get("at"), f"{path}.at", errors)
+    _check_capabilities(value.get("scope", []), f"{path}.scope", errors)
+def _check_entry(entry: object, idx: int, errors: list[str]) -> None:
+    path = f"entries[{idx}]"
+    if not isinstance(entry, dict):
+        errors.append(f"{path} must be an object")
+        return
+    _check_nonempty_str(entry.get("stageKey"), f"{path}.stageKey", errors)
+    _check_nonempty_str(entry.get("script"), f"{path}.script", errors)
+    _check_nonempty_str(entry.get("runCommand"), f"{path}.runCommand", errors)
+    _check_nonempty_str(entry.get("passContract"), f"{path}.passContract", errors)
+    req_ids = entry.get("requirementIds")
+    if (
+        not isinstance(req_ids, list)
+        or not req_ids
+        or not all(isinstance(r, str) and r.strip() for r in req_ids)
+    ):
+        errors.append(f"{path}.requirementIds must be a non-empty array of strings")
+    _check_capabilities(entry.get("requires", []), f"{path}.requires", errors)
+    _check_exemption(entry.get("exemption"), f"{path}.exemption", errors)
+    _check_waiver(entry.get("waiver"), f"{path}.waiver", errors)
+def validate_conformance_manifest(manifest: object) -> list[str]:
+    """conformance-manifest 전체 검증. 위반 메시지 리스트 반환(비면 안전).
+    매니페스트 부재(None)는 합법 — 스크립트 없는 task 가 있을 수 있고, 게이트
+    강제(diff surface 대조)는 validators/validate-run.py 가 판정한다.
+    """
+    if manifest is None:
+        return []
+    if not isinstance(manifest, dict):
+        return [f"conformance manifest must be an object, got {type(manifest).__name__}"]
+    entries = manifest.get("entries")
+    if not isinstance(entries, list):
+        return ["conformance manifest .entries must be an array"]
+    errors: list[str] = []
+    seen: set[str] = set()
+    for idx, entry in enumerate(entries):
+        _check_entry(entry, idx, errors)
+        key = entry.get("stageKey") if isinstance(entry, dict) else None
+        if isinstance(key, str) and key:
+            if key in seen:
+                errors.append(f"entries[{idx}].stageKey duplicate: {key!r}")
+            seen.add(key)
+    return errors
+_QA_RESULT_RE = re.compile(r"^QA-RESULT:\s*(PASS|FAIL)\s*$", re.MULTILINE)
+_REQ_LINE_RE = re.compile(r"^REQ\s+(\S+):\s*(PASS|FAIL):\s*(.*)$", re.MULTILINE)
+@dataclass
+class QaResult:
+    overall: str  # "PASS" | "FAIL" | "MISSING"
+    requirements: dict[str, dict[str, str]]  # id -> {"status": "PASS"|"FAIL", "reason": str}
+def parse_qa_result(stdout: str) -> QaResult:
+    """스크립트 stdout 에서 `QA-RESULT` 마커 + `REQ` 줄 파싱.
+    마커가 없으면 overall='MISSING' — 스크립트가 계약을 안 지킨 것이므로 게이트는
+    FAIL 로 취급한다. 마커가 여럿이면 마지막 것을 채택한다.
+    """
+    text = stdout or ""
+    markers = _QA_RESULT_RE.findall(text)
+    overall = markers[-1] if markers else "MISSING"
+    requirements: dict = {}
+    for rid, status, reason in _REQ_LINE_RE.findall(text):
+        requirements[rid] = {"status": status, "reason": reason.strip()}
+    return QaResult(overall=overall, requirements=requirements)
+@dataclass
+class ConformanceVerdict:
+    stage_key: str
+    status: str        # "PASS" | "BLOCKING" | "WAIVED" | "EXEMPT"
+    ok: bool           # 진행 허용 여부 (PASS/WAIVED/EXEMPT 면 True)
+    conditional: bool  # WAIVED 일 때만 True — conformance 미검증(사용자 확인)
+    message: str
+def decide_conformance_gate(entry: dict, result: object) -> ConformanceVerdict:
+    """단일 stage entry + 실행 결과(`QaResult | None`)로 게이트 판정.
+    우선순위: exemption → waiver → 결과 평가. 미실행/MISSING/FAIL 은 BLOCKING.
+    면제·waiver 의 형태 검증은 `validate_conformance_manifest` 가 이미 보장한다.
+    """
+    key = entry.get("stageKey", "<unknown>")
+    exemption = entry.get("exemption")
+    if exemption:
+        return ConformanceVerdict(
+            key, "EXEMPT", True, False,
+            f"conformance exempted: {exemption.get('reason', '')}",
+        )
+    waiver = entry.get("waiver")
+    if waiver:
+        return ConformanceVerdict(
+            key, "WAIVED", True, True,
+            f"conformance waived by {waiver.get('acknowledgedBy', '?')}: "
+            f"{waiver.get('reason', '')}",
+        )
+    overall = getattr(result, "overall", None)  # None when result is None → "never ran"
+    if overall == "PASS":
+        return ConformanceVerdict(key, "PASS", True, False, "conformance PASS")
+    if overall is None:
+        return ConformanceVerdict(
+            key, "BLOCKING", False, False,
+            "conformance script never ran (no result recorded)",
+        )
+    if overall == "MISSING":
+        return ConformanceVerdict(
+            key, "BLOCKING", False, False,
+            "conformance script ran but emitted no QA-RESULT marker",
+        )
+    return ConformanceVerdict(key, "BLOCKING", False, False, f"conformance {overall}")
+def qa_result_from_dict(data: object) -> QaResult:
+    """결과 사이드카(JSON dict)를 `QaResult` 로 복원. Phase 3 의 verifier 가 쓴
+    `result-stage-<N>.json` 을 validate-run 이 로드할 때 쓴다. 형태가 깨졌으면
+    overall='MISSING'(=BLOCKING 취급)으로 안전하게 강등한다."""
+    if not isinstance(data, dict):
+        return QaResult(overall="MISSING", requirements={})
+    overall = data.get("overall")
+    if overall not in ("PASS", "FAIL", "MISSING"):
+        overall = "MISSING"
+    reqs = data.get("requirements")
+    return QaResult(overall=overall, requirements=reqs if isinstance(reqs, dict) else {})
+def evaluate_conformance(manifest: object, results_by_stage: object) -> list[ConformanceVerdict]:
+    """매니페스트 전 entry 에 대해 게이트 판정 목록을 반환.
+    `results_by_stage`: stageKey -> `QaResult`. 키가 없으면 미실행(None)으로 본다.
+    매니페스트 구조 검증은 호출 전에 `validate_conformance_manifest` 로 끝낸다는 전제.
+    """
+    entries = manifest.get("entries") if isinstance(manifest, dict) else None
+    if not isinstance(entries, list):
+        return []
+    results = results_by_stage if isinstance(results_by_stage, dict) else {}
+    verdicts: list[ConformanceVerdict] = []
+    for entry in entries:
+        if not isinstance(entry, dict):
+            continue
+        result = results.get(entry.get("stageKey"))
+        verdicts.append(decide_conformance_gate(entry, result))
+    return verdicts
+# 경로 → capability surface 기본 매핑. 프로젝트별 override 는 qaEnv.surfacePatterns
+# (Phase 4e). 'external' 은 경로로 감지하기 어려워 기본 패턴 없음 — 명시 선언 의존.
+_DEFAULT_SURFACE_PATTERNS: dict[str, tuple[str, ...]] = {
+    "db": ("*.sql", "*migration*", "*repository*", "*.entity.*", "*entities*", "*schema.prisma*"),
+    "http": ("*controller*", "*.routes.*", "*router*", "*endpoint*", "*.api.*"),
+    "io": ("*filesystem*", "*storage*", "*.fs.*"),
+}
+def detect_surfaces(file_paths: object, patterns: object = None) -> set[str]:
+    """변경된 파일 경로들에서 capability surface 집합을 감지(소문자 fnmatch).
+    `patterns` 미지정 시 기본 매핑 사용."""
+    table = patterns if isinstance(patterns, dict) else _DEFAULT_SURFACE_PATTERNS
+    found: set[str] = set()
+    for raw in file_paths or []:
+        if not isinstance(raw, str):
+            continue
+        path = raw.strip().lower()
+        for surface, globs in table.items():
+            if any(fnmatch.fnmatch(path, g) for g in globs):
+                found.add(surface)
+    return found
+def parse_qa_waiver_arg(arg: object) -> tuple[str, str] | None:
+    """`--qa-waiver` 값 `<stageKey>:<reason>` 를 (stageKey, reason) 로 분해.
+    형식이 아니거나 비면 None."""
+    if not isinstance(arg, str) or ":" not in arg:
+        return None
+    key, reason = arg.split(":", 1)
+    key, reason = key.strip(), reason.strip()
+    if not key or not reason:
+        return None
+    return key, reason
+def apply_qa_waiver(manifest: object, stage_key: str, reason: str, *, at: str,
+                    acknowledged_by: str = "user") -> bool:
+    """매니페스트에서 stage_key entry 의 `waiver` 를 채운다(in place). 찾으면 True.
+    사용자 확인형 우회(spec §7.2) — reason 은 사용자 지시 원문."""
+    entries = manifest.get("entries") if isinstance(manifest, dict) else None
+    if not isinstance(entries, list):
+        return False
+    for entry in entries:
+        if isinstance(entry, dict) and entry.get("stageKey") == stage_key:
+            entry["waiver"] = {"acknowledgedBy": acknowledged_by, "reason": reason,
+                               "scope": [], "at": at}
+            return True
+    return False
+def manifest_required_surfaces(manifest: object) -> set[str]:
+    """매니페스트 전 entry 의 `requires` 합집합 — 선언된 surface 집합."""
+    entries = manifest.get("entries") if isinstance(manifest, dict) else None
+    if not isinstance(entries, list):
+        return set()
+    out: set[str] = set()
+    for entry in entries:
+        if isinstance(entry, dict) and isinstance(entry.get("requires"), list):
+            out.update(c for c in entry["requires"] if isinstance(c, str))
+    return out

package/runtime/python/okstra_ctl/paths.py CHANGED Viewed

@@ -117,6 +117,7 @@ def compute_run_paths(
     task_index = task_root / "task-index.md"
     instruction_set = task_root / "instruction-set"
     analysis_packet = instruction_set / "analysis-packet.md"
+    task_qa = task_root / "qa"
     runs_dir = task_root / "runs"
     history_dir = task_root / "history"
     timeline_file = history_dir / "timeline.json"
@@ -202,6 +203,7 @@ def compute_run_paths(
         "TASK_INDEX_PATH": str(task_index),
         "INSTRUCTION_SET_PATH": str(instruction_set),
         "ANALYSIS_PACKET_PATH": str(analysis_packet),
+        "TASK_QA_PATH": str(task_qa),
         "RUNS_DIR": str(runs_dir),
         "HISTORY_DIR": str(history_dir),
         "TIMELINE_PATH": str(timeline_file),

package/runtime/python/okstra_ctl/run.py CHANGED Viewed

@@ -276,6 +276,9 @@ class PrepareInputs:
     work_category: str = ""
     base_ref: str = ""
     approved_plan_path: str = ""
+    # implementation 전용: `--qa-waiver "<stageKey>:<reason>"` 사용자 확인형 우회.
+    # prepare-time 에 task-level conformance 매니페스트 entry.waiver 를 채운다.
+    qa_waiver: str = ""
     stage: str = "auto"
     clarification_response_path: str = ""  # absolute or empty
     # release-handoff 전용: PR 본문 템플릿 1회성 override. 빈 문자열이면
@@ -1092,6 +1095,28 @@ def _validate_prepare_inputs(project_root: Path, inp: PrepareInputs) -> list:
     return ctx_stage_map
+def _apply_qa_waiver_if_requested(inp: "PrepareInputs", project_root: Path) -> None:
+    """`--qa-waiver` 가 있으면 task-level 매니페스트 entry 의 waiver 를 채운다."""
+    if not inp.qa_waiver:
+        return
+    from .conformance import apply_qa_waiver, parse_qa_waiver_arg
+    from .paths import task_dir
+    parsed = parse_qa_waiver_arg(inp.qa_waiver)
+    if parsed is None:
+        raise PrepareError(
+            f'--qa-waiver must be "<stageKey>:<reason>", got {inp.qa_waiver!r}'
+        )
+    stage_key, reason = parsed
+    manifest_path = task_dir(project_root, inp.task_group, inp.task_id) / "qa" / "conformance-manifest.json"
+    if not manifest_path.is_file():
+        raise PrepareError(f"--qa-waiver: conformance manifest not found at {manifest_path}")
+    manifest = json.loads(manifest_path.read_text())
+    when = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+    if not apply_qa_waiver(manifest, stage_key, reason, at=when):
+        raise PrepareError(f"--qa-waiver: stageKey {stage_key!r} not in manifest {manifest_path}")
+    manifest_path.write_text(json.dumps(manifest, indent=2, ensure_ascii=False) + "\n")
 def _register_and_check_project(project_root: Path, inp: PrepareInputs) -> None:
     """project.json self-registration + (implementation 한정) qaCommands gate 검증."""
     from okstra_project import ResolverError
@@ -1120,6 +1145,7 @@ def _register_and_check_project(project_root: Path, inp: PrepareInputs) -> None:
             qa_errors = validate_qa_commands(project_meta.get("qaCommands"))
             if qa_errors:
                 raise PrepareError(_format_qa_errors(qa_errors))
+        _apply_qa_waiver_if_requested(inp, project_root)
 def _resolve_roster(inp: PrepareInputs, profile_file: Path) -> tuple[list[str], str]:
@@ -1860,6 +1886,8 @@ def main(argv: list[str]) -> int:
     p.add_argument("--critic", default="")
     p.add_argument("--related-tasks", default="", dest="related_tasks_raw")
     p.add_argument("--approved-plan", default="", dest="approved_plan_path")
+    p.add_argument("--qa-waiver", default="", dest="qa_waiver",
+                   help='Stage conformance 우회: "<stageKey>:<reason>" (사용자 확인형, 매니페스트 entry.waiver 기록)')
     p.add_argument(
         "--stage", default="auto", dest="stage",
         help=(
@@ -1975,6 +2003,7 @@ def main(argv: list[str]) -> int:
         work_category=args.work_category,
         base_ref=args.base_ref,
         approved_plan_path=args.approved_plan_path,
+        qa_waiver=args.qa_waiver,
         stage=args.stage,
         clarification_response_path=clarification_abs,
         pr_template_path=args.pr_template_path,

package/runtime/skills/okstra-run/SKILL.md CHANGED Viewed

@@ -184,6 +184,18 @@ The python function underneath is mutex-protected (`~/.okstra/.locks/<task-key>.
 You can delete the literal state-file path after this point — its job is done. Invoke `rm` with the literal path (e.g. `rm /var/folders/.../okstra-wizard.AbCd.json`), not a shell variable.
+### Step 5.1 (implementation only): conformance waiver offer
+`render-bundle` accepts an optional `--qa-waiver "<stageKey>:<reason>"` flag (implementation only). It records a **user-acknowledged** waiver into the task-level conformance manifest entry (`entry.waiver`), letting the run proceed when a stage's Tier 3 conformance script genuinely cannot run (e.g. the replica DB is unreachable). The waiver records the user's reason **verbatim**.
+This is **never** a lead/worker self-exemption — only the user may waive. Offer it **only** when conformance BLOCKING is expected (the chosen stage declares a conformance entry whose script you cannot run in this environment). Surface it as a 3-option recommendation picker (per the run-prompt recommendation rule):
+1. (recommended) Run the conformance script — no waiver.
+2. Waive this stage — ask the user for the exact `<stageKey>` and reason, then pass `--qa-waiver "<stageKey>:<reason>"` to `render-bundle` (reason = the user's words, unedited).
+3. 직접 입력 — the user types the full `<stageKey>:<reason>` value.
+When the user picks a waiver, append `--qa-waiver "<stageKey>:<reason>"` to the `render-bundle` invocation above. Omit the flag entirely otherwise (do **not** pass `--qa-waiver ""`). A malformed value or unknown `<stageKey>` aborts `render-bundle` with a `PrepareError`.
 ## Step 6: Take over as Claude lead
 Read `<INSTRUCTION_SET_PATH>/claude-execution-prompt.md` verbatim and enter `Claude lead` mode. The lead prompt now points to compact intake artifacts first (`active-run-context`, `analysis-profile.md`, and `analysis-packet.md`); full source files such as `analysis-material.md`, `reference-expectations.md`, and `final-report-template.md` are lazy/fallback inputs. Follow the rendered prompt order, do not preempt it.

package/runtime/skills/okstra-setup/SKILL.md CHANGED Viewed

@@ -181,6 +181,41 @@ The field is preserved across the runtime's auto-upserts of
 `updatedAt` are runtime-owned, so manual edits to `qaCommands`
 survive every subsequent `okstra setup` / `okstra run` invocation.
+### Step 4.6.1 (optional): `qaEnv` — Tier 3 conformance environment
+`implementation` / `final-verification` verifiers run **stage
+conformance scripts** (Tier 3) that may need to reach a database or an
+HTTP endpoint to prove the diff satisfies upstream requirements. Declare
+the environment those scripts are allowed to touch under `qaEnv`. Every
+field is optional; declare only what your conformance scripts use.
+```json
+"qaEnv": {
+  "replicaDbDsn": "<replica/test DB DSN — never shared/staging/prod>",
+  "appBaseUrl": "http://localhost:3000",
+  "envFile": ".okstra/qa.env",
+  "surfacePatterns": { "db": ["*.sql", "*repository*"], "http": ["*controller*"] }
+}
+```
+- `replicaDbDsn` — DSN the conformance script connects to. MUST be a
+  replica / disposable test DB, **never** a shared, staging, or
+  production database (conformance scripts may write).
+- `appBaseUrl` — base URL for endpoint-level conformance checks
+  (local app only).
+- `envFile` — path (under `.okstra/`) to an env file the verifier
+  sources before running conformance scripts.
+- `surfacePatterns` — per-project **override** of the diff-surface
+  cross-check map (`capability → glob list`). The validator maps each
+  changed file to a capability surface (`db` / `http` / `io`) and fails
+  the run when the diff touches a surface no stage `requires`. The
+  built-in patterns (e.g. `*router*` for `http`, `*storage*` for `io`)
+  are broad and match many front-end files, so front-end-heavy repos
+  should override with narrower globs to avoid false BLOCKING verdicts
+  (Phase 4b review note). An over-broad pattern over-blocks; an
+  over-narrow one lets an undeclared surface through — tune to the
+  repo's real db/http/io file naming.
 ## Step 4.7 (automatic): project-local Claude settings symlink
 `okstra setup` (and `okstra run` on its first invocation per project)

package/runtime/validators/validate-implementation-plan-stages.py CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""S1–S10 checks for the Stage Map structure of an approved
+"""S1–S11 checks for the Stage Map structure of an approved
 implementation-planning final-report.md. Run from prepare_task_bundle
 of `implementation` task or standalone."""
@@ -40,7 +40,7 @@ class StageMeta:
 @dataclass
 class ValidationError:
-    code: str   # S1..S10
+    code: str   # S1..S11
     stage: int  # 0 = global
     message: str
@@ -168,6 +168,8 @@ def _check_each_stage_section(text: str, stages: List[StageMeta]) -> List[Valida
 SLICE_VALUE = re.compile(r"^\s*Slice value\s*:\s*(.+?)\s*$", re.M)
 ACCEPTANCE = re.compile(r"^\s*Acceptance\s*:\s*(.+?)\s*$", re.M)
 TDD_EXEMPTION = re.compile(r"^\s*TDD exemption\s*:\s*\S", re.M)
+CONFORMANCE_TESTS = re.compile(r"^\s*Conformance tests\s*:\s*\S", re.M)
+CONFORMANCE_EXEMPTION = re.compile(r"^\s*Conformance exemption\s*:\s*\S", re.M)
 def _check_slice_tdd(text: str, stages: List[StageMeta]) -> List[ValidationError]:
@@ -204,6 +206,28 @@ def _check_slice_tdd(text: str, stages: List[StageMeta]) -> List[ValidationError
     return errs
+def _check_conformance_declaration(
+    text: str, stages: List[StageMeta]
+) -> List[ValidationError]:
+    """S11: 각 stage 는 conformance 검증을 선언하거나 명시적으로 면제한다.
+    S11 — `Conformance tests:` 라인(Tier3 검증 스크립트 선언) 또는
+          `Conformance exemption:` 라인(테스트 불필요 사유) 중 하나 필수.
+    diff 가 db/io/http surface 를 건드렸는데 아무 선언이 없는 silent-pass(DEV-9184)
+    를 planning boundary 에서 차단한다.
+    """
+    errs: List[ValidationError] = []
+    for s in stages:
+        section = _slice_stage_section(text, s.stage_number)
+        if not (CONFORMANCE_TESTS.search(section) or CONFORMANCE_EXEMPTION.search(section)):
+            errs.append(ValidationError(
+                "S11", s.stage_number,
+                "S11: stage must declare 'Conformance tests:' (Tier3 검증 스크립트) "
+                "or 'Conformance exemption:' (사유) — stage conformance QA design §12.2",
+            ))
+    return errs
 def _check_depends_on(stages: List[StageMeta]) -> List[ValidationError]:
     errs: List[ValidationError] = []
     valid = {s.stage_number for s in stages}
@@ -274,7 +298,7 @@ def _check_parallel_safety(text: str, stages: List[StageMeta]) -> List[Validatio
 def collect_validation_errors(text: str) -> List[ValidationError]:
-    """All S1–S10 checks against the report text; empty list means valid.
+    """All S1–S11 checks against the report text; empty list means valid.
     S1 (missing `## 5.5 Stage Map` heading) makes the rest unparseable, so it
     short-circuits. Shared by `main()` (CLI / implementation entry) and the
@@ -290,6 +314,7 @@ def collect_validation_errors(text: str) -> List[ValidationError]:
     if stages:
         errors.extend(_check_each_stage_section(text, stages))
         errors.extend(_check_slice_tdd(text, stages))
+        errors.extend(_check_conformance_declaration(text, stages))
         errors.extend(_check_depends_on(stages))
         errors.extend(_check_parallel_safety(text, stages))
     return errors