npm - okstra - Versions diffs - 0.16.0 → 0.18.0 - Mend

okstra 0.16.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.kr.md +1 -1
package/README.md +1 -1
package/docs/kr/architecture.md +8 -8
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/prompts/profiles/final-verification.md +1 -0
package/runtime/prompts/profiles/implementation.md +37 -1
package/runtime/python/okstra_ctl/__init__.py +8 -0
package/runtime/python/okstra_ctl/qa_commands.py +165 -0
package/runtime/python/okstra_ctl/run.py +17 -0
package/runtime/python/okstra_ctl/workers.py +3 -2
package/runtime/skills/okstra-setup/SKILL.md +48 -0

package/README.kr.md CHANGED Viewed

@@ -6,7 +6,7 @@
 ## 1. 용도
-`okstra` 는 **Claude Code 안에서 lead + worker 모델로 작업을 cross-verify 하기 위한 정형화된 task 실행 러너**입니다. Claude lead 가 phase 진행을 주도하고, 독립된 분석 worker 3종 — **Claude · Codex · Gemini** — 와 최종 보고서 작성을 전담하는 report-writer 를 dispatch 합니다.
+`okstra` 는 **Claude Code 안에서 lead + worker 모델로 작업을 cross-verify 하기 위한 정형화된 task 실행 러너**입니다. Claude lead 가 phase 진행을 주도하고, 독립된 분석 worker — **기본 Claude · Codex** (Gemini 는 옵션으로 명시할 때만 추가) — 와 최종 보고서 작성을 전담하는 report-writer 를 dispatch 합니다.
 설계의 세 가지 원칙:

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 ## 1. Purpose
-`okstra` is a **structured task-execution runner for Claude Code that cross-verifies work with a lead + worker model**. The Claude lead drives phase progression and dispatches three independent analysis workers — **Claude, Codex, Gemini** — plus a dedicated report-writer for the final synthesis.
+`okstra` is a **structured task-execution runner for Claude Code that cross-verifies work with a lead + worker model**. The Claude lead drives phase progression and dispatches independent analysis workers — **Claude and Codex by default** (with **Gemini** available as an opt-in extra) — plus a dedicated report-writer for the final synthesis.
 The design rests on three principles:

package/docs/kr/architecture.md CHANGED Viewed

@@ -17,7 +17,7 @@
 - **Run lifecycle**: 매 실행마다 per-run 디렉터리(`runs/<timestamp>/`)에 prompt snapshot, sessions/, expected-state, final report 템플릿, run manifest, timeline 이벤트를 저장합니다.
 - **Single python authority**: 모든 prepare wiring(profile/workers/model 해소, path 계산, 9개 render, central record_start)이 [`okstra_ctl.run.prepare_task_bundle()`](scripts/okstra_ctl/run.py) 한 함수에 모여 있습니다. `okstra.sh` 와 `okstra-run` skill 은 같은 함수를 호출하는 thin caller 이며, 환경 변수로 상태를 전달하지 않습니다 — task 정체성·경로·workflow 상태는 모두 디스크 권위 파일에서 매번 계산됩니다.
 - **Claude handoff (두 모드)**: (a) `okstra.sh` 가 새 `claude` 프로세스를 띄우는 전통 방식, (b) `okstra-run` skill 이 현재 claude 세션 안에서 prepare 후 lead 역할을 그대로 인계받는 in-session 모드. 둘 다 `prepare_task_bundle` 의 산출물(instruction-set 등)을 그대로 사용합니다.
-- **Required team contract**: `Claude lead` + `Claude worker` · `Codex worker` · `Gemini worker` · `Report writer worker`의 필수 구성과 Agent Teams 우선 시도를 강제합니다.
+- **Required team contract**: `Claude lead` + 기본 worker `Claude worker` · `Codex worker` · `Report writer worker` 와 Agent Teams 우선 시도를 강제합니다. `Gemini worker` 는 옵션 워커로, `--workers` 또는 프로필의 `- Workers:` 섹션에 명시할 때만 포함됩니다.
 - **User-home install + project-local task bundles**: `npx okstra@latest install` 한 명령이 런타임(`~/.okstra/{lib/python, bin, templates}`) + 스킬 마크다운(`~/.claude/skills/<name>/SKILL.md`) 을 모두 깐다. 대상 프로젝트에는 task bundle 과 discovery metadata 가 `.project-docs/okstra/` 아래 저장되고, **추가로 `<PROJECT_ROOT>/.claude/settings.local.json` 이 `~/.okstra/templates/settings.local.json` 을 가리키는 symlink 로 provisioning** 됩니다 (`okstra setup` 또는 `okstra-ctl` prepare 가 idempotent 하게 관리; 기존에 일반 파일이 있었다면 `.bak.<timestamp>` 로 백업 후 교체). 이 symlink 가 host Claude Code 세션에 자동 로드되어 codex/gemini worker wrapper 호출 권한을 부여하므로, 사용자의 글로벌 `~/.claude/settings.json` 은 건드리지 않으며 별도 `--settings` CLI 주입도 필요 없습니다. (개발용으로는 `okstra-install.sh` 가 `--link` 모드 symlink 설치를 제공합니다.)
 - **Resume and clarification**: `--task-key`, `--resume-clarification`, `--clarification-response`로 같은 task 재개와 lead의 추가 질문 응답 흐름을 지원합니다.
 - **Optional integrations**: worker error sidecar, token usage / cost accounting을 옵션으로 제공합니다.
@@ -223,7 +223,7 @@ per-process 환경 변수에 task 정체성·경로·workflow 상태를 보관
 두 모드 모두 동일한 산출물(task-manifest, run-manifest, timeline, instruction-set, central index 등록) 을 만들며, `okstra-ctl` 의 후속 명령(list / show / rerun / reconcile)은 산출물 차이를 알지 못한 채 일관되게 동작합니다.
 - handoff된 메인 Claude는 `Claude lead`로 동작하며 orchestration과 final synthesis를 담당합니다.
-- standard workflow의 required worker role은 `Claude worker`, `Codex worker`, `Gemini worker`, `Report writer worker`입니다.
+- standard workflow의 기본 worker role은 `Claude worker`, `Codex worker`, `Report writer worker`이며, `Gemini worker`는 `--workers` 또는 프로필에서 명시할 때만 포함되는 옵션입니다.
 - worker 역할 분담과 최종 판단은 Claude가 task bundle을 읽고 수행합니다.
 - 사용자 홈에 설치된 okstra Claude assets(`~/.claude/skills`, `~/.claude/agents`) 는 Agent Teams 를 우선 시도하고, 팀 구성이 불가능할 때만 sequential/background fallback 을 사용하도록 Claude 를 유도합니다.
@@ -242,11 +242,11 @@ Claude launch prompt 본문은 항상 `prompts/launch.template.md` 템플릿에
 표준 `okstra` workflow는 아래 팀 계약을 runtime prompt, profile, manifest, skill 문서에 공통으로 반영합니다.
 - 메인 Claude는 항상 `Claude lead`이며 synthesis-only로 동작합니다.
-- required worker role은 `Claude worker`, `Codex worker`, `Gemini worker`, `Report writer worker`입니다.
+- 기본 required worker role은 `Claude worker`, `Codex worker`, `Report writer worker`입니다. `Gemini worker`는 옵션 워커로, `--workers` 또는 프로필의 `- Workers:` 섹션에 명시될 때만 required 로 포함됩니다.
 - `Report writer worker`는 보고서 구조화와 근거 정리에 집중하지만 최종 synthesis owner는 여전히 `Claude lead`입니다.
-- 기본 모델 계약은 중앙 기본값에서 계산합니다. 기본 fallback은 `Claude lead`=`opus`, `Claude worker`=`sonnet`, `Codex worker`=`gpt-5.5`, `Gemini worker`=`auto`이며, `Report writer worker`는 별도 override가 없으면 `Claude lead` 모델을 따릅니다(즉, 기본값에서는 `opus`).
-- `Gemini worker`는 반드시 시도해야 합니다.
-- 최종 판단 전에는 각 required worker role별로 결과 또는 명시적인 terminal status(`completed`, `timeout`, `error`, `not-run`)가 필요합니다.
+- 기본 모델 계약은 중앙 기본값에서 계산합니다. 기본 fallback은 `Claude lead`=`opus`, `Claude worker`=`sonnet`, `Codex worker`=`gpt-5.5`, `Gemini worker`=`auto`(opt-in 시 적용)이며, `Report writer worker`는 별도 override가 없으면 `Claude lead` 모델을 따릅니다(즉, 기본값에서는 `opus`).
+- `Gemini worker`는 옵션이므로 명시 포함된 run에 한해서만 시도 대상이 됩니다.
+- 최종 판단 전에는 현재 run의 worker roster 에 포함된 각 required role별로 결과 또는 명시적인 terminal status(`completed`, `timeout`, `error`, `not-run`)가 필요합니다.
 - 시도된 worker(`completed`, `timeout`, `error`)는 현재 run의 `prompts/` 아래 assigned worker prompt history file을 반드시 가져야 합니다.
 - 이름 없는 generic parallel worker는 required role 대체 수단으로 허용하지 않습니다.
@@ -587,7 +587,7 @@ canonical metadata는 항상 `task-manifest.json`을 기준으로 확인합니
 9. `instruction-set/final-report-template.md`를 읽습니다.
 10. current `manifests/run-manifest-<task-type>-<seq>.json`을 읽습니다.
 11. 필요하면 `history/timeline.json`과 이전 run 결과를 참고합니다.
-12. `Claude lead`로서 required worker `Claude worker`, `Codex worker`, `Gemini worker`, `Report writer worker`를 기준으로 역할을 구성합니다.
+12. `Claude lead`로서 현재 run의 worker roster (기본 `Claude worker`, `Codex worker`, `Report writer worker`; `Gemini worker`는 명시 포함된 경우에만)에 따라 역할을 구성합니다.
 13. 각 selected worker prompt를 assigned worker prompt history path로 현재 run의 `prompts/` 아래에 먼저 저장한 뒤 worker를 dispatch합니다.
 14. 각 required worker에 대해 결과 또는 terminal status를 수집합니다.
 15. brief이 더 구체적인 형식을 강제하지 않으면 `final-report-template.md` 구조로 Markdown 최종 보고서를 작성합니다.
@@ -850,7 +850,7 @@ Claude가 작성하는 최종 보고서는 brief에 더 구체적인 형식이
 - 현재 run 세션의 resume helper는 `runs/<task-type>/sessions/claude-resume-<task-type>-<seq>.sh`에 생성됩니다.
 - run directory 내부는 `manifests/`, `state/`, `prompts/`, `reports/`, `status/`, `sessions/`, `worker-results/`처럼 유형별 하위 폴더로 구성되고, prompt snapshot은 `prompts/` 아래에 먼저 준비됩니다.
 - worker 생성과 결과 취합은 Claude가 수행합니다.
-- standard workflow는 `Claude lead` + required worker `Claude worker`, `Codex worker`, `Gemini worker`, `Report writer worker`를 사용합니다.
+- standard workflow는 `Claude lead` + 기본 worker `Claude worker`, `Codex worker`, `Report writer worker`를 사용하고, `Gemini worker`는 명시할 때만 포함되는 옵션입니다.
 - worker 모델은 `--lead-model`, `--claude-model`, `--codex-model`, `--gemini-model`, `--report-writer-model`로 override할 수 있고, 기본값은 `OKSTRA_DEFAULT_*` 환경 변수에서 중앙 관리합니다. fallback 기본값은 `Claude lead`/`Report writer worker`=`opus`, `Claude worker`=`sonnet`, `Codex worker`=`gpt-5.5`, `Gemini worker`=`auto`입니다.
 - `--task-type implementation` 에서는 Executor 역할을 맡을 provider 를 `--executor <claude|codex|gemini>` (또는 `OKSTRA_DEFAULT_EXECUTOR`, fallback `claude`) 로 선택합니다. Executor 만 프로젝트 파일을 mutate 할 수 있고, 나머지 두 provider 와 자기 자신의 provider 가 모두 별도 CLI 세션으로 verifier 로 dispatch 됩니다 (세션 분리만으로도 self-review 안전장치 유지). Executor 의 모델은 선택된 provider 의 worker 모델 플래그(`--claude-model` / `--codex-model` / `--gemini-model`) 를 그대로 재사용하며, run-manifest 의 `teamContract.executor` 블록에 provider / displayName / workerAgent / model 이 기록됩니다.
 - Executor 별 worktree cwd 주입: codex / gemini executor 는 wrapper(`okstra-codex-exec.sh -C` / `okstra-gemini-exec.sh --include-directories`) 가 CLI layer 에서 cwd 를 worktree 로 고정합니다. Claude executor 는 Bash tool 에 per-call cwd 인자가 없어 cwd 민감 toolchain (`cargo`/`npm`/`pnpm`/`bun`/`pytest`/`make`/`go`) 호출을 같은 Bash invocation 안에서 `cd {{EXECUTOR_WORKTREE_PATH}} && <cmd>` 로 prefix 합니다 — `bash -lc`/`bash -c` 래핑은 금지되며 (`cd` leading token 이 가려져 permission auto-allow 우회 실패), 작업 디렉터리 플래그 (`git -C`, `cargo --manifest-path` 등) 가 있으면 그것을 우선합니다. 자세한 규약은 `prompts/profiles/implementation.md` 의 *Executor Worktree* 블록과 `agents/workers/claude-worker.md` 의 Executor exception 항목 참고.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.16.0",
+  "version": "0.18.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.16.0",
-  "builtAt": "2026-05-13T06:50:20.316Z",
+  "package": "0.18.0",
+  "builtAt": "2026-05-13T11:17:29.529Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/prompts/profiles/final-verification.md CHANGED Viewed

@@ -24,6 +24,7 @@
   - **Residual Risk block** (under section 4): risks that are not blockers but should be tracked, each with mitigation owner and a trigger that would escalate them to a blocker.
   - **Validation Evidence**: for every requirement in the originating plan or task brief, cite the artifact (commit SHA, test output, log line, MCP SELECT result) that demonstrates coverage. Paraphrased "verified" claims without an artifact are rejected.
   - **Read-only command log**: any pre-existing test/validation command executed during this run MUST be listed with its exact command line and exit code. No mutating commands may appear here.
+  - **Two-tier command lookup (shared with `implementation`):** when this phase performs its own independent re-validation, the command source is exactly the same two tiers `implementation` verifiers use — Tier 1 is the originating task brief / approved plan's `validation` set, Tier 2 is `<PROJECT_ROOT>/.project-docs/okstra/project.json` under `qaCommands`. Auto-detecting tools from manifest files is forbidden; missing tiers are recorded as `qa-command not configured: <category>` and do NOT trigger a guess. The `cmd` deny-list (`--fix`, `--write`, ` -w`, ` -u`, `--snapshot-update`, `INSTA_UPDATE=<not-no>`, `cargo update`, `npm install` without `ci`, etc.) is enforced identically. NOTE: runtime fail-fast validation (`okstra_ctl.qa_commands.validate_qa_commands`) only fires at `--task-type implementation` run-prep, so this phase MUST self-check each `qaCommands` entry against the deny-list before executing it — if a denied token is present, skip the command and record it as a `Read-only command log` line `qa-command rejected (denied token: <token>): <label>`.
   - **Routing recommendation**: brief note on the next safe phase (`done`, `error-analysis`, `implementation-planning`) tied to the verdict and blocker list.
 - Clarification request policy (phase-specific addendum — shared policy is in `_common-contract.md`):
   - populate section 5 only when a blocker hinges on information only the user can supply (deployment intent, intended target environment, business-rule interpretation)

package/runtime/prompts/profiles/implementation.md CHANGED Viewed

@@ -17,6 +17,27 @@
 - Team contract (phase-specific overrides — `Claude worker` is replaced by `Executor` + verifier set in this phase):
   - **Executor role:** the `Executor` (bound above) is the **only worker permitted to use Edit / Write / state-mutating Bash commands** on project files. All other workers run read-only. When the executor provider is `codex` or `gemini`, the actual file mutation happens inside the executor CLI's own auto-edit mode (e.g. `codex exec --sandbox workspace-write`, gemini's equivalent) — not through Claude-side Edit/Write tools — but the safety rules in this profile still apply identically.
   - **Verifier roles:** the verifier slots are `Claude verifier` and `Codex verifier`, plus `Gemini verifier` **only when `gemini` is in the resolved `--workers` roster**. Every verifier in the resolved roster is dispatched regardless of which provider holds the executor role; the executor's own provider is run *separately* as a verifier (a fresh CLI session with no shared context) so that no verdict is produced from the same session that wrote the diff. Verifiers MUST NOT call Edit, Write, or any Bash command that mutates files outside the run's artifact directories. If a verifier wants a fix, it records the recommendation in its worker result; it does not apply the fix itself.
+  - **Verifier QA duties (independent re-run mandate):** every verifier acts as a QA gate, not just a diff reviewer. Trusting the executor's reported evidence is forbidden — verifiers MUST reproduce it themselves from the same worktree path the executor used.
+    - **Two-tier command lookup (NO auto-detection):** verifier obtains the QA command set from exactly two declared sources, in order — there is **no fallback to guessing tools from manifest files**.
+      1. **Tier 1 — plan validation set (task-specific):** every command listed under the approved plan's `validation` block (pre / mid / post).
+      2. **Tier 2 — project baseline (`project.json.qaCommands`):** the project's standing QA baseline declared in `<PROJECT_ROOT>/.project-docs/okstra/project.json` under the `qaCommands` key. Schema (each category is an array of `{ "label", "cmd", "language"? }` objects):
+         ```json
+         {
+           "qaCommands": {
+             "lint":      [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
+             "format":    [{ "label": "cargo fmt",    "cmd": "cargo fmt --check",                          "language": "rust" }],
+             "typecheck": [{ "label": "tsc",          "cmd": "pnpm exec tsc --noEmit",                     "language": "ts"   }],
+             "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }]
+           }
+         }
+         ```
+         `language` is optional; when present, verifier MAY skip categories whose `language` is not represented in this run's diff (recorded as `qa-command skipped: <label> (language=<x> not in diff)`). Absent `language` means "always run".
+    - **Execution rule:** Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each command runs in the worktree cwd, and is recorded in the worker result with its exact command line, exit code, and the tail of stdout/stderr. Substituting or paraphrasing a Tier 1 command is forbidden (see Forbidden actions).
+    - **Missing-tier handling:** if a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
+    - **`cmd` field deny-list (Tier 2 validation):** the runtime AND the verifier MUST reject any `cmd` containing tokens that imply mutation: `--fix`, `--write`, ` -w` (gofmt write), ` -u` (jest snapshot update), `--update-snapshots`, `--snapshot-update`, `--update-goldens`, `INSTA_UPDATE=` (with any value other than `no`), `cargo insta accept`, `npm install` (without `ci`), `cargo update`, `pip install -U`, `pnpm add`, `bun add`. Encountering a denied token aborts the verifier run with `contract-violated` and the operator is asked to re-declare the command in check-only form.
+    - **Discrepancy rule:** if the verifier's re-run result differs from what the executor reported (a passing test fails on re-run, a clean lint surfaces warnings, an exit code mismatches), the verifier MUST issue verdict `FAIL` with the divergence cited. `Claude lead` MUST NOT silently prefer the executor's evidence over a verifier's reproduced result during synthesis; if it overrides, it MUST cite a concrete reproduction-time reason (flaky-test commit-cited, environment delta documented) — handwaving is not allowed.
+    - **Read-only command log (per verifier):** the worker result MUST contain a `Read-only command log` block listing every command executed during the verifier run with its exact invocation and exit code, in execution order. No mutating command may appear in this block. This log is copied into the final report's verifier result section verbatim.
+    - **Verifier evidence is independent of executor evidence:** the final report keeps both — executor's `Validation evidence` AND each verifier's `Read-only command log` — so reviewers can compare them line-by-line.
   - Session isolation — not model-variant divergence — is the primary self-review safeguard: each verifier is a separate CLI invocation with its own context window, so reusing the same model variant for executor and same-provider verifier is acceptable. Different model variants (e.g. executor=opus / Claude verifier=sonnet) remain recommended when available.
   - Phase-specific model defaults override the shared defaults: `Claude verifier`=`sonnet`, `Codex verifier`=`gpt-5.5`, `Gemini verifier`=`auto` (only when present in the roster). The `Executor`'s model is taken from the provider-specific worker model corresponding to `--executor`: claude→`--claude-model` (default `sonnet`, override to `opus` recommended when this run's executor is claude), codex→`--codex-model` (default `gpt-5.5`), gemini→`--gemini-model` (default `auto`).
   - **All-verifier-failure policy**: if every verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) ends with a non-result terminal status (`timeout`, `error`, `not-run`) — i.e. zero independent verdicts were produced — the run MUST end with status `blocked` and route to a follow-up `error-analysis` run. `Claude lead` MUST NOT substitute its own verdict in place of the missing verifier outputs; synthesis requires at least one independent verifier's verdict. If one or more verifiers fail but at least one returns a verdict, the run proceeds with the surviving verdict(s) and the final report MUST explicitly notate which verifiers were unavailable, with the captured error / timeout evidence per failed verifier.
@@ -79,6 +100,14 @@
   - dispatching parallel sub-agents beyond the required worker roster
   - silent scope expansion — adding files, dependencies, or features that the approved plan did not list, without recording an `Out-of-plan edits` justification
   - leaving placeholders such as TBD / TODO / "implement later" / "handle edge cases" in committed code
+  - **(verifier-specific)** running lint / formatter auto-fix modes during a verifier's re-run — `eslint --fix`, `prettier --write`, `ruff check --fix`, `rustfmt` (writes by default; verifiers MUST use `cargo fmt --check` or `rustfmt --check`), `gofmt -w`, `black .` (use `black --check`), `isort .` (use `isort --check-only`), or any equivalent rewrite mode
+  - **(verifier-specific)** updating snapshots / golden fixtures during verification — `jest -u` / `--updateSnapshot`, `pytest --snapshot-update`, `INSTA_UPDATE=*` (any value other than `no`), `cargo insta accept`, `--update-goldens`, or any equivalent "make the test agree with current output" flag
+  - **(verifier-specific)** masking test failure with selection or shell tricks during re-run — `-k <expr>` / `--ignore` / `--deselect` to skip subsets, trailing `|| true`, `set +e` followed by a manually softened comparison, redirecting non-zero exit to success. The plan's listed test command MUST run in full
+  - **(verifier-specific)** substituting the plan's validation commands — verifier MUST run the plan's pre/mid/post validation commands verbatim; replacing them with paraphrased or "equivalent" commands is forbidden. Adding supplementary check-only lint/type-check is allowed and is logged separately in the verifier's Read-only command log
+  - **(verifier-specific)** mutating lockfiles or dependency manifests — `npm install <pkg>`, `npm install` (without lockfile freeze; use `npm ci`), `pnpm add`, `bun add`, `cargo add`, `cargo update`, `pip install -U`, or any dependency install that is not lockfile-frozen (`--locked` / `--frozen-lockfile` / `npm ci` / `pip install --require-hashes`)
+  - **(verifier-specific)** git state mutations — `git add`, `git commit`, `git stash`, `git checkout -- <file>`, `git restore`, `git reset`, `git rebase`, `git merge`, branch creation/deletion, tag creation. Only read-only git queries (`git status`, `git diff`, `git log`, `git show`, `git rev-parse`, `git blame`) are permitted for verifiers
+  - **(verifier-specific)** running integration / end-to-end tests that produce non-local side effects (DB writes against a non-local datastore, external API writes, docker compose against a non-isolated environment) unless that exact command is listed in the approved plan's validation set
+  - **(verifier-specific)** redirecting tool caches or output to paths outside the worktree — e.g. setting `CARGO_TARGET_DIR`, `PYTEST_CACHE_DIR`, `NODE_OPTIONS=--require=<external>`, or any env var that causes the verifier's command to write outside the worktree's normal build artifact paths
 - Required deliverable shape (final report, in addition to the standard sections):
   - **Plan link & approval evidence**: path to the approved `final-report.md` and the exact quoted approval marker
   - **Commit list**: each commit's SHA (or short SHA), message, and the plan step it satisfies
@@ -86,7 +115,14 @@
   - **Out-of-plan edits block**: every file edited that was not in the approved plan's file list, with rationale (empty block is acceptable and preferred)
   - **Validation evidence**: actual command output (stdout/stderr) for every `pre / mid / post` validation command from the plan. Truncated output is acceptable but the command line and exit code MUST be exact. No paraphrasing of test results.
   - **TDD evidence (when applicable)**: for steps that should be TDD-ordered, show the failing-test output BEFORE the implementation commit and the passing-test output AFTER, with commit SHAs framing the transition.
-  - **Verifier results**: a section per verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) containing their independent verdict (PASS / CONCERNS / FAIL), their cited diff snippets, and any fix recommendations they declined to apply. `Claude lead` synthesises a unified verdict but MUST preserve dissent — do not collapse opinions into one paragraph.
+  - **Verifier results**: a section per verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) containing:
+    - their independent verdict (PASS / CONCERNS / FAIL),
+    - cited diff snippets supporting the verdict,
+    - the verifier's `Read-only command log` (every command they ran with exact invocation and exit code, in execution order — copied verbatim from the worker result),
+    - **independent validation re-run results** — per plan-validation command: command line, exit code, and tail of output captured by the verifier (not the executor); any divergence from the executor's reported result MUST be called out as a `Discrepancy` line citing both sides,
+    - **style / lint / type-check results** — each check-only tool the verifier ran, its exit code, and the count of new findings attributable to lines this run introduced. When no tool is configured for a touched language, record the single line `no lint/style tool configured for <language>`,
+    - any fix recommendations the verifier declined to apply.
+    `Claude lead` synthesises a unified verdict but MUST preserve dissent — do not collapse opinions into one paragraph. If any verifier issued `FAIL` on a `Discrepancy` line, the synthesised verdict MUST be `FAIL` unless lead cites a concrete reproduction-time reason (committed flaky-test record, documented environment delta) for overriding.
   - **Rollback verification**: confirmation that the plan's rollback path is still valid after the changes. Strength of verification depends on the change category:
     - **Pure code changes** (no persisted state, no infra mutation): a reachable revert SHA is sufficient. Record the exact `git revert <SHA>` command that would undo the change, and confirm `git rev-parse <SHA>` resolves.
     - **Feature-flag-gated changes**: confirm the off-switch path was exercised in this run's validation evidence (i.e. one of the validation commands ran with the flag off and succeeded). A plan that ships a flag without exercising the off-path does NOT satisfy this requirement.

package/runtime/python/okstra_ctl/__init__.py CHANGED Viewed

@@ -28,6 +28,14 @@ from .project_meta import (
     load_project_meta,
     upsert_project_meta,
 )
+from .qa_commands import (
+    ALLOWED_CATEGORIES,
+    QaCommandsError,
+    find_denied_tokens,
+    format_errors,
+    validate_qa_cmd,
+    validate_qa_commands,
+)
 from .index import (
     _replace_or_append_active_row,
     _replace_or_append_project_row,

package/runtime/python/okstra_ctl/qa_commands.py ADDED Viewed

@@ -0,0 +1,165 @@
+"""`project.json.qaCommands` 검증 헬퍼.
+implementation phase 의 verifier QA gate 는 plan 의 `validation` 셋(Tier 1) 과
+project-wide baseline 인 `qaCommands`(Tier 2) 를 함께 실행한다. Tier 2 는
+사용자가 `project.json` 에 직접 선언하며, mutation 을 유발하는 토큰이 포함된
+명령을 미리 차단해야 verifier 가 read-only 계약을 깨지 않는다.
+본 모듈은 두 가지 책임을 갖는다.
+1. `cmd` 문자열에서 mutation 유발 토큰을 검출 (`validate_qa_cmd`).
+2. `qaCommands` 블록 전체를 순회하며 모든 위반을 모은다 (`validate_qa_commands`).
+본 모듈은 런타임 검증에만 사용된다. 런타임 외 (verifier 가 실제 실행 단계에서
+self-enforce 하는 측면) 의 계약은 `prompts/profiles/implementation.md` 의
+"Two-tier command lookup" 단락에 명문화돼 있다.
+"""
+from __future__ import annotations
+import re
+from typing import Iterable
+# 카테고리 화이트리스트. 알 수 없는 카테고리는 오타 가능성이 높으므로 거부.
+ALLOWED_CATEGORIES: tuple[str, ...] = ("lint", "format", "typecheck", "test")
+# Mutation 을 유발하거나 lockfile 을 갱신하는 토큰. 각 토큰은 `cmd` 문자열을
+# 공백으로 단순 분해한 결과 또는 부분 일치 패턴(prefix/suffix sensitive) 로 검출한다.
+# 새로운 도구를 추가할 때마다 한 줄씩 늘려가는 것이 정상 — 정규식 흑마법 금지.
+_DENIED_LITERAL_TOKENS: tuple[str, ...] = (
+    "--fix",
+    "--write",
+    "-w",  # gofmt -w, prettier -w
+    "-u",  # jest -u
+    "--updateSnapshot",
+    "--update-snapshot",
+    "--snapshot-update",
+    "--update-goldens",
+    "--update-golden",
+)
+# 공백 분해로는 잡기 어려운 패턴 (substring 검사로 잡는다).
+_DENIED_SUBSTRINGS: tuple[str, ...] = (
+    "cargo insta accept",
+    "cargo update",
+    "pip install -U",
+    "pip install --upgrade",
+    "pnpm add",
+    "bun add",
+    "cargo add",
+)
+def _has_npm_install_without_ci(cmd: str) -> bool:
+    """`npm install` 은 lockfile mutation 위험이라 거부, `npm ci` 는 허용.
+    부분 문자열 매칭에서 `npm install` 이 잡히면, 그 뒤에 오는 토큰 시퀀스가
+    `npm ci` 의 변종이 아닌 한 항상 거부.
+    """
+    # 단순화: 정확히 `npm install` (또는 `npm i`) 가 등장하는지 검사. `ci` 는 별개
+    # 서브커맨드라 `npm ci` 는 이 정규식에 걸리지 않는다.
+    return re.search(r"\bnpm\s+(install|i)\b", cmd) is not None
+def _has_insta_update_set(cmd: str) -> bool:
+    """`INSTA_UPDATE=<value>` 에서 value 가 `no` 가 아닌 경우 거부."""
+    match = re.search(r"\bINSTA_UPDATE=([A-Za-z0-9_-]+)", cmd)
+    if match is None:
+        return False
+    return match.group(1).lower() != "no"
+def find_denied_tokens(cmd: str) -> list[str]:
+    """`cmd` 안에 포함된 모든 denied 토큰 목록을 반환. 비어 있으면 안전."""
+    if not isinstance(cmd, str):
+        return ["<not-a-string>"]
+    found: list[str] = []
+    tokens = cmd.split()
+    for tok in _DENIED_LITERAL_TOKENS:
+        if tok in tokens:
+            found.append(tok)
+    for sub in _DENIED_SUBSTRINGS:
+        if sub in cmd:
+            found.append(sub)
+    if _has_npm_install_without_ci(cmd):
+        found.append("npm install (use 'npm ci' instead)")
+    if _has_insta_update_set(cmd):
+        found.append("INSTA_UPDATE=<not-no>")
+    return found
+class QaCommandsError(ValueError):
+    """`qaCommands` 블록이 계약을 어긴 경우 발생."""
+def validate_qa_cmd(cmd: str, *, label: str = "<unnamed>", category: str = "<uncategorised>") -> None:
+    """단일 `cmd` 문자열을 검사. 위반이 있으면 `QaCommandsError`.
+    `label` / `category` 는 에러 메시지를 사람이 읽을 수 있게 하는 데만 쓰인다.
+    """
+    denied = find_denied_tokens(cmd)
+    if denied:
+        joined = ", ".join(denied)
+        raise QaCommandsError(
+            f"qaCommands.{category}[{label!r}] contains mutation token(s): {joined}. "
+            f"Re-declare in check-only form."
+        )
+def validate_qa_commands(qa: object) -> list[str]:
+    """`qaCommands` 블록 전체를 검증. 위반 메시지 리스트를 반환 (비면 안전).
+    런타임이 fail-fast 하려면 반환값이 비어있지 않을 때 `PrepareError` 로 승격.
+    여기서는 raise 하지 않고 메시지를 모아서 호출자가 일괄 보고할 수 있게 한다.
+    """
+    errors: list[str] = []
+    if qa is None:
+        return errors  # 옵션 필드 — 미선언은 합법.
+    if not isinstance(qa, dict):
+        return [f"qaCommands must be an object, got {type(qa).__name__}"]
+    for category, entries in qa.items():
+        if category not in ALLOWED_CATEGORIES:
+            errors.append(
+                f"qaCommands.{category}: unknown category "
+                f"(allowed: {', '.join(ALLOWED_CATEGORIES)})"
+            )
+            continue
+        if not isinstance(entries, list):
+            errors.append(
+                f"qaCommands.{category} must be an array, got {type(entries).__name__}"
+            )
+            continue
+        for idx, entry in enumerate(entries):
+            if not isinstance(entry, dict):
+                errors.append(
+                    f"qaCommands.{category}[{idx}] must be an object, got {type(entry).__name__}"
+                )
+                continue
+            label = entry.get("label")
+            cmd = entry.get("cmd")
+            if not isinstance(label, str) or not label.strip():
+                errors.append(
+                    f"qaCommands.{category}[{idx}].label must be a non-empty string"
+                )
+            if not isinstance(cmd, str) or not cmd.strip():
+                errors.append(
+                    f"qaCommands.{category}[{idx}].cmd must be a non-empty string"
+                )
+                continue
+            denied = find_denied_tokens(cmd)
+            if denied:
+                pretty_label = label if isinstance(label, str) else f"index {idx}"
+                errors.append(
+                    f"qaCommands.{category}[{pretty_label!r}] contains mutation token(s): "
+                    f"{', '.join(denied)}. Re-declare in check-only form."
+                )
+    return errors
+def format_errors(errors: Iterable[str]) -> str:
+    """`PrepareError` 등에 그대로 박을 수 있는 멀티라인 문자열."""
+    lines = list(errors)
+    if not lines:
+        return ""
+    head = "qaCommands validation failed:"
+    body = "\n".join(f"  - {line}" for line in lines)
+    return f"{head}\n{body}"

package/runtime/python/okstra_ctl/run.py CHANGED Viewed

@@ -25,6 +25,7 @@ from datetime import datetime, timezone
 from pathlib import Path
 from okstra_project import upsert_project_json
+from .qa_commands import format_errors as _format_qa_errors, validate_qa_commands
 from .material import (
     build_analysis_material,
     related_tasks_bullets,
@@ -456,6 +457,22 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
         # is preserved by the `: {exc}` suffix and the `raise ... from exc`.
         raise PrepareError(f"project.json upsert failed for {project_root}: {exc}") from exc
+    # `qaCommands` 는 implementation phase verifier 의 QA gate baseline 으로만
+    # 쓰이므로 검증도 implementation 진입 시에만 수행한다. 다른 task-type 에서는
+    # 잘못된 선언이 있어도 동작에 영향이 없어 fail-fast 할 이유가 없다.
+    if inp.task_type == "implementation":
+        project_json_path = Path(project_root) / ".project-docs" / "okstra" / "project.json"
+        if project_json_path.is_file():
+            try:
+                project_meta = json.loads(project_json_path.read_text())
+            except (OSError, json.JSONDecodeError) as exc:
+                raise PrepareError(
+                    f"project.json read failed at {project_json_path}: {exc}"
+                ) from exc
+            qa_errors = validate_qa_commands(project_meta.get("qaCommands"))
+            if qa_errors:
+                raise PrepareError(_format_qa_errors(qa_errors))
     # ---- workers resolution ----
     # release-handoff is intentionally single-lead (no worker dispatch, no
     # TeamCreate, no convergence). The profile has no `- Required workers:`

package/runtime/python/okstra_ctl/workers.py CHANGED Viewed

@@ -9,6 +9,7 @@ from __future__ import annotations
 from pathlib import Path
 ALLOWED_WORKERS = ["claude", "codex", "gemini", "report-writer"]
+DEFAULT_WORKERS = ["claude", "codex", "report-writer"]
 PROFILE_BULLET_HEADERS = {
     "- Workers:",
     "- Required workers:",
@@ -51,11 +52,11 @@ def normalize_workers(value: str) -> list[str]:
     """CSV 입력을 정규화한다.
     - 공백 strip, 소문자화, 중복 제거(첫 출현 우선).
-    - 빈 입력이면 `ALLOWED_WORKERS` 전체를 default 로 사용.
+    - 빈 입력이면 `DEFAULT_WORKERS` 를 default 로 사용 (gemini 제외).
     - 허용 외 worker 가 포함되면 `WorkersError`.
     """
     items = [v.strip().lower() for v in (value or "").split(",") if v.strip()]
-    source = items or ALLOWED_WORKERS
+    source = items or DEFAULT_WORKERS
     unknown = [v for v in source if v not in ALLOWED_WORKERS]
     if unknown:
         raise WorkersError(f"unknown workers: {','.join(unknown)}")

package/runtime/skills/okstra-setup/SKILL.md CHANGED Viewed

@@ -130,6 +130,54 @@ field → built-in default. Only edit when defaults don't cover the
 project's working files (e.g. additional cache or local-config dirs
 that must follow the executor into the worktree).
+## Step 4.7 (optional but recommended): declare project QA commands
+`implementation`-phase verifiers run an independent QA gate over the
+executor's diff and need a project-wide baseline of check-only
+lint / format / typecheck / test commands. okstra does NOT auto-detect
+tooling from manifest files — declare the commands explicitly in
+`project.json` under `qaCommands`. Skipping this declaration is
+allowed but the verifier will then only run the plan's per-task
+`validation` set, with `qa-command not configured: <category>`
+recorded per missing category in the final report.
+Each category is an array of `{ "label", "cmd", "language"? }`
+objects. `language` is optional; when present the verifier MAY skip
+commands whose language is not represented in this run's diff.
+```json
+{
+  "projectId": "...",
+  "projectRoot": "...",
+  "qaCommands": {
+    "lint":      [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
+    "format":    [{ "label": "cargo fmt",    "cmd": "cargo fmt --check",                          "language": "rust" }],
+    "typecheck": [{ "label": "tsc",          "cmd": "pnpm exec tsc --noEmit",                     "language": "ts"   }],
+    "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }]
+  }
+}
+```
+**`cmd` deny-list (mutation guard):** the verifier rejects any `cmd`
+containing tokens that imply mutation — declare commands in their
+check-only form only. Denied tokens include:
+- `--fix` (eslint, ruff), `--write` (prettier), ` -w` (gofmt),
+- ` -u` / `--updateSnapshot` / `--snapshot-update` / `--update-goldens`,
+- `INSTA_UPDATE=` with any value other than `no`,
+- `cargo insta accept`,
+- `npm install` (use `npm ci`), `cargo update`, `pip install -U`,
+- `pnpm add`, `bun add`, `cargo add`.
+Encountering a denied token aborts the verifier with status
+`contract-violated`; re-declare the command in check-only form to
+recover (e.g. swap `prettier --write` → `prettier --check`).
+The field is preserved across the runtime's auto-upserts of
+`project.json` — only `projectId`, `projectRoot`, `createdAt`,
+`updatedAt` are runtime-owned, so manual edits to `qaCommands`
+survive every subsequent `okstra setup` / `okstra run` invocation.
 ## Step 4.6 (automatic): project-local Claude settings symlink
 `okstra setup` (and `okstra run` on its first invocation per project)