okstra 0.36.0 → 0.36.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/README.kr.md +3 -5
  2. package/README.md +3 -5
  3. package/docs/project-structure-overview.md +2 -7
  4. package/docs/superpowers/plans/2026-05-24-implementation-lead-context-slimming.md +1700 -0
  5. package/package.json +1 -1
  6. package/runtime/BUILD.json +2 -2
  7. package/runtime/agents/SKILL.md +18 -5
  8. package/runtime/agents/workers/claude-worker.md +5 -6
  9. package/runtime/agents/workers/codex-worker.md +10 -9
  10. package/runtime/agents/workers/gemini-worker.md +7 -6
  11. package/runtime/agents/workers/report-writer-worker.md +13 -11
  12. package/runtime/prompts/launch.template.md +1 -0
  13. package/runtime/prompts/profiles/_implementation-deliverable.md +53 -0
  14. package/runtime/prompts/profiles/_implementation-executor.md +60 -0
  15. package/runtime/prompts/profiles/_implementation-verifier.md +76 -0
  16. package/runtime/prompts/profiles/implementation.md +27 -134
  17. package/runtime/python/okstra_ctl/paths.py +3 -0
  18. package/runtime/python/okstra_ctl/render.py +19 -5
  19. package/runtime/python/okstra_ctl/run.py +7 -1
  20. package/runtime/python/okstra_ctl/session.py +65 -7
  21. package/runtime/skills/okstra-brief/SKILL.md +2 -211
  22. package/runtime/skills/okstra-inspect/SKILL.md +581 -0
  23. package/runtime/skills/okstra-run/SKILL.md +3 -3
  24. package/runtime/skills/okstra-schedule/SKILL.md +10 -153
  25. package/runtime/skills/okstra-setup/SKILL.md +1 -1
  26. package/runtime/skills/okstra-team-contract/SKILL.md +15 -106
  27. package/runtime/templates/reports/brief.template.md +204 -0
  28. package/runtime/templates/reports/schedule.template.md +12 -3
  29. package/runtime/templates/worker-prompt-preamble.md +108 -0
  30. package/src/uninstall.mjs +7 -3
  31. package/runtime/prompts/profiles/kr/_common-contract.md +0 -92
  32. package/runtime/prompts/profiles/kr/error-analysis.md +0 -36
  33. package/runtime/prompts/profiles/kr/final-verification.md +0 -48
  34. package/runtime/prompts/profiles/kr/implementation-planning.md +0 -90
  35. package/runtime/prompts/profiles/kr/implementation.md +0 -144
  36. package/runtime/prompts/profiles/kr/improvement-discovery.md +0 -42
  37. package/runtime/prompts/profiles/kr/release-handoff.md +0 -104
  38. package/runtime/prompts/profiles/kr/requirements-discovery.md +0 -42
  39. package/runtime/skills/okstra-history/SKILL.md +0 -165
  40. package/runtime/skills/okstra-logs/SKILL.md +0 -173
  41. package/runtime/skills/okstra-report-finder/SKILL.md +0 -111
  42. package/runtime/skills/okstra-status/SKILL.md +0 -246
  43. package/runtime/skills/okstra-time-summary/SKILL.md +0 -172
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "okstra",
3
- "version": "0.36.0",
3
+ "version": "0.36.1",
4
4
  "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
5
5
  "license": "MIT",
6
6
  "author": "devonshin",
@@ -1,5 +1,5 @@
1
1
  {
2
- "package": "0.36.0",
3
- "builtAt": "2026-05-23T15:38:53.956Z",
2
+ "package": "0.36.1",
3
+ "builtAt": "2026-05-24T11:52:05.811Z",
4
4
  "repoRoot": "/home/runner/work/okstra/okstra"
5
5
  }
@@ -27,11 +27,8 @@ This SKILL.md is the operating contract and phase index. Detailed procedures liv
27
27
  | [okstra-team-contract](./skills/okstra-team-contract/SKILL.md) | Phase 2–5 worker roster, model assignment rules, prompt composition (anchor headers, `[Required reading]`, `[Error reporting]`), worker output contract, terminal statuses, usage tracking |
28
28
  | [okstra-convergence](./skills/okstra-convergence/SKILL.md) | Phase 5.5 finding convergence loop, finding categories, reverify dispatch (anchor headers, required-reading suppression), convergence state schema, **plus Phase 6 plan-body verification mode (implementation-planning only)** |
29
29
  | [okstra-report-writer](./skills/okstra-report-writer/SKILL.md) | Phase 6 final-report authorship, dispatch template, resume-safe dispatch, shared-graph integrity check, Phase 7 token-usage collector |
30
- | [okstra-status](./skills/okstra-status/SKILL.md) | Project/task status views and `workStatus` updates |
31
- | [okstra-history](./skills/okstra-history/SKILL.md) | Past-run history, re-run command assembly, resume helper |
32
- | [okstra-report-finder](./skills/okstra-report-finder/SKILL.md) | Locate the final report for a given task key |
30
+ | [okstra-inspect](./skills/okstra-inspect/SKILL.md) | Unified read-side skill sub-commands `status` (workStatus updates), `history` (past runs, re-run, resume), `report` (find/read final-report), `time` (per-task/per-worker elapsed-time breakdown), `logs` (wrapper log sidecar inventory + cleanup suggestions) |
33
31
  | [okstra-schedule](./skills/okstra-schedule/SKILL.md) | Generate a consolidated work schedule for a task-group |
34
- | [okstra-time-summary](./skills/okstra-time-summary/SKILL.md) | Per-task / per-worker elapsed-time breakdown |
35
32
 
36
33
  ## Quick Reference
37
34
 
@@ -168,6 +165,22 @@ After context-loader completes, read **only the five mandatory files below** in
168
165
  - `instruction-set/final-report-template.md` — never read by Lead. The Report writer worker reads it as part of its own [Required reading]; Lead only references its path when dispatching.
169
166
  - `history/timeline.json` — read only on user request or when carry-in resolution requires it.
170
167
 
168
+ **Implementation profile lazy reading discipline (BLOCKING — applies only when `task_type == "implementation"`):**
169
+
170
+ The `implementation` profile's thin core (`prompts/profiles/implementation.md`) is intentionally minimal so the Phase 1 baseline stays small. Three sidecar files carry the bulk of the rules and MUST be read at the listed phase — do NOT pre-load them at Phase 1.
171
+
172
+ | Sidecar | Read at | Owned by |
173
+ |---------|---------|----------|
174
+ | `prompts/profiles/_implementation-executor.md` | **Phase 5**, after Stage Map parse, BEFORE issuing the Executor's first `Edit` / `Write` | Executor role binding, Pre-implementation context exploration, TDD loop, Stage execution contract, allowed actions, commit-message format |
175
+ | `prompts/profiles/_implementation-verifier.md` | **Phase 5**, between Executor stage completion and the first verifier dispatch | Verifier roles, Two-tier command lookup, deny-list, discrepancy rule, Read-only command log, verifier-specific forbidden actions |
176
+ | `prompts/profiles/_implementation-deliverable.md` | **Phase 6**, after Phase 5.5 convergence completes, BEFORE constructing the report-writer dispatch prompt | Required deliverable shape, Validation / TDD evidence rules, Verifier results structure, Self-review pass, Lead post-stage persistence |
177
+
178
+ **Entry guard (BLOCKING).** Before transitioning into Phase 5 or Phase 6 for an `implementation` run, lead MUST emit a single Read tool call for the sidecar(s) above whose `Read at` matches the entering phase. If lead enters the phase without that Read on record (visible in the lead session jsonl), phase 진입 거부 — lead writes a `contract-violation` to the run-level errors log with `--message "implementation-sidecar-not-loaded"` and stops. Re-entry requires the sidecar Read first.
179
+
180
+ The guard is not satisfied by remembering content from a prior run — each implementation run reads the sidecar fresh, because the sidecars are part of the runtime shipped via `okstra install` and may have been updated between runs.
181
+
182
+ This pattern is implementation-only. Other profiles (`requirements-discovery`, `error-analysis`, `implementation-planning`, `final-verification`, `release-handoff`) load their whole profile body at Phase 1 as before — they are short enough not to benefit from a split.
183
+
171
184
  Extract from the five mandatory files: task key, task type, work category, workflow lifecycle snapshot, selected worker roster, assigned models, worker result paths, worker prompt history paths, current run prompt directory, final report path, final status path, validator path, resume helper path, config-file references, deployment-manifest references, and their expected values or invariants.
172
185
 
173
186
  If previous run reports exist, use as historical context only. If discovery metadata or current artifacts conflict with a newer user instruction, prefer the user instruction. If `reference-expectations.md` explicitly says expectations were not provided (you can confirm this without reading the file if the brief's "Expected state" section is empty), treat that as missing information and say `I don't know` rather than inventing expected states.
@@ -299,7 +312,7 @@ Lead's responsibilities in this sub-step (in order):
299
312
 
300
313
  If `convergence.planBodyVerification.enabled == false` (set by `--no-plan-verification` or by `okstra config set plan-verification off`), the entire sub-step is skipped and the top-of-report Approval marker is rendered unconditionally (legacy behaviour). This opt-out is intended for fast iteration only and is not recommended for handoff-ready plans.
301
314
 
302
- `/okstra-status` exposes the sub-step's state as a `planVerification` sub-field of the implementation-planning phase, not as a separate lifecycle phase identifier.
315
+ The `okstra-inspect status` sub-command exposes the sub-step's state as a `planVerification` sub-field of the implementation-planning phase, not as a separate lifecycle phase identifier.
303
316
 
304
317
  ## Phase 7: Artifact persistence and validator handoff
305
318
 
@@ -56,13 +56,12 @@ Unlike the Codex / Gemini workers, you are an in-process Claude subagent — you
56
56
 
57
57
  ## Required Reading Before Any Analysis
58
58
 
59
- Before producing any output, you MUST read every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. For analysis workers this includes the task brief, analysis profile, analysis material (if present), reference expectations, and the carry-in clarification response (if present). Analysis workers do NOT read `final-report-template.md` — that file is for the Report writer worker only (see `okstra-team-contract` "Audience-scoped enumeration"). Producing findings without the template is the intended contract; the report writer in Phase 6 owns final-report structure.
59
+ Before producing any output, you MUST:
60
60
 
61
- - Use a single `Read` call per file with no `offset` and no `limit`. If a file is genuinely too large for one read, page through it with explicit `offset` / `limit` calls that together cover the entire file, and record the page boundaries in your Findings.
62
- - For the carry-in clarification response, walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank a blank `User input` with `Status=open` is itself a signal you must surface, not skip. Skimming these rows is the most common failure mode here; the fact that the file you will eventually contribute to has a structurally similar section 5 is NOT a license to skim.
63
- - Before listing any Findings, write a Reading Confirmation block to your **audit sidecar** at `runs/<task-type>/worker-results/claude-worker-audit-<task-type>-<seq>.md` (sibling to your main worker-results file — substitute `claude-worker-<task-type>-<seq>.md` → `claude-worker-audit-<task-type>-<seq>.md`). The sidecar's body begins with `# Claude Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). Do NOT include a `## 0. Reading Confirmation` heading in the main worker-results file — the validator now fails worker-results that contain one. If you cannot truthfully confirm a file end-to-end, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
64
- - **Heartbeat — write the audit sidecar EARLY and APPEND per stage (BLOCKING).** Because this worker runs as an in-process Agent or a fresh-session tmux pane, the lead has no `BashOutput`-style liveness signal while waiting for your return. The audit sidecar is the only signal that survives a silent hang. Write the sidecar immediately after extracting `Project Root` and the assigned paths — BEFORE the per-file end-to-end reads — with just the heading line (`# Claude Worker Audit — <task-key>`) and one `- PROGRESS: started <ISO-8601-UTC>` line. Then APPEND one short progress line per stage as you advance: `read-<filename>`, `analysis-start`, `findings-draft-start`, `findings-draft-complete`, `write-result-start`. Each line: `- PROGRESS: <stage> <ISO-8601-UTC>`. The append cadence MUST NOT exceed 5 minutes — if a single analysis stage is taking longer, emit a `- PROGRESS: in-stage:<stage> <ISO-8601-UTC>` heartbeat. A 5-minute stale sidecar mtime is the canonical "this worker has hung" signal for the operator (the lead is blocked on the Agent call and cannot detect this itself, but a human watching via `tail -F <audit-sidecar>` from another terminal can). Sidecar write/append uses `Write` (for the initial creation) and `Edit` / heredoc `>>` for the per-stage append — heredoc append is the lighter option once the file exists.
65
- - Do not skip a file because its name suggests its content is already familiar from a prior run. Each file is canonical for the current run only.
61
+ 1. Extract the absolute path from the lead's `**Worker Preamble Path:**` anchor header and Read that file end-to-end with a single `Read` call (no `offset`, no `limit`). This is the canonical SSOT for the Required Reading + Error Reporting + Output sections contract.
62
+ 2. Read every input file the lead enumerated under `## Inputs` (or equivalent heading) in the dispatch prompt body, end-to-end, following the rules stated in the preamble. For analysis workers this is task-brief + analysis-profile + analysis-material (if present) + reference-expectations + clarification-response (if carry-in). Analysis workers do NOT read `final-report-template.md` that file is for the report writer only.
63
+
64
+ **Heartbeat — write the audit sidecar EARLY and APPEND per stage (BLOCKING).** Because this worker runs as an in-process Agent or a fresh-session tmux pane, the lead has no `BashOutput`-style liveness signal while waiting for your return. The audit sidecar is the only signal that survives a silent hang. Write the sidecar at `runs/<task-type>/worker-results/claude-worker-audit-<task-type>-<seq>.md` immediately after extracting `Project Root` and the assigned paths — BEFORE the per-file end-to-end reads — with just the heading line (`# Claude Worker Audit — <task-key>`) and one `- PROGRESS: started <ISO-8601-UTC>` line. Then APPEND one short progress line per stage as you advance: `read-<filename>`, `analysis-start`, `findings-draft-start`, `findings-draft-complete`, `write-result-start`. The append cadence MUST NOT exceed 5 minutes — if a single analysis stage is taking longer, emit a `- PROGRESS: in-stage:<stage> <ISO-8601-UTC>` heartbeat. A 5-minute stale sidecar mtime is the canonical "this worker has hung" signal for the operator. Sidecar write/append uses `Write` (initial) and `Edit` / heredoc `>>` (per-stage append).
66
65
 
67
66
  ## Worker Output Structure
68
67
 
@@ -39,7 +39,7 @@ The wrapper internally runs:
39
39
  codex exec -C "<project-root>" [--add-dir "<worktree-path>"] --model "<model>" --sandbox workspace-write - < "<prompt-path>" 2>/dev/null
40
40
  ```
41
41
 
42
- The wrapper exists because Claude Code's Bash permission matcher rejects simple-prefix matches when the command contains stdin/stderr redirects. Calling `codex exec ... - < <path> 2>/dev/null` directly triggers a permission prompt every dispatch even when `Bash(codex exec:*)` is allowlisted. The wrapper folds the redirects inside, so the harness sees a single non-redirect command that matches `Bash($HOME/.okstra/bin/okstra-codex-exec.sh:*)`.
42
+ The wrapper exists because Claude Code's Bash permission matcher rejects simple-prefix matches when the command contains stdin/stderr redirects. Calling `codex exec ... < <path> 2>/dev/null` directly triggers a permission prompt every dispatch even when `Bash(codex exec:*)` is allowlisted. The wrapper folds the redirects inside, so the harness sees a single non-redirect command that matches `Bash($HOME/.okstra/bin/okstra-codex-exec.sh:*)`.
43
43
 
44
44
  **Do NOT use** non-existent flags like `-q` or `-a never`. **Do NOT** invoke `codex exec ... < ... 2>/dev/null` directly — always go through the wrapper.
45
45
 
@@ -68,7 +68,7 @@ The wrapper exists because Claude Code's Bash permission matcher rejects simple-
68
68
  6. Extract the assigned model execution value for `Codex worker`.
69
69
  - First, look for a `**Model:** Codex worker, <execution-value>` line in the lead prompt and use `<execution-value>`.
70
70
  - If only a display model is listed, look up the canonical execution value from the referenced task bundle metadata (`task-manifest.json` → `resultContract.requiredWorkerRoles[]` for the codex role).
71
- - If neither is available, immediately return `CODEX_MODEL_MISSING: assigned Codex model execution value was not provided`. Do NOT fall back to training-data defaults — historical codex defaults like `o4-mini` are NOT acceptable substitutes for the assigned model. Returning the sentinel is the correct behavior; the lead is responsible for fixing its prompt and redispatching.
71
+ - If no assigned model execution value can be determined, immediately return `CODEX_MODEL_MISSING: assigned Codex model execution value was not provided`. Do NOT fall back to training-data defaults — historical Codex defaults like `o4-mini` are NOT acceptable substitutes for the assigned model. Returning the sentinel is the correct behavior; the lead is responsible for fixing its prompt and redispatching.
72
72
  - This rule applies equally to convergence reverify rounds. The reverify prompt MUST carry the same `**Model:**` line as the initial run (see `okstra-convergence` skill, "Required reverify-prompt anchor headers"). If the line is absent in a reverify prompt, return `CODEX_MODEL_MISSING` rather than guessing.
73
73
 
74
74
  7. If installed, dispatch the wrapper as a **background** Bash command and poll for completion. The two-minute foreground Bash timeout is insufficient for implementation-phase Codex runs and forced workers into ad-hoc background dispatch with lost output. The polling contract below is the formal replacement.
@@ -77,7 +77,7 @@ The wrapper exists because Claude Code's Bash permission matcher rejects simple-
77
77
  ```bash
78
78
  $HOME/.okstra/bin/okstra-codex-exec.sh "<absolute-project-root>" "<assigned-model-execution-value>" "<absolute-prompt-history-path>" "<absolute-worktree-path>" "worker"
79
79
  ```
80
- Call `Bash` with `run_in_background: true`. Capture the returned `bash_id` (a.k.a. `shell_id`). Pass the positional arguments verbatim — do NOT use environment variables, `cd`, `&&` chains, or pipes from `cat`. Substitute the literal extracted Project Root, model execution value, prompt-history path, and worktree path. The fourth argument is **mandatory for implementation phase** (extract from `EXECUTOR_WORKTREE_PATH` in the lead prompt's run context or the `**Worktree:**` / `cwd for every mutating command:` line) and **may be omitted only for non-implementation analysis phases** that do not mutate the worktree. Omitting it during implementation will cause every Edit/Write to fail with EPERM. The wrapper handles `-C`, `--add-dir`, `--model`, `--sandbox workspace-write`, the stdin redirect from the prompt file, and stderr suppression internally. Calling `codex exec` directly (without the wrapper) is an error in this skill: the redirect tokens disqualify the prefix match against `Bash(codex exec:*)` and produce a permission prompt every dispatch.
80
+ Call `Bash` with `run_in_background: true`. Capture the returned `bash_id` (a.k.a. `shell_id`). Pass the positional arguments verbatim — do NOT use environment variables, `cd`, `&&` chains, or pipes from `cat`. Substitute the literal extracted Project Root, model execution value, prompt-history path, and worktree path. The fourth argument is **mandatory for implementation phase** (extract from `EXECUTOR_WORKTREE_PATH` in the lead prompt's run context or the `**Worktree:**` / `cwd for every mutating command:` line) and **may be omitted only for non-implementation analysis phases** that do not mutate the worktree. The wrapper handles `-C`, `--add-dir`, `--model`, `--sandbox workspace-write`, the stdin redirect from the prompt file, and stderr suppression internally. Calling `codex exec` directly (without the wrapper) is an error in this skill: the redirect tokens disqualify the prefix match against `Bash(codex exec:*)` and produce a permission prompt every dispatch.
81
81
 
82
82
  **Poll loop (BashOutput-only, 30-minute cap):**
83
83
  - Record `start_ts` at dispatch time via a single `Bash` call: `date +%s` (output captured).
@@ -132,7 +132,7 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Cod
132
132
  - The assigned model execution value is canonical for CLI execution. Do not substitute a different Codex model unless the task bundle explicitly changes it.
133
133
  - Pass the prompt received from Lead directly to codex after persisting the exact prompt to the assigned path.
134
134
  - Include context (code, diff, file paths) if provided.
135
- - For long prompts, the wrapper script reads from the saved project-local prompt history file via stdin redirect internally. The caller invokes the wrapper with three required positional args + the worktree path for implementation phase:
135
+ - For long prompts, dispatch through the wrapper with literal absolute paths (plus the worktree path for implementation phase):
136
136
  ```bash
137
137
  $HOME/.okstra/bin/okstra-codex-exec.sh "<literal-project-root>" "<assigned-model-execution-value>" "<literal-prompt-history-path>" "<literal-worktree-path>" "worker"
138
138
  ```
@@ -140,11 +140,12 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Cod
140
140
 
141
141
  ## Required Reading Before Any Analysis
142
142
 
143
- Before producing any output, you MUST ensure the underlying Codex CLI run reads every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. For analysis workers this includes the task brief, analysis profile, analysis material (if present), reference expectations, and the carry-in clarification response (if present). Analysis workers do NOT read `final-report-template.md` — that file is for the Report writer worker only (see `okstra-team-contract` "Audience-scoped enumeration"). Producing findings without the template is the intended contract; the report writer in Phase 6 owns final-report structure.
143
+ Before invoking the Codex CLI, you MUST:
144
144
 
145
- - The lead's prompt body, which you persist verbatim and feed into Codex via stdin, already contains the explicit list of files and the end-to-end reading rule. Do not strip or summarize that block before passing it to the CLI.
146
- - For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank a blank `User input` with `Status=open` is itself a signal you must surface. The fact that the prior run's final report and the upcoming output share section 5 structure is NOT a license to skim.
147
- - The wrapper writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/codex-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Codex Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). The main Codex output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
145
+ 1. Extract the absolute path from the lead's `**Worker Preamble Path:**` anchor header and verify the CLI run will Read that file end-to-end (canonical SSOT for the Required Reading + Error Reporting + Output sections contract). The lead's prompt body which you persist verbatim and feed into Codex via stdin already contains this anchor; do not strip it.
146
+ 2. Verify the lead's prompt body lists the per-run input files under `## Inputs` (task-brief, analysis-profile, analysis-material if present, reference-expectations, clarification-response if carry-in). Analysis workers do NOT read `final-report-template.md` that file is for the report writer only.
147
+
148
+ The CLI writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/codex-worker-audit-<task-type>-<seq>.md`. The sidecar's body begins with `# Codex Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading. The main Codex output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
148
149
 
149
150
  ## Worker Output Structure
150
151
 
@@ -228,7 +229,7 @@ pre-flight terminal status, not a runtime CLI error.
228
229
  - Ignore stderr warnings from MCP integration.
229
230
  - Return error messages as-is on failure.
230
231
  - Do not summarize or modify Codex results.
231
- - Sections 1–5 of the worker output are the common core shared with the Claude and Gemini workers — the dispatched prompt asks identical questions for all three roles, and the Codex CLI must answer all of them, not only implementation-realism findings. Your specialization (implementation realism, code-path implications, edge cases, technical trade-offs) belongs only in optional Section 6 as additive depth. A Codex result whose Findings section is populated solely with implementation-feasibility items is in breach of contract; see `skills/okstra-team-contract/SKILL.md` "Worker Output Contract".
232
+ - Sections 1–5 of the worker output are the common core shared with the Claude and Gemini workers — the dispatched prompt asks identical questions for all three roles, and the Codex CLI must answer all of them, not only implementation-feasibility findings. Your specialization (implementation realism, code-path implications, edge cases, technical trade-offs) belongs only in optional Section 6 as additive depth. A Codex result whose Findings section is populated solely with implementation-feasibility items is in breach of contract; see `skills/okstra-team-contract/SKILL.md` "Worker Output Contract".
232
233
 
233
234
  ## Stage evidence emission (BLOCKING, implementation task only)
234
235
 
@@ -68,7 +68,7 @@ The wrapper exists because Claude Code's Bash permission matcher rejects simple-
68
68
  6. Extract the assigned model execution value for `Gemini worker`.
69
69
  - First, use the value explicitly assigned in the lead prompt.
70
70
  - If the lead prompt only lists the display model, use the canonical execution value from the referenced task bundle metadata (`task-manifest.json` → `resultContract.requiredWorkerRoles[]` for the gemini role).
71
- - If no assigned model execution value can be determined, immediately return `GEMINI_MODEL_MISSING: assigned Gemini model execution value was not provided`. Do NOT fall back to training-data defaults — historical Gemini defaults (e.g. `gemini-1.5-flash`) are NOT acceptable substitutes for the assigned model. Returning the sentinel is the correct behavior; the lead is responsible for fixing its prompt and redispatching.
71
+ - If no assigned model execution value can be determined, immediately return `GEMINI_MODEL_MISSING: assigned Gemini model execution value was not provided`. Do NOT fall back to training-data defaults — historical Gemini defaults like `gemini-1.5-flash` are NOT acceptable substitutes for the assigned model. Returning the sentinel is the correct behavior; the lead is responsible for fixing its prompt and redispatching.
72
72
  - This rule applies equally to convergence reverify rounds. The reverify prompt MUST carry the same `**Model:**` line as the initial run (see `okstra-convergence` skill, "Required reverify-prompt anchor headers"). If the line is absent in a reverify prompt, return `GEMINI_MODEL_MISSING` rather than guessing.
73
73
 
74
74
  7. If installed, dispatch the wrapper as a **background** Bash command and poll for completion. The two-minute foreground Bash timeout is insufficient for implementation-phase Gemini runs and forced workers into ad-hoc background dispatch with lost output. The polling contract below is the formal replacement.
@@ -128,7 +128,7 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Gem
128
128
  ## Prompt Composition
129
129
 
130
130
  - The lead prompt must include both `**Project Root:** <absolute-path>` (at the top) and `Assigned worker prompt history path: <path>`.
131
- - Treat that path as the canonical worker prompt history artifact for the current run.
131
+ - Treat the prompt-history path as the canonical worker prompt history artifact for the current run, resolved to absolute against `Project Root` if given as relative.
132
132
  - The assigned model execution value is canonical for CLI execution. Do not substitute a different Gemini model unless the task bundle explicitly changes it.
133
133
  - Pass the prompt received from Lead directly to gemini after persisting the exact prompt to the assigned path.
134
134
  - Include context (code, diff, file paths) if provided.
@@ -140,11 +140,12 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Gem
140
140
 
141
141
  ## Required Reading Before Any Analysis
142
142
 
143
- Before producing any output, you MUST ensure the underlying Gemini CLI run reads every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. For analysis workers this includes the task brief, analysis profile, analysis material (if present), reference expectations, and the carry-in clarification response (if present). Analysis workers do NOT read `final-report-template.md` — that file is for the Report writer worker only (see `okstra-team-contract` "Audience-scoped enumeration"). Producing findings without the template is the intended contract; the report writer in Phase 6 owns final-report structure.
143
+ Before invoking the Gemini CLI, you MUST:
144
144
 
145
- - The lead's prompt body, which you persist verbatim and feed into Gemini via stdin, already contains the explicit list of files and the end-to-end reading rule. Do not strip or summarize that block before passing it to the CLI.
146
- - For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank a blank `User input` with `Status=open` is itself a signal you must surface. The structural similarity between the prior final report and the upcoming output is the most common reason this step gets skipped — do not repeat that.
147
- - The wrapper writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/gemini-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Gemini Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). The main Gemini output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
145
+ 1. Extract the absolute path from the lead's `**Worker Preamble Path:**` anchor header and verify the CLI run will Read that file end-to-end (canonical SSOT for the Required Reading + Error Reporting + Output sections contract). The lead's prompt body which you persist verbatim and feed into Gemini via stdin already contains this anchor; do not strip it.
146
+ 2. Verify the lead's prompt body lists the per-run input files under `## Inputs` (task-brief, analysis-profile, analysis-material if present, reference-expectations, clarification-response if carry-in). Analysis workers do NOT read `final-report-template.md` that file is for the report writer only.
147
+
148
+ The CLI writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/gemini-worker-audit-<task-type>-<seq>.md`. The sidecar's body begins with `# Gemini Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading. The main Gemini output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
148
149
 
149
150
  ## Worker Output Structure
150
151
 
@@ -60,20 +60,22 @@ Do NOT duplicate the data.json contents here — the data.json is the canonical
60
60
 
61
61
  ## Required Reading Before Authoring
62
62
 
63
- Before writing the data.json, you MUST read every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. This always includes:
63
+ Before writing the data.json, you MUST:
64
64
 
65
- - `schemas/final-report-v1.0.schema.json` the JSON Schema you must conform to. The renderer + validator both consume this.
66
- - `templates/reports/final-report.template.md` the Jinja2 template the renderer uses. Read this to understand which data.json fields appear where in the rendered markdown, but do NOT edit it.
67
- - `templates/reports/i18n/en.json`
68
- - `templates/reports/i18n/ko.json`
65
+ 1. Extract the absolute path from the lead's `**Worker Preamble Path:**` anchor header and Read that file end-to-end (canonical SSOT for the Required Reading + Error Reporting + Anchor contract — this overrides per-spec restatements).
66
+ 2. Read every input file the lead enumerated under `## Inputs` (or equivalent heading) in the dispatch prompt body, end-to-end (single `Read` call with no `offset`/`limit`; page through with explicit offsets only when a file is too large for one read).
67
+
68
+ For the report writer specifically, the `## Inputs` list always includes:
69
+
70
+ - `schemas/final-report-v1.0.schema.json` — the JSON Schema you must conform to. The renderer + validator both consume it.
71
+ - `templates/reports/final-report.template.md` — the Jinja2 template the renderer uses. Read it to understand which data.json fields appear where in the rendered markdown; do NOT edit it.
72
+ - `templates/reports/i18n/en.json` and `templates/reports/i18n/ko.json`.
69
73
  - Every analysis worker's result file under `worker-results/`.
70
- - `state/convergence-<task-type>-<seq>.json` (if present).
74
+ - `state/convergence-<task-type>-<seq>.json` (if present). When present, reproduce its `roundHistory[]`, `round2SkippedReason`, and `finalClassificationCounts` verbatim into the final report's Section 1 Round History sub-table — do not recompute from worker results.
75
+
76
+ For the carry-in `clarification-response.md` (if present), walk every row of `## 5. Clarification Items` including rows whose `User input` cell is blank — a blank cell with `Status=open` is a signal you must surface in the conditional `## 0. Clarification Response Carried In From Previous Run` section (the template's `RENDER_IF` guard activates it when the carry-in path is non-empty). When no carry-in path was provided, OMIT the `## 0.` heading entirely — do NOT write an empty-state stub.
71
77
 
72
- - Use a single `Read` call per file with no `offset` and no `limit`. If a file is too large for one read, page through it with explicit `offset` / `limit` calls covering the full file.
73
- - For the carry-in `clarification-response.md` (if present), walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) including rows whose `User input` cell is blank — a blank cell with `Status=open` is itself a signal you must surface in the conditional `## 0. Clarification Response Carried In From Previous Run` section (the template's `RENDER_IF` guard activates it when the carry-in path is non-empty). The fact that the file you write has a structurally similar section 5 is NOT an excuse to skim. When no carry-in path was provided, OMIT the `## 0.` heading entirely — do NOT write an empty-state stub.
74
- - Open every analysis-worker result file under `worker-results/` end-to-end. Do not summarize them from convergence output alone — convergence captures classifications, not full evidence.
75
- - Write a Reading Confirmation block to your **audit sidecar** at `runs/<task-type>/worker-results/report-writer-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Report Writer Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading. The main final-report and the main worker-results file MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails reports that contain one. If you cannot truthfully confirm a file end-to-end, record a `tool-failure` in the errors sidecar instead of fabricating the report.
76
- - When the convergence-state file is present, read it fully and reproduce the `roundHistory[]` array, `round2SkippedReason`, and `finalClassificationCounts` in the final report's Section 1 Round History sub-table. Do not derive these values from worker results alone — they live in `state/convergence-<task-type>-<seq>.json`.
78
+ Write a Reading Confirmation block to your **audit sidecar** at `runs/<task-type>/worker-results/report-writer-worker-audit-<task-type>-<seq>.md`. The main final-report and the main worker-results file MUST NOT contain a `## 0. Reading Confirmation` heading. If you cannot truthfully confirm a file end-to-end, record a `tool-failure` in the errors sidecar instead of fabricating the report.
77
79
 
78
80
  ## Authoring Contract
79
81
 
@@ -32,6 +32,7 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
32
32
  - Therefore: at the start of every phase the prompts/ directory is normally empty (or contains only previously-dispatched workers' files). This is expected. Do NOT narrate it as "missing", "누락", or "not yet rendered" — it just means dispatch has not happened yet.
33
33
  - Before dispatching any required worker, **you (the lead) construct the worker prompt and persist it to the assigned absolute path using `Write`** (per `okstra-team-contract` rule 6). Only after persisting do you call the worker subagent (Agent tool / Codex / Gemini wrapper). The wrapper subagents will also re-write the same file on their end; the double-write is intentional and idempotent.
34
34
  - Do not "check if the file exists and skip dispatch" — file presence is not a signal to skip. Worker selection and skipping rules come from team-state, never from prompts/ directory contents.
35
+ - **Worker Preamble Path (single anchor — replaces inlined `[Required reading]` and `[Error reporting]` blocks).** Every worker prompt you construct MUST include the anchor header `**Worker Preamble Path:** {{WORKER_PROMPT_PREAMBLE_PATH}}`. The file at that path is the canonical SSOT for Required Reading + Error Reporting + Output sections; workers Read it end-to-end before producing output. Do NOT re-inline those blocks into worker prompt bodies — that is the legacy ~80-line dispatch boilerplate that this anchor is designed to replace.
35
36
 
36
37
  ## Manifests
37
38
 
@@ -0,0 +1,53 @@
1
+ <!--
2
+ Implementation profile — deliverable sidecar. Lead lazy-loads this file ONCE
3
+ at the start of Phase 6 (report-writer dispatch), AFTER all worker results
4
+ are collected and convergence finished. Phase 1-5 do not need it.
5
+ -->
6
+
7
+ # Implementation profile — Deliverable sidecar
8
+
9
+ > **When to read**: lead reads this file ONCE at the start of Phase 6 (after Phase 5.5 convergence completes, before constructing the report-writer dispatch prompt). Carries the final-report deliverable shape, lead's post-stage persistence rules, and the self-review checklist.
10
+
11
+ ## Required deliverable shape (final report, in addition to the standard sections)
12
+
13
+ - **Plan link & approval evidence**: path to the approved `final-report.md`, the exact quoted approval marker, AND the executed stage number / title quoted from the Stage Map row.
14
+ - **Commit list**: each commit's SHA (or short SHA), message, and the plan step it satisfies
15
+ - **Diff summary**: `git diff --stat <base>..HEAD` output, plus a per-file one-line summary of changes
16
+ - **Out-of-plan edits block**: every file edited that was not in the approved plan's file list, with rationale (empty block is acceptable and preferred)
17
+ - **Stage sidecar evidence**: the JSON payload of `runs/<impl-task-key>/carry/stage-<N>.json` is embedded verbatim in a fenced ```json``` block, AND the `consumers.jsonl` rows this run appended are quoted line-by-line, so reviewers can audit the carry surface without grepping artifact directories.
18
+ - **Validation evidence**: actual command output (stdout/stderr) for every `pre / mid / post` validation command from the plan. Truncated output is acceptable but the command line and exit code MUST be exact. No paraphrasing of test results.
19
+ - **TDD evidence (when applicable)**: for steps that should be TDD-ordered, show the failing-test output BEFORE the implementation commit and the passing-test output AFTER, with commit SHAs framing the transition.
20
+ - **Verifier results**: a section per verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) containing:
21
+ - their independent verdict (PASS / CONCERNS / FAIL),
22
+ - cited diff snippets supporting the verdict,
23
+ - the verifier's `Read-only command log` (every command they ran with exact invocation and exit code, in execution order — copied verbatim from the worker result),
24
+ - **independent validation re-run results** — per plan-validation command: command line, exit code, and tail of output captured by the verifier (not the executor); any divergence from the executor's reported result MUST be called out as a `Discrepancy` line citing both sides,
25
+ - **style / lint / type-check results** — each check-only tool the verifier ran, its exit code, and the count of new findings attributable to lines this run introduced. When no tool is configured for a touched language, record the single line `no lint/style tool configured for <language>`,
26
+ - any fix recommendations the verifier declined to apply.
27
+ `Claude lead` synthesises a unified verdict but MUST preserve dissent — do not collapse opinions into one paragraph. If any verifier issued `FAIL` on a `Discrepancy` line, the synthesised verdict MUST be `FAIL` unless lead cites a concrete reproduction-time reason (committed flaky-test record, documented environment delta) for overriding.
28
+ - **Rollback verification**: confirmation that the plan's rollback path is still valid after the changes. Strength of verification depends on the change category:
29
+ - **Pure code changes** (no persisted state, no infra mutation): a reachable revert SHA is sufficient. Record the exact `git revert <SHA>` command that would undo the change, and confirm `git rev-parse <SHA>` resolves.
30
+ - **Feature-flag-gated changes**: confirm the off-switch path was exercised in this run's validation evidence (i.e. one of the validation commands ran with the flag off and succeeded). A plan that ships a flag without exercising the off-path does NOT satisfy this requirement.
31
+ - **Schema migrations, config-format changes, or any change with persisted state**: a **dry-run of the rollback step is mandatory**, not preferred. Record the exact rollback command and its captured exit code / stdout. If the migration tool offers no dry-run mode (`--dry-run`, `--plan`, equivalent), the executor MUST refuse to claim rollback verification and instead end the run with a routing recommendation back to `implementation-planning` for a safer rollback strategy. Skipping this step on a stateful change is treated as a `contract-violated` outcome by `final-verification`.
32
+ - **Routing recommendation for `final-verification`**: brief note on whether the changes are ready for final-verification phase or need a new error-analysis / planning loop first.
33
+ - **Follow-up tasks (Section 7 of the final report)**: every item discovered during this run that was *not* delivered MUST appear in the final report's `## 7. Follow-up Tasks (후속 작업)` table with a concrete `Origin`, `New Task ID`, `Suggested task-type`, `Scope`, and `Reason / Why deferred`. Sources include: out-of-scope discoveries that the executor consciously chose not to fold into this run, verifier concerns the executor declined to fix in-place, scope-boundary items from the approved plan that turned out to need their own ticket, and any unresolved `## 5. Clarification Items` row carried over from the approved plan (`Status` ∈ `{open, answered}` at approval time). An empty section is acceptable but only when expressed as the single line `- 후속 작업 없음.` — silence is treated as a contract violation. Rows with `Auto-spawn? = yes` will be materialised by `scripts/okstra-spawn-followups.py` in Phase 7; rows with `Auto-spawn? = no` MUST also appear in `Section 6. Recommended Next Steps` so the user knows to act manually.
34
+
35
+ ## Self-review pass before finalising the report (`Claude lead` runs this; do not delegate to a generic subagent)
36
+
37
+ 1. **Plan coverage** — every step in the approved plan's recommended option must point to a commit (or an explicit `Skipped: <reason>` entry). List gaps.
38
+ 2. **Evidence completeness** — every `Validation evidence` and `TDD evidence` claim has the actual command line and exit code? No paraphrased "tests pass" without output?
39
+ 3. **Out-of-plan honesty** — files in the diff that are NOT in the plan list must appear in the `Out-of-plan edits` block. Cross-check with `git diff --name-only`.
40
+ 4. **Verifier dissent preserved** — if the verifiers in the resolved roster disagree, the disagreement is visible in the report? Synthesis hides nothing?
41
+ 5. **Forbidden action audit** — `git push`, publish, deploy, migration, third-party write commands: scan the run's session transcripts for any occurrence and confirm none happened.
42
+ 6. **Placeholder scan** — restrict the scan to lines this run actually introduced; pre-existing placeholders in unchanged regions of touched files are out of scope. Required command (substitute `<base>` with the parent of the first commit in this run's commit list):
43
+ ```
44
+ git diff <base>..HEAD | grep -E '^\+[^+].*\b(TBD|TODO|FIXME|XXX|implement later|handle edge cases|similar to|placeholder)\b' || echo 'clean'
45
+ ```
46
+ Only newly-added lines (those starting with `+` and not part of the `+++` header) are inspected. If output is anything other than `clean`, the run MUST either remove the placeholders before finalising or record an explicit justification per occurrence in the final report.
47
+
48
+ ## Lead post-stage persistence (BLOCKING — runs after the Executor emits `### Stage Carry Evidence`)
49
+
50
+ - Parse the executor's `### Stage Carry Evidence` JSON block. If absent or unparsable, end with status `contract-violated` and route to a follow-up `error-analysis`.
51
+ - Write the JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
52
+ - Append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
53
+ - Quote both files' new contents (the sidecar JSON in full, the new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.
@@ -0,0 +1,60 @@
1
+ <!--
2
+ Implementation profile — executor sidecar. Lead lazy-loads this file at the
3
+ start of Phase 5 (executor dispatch). NOT included via {{INCLUDE:}} because
4
+ lead context should NOT carry this body during Phase 1-4. Read once, retain
5
+ until Phase 5 ends, then drop from active context for Phase 6/7.
6
+ -->
7
+
8
+ # Implementation profile — Executor sidecar
9
+
10
+ > **When to read**: lead reads this file ONCE at the start of Phase 5 (after Stage Map parse, before issuing the Executor's first `Edit` / `Write`). The body governs ONLY the Executor role's behaviour. Verifier / report-writer behaviour lives in sibling sidecars.
11
+
12
+ ## Executor role binding (carried over from the thin core)
13
+
14
+ - The `Executor` (bound in `implementation.md` thin core) is the **only worker permitted to use Edit / Write / state-mutating Bash commands** on project files. All other workers run read-only. When the executor provider is `codex` or `gemini`, the actual file mutation happens inside the executor CLI's own auto-edit mode (e.g. `codex exec --sandbox workspace-write`, gemini's equivalent) — not through Claude-side Edit/Write tools — but the safety rules in this sidecar still apply identically.
15
+ - Worktree cwd handling — when the thin core's Task worktree block resolves status to `created` or `reused`, the Executor MUST run every Edit / Write / build / test / commit command with the worktree path as cwd. Treat it as `project_root` for the duration of this run. Do NOT mutate the caller's original checkout. Do NOT `cd` out of the worktree to reach files; if a file outside the worktree is needed, the dependency is a planning gap — record it in `Out-of-plan edits` and continue.
16
+ - **How to set cwd per Bash call**: the Claude Bash tool inherits its cwd from the lead session, which is NOT the worktree. To put cwd-sensitive toolchains (`cargo`, `npm`, `pnpm`, `bun`, `pytest`, `make`, `go`) into the worktree, prefix the command with `cd {{EXECUTOR_WORKTREE_PATH}} && ` inside the same Bash invocation — e.g. `cd {{EXECUTOR_WORKTREE_PATH}} && cargo test -p foo`. **Never wrap in `bash -lc "..."` or `bash -c "..."`** — the wrapper hides the leading `cd` token from Claude Code's permission auto-allow layer (causing prompts on every call) without any safety benefit. For tools that accept an explicit working-directory flag (`git -C <path>`, `cargo --manifest-path`, `pytest --rootdir`), prefer that form over the `cd && ` chain. Edit / Write / Read tool calls already use absolute paths and need no cwd handling. The codex / gemini executor CLI wrappers (`okstra-codex-exec.sh -C`, `okstra-gemini-exec.sh --include-directories`) already inject worktree cwd at the CLI layer, so this rule applies primarily to the Claude executor.
17
+ - **Synced okstra state directory.** At provision time `okstra-ctl` may symlink `.project-docs/` from the repo's **main worktree** into the task worktree. This is NOT an independent copy — writes through it land in the main worktree. Inside this run the executor MUST confine okstra artifact writes to its own task scope (i.e. `.project-docs/okstra/tasks/<this-task-id>/...`). Other synced directories, if present due to local configuration, are not implicit okstra context; read them only when the brief explicitly cites them as source material.
18
+
19
+ ## Pre-implementation context exploration (executor before first edit)
20
+
21
+ - **Mandatory TDD loop**: BEFORE the first `Edit` or `Write` call, the executor MUST apply a red-green-refactor loop for every code change in this run. This is required; skipping it is a `contract-violated` outcome. This governs HOW each step is executed (failing test first → minimal implementation → refactor); it does not override the approved plan's WHAT/file scope.
22
+ - Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
23
+ - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
24
+ - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
25
+ - re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
26
+ - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
27
+ - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
28
+ - extract the **start stage's** file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path. These — not the whole plan — are the authoritative scope for this run.
29
+ - inspect the current state of every file the plan names; if any file has changed materially since the plan was written, stop and route to a new `implementation-planning` run instead of editing speculatively
30
+ - "materially changed" means: the function, class, section, or behaviour the plan targets has been edited, renamed, moved, removed, or otherwise altered in a way that invalidates the plan's reasoning. Cosmetic edits (whitespace, comment-only changes, unrelated function modifications elsewhere in the same file) do NOT trigger a re-plan; cite the diff (`git log --oneline <plan-created-at>..HEAD -- <file>`) in the final report and proceed.
31
+ - distinguish the two file-scope rules (they are not in conflict):
32
+ - **drift rule** (this section): if a file *named in the plan* has materially drifted, refuse to edit and route back to planning. This protects trust in the approved scope.
33
+ - **out-of-plan rule** (Allowed actions section below): if a step *requires touching a file NOT in the plan list*, that is permitted with `Out-of-plan edits` justification. This handles honest scope discovery during execution.
34
+ - confirm the test/build commands referenced in the plan still exist and run from a clean state
35
+
36
+ ## Stage execution contract (this run owns exactly one stage of the plan)
37
+
38
+ - **Sidecar evidence writer (BLOCKING).** When the start stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. The file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
39
+ - **Reverse link (BLOCKING).** Before the first Edit/Write, append a `status:"started"` row to `runs/<plan-task-key>/consumers.jsonl` (lock via the okstra runtime). On stage completion, append a `status:"done"` row with `carry_path` populated.
40
+ - **One-PR-per-stage.** This run creates exactly one PR titled `Stage <N>: <stage title>`. The PR body MUST include:
41
+ - `## Stage` — number and title (from Stage Map row).
42
+ - `## Carry-In summary` — depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
43
+ - `## Next stage` — next stage number/title or `(last stage)`.
44
+ Stage PRs link back to each other in their bodies (`Previous: #<n>, Next: #<m>` lines) so a reviewer can navigate the chain.
45
+
46
+ ## Allowed actions during the run
47
+
48
+ - **Edit / Write on approved project source files**: scope is bounded first by the shared Resource boundary, then by the approved plan's file list. Editing files outside the plan's list is permitted only when strictly needed to satisfy a step, and MUST be recorded in the final report's `Out-of-plan edits` block with rationale.
49
+ - read-only inspection commands: `git status`, `git diff`, `git log`, `grep`, `rg`, `find`, `cat`, `ls`, file Read tools
50
+ - build, lint, type-check, and test commands (`npm test`, `pytest`, `go build`, `cargo test`, `bash -n`, etc.)
51
+ - **local git operations only**: `git add`, `git commit`. Prefer small commits keyed to plan steps.
52
+ - **Commit message format (mandatory)**: every commit message MUST follow Conventional Commits — `<type>(<scope>): <subject>` for the first line, optional body separated by a blank line, optional footer. Constraints:
53
+ - `<type>` MUST be one of: `feat` / `fix` / `perf` / `revert` / `deps` / `docs` / `refactor` / `build` / `ci` / `chore` / `test`. When the repo is `release-please`-managed, this aligns the commit with a configured changelog section.
54
+ - `<scope>` SHOULD be the plan step identifier or the primary module touched (e.g. `feat(report-writer): ...`). Omit the parentheses only when no meaningful scope applies.
55
+ - `<subject>` MUST be ≤72 characters, imperative mood (`add`, `fix`, `remove` — not `added` / `adding`), no trailing period, no emoji, no AI attribution lines (no `Co-Authored-By: Claude ...`, no `Generated with Claude Code`).
56
+ - Body (when present) explains *why*, not *what*; wrap at ~100 chars.
57
+ - Do NOT append okstra artefact paths to the commit message — no `Plan: .project-docs/okstra/...`, no `Report: ...`, no `Run: ...`, no `Task: ...` footers, and no other reference to files under `.project-docs/okstra/`. Those paths belong in the final report's `Plan link & approval evidence` section, not in git history; they rot quickly and leak internal layout into the upstream changelog.
58
+ - Allowed footers are limited to standard Conventional Commits trailers (`BREAKING CHANGE: ...`, `Refs: <issue/ticket-id>`, `Closes #<n>`). When citing a ticket, use the ticket id only (e.g. `Refs: DEV-9423`) — never a filesystem path.
59
+ - One commit MUST correspond to one plan step (or one cohesive sub-step). Do NOT bundle unrelated steps into a single commit, and do NOT split a single step across commits unless the plan explicitly sequenced it that way.
60
+ - The exact message used for each commit MUST be reproduced verbatim in the final report's `Commit list` so reviewers can audit it without re-running `git log`.
@@ -0,0 +1,76 @@
1
+ <!--
2
+ Implementation profile — verifier sidecar. Lead lazy-loads this file ONCE
3
+ at Phase 5, BEFORE constructing the verifier worker dispatch prompts.
4
+ -->
5
+
6
+ # Implementation profile — Verifier sidecar
7
+
8
+ > **When to read**: lead reads this file ONCE in Phase 5, between executor stage completion and the first verifier dispatch. Carries the two-tier command lookup, deny-list, discrepancy rule, and verifier-specific forbidden actions.
9
+
10
+ ## Verifier roles (resolved at run-prep time)
11
+
12
+ - The verifier slots are `Claude verifier` and `Codex verifier`, plus `Gemini verifier` **only when `gemini` is in the resolved `--workers` roster**. Every verifier in the resolved roster is dispatched regardless of which provider holds the executor role; the executor's own provider is run *separately* as a verifier (a fresh CLI session with no shared context) so that no verdict is produced from the same session that wrote the diff. Verifiers MUST NOT call Edit, Write, or any Bash command that mutates files outside the run's artifact directories. If a verifier wants a fix, it records the recommendation in its worker result; it does not apply the fix itself.
13
+ - Session isolation — not model-variant divergence — is the primary self-review safeguard: each verifier is a separate CLI invocation with its own context window, so reusing the same model variant for executor and same-provider verifier is acceptable. Different model variants (e.g. executor=opus / Claude verifier=sonnet) remain recommended when available.
14
+ - Phase-specific model defaults override the shared defaults: `Claude verifier`=`sonnet`, `Codex verifier`=`gpt-5.5`, `Gemini verifier`=`auto` (only when present in the roster). The `Executor`'s model is taken from the provider-specific worker model corresponding to `--executor`: claude→`--claude-model` (default `sonnet`, override to `opus` recommended when this run's executor is claude), codex→`--codex-model` (default `gpt-5.5`), gemini→`--gemini-model` (default `auto`).
15
+ - Verifiers read from the SAME working tree path the Executor used so they observe the exact diff the Executor produced. Verifiers remain strictly read-only there.
16
+
17
+ ## Verifier QA duties (independent re-run mandate)
18
+
19
+ Every verifier acts as a QA gate, not just a diff reviewer. Trusting the executor's reported evidence is forbidden — verifiers MUST reproduce it themselves from the same worktree path the executor used.
20
+
21
+ ### Two-tier command lookup (NO auto-detection)
22
+
23
+ Verifier obtains the QA command set from exactly two declared sources, in order — there is **no fallback to guessing tools from manifest files**.
24
+
25
+ 1. **Tier 1 — plan validation set (task-specific):** every command listed under the approved plan's `validation` block (pre / mid / post).
26
+ 2. **Tier 2 — project baseline (`project.json.qaCommands`):** the project's standing QA baseline declared in `<PROJECT_ROOT>/.project-docs/okstra/project.json` under the `qaCommands` key. Schema (each category is an array of `{ "label", "cmd", "language"? }` objects):
27
+ ```json
28
+ {
29
+ "qaCommands": {
30
+ "lint": [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
31
+ "format": [{ "label": "cargo fmt", "cmd": "cargo fmt --check", "language": "rust" }],
32
+ "typecheck": [{ "label": "tsc", "cmd": "pnpm exec tsc --noEmit", "language": "ts" }],
33
+ "test": [{ "label": "cargo test", "cmd": "cargo test --workspace --locked", "language": "rust" }]
34
+ }
35
+ }
36
+ ```
37
+ `language` is optional; when present, verifier MAY skip categories whose `language` is not represented in this run's diff (recorded as `qa-command skipped: <label> (language=<x> not in diff)`). Absent `language` means "always run".
38
+
39
+ ### Execution rule
40
+
41
+ Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each command runs in the worktree cwd, and is recorded in the worker result with its exact command line, exit code, and the tail of stdout/stderr. Substituting or paraphrasing a Tier 1 command is forbidden (see Verifier-specific forbidden actions below).
42
+
43
+ ### Missing-tier handling
44
+
45
+ If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
46
+
47
+ ### `cmd` field deny-list (Tier 2 validation)
48
+
49
+ The runtime AND the verifier MUST reject any `cmd` containing tokens that imply mutation: `--fix`, `--write`, ` -w` (gofmt write), ` -u` (jest snapshot update), `--update-snapshots`, `--snapshot-update`, `--update-goldens`, `INSTA_UPDATE=` (with any value other than `no`), `cargo insta accept`, `npm install` (without `ci`), `cargo update`, `pip install -U`, `pnpm add`, `bun add`. Encountering a denied token aborts the verifier run with `contract-violated` and the operator is asked to re-declare the command in check-only form.
50
+
51
+ ### Discrepancy rule
52
+
53
+ If the verifier's re-run result differs from what the executor reported (a passing test fails on re-run, a clean lint surfaces warnings, an exit code mismatches), the verifier MUST issue verdict `FAIL` with the divergence cited. `Claude lead` MUST NOT silently prefer the executor's evidence over a verifier's reproduced result during synthesis; if it overrides, it MUST cite a concrete reproduction-time reason (flaky-test commit-cited, environment delta documented) — handwaving is not allowed.
54
+
55
+ ### Read-only command log (per verifier)
56
+
57
+ The worker result MUST contain a `Read-only command log` block listing every command executed during the verifier run with its exact invocation and exit code, in execution order. No mutating command may appear in this block. This log is copied into the final report's verifier result section verbatim.
58
+
59
+ ### Verifier evidence is independent of executor evidence
60
+
61
+ The final report keeps both — executor's `Validation evidence` AND each verifier's `Read-only command log` — so reviewers can compare them line-by-line.
62
+
63
+ ## All-verifier-failure policy
64
+
65
+ If every verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) ends with a non-result terminal status (`timeout`, `error`, `not-run`) — i.e. zero independent verdicts were produced — the run MUST end with status `blocked` and route to a follow-up `error-analysis` run. `Claude lead` MUST NOT substitute its own verdict in place of the missing verifier outputs; synthesis requires at least one independent verifier's verdict. If one or more verifiers fail but at least one returns a verdict, the run proceeds with the surviving verdict(s) and the final report MUST explicitly notate which verifiers were unavailable, with the captured error / timeout evidence per failed verifier.
66
+
67
+ ## Verifier-specific forbidden actions (any occurrence → terminal status `contract-violated`)
68
+
69
+ - running lint / formatter auto-fix modes during a verifier's re-run — `eslint --fix`, `prettier --write`, `ruff check --fix`, `rustfmt` (writes by default; verifiers MUST use `cargo fmt --check` or `rustfmt --check`), `gofmt -w`, `black .` (use `black --check`), `isort .` (use `isort --check-only`), or any equivalent rewrite mode
70
+ - updating snapshots / golden fixtures during verification — `jest -u` / `--updateSnapshot`, `pytest --snapshot-update`, `INSTA_UPDATE=*` (any value other than `no`), `cargo insta accept`, `--update-goldens`, or any equivalent "make the test agree with current output" flag
71
+ - masking test failure with selection or shell tricks during re-run — `-k <expr>` / `--ignore` / `--deselect` to skip subsets, trailing `|| true`, `set +e` followed by a manually softened comparison, redirecting non-zero exit to success. The plan's listed test command MUST run in full
72
+ - substituting the plan's validation commands — verifier MUST run the plan's pre/mid/post validation commands verbatim; replacing them with paraphrased or "equivalent" commands is forbidden. Adding supplementary check-only lint/type-check is allowed and is logged separately in the verifier's Read-only command log
73
+ - mutating lockfiles or dependency manifests — `npm install <pkg>`, `npm install` (without lockfile freeze; use `npm ci`), `pnpm add`, `bun add`, `cargo add`, `cargo update`, `pip install -U`, or any dependency install that is not lockfile-frozen (`--locked` / `--frozen-lockfile` / `npm ci` / `pip install --require-hashes`)
74
+ - git state mutations — `git add`, `git commit`, `git stash`, `git checkout -- <file>`, `git restore`, `git reset`, `git rebase`, `git merge`, branch creation/deletion, tag creation. Only read-only git queries (`git status`, `git diff`, `git log`, `git show`, `git rev-parse`, `git blame`) are permitted for verifiers
75
+ - running integration / end-to-end tests that produce non-local side effects (DB writes against a non-local datastore, external API writes, docker compose against a non-isolated environment) unless that exact command is listed in the approved plan's validation set
76
+ - redirecting tool caches or output to paths outside the worktree — e.g. setting `CARGO_TARGET_DIR`, `PYTEST_CACHE_DIR`, `NODE_OPTIONS=--require=<external>`, or any env var that causes the verifier's command to write outside the worktree's normal build artifact paths