okstra 0.47.0 → 0.49.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/superpowers/plans/2026-06-04-adversarial-implementation-planning.md +294 -0
- package/docs/superpowers/plans/2026-06-04-coverage-critic.md +516 -0
- package/docs/superpowers/plans/2026-06-05-acceptance-critic.md +251 -0
- package/docs/superpowers/plans/2026-06-05-compact-markdown-report-tables.md +323 -0
- package/docs/superpowers/specs/2026-06-04-adversarial-implementation-planning-design.md +90 -0
- package/docs/superpowers/specs/2026-06-04-coverage-critic-design.md +99 -0
- package/docs/superpowers/specs/2026-06-05-acceptance-critic-design.md +90 -0
- package/docs/superpowers/specs/2026-06-05-compact-markdown-report-tables-design.md +87 -0
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +3 -1
- package/runtime/bin/lib/okstra/tmux-pane.sh +40 -0
- package/runtime/bin/okstra-codex-exec.sh +17 -21
- package/runtime/bin/okstra-gemini-exec.sh +12 -15
- package/runtime/bin/okstra-trace-cleanup.sh +13 -1
- package/runtime/prompts/profiles/_common-contract.md +5 -5
- package/runtime/prompts/profiles/error-analysis.md +1 -0
- package/runtime/prompts/profiles/final-verification.md +2 -0
- package/runtime/prompts/profiles/implementation-planning.md +5 -1
- package/runtime/prompts/profiles/requirements-discovery.md +1 -0
- package/runtime/prompts/wizard/prompts.ko.json +13 -0
- package/runtime/python/okstra_ctl/render.py +24 -3
- package/runtime/python/okstra_ctl/report_views.py +16 -29
- package/runtime/python/okstra_ctl/run.py +12 -0
- package/runtime/python/okstra_ctl/wizard.py +72 -1
- package/runtime/skills/okstra-convergence/SKILL.md +80 -5
- package/runtime/skills/okstra-run/SKILL.md +1 -0
- package/runtime/templates/project-docs/task-index.template.md +1 -8
- package/runtime/templates/reports/final-report.template.md +24 -28
- package/runtime/templates/reports/i18n/en.json +14 -15
- package/runtime/templates/reports/i18n/ko.json +14 -15
- package/runtime/templates/reports/schedule.template.md +3 -7
|
@@ -14,7 +14,7 @@ profile document.
|
|
|
14
14
|
- Worker interaction model (shared — read before inferring behaviour from the roster):
|
|
15
15
|
- the per-profile `Required workers:` block is a **roster**, not a behaviour contract. Each role's interaction mode changes across operating phases of the same run.
|
|
16
16
|
- **Phase 4 / 5 (independent analysis)**: analyser workers (`claude`, `codex`, `gemini` when opted in) produce findings independently and have no access to one another's outputs. `report-writer` does not analyse.
|
|
17
|
-
- **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`). For `requirements-discovery` and `
|
|
17
|
+
- **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`). For `requirements-discovery`, `error-analysis`, and `implementation-planning` this phase runs in **adversarial mode** (`convergence.adversarial=true`): verifiers try to refute each finding against its cited evidence and the burden of proof sits on the claim — see that skill's §"Adversarial Verification Mode".
|
|
18
18
|
- Do NOT conclude "no peer review happens" from the roster alone — every profile that lists ≥2 analyser workers runs convergence by default (`convergence.enabled=true` in `task-manifest.json`).
|
|
19
19
|
- Tooling — read-only MCP availability (shared):
|
|
20
20
|
- MCP is not implicit okstra context. Query an MCP server only when the task brief explicitly lists it as source material for this run. Any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.
|
|
@@ -30,8 +30,8 @@ profile document.
|
|
|
30
30
|
- Anti-escalation rule (shared):
|
|
31
31
|
- treating "다음 단계 진행해" or equivalent user phrases as authorisation to start a *different* lifecycle phase is forbidden. The next phase begins only in a separate okstra run launched with the new `--task-type`. Per-profile documents may further restrict this within their own scope.
|
|
32
32
|
- Run-start pane recording (shared — runs ONCE at run start, before the FIRST worker dispatch):
|
|
33
|
-
- The wrappers anchor
|
|
34
|
-
- The lead MUST run once, at run start: `mkdir -p "<RUN_DIR>/state" &&
|
|
33
|
+
- The codex/gemini wrappers now self-anchor their trace pane by walking their own ancestor PIDs against tmux `pane_pid`s (see `lib/okstra/tmux-pane.sh`), so they no longer depend on this file. The lead still records its own pane id here for the cleanup steps below (which-pane-to-never-kill) and as the "am I in tmux" gate. A bare `tmux display-message -p '#{pane_id}'` is NOT reliable for this — Claude Code's Bash tool strips `$TMUX`/`$TMUX_PANE`, so that command returns the most-recently-active *client's* pane (often a different session, or a foreign pane when the lead is launched outside tmux entirely). The lead therefore records via the same ancestry resolver.
|
|
34
|
+
- The lead MUST run once, at run start: `mkdir -p "<RUN_DIR>/state" && { . "$HOME/.okstra/bin/lib/okstra/tmux-pane.sh" 2>/dev/null && okstra_resolve_caller_pane; } > "<RUN_DIR>/state/lead-pane.id" 2>/dev/null || true` (substitute the run's absolute `RUN_DIR`). When the lead is not inside a tmux pane (e.g. Claude launched from the GUI app) no ancestor matches a pane, the file is empty, and every pane step below silently no-ops — that empty/absent file is the single signal that the lead is not in tmux.
|
|
35
35
|
- Phase-start pane reset (shared — runs BEFORE dispatching each new worker batch):
|
|
36
36
|
- okstra creates two kinds of tmux pane per run: (a) **worker-agent panes** the harness gives to dispatched subagents (titled `claude-worker` / `codex-worker` / `gemini-worker` / `report-writer-worker`), and (b) **trace panes** the codex/gemini wrappers spawn (`<cli>-<role>-<pid>-tail`). Both accumulate across internal phases because each new phase dispatches a fresh worker batch and the prior panes are never reclaimed.
|
|
37
37
|
- When `<RUN_DIR>/state/lead-pane.id` is non-empty (the lead is in tmux), the lead MUST run `$HOME/.okstra/bin/okstra-trace-cleanup.sh --run-dir "<RUN_DIR>"` **immediately before** dispatching the next phase's workers — i.e. just before emitting each `PROGRESS: phase-5.5-convergence round=<N>` marker and just before `PROGRESS: phase-6-synthesis dispatching report-writer-worker`. This closes every prior-phase okstra pane (worker-agent + trace) for this run, while NEVER killing the lead's own pane.
|
|
@@ -41,8 +41,8 @@ profile document.
|
|
|
41
41
|
- This step is **automatic and silent** — NO user prompt (workers are idle sessions that have already delivered their results; there is nothing for the user to preserve). It runs only when team-state's `teamCreate.status == "ok"` (Teams mode was actually used); in the no-`team_name` fallback there is no team to delete, so silent-skip.
|
|
42
42
|
- Sequence (token-usage collection MUST already be complete — `TeamDelete` removes `~/.claude/teams/<team>/` + `~/.claude/tasks/<team>/` but NOT the `~/.claude/projects/` jsonls Phase 7 reads, yet the read MUST precede teardown):
|
|
43
43
|
1. Read `~/.claude/teams/okstra-<task-key>/config.json` and, for every `members` entry whose name is not the lead, `SendMessage(to: <name>, message: { type: "shutdown_request" })` to terminate it gracefully.
|
|
44
|
-
2.
|
|
45
|
-
3. Call `TeamDelete()
|
|
44
|
+
2. These workers already delivered their results and terminated when their `Agent()` dispatch returned (the lead's completion evidence is the returned output + the existing result/final-report file, not a teardown ack) — a terminated session emits NO shutdown confirmation. Treat `shutdown_request` as best-effort (fire-and-forget); the lead MUST NOT block waiting for acks from addressed teammates. Proceed immediately to step 3.
|
|
45
|
+
3. Call `TeamDelete()` — the single synchronization point for teardown. If it errors with an active-members message, one teammate is genuinely still shutting down: wait briefly, retry `TeamDelete()` once, then proceed regardless of the result. NEVER loop or re-send `shutdown_request`; teardown must never block run completion once the work and final report already exist.
|
|
46
46
|
- Report it in one short line (e.g. `worker 6명 종료 + 팀 해제`) and proceed. Emit `PROGRESS: phase-7-teardown disbanding team` immediately before step 1.
|
|
47
47
|
- Phase wrap-up — okstra pane disposition (shared, MUST be the *last* step before returning control to the user):
|
|
48
48
|
- At run end the only residual okstra panes are the LAST phase's (e.g. the `report-writer-worker` agent pane and any codex/gemini trace pane). `okstra-trace-cleanup.sh --list --run-dir "<RUN_DIR>"` returns one tab-separated `<pane_id>\t<pane_title>` line per residual okstra pane (worker-agent + trace) for this run.
|
|
@@ -32,6 +32,7 @@
|
|
|
32
32
|
- **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <reporter-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed). A row with `none` that *could* have been answered by code or logs is a defect.
|
|
33
33
|
- Cross-verification mode:
|
|
34
34
|
- Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each root-cause / reproduction claim by directly re-inspecting the cited code, logs, or config; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
|
|
35
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
35
36
|
- Non-goals:
|
|
36
37
|
- implementation details unless they are necessary to validate the cause
|
|
37
38
|
- **source code edits, builds, migrations, or deployments** — this run produces evidence and cause analysis only; the fix belongs to a later `implementation-planning` run followed by an `implementation` run
|
|
@@ -44,6 +44,8 @@
|
|
|
44
44
|
3. **Coverage check** — every requirement in the originating plan/task brief is either marked covered (with artifact) or listed as a blocker. No silent omissions.
|
|
45
45
|
4. **Verifier dissent preserved** — if workers reach different verdicts, the disagreement is visible in section 1.2; synthesis hides nothing.
|
|
46
46
|
5. **No source-mutation audit** — scan the run's session transcripts for Edit / Write or state-mutating Bash commands that touch paths OUTSIDE `<PROJECT_ROOT>/.okstra/**` and outside the assigned run-artifact paths. Writes to worker prompts, audit sidecars, team-state, the final-report `data.json`, and rendered reports under the run directory are allowed okstra artifacts. Any source/schema/deployment mutation means the run has crossed into implementation and MUST be re-routed; do NOT silently strip the evidence.
|
|
47
|
+
- Cross-verification mode:
|
|
48
|
+
- **Acceptance critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker **acceptance devil's-advocate** pass runs after convergence to surface candidate acceptance blockers the verifiers may have missed. Each candidate is verified **confirm-or-downgrade**: confirmed → an `Acceptance Blockers` row (which, since `accepted` requires zero blockers, moves the verdict to `conditional-accept` / `blocked`); unconfirmed → a `Residual Risk` row (never dropped). See `skills/okstra-convergence/SKILL.md` "Acceptance critic pass (final-verification)".
|
|
47
49
|
- Non-goals:
|
|
48
50
|
- proposing unrelated refactors beyond the delivered scope
|
|
49
51
|
- **source code edits, follow-up bug fixes, or scope expansion** — this run renders a verdict only; defects detected here become inputs to a new `error-analysis` or `implementation-planning` run
|
|
@@ -37,6 +37,10 @@
|
|
|
37
37
|
- recommended execution order
|
|
38
38
|
- Approval gate (phase-specific addendum to shared authority rule):
|
|
39
39
|
- The YAML frontmatter `approved: true|false` field is the only authorised approval gate. report-writer always emits `approved: false`. The user clears it either by (a) editing the frontmatter line to `approved: true` directly, or (b) invoking the next phase with `--approve` so the CLI flips the frontmatter on the user's behalf. `okstra_ctl.run._validate_approved_plan` reads this field and refuses entry until it is `true`.
|
|
40
|
+
- Cross-verification mode:
|
|
41
|
+
- Phase 5.5 finding convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker finding (requirement gap / risk / option) by re-inspecting its cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode".
|
|
42
|
+
- §4.5.9 plan-body verification runs with an **adversarial posture** (`skills/okstra-convergence/SKILL.md` §"Adversarial plan-body posture"): verifiers open and confirm every cited path / command and put the burden of proof on the plan. The gate threshold is unchanged — a *majority* `DISAGREE` (`majority-disagree`) is still required to block approval; a single dissent does not.
|
|
43
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
40
44
|
- Non-goals:
|
|
41
45
|
- code-level micro-optimization unless it changes the implementation approach
|
|
42
46
|
- **source code edits of any kind** — this run produces a plan document only; Edit/Write on project source files is forbidden until the plan is approved and a separate `implementation` run starts
|
|
@@ -74,7 +78,7 @@
|
|
|
74
78
|
- the YAML frontmatter MUST include the line `approved: false` (report-writer always emits the unflipped value). The user authorises the next `implementation` run by flipping it to `approved: true` (manual edit or `--approve` CLI). Do NOT recreate any `User Approval Request` body block — the validator fails reports that contain one (see `validators/validate-run.py` deprecated patterns).
|
|
75
79
|
- **the frontmatter `approved: false` line is rendered unconditionally; if the plan-body verification gate (§4.5.9) returns `blocked-by-disagreement` or `aborted-non-result`, the writer MUST keep `approved: false` and the validator refuses any report that ships with `approved: true` under such a gate result.**
|
|
76
80
|
- every ambiguity flagged during pre-planning that the user must resolve before approval registered as a `Blocks=approval` row in the `## 5. Clarification Items` table (do NOT create a separate `Open Questions` block under `4.5.x` — the unified table is the single home)
|
|
77
|
-
- **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation.
|
|
81
|
+
- **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation. When `convergence.adversarial=true` (the default for this phase), this round uses the adversarial posture — verifiers confirm cited paths/commands and the burden of proof is on the plan — but the gate threshold stays `majority-disagree` (see that skill's §"Adversarial plan-body posture").
|
|
78
82
|
- **Decision-record evaluation (sole owner)**: this phase is the **single owner** of decision-record evaluation in the okstra lifecycle. The brief never evaluates or drafts decision records — it only forwards `adr-candidate:*` signals. Every `adr-candidate:*` entry inherited from the brief's `Open Questions` is a mandatory evaluation target. In addition, evaluate every decision the recommended option introduces against the three criteria:
|
|
79
83
|
1. **Hard to reverse** — would changing the decision later cost meaningfully more than deciding now?
|
|
80
84
|
2. **Surprising without context** — would a future reader, seeing only the code, wonder "why was it built this way?"?
|
|
@@ -53,6 +53,7 @@
|
|
|
53
53
|
- **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, external authority). A row with `none` that *could* have been answered by the codebase is a defect.
|
|
54
54
|
- Cross-verification mode:
|
|
55
55
|
- Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
|
|
56
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
56
57
|
- Non-goals:
|
|
57
58
|
- full implementation design unless it is required to decide the next phase
|
|
58
59
|
- **source code edits, plan authoring, builds, or deployments** — this run only classifies the work and routes it; deeper analysis and planning belong to subsequent phases
|
|
@@ -228,6 +228,19 @@
|
|
|
228
228
|
"_DEFAULT_SUFFIX": " (default)"
|
|
229
229
|
}
|
|
230
230
|
},
|
|
231
|
+
"critic_pick": {
|
|
232
|
+
"label": "추가 critic 패스를 돌릴까요? (놓친 finding/blocker 를 캐는 검증 패스 — opt-in)",
|
|
233
|
+
"echo_template": "critic: {value}",
|
|
234
|
+
"options": {
|
|
235
|
+
"off": "사용 안 함 (기본·추천)",
|
|
236
|
+
"claude": "claude critic (추천)",
|
|
237
|
+
"__free_input__": "직접 입력 (codex / gemini)"
|
|
238
|
+
}
|
|
239
|
+
},
|
|
240
|
+
"critic_text": {
|
|
241
|
+
"label": "critic provider 를 직접 입력하세요 (codex / gemini)",
|
|
242
|
+
"echo_template": "critic: {value}"
|
|
243
|
+
},
|
|
231
244
|
"defaults_or_custom": {
|
|
232
245
|
"label": "역할별로 어떤 모델을 쓸지 정하는 단계입니다 (참여 워커 구성을 바꾸는 게 아닙니다).\n· 기본값으로 진행 — lead·실행자/워커·report-writer 를 모두 추천 모델로 두고 바로 진행합니다.\n· 커스터마이즈 — 역할별 모델을 직접 고르고, 추가 directive·관련 task 도 지정합니다.",
|
|
233
246
|
"echo_template": "customize: {value}",
|
|
@@ -903,26 +903,47 @@ def _build_convergence_block(ctx: dict) -> dict:
|
|
|
903
903
|
- `enabled` default True
|
|
904
904
|
- `maxRounds` default 1 for `requirements-discovery`, 2 otherwise
|
|
905
905
|
- `verificationMode` default "lightweight"
|
|
906
|
-
- `adversarial` default True for `requirements-discovery` / `error-analysis`
|
|
907
|
-
(forces `verificationMode` to "full-reanalysis"),
|
|
906
|
+
- `adversarial` default True for `requirements-discovery` / `error-analysis` /
|
|
907
|
+
`implementation-planning` (forces `verificationMode` to "full-reanalysis"),
|
|
908
|
+
False otherwise
|
|
908
909
|
- `planBodyVerification` is implementation-planning specific; the key is
|
|
909
910
|
always emitted (dead-letter on other phases) so the schema stays stable.
|
|
910
911
|
|
|
911
912
|
ctx knobs honoured:
|
|
912
913
|
- `OKSTRA_PLAN_VERIFICATION`: "true" | "false" | "" (empty → default True).
|
|
913
914
|
Wired from CLI `--no-plan-verification` (sets "false").
|
|
915
|
+
- `CRITIC_CHOICE`: "" | "off" | "claude" | "codex" | "gemini" — critic
|
|
916
|
+
backing provider (enabled only for requirements-discovery / error-analysis /
|
|
917
|
+
implementation-planning / final-verification); model taken from that
|
|
918
|
+
provider's execution value.
|
|
914
919
|
"""
|
|
915
920
|
task_type = ctx.get("TASK_TYPE", "")
|
|
916
921
|
default_max_rounds = 1 if task_type == "requirements-discovery" else 2
|
|
917
|
-
adversarial_phases = {"requirements-discovery", "error-analysis"}
|
|
922
|
+
adversarial_phases = {"requirements-discovery", "error-analysis", "implementation-planning"}
|
|
918
923
|
is_adversarial = task_type in adversarial_phases
|
|
919
924
|
raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
|
|
920
925
|
plan_verify_enabled = raw_plan_verify != "false"
|
|
926
|
+
critic_choice = (ctx.get("CRITIC_CHOICE", "") or "").strip().lower()
|
|
927
|
+
# Independent of `adversarial_phases` above (they answer different questions and
|
|
928
|
+
# may diverge): the coverage critic is opt-in for the finding-producing phases.
|
|
929
|
+
critic_phases = {"requirements-discovery", "error-analysis", "implementation-planning", "final-verification"}
|
|
930
|
+
critic_exec_key = {
|
|
931
|
+
"claude": "CLAUDE_WORKER_MODEL_EXECUTION_VALUE",
|
|
932
|
+
"codex": "CODEX_WORKER_MODEL_EXECUTION_VALUE",
|
|
933
|
+
"gemini": "GEMINI_WORKER_MODEL_EXECUTION_VALUE",
|
|
934
|
+
}
|
|
935
|
+
critic_enabled = critic_choice in critic_exec_key and task_type in critic_phases
|
|
936
|
+
critic_block = {
|
|
937
|
+
"enabled": critic_enabled,
|
|
938
|
+
"provider": critic_choice if critic_enabled else None,
|
|
939
|
+
"modelExecutionValue": (ctx.get(critic_exec_key[critic_choice]) or None) if critic_enabled else None,
|
|
940
|
+
}
|
|
921
941
|
return {
|
|
922
942
|
"enabled": True,
|
|
923
943
|
"adversarial": is_adversarial,
|
|
924
944
|
"maxRounds": default_max_rounds,
|
|
925
945
|
"verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
|
|
946
|
+
"critic": critic_block,
|
|
926
947
|
"planBodyVerification": {
|
|
927
948
|
"enabled": plan_verify_enabled,
|
|
928
949
|
"maxRounds": 1,
|
|
@@ -407,21 +407,12 @@ class _GroupedSpec:
|
|
|
407
407
|
user_input_col: int = -1
|
|
408
408
|
|
|
409
409
|
|
|
410
|
-
_FOLLOWUP_WIDE_PREFIXES: tuple[str, ...] = ("title", "scope", "reason")
|
|
411
|
-
|
|
412
|
-
|
|
413
410
|
def _grouped_table_spec(
|
|
414
411
|
header_cells: list[str], section_path: list[str]
|
|
415
412
|
) -> Optional[_GroupedSpec]:
|
|
416
|
-
"""
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
table (which keeps the default per-cell ``td-narrow`` rendering).
|
|
420
|
-
|
|
421
|
-
Each table is identified by stable header tokens (the i18n token/cost
|
|
422
|
-
columns are never used as anchors). ``wide_cols`` lists the long-prose
|
|
423
|
-
columns that must keep a guaranteed min-width; everything else short
|
|
424
|
-
collapses into the leading metadata cell."""
|
|
413
|
+
"""Only §5 Clarification Items is grouped in the HTML view (it keeps the
|
|
414
|
+
interactive form and stays flat in the .md). All other narrative tables are
|
|
415
|
+
already rendered compactly by the template, so no grouping is applied here."""
|
|
425
416
|
norm = [h.strip() for h in header_cells]
|
|
426
417
|
|
|
427
418
|
def _spec(headline: int, wide: tuple[int, ...], **kw) -> _GroupedSpec:
|
|
@@ -429,12 +420,8 @@ def _grouped_table_spec(
|
|
|
429
420
|
group = tuple(c for c in range(len(norm)) if c != headline and c not in wide_set)
|
|
430
421
|
return _GroupedSpec(headline_col=headline, group_cols=group, wide_cols=wide, **kw)
|
|
431
422
|
|
|
432
|
-
#
|
|
433
|
-
|
|
434
|
-
return _spec(0, (len(norm) - 1,), kind="plain")
|
|
435
|
-
|
|
436
|
-
# §5 Clarification Items — keep the interactive form, but collapse the
|
|
437
|
-
# short ID/Kind/Status/… columns and widen Statement + User input.
|
|
423
|
+
# §5 Clarification Items — keep the interactive form, and widen the three
|
|
424
|
+
# long-prose columns (Expected form is prose too, not a code column).
|
|
438
425
|
if (
|
|
439
426
|
any("Clarification Items" in h for h in section_path)
|
|
440
427
|
and not _section_forbids_form(section_path)
|
|
@@ -444,9 +431,15 @@ def _grouped_table_spec(
|
|
|
444
431
|
):
|
|
445
432
|
statement_col = next(i for i, h in enumerate(norm) if h.startswith("Statement"))
|
|
446
433
|
user_input_col = norm.index("User input")
|
|
434
|
+
expected_col = next(
|
|
435
|
+
(i for i, h in enumerate(norm) if h.startswith("Expected form")), -1
|
|
436
|
+
)
|
|
437
|
+
wide_cols = tuple(
|
|
438
|
+
c for c in (expected_col, statement_col, user_input_col) if c >= 0
|
|
439
|
+
)
|
|
447
440
|
return _spec(
|
|
448
441
|
norm.index("ID"),
|
|
449
|
-
|
|
442
|
+
wide_cols,
|
|
450
443
|
kind="clarification",
|
|
451
444
|
id_col=norm.index("ID"),
|
|
452
445
|
kind_col=norm.index("Kind") if "Kind" in norm else -1,
|
|
@@ -455,16 +448,6 @@ def _grouped_table_spec(
|
|
|
455
448
|
user_input_col=user_input_col,
|
|
456
449
|
)
|
|
457
450
|
|
|
458
|
-
# §7 Follow-up Tasks — widen Title / Scope / Reason, collapse the rest.
|
|
459
|
-
if any("Follow-up Tasks" in h for h in section_path) and "ID" in norm:
|
|
460
|
-
wide = tuple(
|
|
461
|
-
i
|
|
462
|
-
for i, h in enumerate(norm)
|
|
463
|
-
if any(h.lower().startswith(p) for p in _FOLLOWUP_WIDE_PREFIXES)
|
|
464
|
-
)
|
|
465
|
-
if wide:
|
|
466
|
-
return _spec(norm.index("ID"), wide, kind="plain")
|
|
467
|
-
|
|
468
451
|
return None
|
|
469
452
|
|
|
470
453
|
|
|
@@ -768,6 +751,10 @@ def _inline(text: str) -> str:
|
|
|
768
751
|
out = _LINK_PATTERN.sub(
|
|
769
752
|
lambda m: f'<a href="{m.group(2)}">{m.group(1)}</a>', out
|
|
770
753
|
)
|
|
754
|
+
# Preserve explicit <br> line breaks used inside compact meta cells (the
|
|
755
|
+
# markdown source intentionally stacks short fields with <br>). html.escape
|
|
756
|
+
# above turned them into <br>; restore the tag.
|
|
757
|
+
out = out.replace("<br>", "<br>").replace("<br/>", "<br>").replace("<br />", "<br>")
|
|
771
758
|
return out
|
|
772
759
|
|
|
773
760
|
|
|
@@ -120,6 +120,7 @@ class PrepareInputs:
|
|
|
120
120
|
gemini_model: str = ""
|
|
121
121
|
report_writer_model: str = ""
|
|
122
122
|
executor: str = ""
|
|
123
|
+
critic: str = ""
|
|
123
124
|
related_tasks_raw: str = ""
|
|
124
125
|
work_category: str = ""
|
|
125
126
|
base_ref: str = ""
|
|
@@ -499,6 +500,7 @@ def _canonical_argv(inp: PrepareInputs, ctx: dict) -> list[str]:
|
|
|
499
500
|
("--gemini-model", inp.gemini_model or ctx.get("GEMINI_WORKER_MODEL", "")),
|
|
500
501
|
("--report-writer-model", inp.report_writer_model or ctx.get("REPORT_WRITER_MODEL", "")),
|
|
501
502
|
("--executor", inp.executor or ctx.get("EXECUTOR_PROVIDER", "")),
|
|
503
|
+
("--critic", inp.critic or ctx.get("CRITIC_CHOICE", "")),
|
|
502
504
|
("--related-tasks", inp.related_tasks_raw),
|
|
503
505
|
("--work-category", inp.work_category),
|
|
504
506
|
]
|
|
@@ -707,6 +709,13 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
|
|
|
707
709
|
default_display=report_writer_default, default_execution=report_writer_default,
|
|
708
710
|
)
|
|
709
711
|
|
|
712
|
+
# ---- coverage critic choice (validated; phase-gating happens in render) ----
|
|
713
|
+
critic_choice = (inp.critic or "").strip().lower()
|
|
714
|
+
if critic_choice not in ("", "off", "claude", "codex", "gemini"):
|
|
715
|
+
raise PrepareError(
|
|
716
|
+
f"--critic must be one of: off, claude, codex, gemini (got: {critic_choice!r})"
|
|
717
|
+
)
|
|
718
|
+
|
|
710
719
|
# ---- executor binding (implementation phase only; recorded universally for manifest consistency) ----
|
|
711
720
|
executor_default = _default("OKSTRA_DEFAULT_EXECUTOR", "claude")
|
|
712
721
|
executor_provider = (inp.executor or executor_default).strip().lower()
|
|
@@ -842,6 +851,7 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
|
|
|
842
851
|
"EXECUTOR_WORKER_AGENT": executor_worker_agent,
|
|
843
852
|
"EXECUTOR_MODEL_DISPLAY": executor_model_meta.display,
|
|
844
853
|
"EXECUTOR_MODEL_EXECUTION_VALUE": executor_model_meta.execution,
|
|
854
|
+
"CRITIC_CHOICE": critic_choice,
|
|
845
855
|
"RELATED_TASKS_JSON": related_tasks_json_str,
|
|
846
856
|
"RELATED_TASKS_BULLETS": bullets,
|
|
847
857
|
"RELATED_TASKS_INLINE": inline,
|
|
@@ -1098,6 +1108,7 @@ def main(argv: list[str]) -> int:
|
|
|
1098
1108
|
p.add_argument("--gemini-model", default="")
|
|
1099
1109
|
p.add_argument("--report-writer-model", default="")
|
|
1100
1110
|
p.add_argument("--executor", default="")
|
|
1111
|
+
p.add_argument("--critic", default="")
|
|
1101
1112
|
p.add_argument("--related-tasks", default="", dest="related_tasks_raw")
|
|
1102
1113
|
p.add_argument("--approved-plan", default="", dest="approved_plan_path")
|
|
1103
1114
|
p.add_argument(
|
|
@@ -1198,6 +1209,7 @@ def main(argv: list[str]) -> int:
|
|
|
1198
1209
|
gemini_model=args.gemini_model,
|
|
1199
1210
|
report_writer_model=args.report_writer_model,
|
|
1200
1211
|
executor=args.executor,
|
|
1212
|
+
critic=args.critic,
|
|
1201
1213
|
related_tasks_raw=args.related_tasks_raw,
|
|
1202
1214
|
work_category=args.work_category,
|
|
1203
1215
|
base_ref=args.base_ref,
|
|
@@ -181,6 +181,8 @@ S_APPROVED_PLAN_PICK = "approved_plan_pick"
|
|
|
181
181
|
S_APPROVED_PLAN = "approved_plan"
|
|
182
182
|
S_STAGE_PICK = "stage_pick"
|
|
183
183
|
S_EXECUTOR = "executor"
|
|
184
|
+
S_CRITIC_PICK = "critic_pick"
|
|
185
|
+
S_CRITIC_TEXT = "critic_text"
|
|
184
186
|
S_DEFAULTS_OR_CUSTOM = "defaults_or_custom"
|
|
185
187
|
S_WORKERS_OVERRIDE = "workers_override"
|
|
186
188
|
S_LEAD_MODEL = "lead_model"
|
|
@@ -246,6 +248,8 @@ class WizardState:
|
|
|
246
248
|
approved_plan_pending_text: bool = False
|
|
247
249
|
selected_stage: str = "auto"
|
|
248
250
|
executor: str = ""
|
|
251
|
+
critic: str = ""
|
|
252
|
+
critic_pending_text: bool = False
|
|
249
253
|
|
|
250
254
|
# customize
|
|
251
255
|
use_defaults: Optional[bool] = None
|
|
@@ -1459,6 +1463,55 @@ def _submit_pr_template_pick(state: WizardState, value: str) -> Optional[str]:
|
|
|
1459
1463
|
)
|
|
1460
1464
|
|
|
1461
1465
|
|
|
1466
|
+
CRITIC_CHOICES = ["off", "claude", "codex", "gemini"]
|
|
1467
|
+
|
|
1468
|
+
|
|
1469
|
+
def _build_critic_pick(state: WizardState) -> Prompt:
|
|
1470
|
+
t = _p(state.workspace_root, "critic_pick")
|
|
1471
|
+
options: list[Option] = []
|
|
1472
|
+
for k, v in t["options"].items():
|
|
1473
|
+
if not k.startswith("_"):
|
|
1474
|
+
options.append(_opt(k, v))
|
|
1475
|
+
custom_label = t["options"].get(PICK_TYPE_CUSTOM, PICK_TYPE_CUSTOM)
|
|
1476
|
+
options.append(_opt(PICK_TYPE_CUSTOM, custom_label))
|
|
1477
|
+
return Prompt(
|
|
1478
|
+
step=S_CRITIC_PICK, kind="pick",
|
|
1479
|
+
label=t["label"],
|
|
1480
|
+
options=options,
|
|
1481
|
+
echo_template=t["echo_template"],
|
|
1482
|
+
)
|
|
1483
|
+
|
|
1484
|
+
|
|
1485
|
+
def _submit_critic_pick(state: WizardState, value: str) -> Optional[str]:
|
|
1486
|
+
if value == PICK_TYPE_CUSTOM:
|
|
1487
|
+
state.critic_pending_text = True
|
|
1488
|
+
return None
|
|
1489
|
+
choice = (value or "").strip().lower()
|
|
1490
|
+
if choice not in CRITIC_CHOICES:
|
|
1491
|
+
raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
|
|
1492
|
+
state.critic = choice
|
|
1493
|
+
state.critic_pending_text = False
|
|
1494
|
+
return f"critic: {choice}"
|
|
1495
|
+
|
|
1496
|
+
|
|
1497
|
+
def _build_critic_text(state: WizardState) -> Prompt:
|
|
1498
|
+
t = _p(state.workspace_root, "critic_text")
|
|
1499
|
+
return Prompt(
|
|
1500
|
+
step=S_CRITIC_TEXT, kind="text",
|
|
1501
|
+
label=t["label"],
|
|
1502
|
+
echo_template=t["echo_template"],
|
|
1503
|
+
)
|
|
1504
|
+
|
|
1505
|
+
|
|
1506
|
+
def _submit_critic_text(state: WizardState, value: str) -> Optional[str]:
|
|
1507
|
+
choice = (value or "").strip().lower()
|
|
1508
|
+
if choice not in CRITIC_CHOICES:
|
|
1509
|
+
raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
|
|
1510
|
+
state.critic = choice
|
|
1511
|
+
state.critic_pending_text = False
|
|
1512
|
+
return f"critic: {choice}"
|
|
1513
|
+
|
|
1514
|
+
|
|
1462
1515
|
def _build_executor(state: WizardState) -> Prompt:
|
|
1463
1516
|
t = _p(state.workspace_root, "executor")
|
|
1464
1517
|
default_suffix = t["options"].get("_DEFAULT_SUFFIX", "")
|
|
@@ -1922,6 +1975,17 @@ STEPS: list[Step] = [
|
|
|
1922
1975
|
and not s.executor),
|
|
1923
1976
|
build=_build_executor, submit=_submit_executor,
|
|
1924
1977
|
owns=("executor",)),
|
|
1978
|
+
Step(S_CRITIC_PICK,
|
|
1979
|
+
applies=lambda s: (s.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification")
|
|
1980
|
+
and not s.critic
|
|
1981
|
+
and not s.critic_pending_text
|
|
1982
|
+
and S_CRITIC_PICK not in s.answered),
|
|
1983
|
+
build=_build_critic_pick, submit=_submit_critic_pick,
|
|
1984
|
+
owns=("critic", "critic_pending_text")),
|
|
1985
|
+
Step(S_CRITIC_TEXT,
|
|
1986
|
+
applies=lambda s: (s.critic_pending_text and S_CRITIC_TEXT not in s.answered),
|
|
1987
|
+
build=_build_critic_text, submit=_submit_critic_text,
|
|
1988
|
+
owns=("critic", "critic_pending_text")),
|
|
1925
1989
|
Step(S_DEFAULTS_OR_CUSTOM,
|
|
1926
1990
|
applies=lambda s: (_identity_ready(s)
|
|
1927
1991
|
and s.use_defaults is None),
|
|
@@ -2118,7 +2182,8 @@ _FIELD_DEFAULTS: dict[str, Any] = {
|
|
|
2118
2182
|
"base_ref_pending_text": False, "approved_plan_path": "",
|
|
2119
2183
|
"approved_plan_pending_text": False,
|
|
2120
2184
|
"selected_stage": "auto",
|
|
2121
|
-
"executor": "", "
|
|
2185
|
+
"executor": "", "critic": "", "critic_pending_text": False,
|
|
2186
|
+
"use_defaults": None, "workers_override": "",
|
|
2122
2187
|
"lead_model": "", "claude_model": "", "codex_model": "",
|
|
2123
2188
|
"gemini_model": "", "report_writer_model": "", "directive": "",
|
|
2124
2189
|
"directive_pending_text": False,
|
|
@@ -2200,6 +2265,7 @@ def render_args(state: WizardState) -> dict[str, str]:
|
|
|
2200
2265
|
"task-type": state.task_type,
|
|
2201
2266
|
"task-brief": state.brief_path,
|
|
2202
2267
|
"executor": state.executor,
|
|
2268
|
+
"critic": state.critic,
|
|
2203
2269
|
"approved-plan": state.approved_plan_path,
|
|
2204
2270
|
"stage": (state.selected_stage or "auto") if state.task_type == "implementation" else "",
|
|
2205
2271
|
"base-ref": base_ref,
|
|
@@ -2244,6 +2310,8 @@ def confirmation_block(state: WizardState) -> str:
|
|
|
2244
2310
|
if state.report_writer_model:
|
|
2245
2311
|
lines.append(f" report-writer : {state.report_writer_model}")
|
|
2246
2312
|
lines.append(f" directive : {state.directive or '(none)'}")
|
|
2313
|
+
if state.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification"):
|
|
2314
|
+
lines.append(f" critic : {state.critic or '(off)'}")
|
|
2247
2315
|
if state.task_type == "implementation":
|
|
2248
2316
|
lines.append(f" approved-plan : {state.approved_plan_path}")
|
|
2249
2317
|
if state.clarification_response_path:
|
|
@@ -2288,6 +2356,7 @@ def _cli(argv: list[str]) -> int:
|
|
|
2288
2356
|
p_init.add_argument("--workspace-root", required=True)
|
|
2289
2357
|
p_init.add_argument("--project-root", required=True)
|
|
2290
2358
|
p_init.add_argument("--project-id", required=True)
|
|
2359
|
+
p_init.add_argument("--critic", default="")
|
|
2291
2360
|
|
|
2292
2361
|
p_step = sub.add_parser("step")
|
|
2293
2362
|
p_step.add_argument("--state-file", required=True)
|
|
@@ -2313,6 +2382,8 @@ def _cli(argv: list[str]) -> int:
|
|
|
2313
2382
|
project_root=args.project_root,
|
|
2314
2383
|
project_id=args.project_id,
|
|
2315
2384
|
)
|
|
2385
|
+
if args.critic:
|
|
2386
|
+
state.critic = args.critic
|
|
2316
2387
|
save_state_file(state_path, state)
|
|
2317
2388
|
first = next_prompt(state)
|
|
2318
2389
|
print(json.dumps({"ok": True, "next": first.to_json()},
|
|
@@ -17,8 +17,11 @@ user-invocable: false
|
|
|
17
17
|
- [Round 1-N: Re-verification Loop (queue-pruned)](#round-1-n-re-verification-loop-queue-pruned)
|
|
18
18
|
- [Convergence Test](#convergence-test)
|
|
19
19
|
- [Verification Mode](#verification-mode)
|
|
20
|
+
- [Adversarial Verification Mode](#adversarial-verification-mode)
|
|
20
21
|
- [Re-verification Agent Dispatch](#re-verification-agent-dispatch)
|
|
21
22
|
- [Convergence State Artifact](#convergence-state-artifact)
|
|
23
|
+
- [Coverage critic pass](#coverage-critic-pass)
|
|
24
|
+
- [Acceptance critic pass (final-verification)](#acceptance-critic-pass-final-verification)
|
|
22
25
|
- [Output](#output)
|
|
23
26
|
- [Convergence Disabled](#convergence-disabled)
|
|
24
27
|
- [Plan-body verification mode (implementation-planning only)](#plan-body-verification-mode-implementation-planning-only)
|
|
@@ -46,7 +49,7 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
|
|
|
46
49
|
| `enabled` | `true` | If `false`, skip the convergence loop and use the existing consensus/divergence method |
|
|
47
50
|
| `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
|
|
48
51
|
| `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
|
|
49
|
-
| `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
|
|
52
|
+
| `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis` / `implementation-planning`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
|
|
50
53
|
|
|
51
54
|
**Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
|
|
52
55
|
|
|
@@ -195,13 +198,13 @@ Disadvantages: 2–3 times the cost, increased time
|
|
|
195
198
|
|
|
196
199
|
## Adversarial Verification Mode
|
|
197
200
|
|
|
198
|
-
Active only when `config.adversarial == true` (default for `requirements-discovery` and `
|
|
201
|
+
Active only when `config.adversarial == true` (default for `requirements-discovery`, `error-analysis`, and `implementation-planning`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
|
|
199
202
|
|
|
200
203
|
In adversarial mode the verifier's job inverts: instead of confirming a peer's finding, the verifier **tries to break it**, and the burden of proof sits on the claim — a finding survives only if refutation attempts fail.
|
|
201
204
|
|
|
202
205
|
### Scoped full-reanalysis (BLOCKING)
|
|
203
206
|
|
|
204
|
-
Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery and
|
|
207
|
+
Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery, error-analysis, and implementation-planning" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
|
|
205
208
|
|
|
206
209
|
### Adversarial verdict semantics
|
|
207
210
|
|
|
@@ -299,7 +302,7 @@ Reverify prompts MUST NOT inject the Phase 2 `[Required reading]` clause:
|
|
|
299
302
|
- **Lightweight mode**: the clause directly contradicts the "Do NOT re-analyze the original source materials" instruction below. Including it forces workers to re-read the entire instruction-set per round per worker (3 workers × 2 rounds × 5+ files in the worst case) for no quality gain.
|
|
300
303
|
- **Full-reanalysis mode**: workers DO need to re-read source materials, but only the analysis-worker file list (no `final-report-template.md`). If lead chooses to inject a reading clause here, it MUST mirror the audience-scoped enumeration in [okstra/SKILL.md](../../SKILL.md) Phase 2 (no template).
|
|
301
304
|
|
|
302
|
-
This is the single largest avoidable cost in `requirements-discovery` and `
|
|
305
|
+
This is the single largest avoidable cost in `requirements-discovery`, `error-analysis`, and `implementation-planning` runs. Treat as BLOCKING.
|
|
303
306
|
|
|
304
307
|
### Lightweight Re-verification Prompt
|
|
305
308
|
|
|
@@ -493,7 +496,7 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
|
|
|
493
496
|
Schema rules:
|
|
494
497
|
|
|
495
498
|
- `schemaVersion`: literal string `"1.2"` for all new runs — both adversarial and collaborative. v1.2 adds `config.adversarial` and `votes.<worker>.disagreeBasis`, written as `false` / `null` respectively on collaborative runs. Readers MUST accept `"1.0"` / `"1.1"` / `"1.2"` for historical artifacts and treat any missing field as `null`.
|
|
496
|
-
- `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
|
|
499
|
+
- `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis` / `implementation-planning`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
|
|
497
500
|
- `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
|
|
498
501
|
- `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
|
|
499
502
|
- `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
|
|
@@ -509,6 +512,66 @@ Schema rules:
|
|
|
509
512
|
- `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (per the "Worker failure handling in reverify" section, rule 4). `aborted-non-result` is the new v1.1 value.
|
|
510
513
|
- `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
|
|
511
514
|
|
|
515
|
+
## Coverage critic pass
|
|
516
|
+
|
|
517
|
+
Runs only when `convergence.critic.enabled == true` (set by `--critic <provider>` or the okstra-run `critic_pick` step; default off). Applies to the three finding-producing phases (`requirements-discovery`, `error-analysis`, `implementation-planning`); for `final-verification` the critic runs in a different mode — see §"Acceptance critic pass (final-verification)". This pass targets **coverage** (missed findings), distinct from convergence which targets **agreement quality**.
|
|
518
|
+
|
|
519
|
+
### When
|
|
520
|
+
After Phase 5.5 finding convergence completes (findings classified) and BEFORE the Phase 6 report-writer dispatch.
|
|
521
|
+
|
|
522
|
+
### Dispatch (reused worker)
|
|
523
|
+
Dispatch ONE pass to the `config.critic.provider`'s existing subagent (`claude-worker` / `codex-worker` / `gemini-worker`) with `model = config.critic.modelExecutionValue` — no new agent type. If `config.critic.modelExecutionValue` is null/empty (model could not be resolved), skip the critic pass and record `critic-skipped: model-unresolved` in the convergence state rather than dispatching with no model. Result path: `runs/<task-type>/worker-results/<provider>-critic-<task-type>-<seq>.md`. The critic prompt seeds the consolidated findings and asks ONLY for coverage gaps:
|
|
524
|
+
|
|
525
|
+
```
|
|
526
|
+
You are the coverage critic for <task-key>. Below are the findings the workers
|
|
527
|
+
already agreed on. Your ONLY job is to name what is MISSING:
|
|
528
|
+
- files / directories / execution paths nobody inspected,
|
|
529
|
+
- requirements or acceptance points with zero findings,
|
|
530
|
+
- claims raised but never verified.
|
|
531
|
+
For each gap, emit a NEW finding with evidence (file:line or the requirement quote).
|
|
532
|
+
Do NOT restate an existing finding. If nothing is missing, say so explicitly.
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
### Gap verification (1 adversarial reverify round)
|
|
536
|
+
Each critic gap enters the verification queue as a finding with `originWorker = "<provider>-critic"` and `source = "critic"`. The lead runs ONE adversarial reverify round (§"Adversarial Verification Mode" classifier) with the Phase 4 analysers (excluding the critic itself) as voters. Only gaps classified `full-consensus` / `partial-consensus` merge into the final report findings; `contested` / `worker-unique` gaps are treated as hallucinations and dropped (recorded in the convergence state, not promoted). If no non-critic analyser is available to vote, the gaps are surfaced as unverified `clarification` items rather than merged, and that fact is recorded.
|
|
537
|
+
|
|
538
|
+
### State
|
|
539
|
+
- `convergence.critic` manifest block: `{ enabled, provider, modelExecutionValue }`.
|
|
540
|
+
- Convergence state artifact: critic gaps appear in `findings[]` with `source: "critic"`. Add a `config.critic` summary `{ provider, modelExecutionValue, gapsProposed, gapsMerged }`. `source` and `config.critic` are optional v1.2 fields (readers treat absence as null); no enum changes.
|
|
541
|
+
|
|
542
|
+
## Acceptance critic pass (final-verification)
|
|
543
|
+
|
|
544
|
+
The `final-verification` phase reuses the SAME reused-worker dispatch as §"Coverage critic pass" (provider + `config.critic.modelExecutionValue` from the `convergence.critic` block; default off; same model-unresolved skip rule). Only the prompt, the verification semantics, and the output sink differ — final-verification's findings are defects/blockers, so the critic acts as an **acceptance devil's advocate** (find reasons NOT to accept), and its candidate blockers are NEVER dropped (that would suppress real defects).
|
|
545
|
+
|
|
546
|
+
### Prompt
|
|
547
|
+
|
|
548
|
+
```
|
|
549
|
+
You are the acceptance devil's advocate for <task-key>. The delivered work is about
|
|
550
|
+
to be judged for acceptance. Your ONLY job is to find reasons it should NOT be
|
|
551
|
+
accepted — surface candidate acceptance BLOCKERS the verifiers may have missed:
|
|
552
|
+
- requirements / acceptance points with no covering evidence,
|
|
553
|
+
- DB / IO / SQL changes lacking real-execution evidence,
|
|
554
|
+
- regressions or broken error paths,
|
|
555
|
+
- scope / contract violations.
|
|
556
|
+
For each, emit a candidate blocker with a one-line statement, evidence (file:line /
|
|
557
|
+
log / test output), and a severity (critical / major / minor). Do NOT restate an
|
|
558
|
+
existing Acceptance Blocker. If you find none, say so explicitly.
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
### Verification — confirm-or-downgrade (BLOCKING)
|
|
562
|
+
|
|
563
|
+
Each candidate blocker is verified by the Phase 4 analysers (excluding the critic). Do NOT use the adversarial finding classifier's "uncertain → reject" rule here.
|
|
564
|
+
- **Confirmed** (an analyser reproduces it or cites supporting evidence) → promote to a `## 4 Acceptance Blockers` row (keep severity + recommended follow-up phase).
|
|
565
|
+
- **Not confirmed** (cannot reproduce, or evidence is weak) → **downgrade to a Residual Risk row — never drop it.** Record the escalation trigger so the user can re-judge a high-severity-but-unconfirmed candidate.
|
|
566
|
+
|
|
567
|
+
### Verdict impact
|
|
568
|
+
|
|
569
|
+
Promoted blockers enter `## 4 Acceptance Blockers`; since `accepted` requires zero blockers, the verdict moves to `conditional-accept` / `blocked` automatically. The existing verdict↔blocker consistency validator (`validators/validate-run.py` `_validate_final_verification_consistency`) enforces this unchanged — no new enum or validator.
|
|
570
|
+
|
|
571
|
+
### State
|
|
572
|
+
|
|
573
|
+
Critic output lives at `runs/final-verification/worker-results/<provider>-critic-final-verification-<seq>.md`. The convergence state `config.critic` summary (see §"Coverage critic pass") records `mode: "acceptance-devils-advocate"`, `candidatesProposed`, `confirmedBlockers`, `downgradedToResidual` (optional v1.2 fields; readers treat absence as null).
|
|
574
|
+
|
|
512
575
|
## Output
|
|
513
576
|
|
|
514
577
|
Information to be passed to Phase 6 after executing this skill:
|
|
@@ -600,6 +663,16 @@ Worker non-result handling (`timeout`, `error`, no result file, wrapper `cli-fai
|
|
|
600
663
|
|
|
601
664
|
Plan-body verification only supports **lightweight mode** (defined in §"Verification Mode" above). `full-reanalysis` is not meaningful here because the "original source materials" for a plan item are the worker's own analysis plus the lead-mediated synthesis — there is no independent ground truth to re-read. The manifest's top-level `verificationMode` is ignored for this round; lightweight is always used.
|
|
602
665
|
|
|
666
|
+
### Adversarial plan-body posture
|
|
667
|
+
|
|
668
|
+
When `config.adversarial == true` (the default for `implementation-planning`; see the top-level §"Configuration" table), the plan-body round runs with an **adversarial posture**. The classification rules and gate arithmetic in §"Round protocol" are UNCHANGED — `majority-disagree` (a *majority* of analysers DISAGREE) remains the only classification that blocks the Approval marker, and `dissent-isolated` still passes the gate. Adversarial mode changes only *how each verifier evaluates an item*:
|
|
669
|
+
|
|
670
|
+
- The burden of proof sits on the plan: an item earns `AGREE` only if the verifier actively tried to break it and could not.
|
|
671
|
+
- The verifier MUST open the file paths / symbols / commands the item cites and confirm they exist and are executable as written. This is the one allowed widening of the lightweight "judge from internal consistency and stated commands / paths" rule — confirming the existence of cited paths is not "re-analyzing the original requirements".
|
|
672
|
+
- If a cited path / command / validation signal cannot be confirmed, the verifier responds `DISAGREE(<kind>)` with the applicable breakage kind (a–e); uncertainty resolves toward DISAGREE, not AGREE.
|
|
673
|
+
|
|
674
|
+
Plan-body verification stays **lightweight** even under this posture — the `verificationMode = "full-reanalysis"` forcing in §"Adversarial Verification Mode" applies to finding convergence only (see §"Mode constraint"); the adversarial posture here only changes verifier behaviour, not the mode. This raises verification *quality* (active refutation, plan-side burden) without changing the gate *threshold* — a single dissent still does not block approval; a majority is required (deliberate design decision).
|
|
675
|
+
|
|
603
676
|
### Round protocol (single round at default `maxRounds=1`)
|
|
604
677
|
|
|
605
678
|
1. Lead parses the report-writer draft and extracts the `P-*` plan items.
|
|
@@ -719,6 +792,8 @@ or worker analyses for this round.
|
|
|
719
792
|
...
|
|
720
793
|
```
|
|
721
794
|
|
|
795
|
+
When `config.adversarial == true`, the lead prepends the adversarial framing from §"Adversarial plan-body posture" to the `## Instructions` block: the burden of proof is on the plan, the verifier opens and confirms every cited path / command, and an item whose cited references cannot be confirmed is answered `DISAGREE(<kind>)` rather than `AGREE`. The verdict tokens, breakage kinds (a–e), classification, and the majority gate threshold are unchanged. This prepended framing supersedes the template's "Judge solely from plan internal consistency" instruction for the adversarial round.
|
|
796
|
+
|
|
722
797
|
The "Reverify prompt: required-reading suppression (BLOCKING)" rule (lightweight mode does NOT inject a `[Required reading]` clause) applies here as well.
|
|
723
798
|
|
|
724
799
|
### Worker non-result handling in plan-body round (BLOCKING)
|
|
@@ -160,6 +160,7 @@ okstra render-bundle \
|
|
|
160
160
|
--task-type "<args.task-type>" \
|
|
161
161
|
--task-brief "<args.task-brief>" \
|
|
162
162
|
--executor "<args.executor>" \
|
|
163
|
+
--critic "<args.critic>" \
|
|
163
164
|
--approved-plan "<args.approved-plan>" \
|
|
164
165
|
--stage "<args.stage>" \
|
|
165
166
|
--base-ref "<args.base-ref>" \
|