okstra 0.47.0 → 0.48.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -44,6 +44,8 @@
44
44
  3. **Coverage check** — every requirement in the originating plan/task brief is either marked covered (with artifact) or listed as a blocker. No silent omissions.
45
45
  4. **Verifier dissent preserved** — if workers reach different verdicts, the disagreement is visible in section 1.2; synthesis hides nothing.
46
46
  5. **No source-mutation audit** — scan the run's session transcripts for Edit / Write or state-mutating Bash commands that touch paths OUTSIDE `<PROJECT_ROOT>/.okstra/**` and outside the assigned run-artifact paths. Writes to worker prompts, audit sidecars, team-state, the final-report `data.json`, and rendered reports under the run directory are allowed okstra artifacts. Any source/schema/deployment mutation means the run has crossed into implementation and MUST be re-routed; do NOT silently strip the evidence.
47
+ - Cross-verification mode:
48
+ - **Acceptance critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker **acceptance devil's-advocate** pass runs after convergence to surface candidate acceptance blockers the verifiers may have missed. Each candidate is verified **confirm-or-downgrade**: confirmed → an `Acceptance Blockers` row (which, since `accepted` requires zero blockers, moves the verdict to `conditional-accept` / `blocked`); unconfirmed → a `Residual Risk` row (never dropped). See `skills/okstra-convergence/SKILL.md` "Acceptance critic pass (final-verification)".
47
49
  - Non-goals:
48
50
  - proposing unrelated refactors beyond the delivered scope
49
51
  - **source code edits, follow-up bug fixes, or scope expansion** — this run renders a verdict only; defects detected here become inputs to a new `error-analysis` or `implementation-planning` run
@@ -37,6 +37,10 @@
37
37
  - recommended execution order
38
38
  - Approval gate (phase-specific addendum to shared authority rule):
39
39
  - The YAML frontmatter `approved: true|false` field is the only authorised approval gate. report-writer always emits `approved: false`. The user clears it either by (a) editing the frontmatter line to `approved: true` directly, or (b) invoking the next phase with `--approve` so the CLI flips the frontmatter on the user's behalf. `okstra_ctl.run._validate_approved_plan` reads this field and refuses entry until it is `true`.
40
+ - Cross-verification mode:
41
+ - Phase 5.5 finding convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker finding (requirement gap / risk / option) by re-inspecting its cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode".
42
+ - §4.5.9 plan-body verification runs with an **adversarial posture** (`skills/okstra-convergence/SKILL.md` §"Adversarial plan-body posture"): verifiers open and confirm every cited path / command and put the burden of proof on the plan. The gate threshold is unchanged — a *majority* `DISAGREE` (`majority-disagree`) is still required to block approval; a single dissent does not.
43
+ - **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
40
44
  - Non-goals:
41
45
  - code-level micro-optimization unless it changes the implementation approach
42
46
  - **source code edits of any kind** — this run produces a plan document only; Edit/Write on project source files is forbidden until the plan is approved and a separate `implementation` run starts
@@ -74,7 +78,7 @@
74
78
  - the YAML frontmatter MUST include the line `approved: false` (report-writer always emits the unflipped value). The user authorises the next `implementation` run by flipping it to `approved: true` (manual edit or `--approve` CLI). Do NOT recreate any `User Approval Request` body block — the validator fails reports that contain one (see `validators/validate-run.py` deprecated patterns).
75
79
  - **the frontmatter `approved: false` line is rendered unconditionally; if the plan-body verification gate (§4.5.9) returns `blocked-by-disagreement` or `aborted-non-result`, the writer MUST keep `approved: false` and the validator refuses any report that ships with `approved: true` under such a gate result.**
76
80
  - every ambiguity flagged during pre-planning that the user must resolve before approval registered as a `Blocks=approval` row in the `## 5. Clarification Items` table (do NOT create a separate `Open Questions` block under `4.5.x` — the unified table is the single home)
77
- - **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation.
81
+ - **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation. When `convergence.adversarial=true` (the default for this phase), this round uses the adversarial posture — verifiers confirm cited paths/commands and the burden of proof is on the plan — but the gate threshold stays `majority-disagree` (see that skill's §"Adversarial plan-body posture").
78
82
  - **Decision-record evaluation (sole owner)**: this phase is the **single owner** of decision-record evaluation in the okstra lifecycle. The brief never evaluates or drafts decision records — it only forwards `adr-candidate:*` signals. Every `adr-candidate:*` entry inherited from the brief's `Open Questions` is a mandatory evaluation target. In addition, evaluate every decision the recommended option introduces against the three criteria:
79
83
  1. **Hard to reverse** — would changing the decision later cost meaningfully more than deciding now?
80
84
  2. **Surprising without context** — would a future reader, seeing only the code, wonder "why was it built this way?"?
@@ -53,6 +53,7 @@
53
53
  - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, external authority). A row with `none` that *could* have been answered by the codebase is a defect.
54
54
  - Cross-verification mode:
55
55
  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
56
+ - **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
56
57
  - Non-goals:
57
58
  - full implementation design unless it is required to decide the next phase
58
59
  - **source code edits, plan authoring, builds, or deployments** — this run only classifies the work and routes it; deeper analysis and planning belong to subsequent phases
@@ -228,6 +228,19 @@
228
228
  "_DEFAULT_SUFFIX": " (default)"
229
229
  }
230
230
  },
231
+ "critic_pick": {
232
+ "label": "추가 critic 패스를 돌릴까요? (놓친 finding/blocker 를 캐는 검증 패스 — opt-in)",
233
+ "echo_template": "critic: {value}",
234
+ "options": {
235
+ "off": "사용 안 함 (기본·추천)",
236
+ "claude": "claude critic (추천)",
237
+ "__free_input__": "직접 입력 (codex / gemini)"
238
+ }
239
+ },
240
+ "critic_text": {
241
+ "label": "critic provider 를 직접 입력하세요 (codex / gemini)",
242
+ "echo_template": "critic: {value}"
243
+ },
231
244
  "defaults_or_custom": {
232
245
  "label": "역할별로 어떤 모델을 쓸지 정하는 단계입니다 (참여 워커 구성을 바꾸는 게 아닙니다).\n· 기본값으로 진행 — lead·실행자/워커·report-writer 를 모두 추천 모델로 두고 바로 진행합니다.\n· 커스터마이즈 — 역할별 모델을 직접 고르고, 추가 directive·관련 task 도 지정합니다.",
233
246
  "echo_template": "customize: {value}",
@@ -903,26 +903,47 @@ def _build_convergence_block(ctx: dict) -> dict:
903
903
  - `enabled` default True
904
904
  - `maxRounds` default 1 for `requirements-discovery`, 2 otherwise
905
905
  - `verificationMode` default "lightweight"
906
- - `adversarial` default True for `requirements-discovery` / `error-analysis`
907
- (forces `verificationMode` to "full-reanalysis"), False otherwise
906
+ - `adversarial` default True for `requirements-discovery` / `error-analysis` /
907
+ `implementation-planning` (forces `verificationMode` to "full-reanalysis"),
908
+ False otherwise
908
909
  - `planBodyVerification` is implementation-planning specific; the key is
909
910
  always emitted (dead-letter on other phases) so the schema stays stable.
910
911
 
911
912
  ctx knobs honoured:
912
913
  - `OKSTRA_PLAN_VERIFICATION`: "true" | "false" | "" (empty → default True).
913
914
  Wired from CLI `--no-plan-verification` (sets "false").
915
+ - `CRITIC_CHOICE`: "" | "off" | "claude" | "codex" | "gemini" — critic
916
+ backing provider (enabled only for requirements-discovery / error-analysis /
917
+ implementation-planning / final-verification); model taken from that
918
+ provider's execution value.
914
919
  """
915
920
  task_type = ctx.get("TASK_TYPE", "")
916
921
  default_max_rounds = 1 if task_type == "requirements-discovery" else 2
917
- adversarial_phases = {"requirements-discovery", "error-analysis"}
922
+ adversarial_phases = {"requirements-discovery", "error-analysis", "implementation-planning"}
918
923
  is_adversarial = task_type in adversarial_phases
919
924
  raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
920
925
  plan_verify_enabled = raw_plan_verify != "false"
926
+ critic_choice = (ctx.get("CRITIC_CHOICE", "") or "").strip().lower()
927
+ # Independent of `adversarial_phases` above (they answer different questions and
928
+ # may diverge): the coverage critic is opt-in for the finding-producing phases.
929
+ critic_phases = {"requirements-discovery", "error-analysis", "implementation-planning", "final-verification"}
930
+ critic_exec_key = {
931
+ "claude": "CLAUDE_WORKER_MODEL_EXECUTION_VALUE",
932
+ "codex": "CODEX_WORKER_MODEL_EXECUTION_VALUE",
933
+ "gemini": "GEMINI_WORKER_MODEL_EXECUTION_VALUE",
934
+ }
935
+ critic_enabled = critic_choice in critic_exec_key and task_type in critic_phases
936
+ critic_block = {
937
+ "enabled": critic_enabled,
938
+ "provider": critic_choice if critic_enabled else None,
939
+ "modelExecutionValue": (ctx.get(critic_exec_key[critic_choice]) or None) if critic_enabled else None,
940
+ }
921
941
  return {
922
942
  "enabled": True,
923
943
  "adversarial": is_adversarial,
924
944
  "maxRounds": default_max_rounds,
925
945
  "verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
946
+ "critic": critic_block,
926
947
  "planBodyVerification": {
927
948
  "enabled": plan_verify_enabled,
928
949
  "maxRounds": 1,
@@ -120,6 +120,7 @@ class PrepareInputs:
120
120
  gemini_model: str = ""
121
121
  report_writer_model: str = ""
122
122
  executor: str = ""
123
+ critic: str = ""
123
124
  related_tasks_raw: str = ""
124
125
  work_category: str = ""
125
126
  base_ref: str = ""
@@ -499,6 +500,7 @@ def _canonical_argv(inp: PrepareInputs, ctx: dict) -> list[str]:
499
500
  ("--gemini-model", inp.gemini_model or ctx.get("GEMINI_WORKER_MODEL", "")),
500
501
  ("--report-writer-model", inp.report_writer_model or ctx.get("REPORT_WRITER_MODEL", "")),
501
502
  ("--executor", inp.executor or ctx.get("EXECUTOR_PROVIDER", "")),
503
+ ("--critic", inp.critic or ctx.get("CRITIC_CHOICE", "")),
502
504
  ("--related-tasks", inp.related_tasks_raw),
503
505
  ("--work-category", inp.work_category),
504
506
  ]
@@ -707,6 +709,13 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
707
709
  default_display=report_writer_default, default_execution=report_writer_default,
708
710
  )
709
711
 
712
+ # ---- coverage critic choice (validated; phase-gating happens in render) ----
713
+ critic_choice = (inp.critic or "").strip().lower()
714
+ if critic_choice not in ("", "off", "claude", "codex", "gemini"):
715
+ raise PrepareError(
716
+ f"--critic must be one of: off, claude, codex, gemini (got: {critic_choice!r})"
717
+ )
718
+
710
719
  # ---- executor binding (implementation phase only; recorded universally for manifest consistency) ----
711
720
  executor_default = _default("OKSTRA_DEFAULT_EXECUTOR", "claude")
712
721
  executor_provider = (inp.executor or executor_default).strip().lower()
@@ -842,6 +851,7 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
842
851
  "EXECUTOR_WORKER_AGENT": executor_worker_agent,
843
852
  "EXECUTOR_MODEL_DISPLAY": executor_model_meta.display,
844
853
  "EXECUTOR_MODEL_EXECUTION_VALUE": executor_model_meta.execution,
854
+ "CRITIC_CHOICE": critic_choice,
845
855
  "RELATED_TASKS_JSON": related_tasks_json_str,
846
856
  "RELATED_TASKS_BULLETS": bullets,
847
857
  "RELATED_TASKS_INLINE": inline,
@@ -1098,6 +1108,7 @@ def main(argv: list[str]) -> int:
1098
1108
  p.add_argument("--gemini-model", default="")
1099
1109
  p.add_argument("--report-writer-model", default="")
1100
1110
  p.add_argument("--executor", default="")
1111
+ p.add_argument("--critic", default="")
1101
1112
  p.add_argument("--related-tasks", default="", dest="related_tasks_raw")
1102
1113
  p.add_argument("--approved-plan", default="", dest="approved_plan_path")
1103
1114
  p.add_argument(
@@ -1198,6 +1209,7 @@ def main(argv: list[str]) -> int:
1198
1209
  gemini_model=args.gemini_model,
1199
1210
  report_writer_model=args.report_writer_model,
1200
1211
  executor=args.executor,
1212
+ critic=args.critic,
1201
1213
  related_tasks_raw=args.related_tasks_raw,
1202
1214
  work_category=args.work_category,
1203
1215
  base_ref=args.base_ref,
@@ -181,6 +181,8 @@ S_APPROVED_PLAN_PICK = "approved_plan_pick"
181
181
  S_APPROVED_PLAN = "approved_plan"
182
182
  S_STAGE_PICK = "stage_pick"
183
183
  S_EXECUTOR = "executor"
184
+ S_CRITIC_PICK = "critic_pick"
185
+ S_CRITIC_TEXT = "critic_text"
184
186
  S_DEFAULTS_OR_CUSTOM = "defaults_or_custom"
185
187
  S_WORKERS_OVERRIDE = "workers_override"
186
188
  S_LEAD_MODEL = "lead_model"
@@ -246,6 +248,8 @@ class WizardState:
246
248
  approved_plan_pending_text: bool = False
247
249
  selected_stage: str = "auto"
248
250
  executor: str = ""
251
+ critic: str = ""
252
+ critic_pending_text: bool = False
249
253
 
250
254
  # customize
251
255
  use_defaults: Optional[bool] = None
@@ -1459,6 +1463,55 @@ def _submit_pr_template_pick(state: WizardState, value: str) -> Optional[str]:
1459
1463
  )
1460
1464
 
1461
1465
 
1466
+ CRITIC_CHOICES = ["off", "claude", "codex", "gemini"]
1467
+
1468
+
1469
+ def _build_critic_pick(state: WizardState) -> Prompt:
1470
+ t = _p(state.workspace_root, "critic_pick")
1471
+ options: list[Option] = []
1472
+ for k, v in t["options"].items():
1473
+ if not k.startswith("_"):
1474
+ options.append(_opt(k, v))
1475
+ custom_label = t["options"].get(PICK_TYPE_CUSTOM, PICK_TYPE_CUSTOM)
1476
+ options.append(_opt(PICK_TYPE_CUSTOM, custom_label))
1477
+ return Prompt(
1478
+ step=S_CRITIC_PICK, kind="pick",
1479
+ label=t["label"],
1480
+ options=options,
1481
+ echo_template=t["echo_template"],
1482
+ )
1483
+
1484
+
1485
+ def _submit_critic_pick(state: WizardState, value: str) -> Optional[str]:
1486
+ if value == PICK_TYPE_CUSTOM:
1487
+ state.critic_pending_text = True
1488
+ return None
1489
+ choice = (value or "").strip().lower()
1490
+ if choice not in CRITIC_CHOICES:
1491
+ raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
1492
+ state.critic = choice
1493
+ state.critic_pending_text = False
1494
+ return f"critic: {choice}"
1495
+
1496
+
1497
+ def _build_critic_text(state: WizardState) -> Prompt:
1498
+ t = _p(state.workspace_root, "critic_text")
1499
+ return Prompt(
1500
+ step=S_CRITIC_TEXT, kind="text",
1501
+ label=t["label"],
1502
+ echo_template=t["echo_template"],
1503
+ )
1504
+
1505
+
1506
+ def _submit_critic_text(state: WizardState, value: str) -> Optional[str]:
1507
+ choice = (value or "").strip().lower()
1508
+ if choice not in CRITIC_CHOICES:
1509
+ raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
1510
+ state.critic = choice
1511
+ state.critic_pending_text = False
1512
+ return f"critic: {choice}"
1513
+
1514
+
1462
1515
  def _build_executor(state: WizardState) -> Prompt:
1463
1516
  t = _p(state.workspace_root, "executor")
1464
1517
  default_suffix = t["options"].get("_DEFAULT_SUFFIX", "")
@@ -1922,6 +1975,17 @@ STEPS: list[Step] = [
1922
1975
  and not s.executor),
1923
1976
  build=_build_executor, submit=_submit_executor,
1924
1977
  owns=("executor",)),
1978
+ Step(S_CRITIC_PICK,
1979
+ applies=lambda s: (s.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification")
1980
+ and not s.critic
1981
+ and not s.critic_pending_text
1982
+ and S_CRITIC_PICK not in s.answered),
1983
+ build=_build_critic_pick, submit=_submit_critic_pick,
1984
+ owns=("critic", "critic_pending_text")),
1985
+ Step(S_CRITIC_TEXT,
1986
+ applies=lambda s: (s.critic_pending_text and S_CRITIC_TEXT not in s.answered),
1987
+ build=_build_critic_text, submit=_submit_critic_text,
1988
+ owns=("critic", "critic_pending_text")),
1925
1989
  Step(S_DEFAULTS_OR_CUSTOM,
1926
1990
  applies=lambda s: (_identity_ready(s)
1927
1991
  and s.use_defaults is None),
@@ -2118,7 +2182,8 @@ _FIELD_DEFAULTS: dict[str, Any] = {
2118
2182
  "base_ref_pending_text": False, "approved_plan_path": "",
2119
2183
  "approved_plan_pending_text": False,
2120
2184
  "selected_stage": "auto",
2121
- "executor": "", "use_defaults": None, "workers_override": "",
2185
+ "executor": "", "critic": "", "critic_pending_text": False,
2186
+ "use_defaults": None, "workers_override": "",
2122
2187
  "lead_model": "", "claude_model": "", "codex_model": "",
2123
2188
  "gemini_model": "", "report_writer_model": "", "directive": "",
2124
2189
  "directive_pending_text": False,
@@ -2200,6 +2265,7 @@ def render_args(state: WizardState) -> dict[str, str]:
2200
2265
  "task-type": state.task_type,
2201
2266
  "task-brief": state.brief_path,
2202
2267
  "executor": state.executor,
2268
+ "critic": state.critic,
2203
2269
  "approved-plan": state.approved_plan_path,
2204
2270
  "stage": (state.selected_stage or "auto") if state.task_type == "implementation" else "",
2205
2271
  "base-ref": base_ref,
@@ -2244,6 +2310,8 @@ def confirmation_block(state: WizardState) -> str:
2244
2310
  if state.report_writer_model:
2245
2311
  lines.append(f" report-writer : {state.report_writer_model}")
2246
2312
  lines.append(f" directive : {state.directive or '(none)'}")
2313
+ if state.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification"):
2314
+ lines.append(f" critic : {state.critic or '(off)'}")
2247
2315
  if state.task_type == "implementation":
2248
2316
  lines.append(f" approved-plan : {state.approved_plan_path}")
2249
2317
  if state.clarification_response_path:
@@ -2288,6 +2356,7 @@ def _cli(argv: list[str]) -> int:
2288
2356
  p_init.add_argument("--workspace-root", required=True)
2289
2357
  p_init.add_argument("--project-root", required=True)
2290
2358
  p_init.add_argument("--project-id", required=True)
2359
+ p_init.add_argument("--critic", default="")
2291
2360
 
2292
2361
  p_step = sub.add_parser("step")
2293
2362
  p_step.add_argument("--state-file", required=True)
@@ -2313,6 +2382,8 @@ def _cli(argv: list[str]) -> int:
2313
2382
  project_root=args.project_root,
2314
2383
  project_id=args.project_id,
2315
2384
  )
2385
+ if args.critic:
2386
+ state.critic = args.critic
2316
2387
  save_state_file(state_path, state)
2317
2388
  first = next_prompt(state)
2318
2389
  print(json.dumps({"ok": True, "next": first.to_json()},
@@ -17,8 +17,11 @@ user-invocable: false
17
17
  - [Round 1-N: Re-verification Loop (queue-pruned)](#round-1-n-re-verification-loop-queue-pruned)
18
18
  - [Convergence Test](#convergence-test)
19
19
  - [Verification Mode](#verification-mode)
20
+ - [Adversarial Verification Mode](#adversarial-verification-mode)
20
21
  - [Re-verification Agent Dispatch](#re-verification-agent-dispatch)
21
22
  - [Convergence State Artifact](#convergence-state-artifact)
23
+ - [Coverage critic pass](#coverage-critic-pass)
24
+ - [Acceptance critic pass (final-verification)](#acceptance-critic-pass-final-verification)
22
25
  - [Output](#output)
23
26
  - [Convergence Disabled](#convergence-disabled)
24
27
  - [Plan-body verification mode (implementation-planning only)](#plan-body-verification-mode-implementation-planning-only)
@@ -46,7 +49,7 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
46
49
  | `enabled` | `true` | If `false`, skip the convergence loop and use the existing consensus/divergence method |
47
50
  | `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
48
51
  | `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
49
- | `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
52
+ | `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis` / `implementation-planning`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
50
53
 
51
54
  **Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
52
55
 
@@ -195,13 +198,13 @@ Disadvantages: 2–3 times the cost, increased time
195
198
 
196
199
  ## Adversarial Verification Mode
197
200
 
198
- Active only when `config.adversarial == true` (default for `requirements-discovery` and `error-analysis`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
201
+ Active only when `config.adversarial == true` (default for `requirements-discovery`, `error-analysis`, and `implementation-planning`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
199
202
 
200
203
  In adversarial mode the verifier's job inverts: instead of confirming a peer's finding, the verifier **tries to break it**, and the burden of proof sits on the claim — a finding survives only if refutation attempts fail.
201
204
 
202
205
  ### Scoped full-reanalysis (BLOCKING)
203
206
 
204
- Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery and error-analysis" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
207
+ Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery, error-analysis, and implementation-planning" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
205
208
 
206
209
  ### Adversarial verdict semantics
207
210
 
@@ -299,7 +302,7 @@ Reverify prompts MUST NOT inject the Phase 2 `[Required reading]` clause:
299
302
  - **Lightweight mode**: the clause directly contradicts the "Do NOT re-analyze the original source materials" instruction below. Including it forces workers to re-read the entire instruction-set per round per worker (3 workers × 2 rounds × 5+ files in the worst case) for no quality gain.
300
303
  - **Full-reanalysis mode**: workers DO need to re-read source materials, but only the analysis-worker file list (no `final-report-template.md`). If lead chooses to inject a reading clause here, it MUST mirror the audience-scoped enumeration in [okstra/SKILL.md](../../SKILL.md) Phase 2 (no template).
301
304
 
302
- This is the single largest avoidable cost in `requirements-discovery` and `error-analysis` runs. Treat as BLOCKING.
305
+ This is the single largest avoidable cost in `requirements-discovery`, `error-analysis`, and `implementation-planning` runs. Treat as BLOCKING.
303
306
 
304
307
  ### Lightweight Re-verification Prompt
305
308
 
@@ -493,7 +496,7 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
493
496
  Schema rules:
494
497
 
495
498
  - `schemaVersion`: literal string `"1.2"` for all new runs — both adversarial and collaborative. v1.2 adds `config.adversarial` and `votes.<worker>.disagreeBasis`, written as `false` / `null` respectively on collaborative runs. Readers MUST accept `"1.0"` / `"1.1"` / `"1.2"` for historical artifacts and treat any missing field as `null`.
496
- - `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
499
+ - `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis` / `implementation-planning`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
497
500
  - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
498
501
  - `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
499
502
  - `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
@@ -509,6 +512,66 @@ Schema rules:
509
512
  - `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (per the "Worker failure handling in reverify" section, rule 4). `aborted-non-result` is the new v1.1 value.
510
513
  - `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
511
514
 
515
+ ## Coverage critic pass
516
+
517
+ Runs only when `convergence.critic.enabled == true` (set by `--critic <provider>` or the okstra-run `critic_pick` step; default off). Applies to the three finding-producing phases (`requirements-discovery`, `error-analysis`, `implementation-planning`); for `final-verification` the critic runs in a different mode — see §"Acceptance critic pass (final-verification)". This pass targets **coverage** (missed findings), distinct from convergence which targets **agreement quality**.
518
+
519
+ ### When
520
+ After Phase 5.5 finding convergence completes (findings classified) and BEFORE the Phase 6 report-writer dispatch.
521
+
522
+ ### Dispatch (reused worker)
523
+ Dispatch ONE pass to the `config.critic.provider`'s existing subagent (`claude-worker` / `codex-worker` / `gemini-worker`) with `model = config.critic.modelExecutionValue` — no new agent type. If `config.critic.modelExecutionValue` is null/empty (model could not be resolved), skip the critic pass and record `critic-skipped: model-unresolved` in the convergence state rather than dispatching with no model. Result path: `runs/<task-type>/worker-results/<provider>-critic-<task-type>-<seq>.md`. The critic prompt seeds the consolidated findings and asks ONLY for coverage gaps:
524
+
525
+ ```
526
+ You are the coverage critic for <task-key>. Below are the findings the workers
527
+ already agreed on. Your ONLY job is to name what is MISSING:
528
+ - files / directories / execution paths nobody inspected,
529
+ - requirements or acceptance points with zero findings,
530
+ - claims raised but never verified.
531
+ For each gap, emit a NEW finding with evidence (file:line or the requirement quote).
532
+ Do NOT restate an existing finding. If nothing is missing, say so explicitly.
533
+ ```
534
+
535
+ ### Gap verification (1 adversarial reverify round)
536
+ Each critic gap enters the verification queue as a finding with `originWorker = "<provider>-critic"` and `source = "critic"`. The lead runs ONE adversarial reverify round (§"Adversarial Verification Mode" classifier) with the Phase 4 analysers (excluding the critic itself) as voters. Only gaps classified `full-consensus` / `partial-consensus` merge into the final report findings; `contested` / `worker-unique` gaps are treated as hallucinations and dropped (recorded in the convergence state, not promoted). If no non-critic analyser is available to vote, the gaps are surfaced as unverified `clarification` items rather than merged, and that fact is recorded.
537
+
538
+ ### State
539
+ - `convergence.critic` manifest block: `{ enabled, provider, modelExecutionValue }`.
540
+ - Convergence state artifact: critic gaps appear in `findings[]` with `source: "critic"`. Add a `config.critic` summary `{ provider, modelExecutionValue, gapsProposed, gapsMerged }`. `source` and `config.critic` are optional v1.2 fields (readers treat absence as null); no enum changes.
541
+
542
+ ## Acceptance critic pass (final-verification)
543
+
544
+ The `final-verification` phase reuses the SAME reused-worker dispatch as §"Coverage critic pass" (provider + `config.critic.modelExecutionValue` from the `convergence.critic` block; default off; same model-unresolved skip rule). Only the prompt, the verification semantics, and the output sink differ — final-verification's findings are defects/blockers, so the critic acts as an **acceptance devil's advocate** (find reasons NOT to accept), and its candidate blockers are NEVER dropped (that would suppress real defects).
545
+
546
+ ### Prompt
547
+
548
+ ```
549
+ You are the acceptance devil's advocate for <task-key>. The delivered work is about
550
+ to be judged for acceptance. Your ONLY job is to find reasons it should NOT be
551
+ accepted — surface candidate acceptance BLOCKERS the verifiers may have missed:
552
+ - requirements / acceptance points with no covering evidence,
553
+ - DB / IO / SQL changes lacking real-execution evidence,
554
+ - regressions or broken error paths,
555
+ - scope / contract violations.
556
+ For each, emit a candidate blocker with a one-line statement, evidence (file:line /
557
+ log / test output), and a severity (critical / major / minor). Do NOT restate an
558
+ existing Acceptance Blocker. If you find none, say so explicitly.
559
+ ```
560
+
561
+ ### Verification — confirm-or-downgrade (BLOCKING)
562
+
563
+ Each candidate blocker is verified by the Phase 4 analysers (excluding the critic). Do NOT use the adversarial finding classifier's "uncertain → reject" rule here.
564
+ - **Confirmed** (an analyser reproduces it or cites supporting evidence) → promote to a `## 4 Acceptance Blockers` row (keep severity + recommended follow-up phase).
565
+ - **Not confirmed** (cannot reproduce, or evidence is weak) → **downgrade to a Residual Risk row — never drop it.** Record the escalation trigger so the user can re-judge a high-severity-but-unconfirmed candidate.
566
+
567
+ ### Verdict impact
568
+
569
+ Promoted blockers enter `## 4 Acceptance Blockers`; since `accepted` requires zero blockers, the verdict moves to `conditional-accept` / `blocked` automatically. The existing verdict↔blocker consistency validator (`validators/validate-run.py` `_validate_final_verification_consistency`) enforces this unchanged — no new enum or validator.
570
+
571
+ ### State
572
+
573
+ Critic output lives at `runs/final-verification/worker-results/<provider>-critic-final-verification-<seq>.md`. The convergence state `config.critic` summary (see §"Coverage critic pass") records `mode: "acceptance-devils-advocate"`, `candidatesProposed`, `confirmedBlockers`, `downgradedToResidual` (optional v1.2 fields; readers treat absence as null).
574
+
512
575
  ## Output
513
576
 
514
577
  Information to be passed to Phase 6 after executing this skill:
@@ -600,6 +663,16 @@ Worker non-result handling (`timeout`, `error`, no result file, wrapper `cli-fai
600
663
 
601
664
  Plan-body verification only supports **lightweight mode** (defined in §"Verification Mode" above). `full-reanalysis` is not meaningful here because the "original source materials" for a plan item are the worker's own analysis plus the lead-mediated synthesis — there is no independent ground truth to re-read. The manifest's top-level `verificationMode` is ignored for this round; lightweight is always used.
602
665
 
666
+ ### Adversarial plan-body posture
667
+
668
+ When `config.adversarial == true` (the default for `implementation-planning`; see the top-level §"Configuration" table), the plan-body round runs with an **adversarial posture**. The classification rules and gate arithmetic in §"Round protocol" are UNCHANGED — `majority-disagree` (a *majority* of analysers DISAGREE) remains the only classification that blocks the Approval marker, and `dissent-isolated` still passes the gate. Adversarial mode changes only *how each verifier evaluates an item*:
669
+
670
+ - The burden of proof sits on the plan: an item earns `AGREE` only if the verifier actively tried to break it and could not.
671
+ - The verifier MUST open the file paths / symbols / commands the item cites and confirm they exist and are executable as written. This is the one allowed widening of the lightweight "judge from internal consistency and stated commands / paths" rule — confirming the existence of cited paths is not "re-analyzing the original requirements".
672
+ - If a cited path / command / validation signal cannot be confirmed, the verifier responds `DISAGREE(<kind>)` with the applicable breakage kind (a–e); uncertainty resolves toward DISAGREE, not AGREE.
673
+
674
+ Plan-body verification stays **lightweight** even under this posture — the `verificationMode = "full-reanalysis"` forcing in §"Adversarial Verification Mode" applies to finding convergence only (see §"Mode constraint"); the adversarial posture here only changes verifier behaviour, not the mode. This raises verification *quality* (active refutation, plan-side burden) without changing the gate *threshold* — a single dissent still does not block approval; a majority is required (deliberate design decision).
675
+
603
676
  ### Round protocol (single round at default `maxRounds=1`)
604
677
 
605
678
  1. Lead parses the report-writer draft and extracts the `P-*` plan items.
@@ -719,6 +792,8 @@ or worker analyses for this round.
719
792
  ...
720
793
  ```
721
794
 
795
+ When `config.adversarial == true`, the lead prepends the adversarial framing from §"Adversarial plan-body posture" to the `## Instructions` block: the burden of proof is on the plan, the verifier opens and confirms every cited path / command, and an item whose cited references cannot be confirmed is answered `DISAGREE(<kind>)` rather than `AGREE`. The verdict tokens, breakage kinds (a–e), classification, and the majority gate threshold are unchanged. This prepended framing supersedes the template's "Judge solely from plan internal consistency" instruction for the adversarial round.
796
+
722
797
  The "Reverify prompt: required-reading suppression (BLOCKING)" rule (lightweight mode does NOT inject a `[Required reading]` clause) applies here as well.
723
798
 
724
799
  ### Worker non-result handling in plan-body round (BLOCKING)
@@ -160,6 +160,7 @@ okstra render-bundle \
160
160
  --task-type "<args.task-type>" \
161
161
  --task-brief "<args.task-brief>" \
162
162
  --executor "<args.executor>" \
163
+ --critic "<args.critic>" \
163
164
  --approved-plan "<args.approved-plan>" \
164
165
  --stage "<args.stage>" \
165
166
  --base-ref "<args.base-ref>" \