okstra 0.46.0 → 0.48.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -17,8 +17,11 @@ user-invocable: false
17
17
  - [Round 1-N: Re-verification Loop (queue-pruned)](#round-1-n-re-verification-loop-queue-pruned)
18
18
  - [Convergence Test](#convergence-test)
19
19
  - [Verification Mode](#verification-mode)
20
+ - [Adversarial Verification Mode](#adversarial-verification-mode)
20
21
  - [Re-verification Agent Dispatch](#re-verification-agent-dispatch)
21
22
  - [Convergence State Artifact](#convergence-state-artifact)
23
+ - [Coverage critic pass](#coverage-critic-pass)
24
+ - [Acceptance critic pass (final-verification)](#acceptance-critic-pass-final-verification)
22
25
  - [Output](#output)
23
26
  - [Convergence Disabled](#convergence-disabled)
24
27
  - [Plan-body verification mode (implementation-planning only)](#plan-body-verification-mode-implementation-planning-only)
@@ -46,6 +49,7 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
46
49
  | `enabled` | `true` | If `false`, skip the convergence loop and use the existing consensus/divergence method |
47
50
  | `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
48
51
  | `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
52
+ | `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis` / `implementation-planning`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
49
53
 
50
54
  **Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
51
55
 
@@ -192,6 +196,62 @@ Use the findings as a guide, but reanalyze the original code/data yourself.
192
196
  Advantages: High accuracy
193
197
  Disadvantages: 2–3 times the cost, increased time
194
198
 
199
+ ## Adversarial Verification Mode
200
+
201
+ Active only when `config.adversarial == true` (default for `requirements-discovery`, `error-analysis`, and `implementation-planning`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
202
+
203
+ In adversarial mode the verifier's job inverts: instead of confirming a peer's finding, the verifier **tries to break it**, and the burden of proof sits on the claim — a finding survives only if refutation attempts fail.
204
+
205
+ ### Scoped full-reanalysis (BLOCKING)
206
+
207
+ Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery, error-analysis, and implementation-planning" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
208
+
209
+ ### Adversarial verdict semantics
210
+
211
+ The persisted `verdict` enum is unchanged (`agree | disagree | supplement | verification-error`). The prompt-facing labels are adversarial and map down on persistence:
212
+
213
+ | Prompt label | Persisted `verdict` | Meaning |
214
+ |---|---|---|
215
+ | SURVIVES | `agree` | Actively tried to refute and failed — the claim withstood the attack. |
216
+ | SURVIVES-WITH-CAVEAT | `supplement` | Holds, but a scope limit / extra condition / precondition was found. |
217
+ | REFUTED | `disagree` | The claim was broken (or failed to prove itself). MUST carry a `disagreeBasis`. |
218
+
219
+ Each `disagree` vote records a new field `disagreeBasis`:
220
+
221
+ | `disagreeBasis` | Meaning |
222
+ |---|---|
223
+ | `counter-evidence` | The verifier cited contradicting evidence (`file:line` / log line) in `explanation`. A **hard refute**. |
224
+ | `burden-not-met` | The verifier re-inspected the cited evidence and could neither confirm nor refute → the claim failed to prove itself ("when uncertain, lean to rejection"). |
225
+
226
+ A `disagree` with `disagreeBasis == null` is a contract violation in adversarial mode — every refutation must state which of the two grounds it rests on. Bare "I disagree" without re-inspection is not allowed.
227
+
228
+ ### Adversarial classification (replaces the §"Convergence Algorithm" per-round classifier when `adversarial == true`)
229
+
230
+ `verification-error` votes are excluded from numerator and denominator exactly as in the collaborative classifier. For each finding `F` in the queue at a round:
231
+
232
+ ```text
233
+ disagrees = [v for v in non-error votes if v.verdict == "disagree"]
234
+ hard_refutes = [v for v in disagrees if v.disagreeBasis == "counter-evidence"]
235
+ all_others_disagree = (every non-discoverer non-error vote is "disagree")
236
+
237
+ IF len(disagrees) == 0:
238
+ resolve F as "full-consensus" (or "partial-consensus" if any SUPPLEMENT/caveat)
239
+ ELIF all_others_disagree:
240
+ resolve F as "worker-unique" # only the discoverer still holds it
241
+ ELIF len(hard_refutes) >= 1:
242
+ # an evidence-backed refute exists and the roster is split → the claim is disputed
243
+ carry F forward; at the LAST executed round classify it "contested"
244
+ ELIF burden-not-met disagrees are a majority of non-error votes (per the Majority definition in the Convergence Algorithm section):
245
+ carry F forward; at the LAST executed round classify it "contested"
246
+ ELSE:
247
+ # a lone weak (burden-not-met) doubt against an otherwise-surviving claim
248
+ resolve F as "partial-consensus"
249
+ ```
250
+
251
+ `contested` remains a **final classification only** (per §"Scope and Terminology"): a disputed finding is carried forward through intermediate rounds and labelled `contested` only at the last executed round. For `requirements-discovery` (`effectiveMaxRounds = 1`) the single round IS the last round, so a split-with-hard-refute finding is labelled `contested` in that one round. The final-classifier block of §"Convergence Algorithm" is unchanged; this section only changes how each round's verdicts resolve into queue actions.
252
+
253
+ Design intent: one `counter-evidence` refute is enough to deny a claim consensus (it cannot rise above `contested` no matter how many others AGREE), while a single `burden-not-met` doubt does not by itself sink an otherwise-surviving claim — only a majority of burden-not-met doubts does. When every non-discoverer refutes (all_others_disagree), the finding is worker-unique regardless of whether those refutes were counter-evidence or burden-not-met — only the discoverer still holds it. A SUPPLEMENT/caveat with zero disagrees lands partial-consensus rather than full-consensus, because a caveat means the claim does not pass cleanly (this differs from the collaborative classifier, where SUPPLEMENT counts as full agreement).
254
+
195
255
  ## Re-verification Agent Dispatch
196
256
 
197
257
  ### Sponsorship Optimization
@@ -242,7 +302,7 @@ Reverify prompts MUST NOT inject the Phase 2 `[Required reading]` clause:
242
302
  - **Lightweight mode**: the clause directly contradicts the "Do NOT re-analyze the original source materials" instruction below. Including it forces workers to re-read the entire instruction-set per round per worker (3 workers × 2 rounds × 5+ files in the worst case) for no quality gain.
243
303
  - **Full-reanalysis mode**: workers DO need to re-read source materials, but only the analysis-worker file list (no `final-report-template.md`). If lead chooses to inject a reading clause here, it MUST mirror the audience-scoped enumeration in [okstra/SKILL.md](../../SKILL.md) Phase 2 (no template).
244
304
 
245
- This is the single largest avoidable cost in `requirements-discovery` and `error-analysis` runs. Treat as BLOCKING.
305
+ This is the single largest avoidable cost in `requirements-discovery`, `error-analysis`, and `implementation-planning` runs. Treat as BLOCKING.
246
306
 
247
307
  ### Lightweight Re-verification Prompt
248
308
 
@@ -282,6 +342,55 @@ For each finding, respond as:
282
342
  **Verdict**: ...
283
343
  ```
284
344
 
345
+ ### Adversarial Re-verification Prompt
346
+
347
+ Used instead of the lightweight/full-reanalysis prompt when `config.adversarial == true`. The required anchor headers (§"Required reverify-prompt anchor headers") are identical. The `[Required reading]` clause is suppressed; only the cited-evidence paths of the items under attack are injected (see §"Adversarial Verification Mode" → Scoped full-reanalysis).
348
+
349
+ ```
350
+ You are <worker-role> performing ADVERSARIAL re-verification for <task-key> (round <N>).
351
+
352
+ ## Instructions
353
+
354
+ Your job is to BREAK each finding below, not to confirm it. For EACH finding,
355
+ open the cited evidence directly and actively search for evidence that the claim
356
+ is wrong, overstated, or unproven. Then respond with exactly one verdict:
357
+
358
+ - **REFUTED**: You broke the claim. State the basis:
359
+ - counter-evidence — you found contradicting evidence (give file:line or log line), OR
360
+ - burden-not-met — you re-inspected the cited evidence and could neither confirm
361
+ nor refute it (the claim has not proven itself).
362
+ - **SURVIVES**: You actively tried to refute it and failed — the claim withstood the attack.
363
+ - **SURVIVES-WITH-CAVEAT**: It holds, but a scope limit / extra condition / missing
364
+ precondition exists (state it).
365
+
366
+ The burden of proof is on the claim. If after inspecting the cited evidence you remain
367
+ uncertain, your verdict is REFUTED with basis = burden-not-met.
368
+
369
+ Inspect ONLY the evidence each finding cites and its immediate surroundings. Do NOT
370
+ re-read the task brief, instruction-set, or report template.
371
+
372
+ ## Findings to verify
373
+
374
+ ### F-001: <one-line summary>
375
+ **Origin**: <worker role>
376
+ **Cited evidence**: <file paths, line numbers, log lines from origin worker>
377
+
378
+ ### F-002: <one-line summary>
379
+ ...
380
+
381
+ ## Response format
382
+
383
+ ### F-001
384
+ **Verdict**: REFUTED | SURVIVES | SURVIVES-WITH-CAVEAT
385
+ **Basis** (only if REFUTED): counter-evidence | burden-not-met
386
+ **Explanation**: <2-3 sentences; for counter-evidence include the file:line you found>
387
+
388
+ ### F-002
389
+ ...
390
+ ```
391
+
392
+ When persisting votes, map SURVIVES→`agree`, SURVIVES-WITH-CAVEAT→`supplement`, REFUTED→`disagree`, and copy the stated Basis into `votes.<worker>.disagreeBasis` (null for non-REFUTED verdicts).
393
+
285
394
  ### Full Re-analysis Re-verification Prompt
286
395
 
287
396
  ```
@@ -324,10 +433,11 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
324
433
 
325
434
  ```json
326
435
  {
327
- "schemaVersion": "1.1",
436
+ "schemaVersion": "1.2",
328
437
  "taskKey": "<task-key>",
329
438
  "config": {
330
439
  "enabled": true,
440
+ "adversarial": false,
331
441
  "maxRounds": 2,
332
442
  "effectiveMaxRounds": 2,
333
443
  "verificationMode": "lightweight"
@@ -345,7 +455,7 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
345
455
  {
346
456
  "round": 1,
347
457
  "votes": {
348
- "codex-worker": { "verdict": "agree", "explanation": "<brief>" },
458
+ "codex-worker": { "verdict": "agree", "disagreeBasis": null, "explanation": "<brief>" },
349
459
  "gemini-worker": { "verdict": "supplement", "explanation": "<brief>" }
350
460
  }
351
461
  }
@@ -385,11 +495,13 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
385
495
 
386
496
  Schema rules:
387
497
 
388
- - `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
498
+ - `schemaVersion`: literal string `"1.2"` for all new runs — both adversarial and collaborative. v1.2 adds `config.adversarial` and `votes.<worker>.disagreeBasis`, written as `false` / `null` respectively on collaborative runs. Readers MUST accept `"1.0"` / `"1.1"` / `"1.2"` for historical artifacts and treat any missing field as `null`.
499
+ - `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis` / `implementation-planning`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
389
500
  - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
390
501
  - `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
391
502
  - `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
392
- - `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted in v1.1.
503
+ - `findings[].rounds[].votes.<worker>.disagreeBasis`: enum `counter-evidence | burden-not-met | null`. Non-null only when `verdict == "disagree"` AND `config.adversarial == true`; `null` (or absent, treated as null) otherwise. See §"Adversarial Verification Mode".
504
+ - `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted.
393
505
  - `roundHistory[].inputQueueSize`: queue size at the start of this round.
394
506
  - `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
395
507
  - `roundHistory[].carriedForwardCount`: queue size at the END of this round — the single definition. In-round insertions into the queue are forbidden, so this always equals `inputQueueSize - resolvedCount`. The pseudocode's per-item `carriedForwardCount += 1` accumulator is a counting convenience that lands on the same value; persist the post-round queue length, not the loop accumulator, if the two ever diverge.
@@ -397,9 +509,69 @@ Schema rules:
397
509
  - `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
398
510
  - `round2SkippedReason`: literal enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped`. Always present. Use `"not-skipped"` when Round 2 actually ran. Use `"max-rounds-1"` when `effectiveMaxRounds == 1` (Round 2 was never attempted). Use `"queue-empty"` when Round 1 fully drained the queue. Use `"all-reverify-non-result"` when all Round 1 dispatches terminated as non-result.
399
511
  - `finalClassificationCounts`: post-loop counts. Required field with keys `fullConsensus`, `partialConsensus`, `contested`, `workerUnique`.
400
- - `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (Task 3's "Worker failure handling in reverify" rule 4). `aborted-non-result` is the new v1.1 value.
512
+ - `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (per the "Worker failure handling in reverify" section, rule 4). `aborted-non-result` is the new v1.1 value.
401
513
  - `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
402
514
 
515
+ ## Coverage critic pass
516
+
517
+ Runs only when `convergence.critic.enabled == true` (set by `--critic <provider>` or the okstra-run `critic_pick` step; default off). Applies to the three finding-producing phases (`requirements-discovery`, `error-analysis`, `implementation-planning`); for `final-verification` the critic runs in a different mode — see §"Acceptance critic pass (final-verification)". This pass targets **coverage** (missed findings), distinct from convergence which targets **agreement quality**.
518
+
519
+ ### When
520
+ After Phase 5.5 finding convergence completes (findings classified) and BEFORE the Phase 6 report-writer dispatch.
521
+
522
+ ### Dispatch (reused worker)
523
+ Dispatch ONE pass to the `config.critic.provider`'s existing subagent (`claude-worker` / `codex-worker` / `gemini-worker`) with `model = config.critic.modelExecutionValue` — no new agent type. If `config.critic.modelExecutionValue` is null/empty (model could not be resolved), skip the critic pass and record `critic-skipped: model-unresolved` in the convergence state rather than dispatching with no model. Result path: `runs/<task-type>/worker-results/<provider>-critic-<task-type>-<seq>.md`. The critic prompt seeds the consolidated findings and asks ONLY for coverage gaps:
524
+
525
+ ```
526
+ You are the coverage critic for <task-key>. Below are the findings the workers
527
+ already agreed on. Your ONLY job is to name what is MISSING:
528
+ - files / directories / execution paths nobody inspected,
529
+ - requirements or acceptance points with zero findings,
530
+ - claims raised but never verified.
531
+ For each gap, emit a NEW finding with evidence (file:line or the requirement quote).
532
+ Do NOT restate an existing finding. If nothing is missing, say so explicitly.
533
+ ```
534
+
535
+ ### Gap verification (1 adversarial reverify round)
536
+ Each critic gap enters the verification queue as a finding with `originWorker = "<provider>-critic"` and `source = "critic"`. The lead runs ONE adversarial reverify round (§"Adversarial Verification Mode" classifier) with the Phase 4 analysers (excluding the critic itself) as voters. Only gaps classified `full-consensus` / `partial-consensus` merge into the final report findings; `contested` / `worker-unique` gaps are treated as hallucinations and dropped (recorded in the convergence state, not promoted). If no non-critic analyser is available to vote, the gaps are surfaced as unverified `clarification` items rather than merged, and that fact is recorded.
537
+
538
+ ### State
539
+ - `convergence.critic` manifest block: `{ enabled, provider, modelExecutionValue }`.
540
+ - Convergence state artifact: critic gaps appear in `findings[]` with `source: "critic"`. Add a `config.critic` summary `{ provider, modelExecutionValue, gapsProposed, gapsMerged }`. `source` and `config.critic` are optional v1.2 fields (readers treat absence as null); no enum changes.
541
+
542
+ ## Acceptance critic pass (final-verification)
543
+
544
+ The `final-verification` phase reuses the SAME reused-worker dispatch as §"Coverage critic pass" (provider + `config.critic.modelExecutionValue` from the `convergence.critic` block; default off; same model-unresolved skip rule). Only the prompt, the verification semantics, and the output sink differ — final-verification's findings are defects/blockers, so the critic acts as an **acceptance devil's advocate** (find reasons NOT to accept), and its candidate blockers are NEVER dropped (that would suppress real defects).
545
+
546
+ ### Prompt
547
+
548
+ ```
549
+ You are the acceptance devil's advocate for <task-key>. The delivered work is about
550
+ to be judged for acceptance. Your ONLY job is to find reasons it should NOT be
551
+ accepted — surface candidate acceptance BLOCKERS the verifiers may have missed:
552
+ - requirements / acceptance points with no covering evidence,
553
+ - DB / IO / SQL changes lacking real-execution evidence,
554
+ - regressions or broken error paths,
555
+ - scope / contract violations.
556
+ For each, emit a candidate blocker with a one-line statement, evidence (file:line /
557
+ log / test output), and a severity (critical / major / minor). Do NOT restate an
558
+ existing Acceptance Blocker. If you find none, say so explicitly.
559
+ ```
560
+
561
+ ### Verification — confirm-or-downgrade (BLOCKING)
562
+
563
+ Each candidate blocker is verified by the Phase 4 analysers (excluding the critic). Do NOT use the adversarial finding classifier's "uncertain → reject" rule here.
564
+ - **Confirmed** (an analyser reproduces it or cites supporting evidence) → promote to a `## 4 Acceptance Blockers` row (keep severity + recommended follow-up phase).
565
+ - **Not confirmed** (cannot reproduce, or evidence is weak) → **downgrade to a Residual Risk row — never drop it.** Record the escalation trigger so the user can re-judge a high-severity-but-unconfirmed candidate.
566
+
567
+ ### Verdict impact
568
+
569
+ Promoted blockers enter `## 4 Acceptance Blockers`; since `accepted` requires zero blockers, the verdict moves to `conditional-accept` / `blocked` automatically. The existing verdict↔blocker consistency validator (`validators/validate-run.py` `_validate_final_verification_consistency`) enforces this unchanged — no new enum or validator.
570
+
571
+ ### State
572
+
573
+ Critic output lives at `runs/final-verification/worker-results/<provider>-critic-final-verification-<seq>.md`. The convergence state `config.critic` summary (see §"Coverage critic pass") records `mode: "acceptance-devils-advocate"`, `candidatesProposed`, `confirmedBlockers`, `downgradedToResidual` (optional v1.2 fields; readers treat absence as null).
574
+
403
575
  ## Output
404
576
 
405
577
  Information to be passed to Phase 6 after executing this skill:
@@ -491,6 +663,16 @@ Worker non-result handling (`timeout`, `error`, no result file, wrapper `cli-fai
491
663
 
492
664
  Plan-body verification only supports **lightweight mode** (defined in §"Verification Mode" above). `full-reanalysis` is not meaningful here because the "original source materials" for a plan item are the worker's own analysis plus the lead-mediated synthesis — there is no independent ground truth to re-read. The manifest's top-level `verificationMode` is ignored for this round; lightweight is always used.
493
665
 
666
+ ### Adversarial plan-body posture
667
+
668
+ When `config.adversarial == true` (the default for `implementation-planning`; see the top-level §"Configuration" table), the plan-body round runs with an **adversarial posture**. The classification rules and gate arithmetic in §"Round protocol" are UNCHANGED — `majority-disagree` (a *majority* of analysers DISAGREE) remains the only classification that blocks the Approval marker, and `dissent-isolated` still passes the gate. Adversarial mode changes only *how each verifier evaluates an item*:
669
+
670
+ - The burden of proof sits on the plan: an item earns `AGREE` only if the verifier actively tried to break it and could not.
671
+ - The verifier MUST open the file paths / symbols / commands the item cites and confirm they exist and are executable as written. This is the one allowed widening of the lightweight "judge from internal consistency and stated commands / paths" rule — confirming the existence of cited paths is not "re-analyzing the original requirements".
672
+ - If a cited path / command / validation signal cannot be confirmed, the verifier responds `DISAGREE(<kind>)` with the applicable breakage kind (a–e); uncertainty resolves toward DISAGREE, not AGREE.
673
+
674
+ Plan-body verification stays **lightweight** even under this posture — the `verificationMode = "full-reanalysis"` forcing in §"Adversarial Verification Mode" applies to finding convergence only (see §"Mode constraint"); the adversarial posture here only changes verifier behaviour, not the mode. This raises verification *quality* (active refutation, plan-side burden) without changing the gate *threshold* — a single dissent still does not block approval; a majority is required (deliberate design decision).
675
+
494
676
  ### Round protocol (single round at default `maxRounds=1`)
495
677
 
496
678
  1. Lead parses the report-writer draft and extracts the `P-*` plan items.
@@ -610,6 +792,8 @@ or worker analyses for this round.
610
792
  ...
611
793
  ```
612
794
 
795
+ When `config.adversarial == true`, the lead prepends the adversarial framing from §"Adversarial plan-body posture" to the `## Instructions` block: the burden of proof is on the plan, the verifier opens and confirms every cited path / command, and an item whose cited references cannot be confirmed is answered `DISAGREE(<kind>)` rather than `AGREE`. The verdict tokens, breakage kinds (a–e), classification, and the majority gate threshold are unchanged. This prepended framing supersedes the template's "Judge solely from plan internal consistency" instruction for the adversarial round.
796
+
613
797
  The "Reverify prompt: required-reading suppression (BLOCKING)" rule (lightweight mode does NOT inject a `[Required reading]` clause) applies here as well.
614
798
 
615
799
  ### Worker non-result handling in plan-body round (BLOCKING)
@@ -160,6 +160,7 @@ okstra render-bundle \
160
160
  --task-type "<args.task-type>" \
161
161
  --task-brief "<args.task-brief>" \
162
162
  --executor "<args.executor>" \
163
+ --critic "<args.critic>" \
163
164
  --approved-plan "<args.approved-plan>" \
164
165
  --stage "<args.stage>" \
165
166
  --base-ref "<args.base-ref>" \