npm - okstra - Versions diffs - 0.46.0 → 0.47.0 - Mend

okstra 0.46.0 → 0.47.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/docs/superpowers/plans/2026-06-04-adversarial-verification.md +570 -0
package/docs/superpowers/specs/2026-06-04-adversarial-verification-design.md +176 -0
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/agents/SKILL.md +2 -1
package/runtime/prompts/profiles/_common-contract.md +1 -1
package/runtime/prompts/profiles/error-analysis.md +2 -0
package/runtime/prompts/profiles/requirements-discovery.md +2 -0
package/runtime/python/okstra_ctl/render.py +6 -1
package/runtime/skills/okstra-convergence/SKILL.md +114 -5

package/docs/superpowers/plans/2026-06-04-adversarial-verification.md ADDED Viewed

@@ -0,0 +1,570 @@
+# 적대적 Phase 5.5 검증 구현 계획
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** `requirements-discovery` / `error-analysis` 두 phase 의 Phase 5.5 convergence 재검증을, 검증자가 다른 워커의 주장을 적극적으로 반박하고 입증 책임을 주장 쪽에 두는 적대적 검증으로 전환한다.
+**Architecture:** 새 에이전트·스테이지를 만들지 않는다. manifest `convergence` 블록에 `adversarial` 플래그를 phase-aware 로 주입하고(render.py), convergence skill 이 `adversarial=true` 분기에서 적대적 프롬프트·집계·범위-한정 재조사를 정의한다. verdict 영속 enum 은 유지하고 신규 `disagreeBasis` 필드로 적대성을 기록한다.
+**Tech Stack:** Python 3 (okstra_ctl, pytest), Markdown skill/prompt 문서, JSON fixture. 빌드: `tools/build.mjs`(`npm run build`).
+**설계 근거:** [`docs/superpowers/specs/2026-06-04-adversarial-verification-design.md`](../specs/2026-06-04-adversarial-verification-design.md)
+---
+## 파일 구조
+| 파일 | 책임 | 작업 |
+|---|---|---|
+| [`scripts/okstra_ctl/render.py`](../../../scripts/okstra_ctl/render.py) | manifest `convergence` 블록에 `adversarial`/`verificationMode` phase-aware 주입 | Modify (`_build_convergence_block`, 899–926) |
+| `tests/test_render_convergence_adversarial.py` | render 주입값 단위 테스트 | Create |
+| [`tests/test_convergence_state_contract.py`](../../../tests/test_convergence_state_contract.py) | 상태 스키마 1.2 + `disagreeBasis` + `config.adversarial` 형태 강제 | Modify |
+| `tests/fixtures/convergence/adversarial-contested.json` | 적대적 contested 케이스 fixture | Create |
+| [`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md) | 적대적 모드 동작 정의(프롬프트·집계·범위·스키마) | Modify |
+| [`prompts/profiles/requirements-discovery.md`](../../../prompts/profiles/requirements-discovery.md) | Phase 5.5 적대적 선언 | Modify |
+| [`prompts/profiles/error-analysis.md`](../../../prompts/profiles/error-analysis.md) | Phase 5.5 적대적 선언 | Modify |
+| [`prompts/profiles/_common-contract.md`](../../../prompts/profiles/_common-contract.md) | Worker interaction model 의 Phase 5.5 설명 갱신 | Modify |
+| [`CHANGES.md`](../../../CHANGES.md) | `사용자 영향:` 항목 | Modify |
+작업 순서: render(코드) → contract 테스트/fixture(데이터) → skill 문서(동작) → 프로필/계약 선언 → CHANGES + 빌드 + 전체 검증.
+---
+### Task 1: render.py — `adversarial`/`verificationMode` phase-aware 주입
+**Files:**
+- Create: `tests/test_render_convergence_adversarial.py`
+- Modify: `scripts/okstra_ctl/render.py:899-926`
+- [ ] **Step 1: 실패하는 테스트 작성**
+Create `tests/test_render_convergence_adversarial.py`:
+```python
+"""_build_convergence_block — adversarial 모드 phase-aware 기본값 단위 테스트.
+requirements-discovery / error-analysis 만 adversarial=true + full-reanalysis 를
+받고, 나머지 phase 는 협조적 lightweight 를 유지한다. maxRounds 의 기존 분기는
+adversarial 도입과 무관하게 보존된다.
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+import pytest
+_REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(_REPO_ROOT / "scripts"))
+from okstra_ctl.render import _build_convergence_block  # noqa: E402
+@pytest.mark.parametrize("task_type", ["requirements-discovery", "error-analysis"])
+def test_adversarial_phases_get_adversarial_full_reanalysis(task_type):
+    block = _build_convergence_block({"TASK_TYPE": task_type})
+    assert block["adversarial"] is True
+    assert block["verificationMode"] == "full-reanalysis"
+@pytest.mark.parametrize(
+    "task_type",
+    [
+        "implementation-planning",
+        "implementation",
+        "final-verification",
+        "release-handoff",
+    ],
+)
+def test_non_adversarial_phases_stay_lightweight(task_type):
+    block = _build_convergence_block({"TASK_TYPE": task_type})
+    assert block["adversarial"] is False
+    assert block["verificationMode"] == "lightweight"
+def test_maxrounds_unchanged_by_adversarial():
+    assert _build_convergence_block({"TASK_TYPE": "requirements-discovery"})["maxRounds"] == 1
+    assert _build_convergence_block({"TASK_TYPE": "error-analysis"})["maxRounds"] == 2
+```
+- [ ] **Step 2: 테스트 실패 확인**
+Run: `python3 -m pytest tests/test_render_convergence_adversarial.py -v`
+Expected: FAIL — `KeyError: 'adversarial'` (블록에 키가 아직 없음).
+- [ ] **Step 3: 최소 구현**
+Modify `_build_convergence_block` body in `scripts/okstra_ctl/render.py` (현재 899–926). 본문 교체:
+```python
+    task_type = ctx.get("TASK_TYPE", "")
+    default_max_rounds = 1 if task_type == "requirements-discovery" else 2
+    adversarial_phases = {"requirements-discovery", "error-analysis"}
+    is_adversarial = task_type in adversarial_phases
+    raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
+    plan_verify_enabled = raw_plan_verify != "false"
+    return {
+        "enabled": True,
+        "adversarial": is_adversarial,
+        "maxRounds": default_max_rounds,
+        "verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
+        "planBodyVerification": {
+            "enabled": plan_verify_enabled,
+            "maxRounds": 1,
+            "gating": True,
+        },
+    }
+```
+그리고 docstring(900–912)의 defaults 목록에 한 줄 추가 — `- verificationMode "lightweight"` 다음 줄에:
+```python
+    - `adversarial` default True for `requirements-discovery` / `error-analysis`
+      (forces `verificationMode` to "full-reanalysis"), False otherwise
+```
+- [ ] **Step 4: 테스트 통과 확인**
+Run: `python3 -m pytest tests/test_render_convergence_adversarial.py -v`
+Expected: PASS (3 tests / 6 parametrized cases).
+- [ ] **Step 5: 커밋**
+```bash
+git add scripts/okstra_ctl/render.py tests/test_render_convergence_adversarial.py
+git commit -m "feat(okstra_ctl/render): inject adversarial convergence mode for discovery/error-analysis"
+```
+---
+### Task 2: convergence 상태 contract 테스트 + 적대적 fixture
+스키마를 1.2 로 올리고 `config.adversarial` 와 `votes.<worker>.disagreeBasis` 형태를 강제한다. 기존 1.1 fixture 3개는 그대로 통과해야 한다(신규 필드는 optional).
+**Files:**
+- Create: `tests/fixtures/convergence/adversarial-contested.json`
+- Modify: `tests/test_convergence_state_contract.py`
+- [ ] **Step 1: 적대적 fixture 작성 (실패 유발 데이터)**
+Create `tests/fixtures/convergence/adversarial-contested.json`. requirements-discovery 스타일(effectiveMaxRounds=1, 단일 라운드=마지막 라운드)로, codex 의 `counter-evidence` 반박 1건이 F-001 을 `contested` 로 강등하는 케이스:
+```json
+{
+  "schemaVersion": "1.2",
+  "taskKey": "fixture/adversarial-contested",
+  "config": {
+    "enabled": true,
+    "adversarial": true,
+    "maxRounds": 1,
+    "effectiveMaxRounds": 1,
+    "verificationMode": "full-reanalysis"
+  },
+  "findings": [
+    {
+      "findingId": "F-001",
+      "summary": "Login handler skips input validation",
+      "category": "bug",
+      "ticketIds": ["AD-100"],
+      "originWorker": "claude-worker",
+      "originEvidence": "src/auth/login.ts:42",
+      "classification": "contested",
+      "rounds": [
+        {
+          "round": 1,
+          "votes": {
+            "codex-worker": {
+              "verdict": "disagree",
+              "disagreeBasis": "counter-evidence",
+              "explanation": "src/auth/login.ts:48 already runs validateBody(); the claimed gap does not exist."
+            },
+            "gemini-worker": {
+              "verdict": "agree",
+              "disagreeBasis": null,
+              "explanation": "Re-read login.ts; could not break the claim."
+            }
+          }
+        }
+      ],
+      "consensusWorkers": ["claude-worker", "gemini-worker"],
+      "dissentingWorkers": ["codex-worker"]
+    }
+  ],
+  "roundHistory": [
+    {
+      "round": 1,
+      "inputQueueSize": 1,
+      "resolvedCount": 0,
+      "carriedForwardCount": 1,
+      "dispatches": [
+        {"worker": "codex-worker", "status": "completed", "durationMs": 173004},
+        {"worker": "gemini-worker", "status": "completed", "durationMs": 188210}
+      ],
+      "skippedWorkers": [
+        {"worker": "claude-worker", "reason": "no items to verify"}
+      ]
+    }
+  ],
+  "round2SkippedReason": "max-rounds-1",
+  "finalState": "max-rounds-reached",
+  "totalRounds": 1,
+  "finalClassificationCounts": {
+    "fullConsensus": 0,
+    "partialConsensus": 0,
+    "contested": 1,
+    "workerUnique": 0
+  }
+}
+```
+- [ ] **Step 2: 기존 테스트가 새 fixture 에서 깨지는지 확인 (red)**
+Run: `python3 -m pytest tests/test_convergence_state_contract.py -k adversarial -v`
+Expected: FAIL — `test_schema_version_is_1_1[adversarial-contested]` 가 `"1.2" == "1.1"` 단언에서 실패.
+- [ ] **Step 3: contract 테스트를 1.2 수용 + 적대적 형태 검증으로 갱신**
+Edit `tests/test_convergence_state_contract.py`:
+(a) 모듈 docstring 첫 줄(1행)을 `(schema v1.1)` → `(schema v1.1 / v1.2)` 로 수정.
+(b) `VALID_VERDICTS = {...}` 정의(32행) 바로 다음에 추가:
+```python
+VALID_DISAGREE_BASIS = {"counter-evidence", "burden-not-met", None}
+```
+(c) `test_schema_version_is_1_1`(40–41행) 전체를 교체:
+```python
+def test_schema_version_is_supported(fixture):
+    assert fixture["schemaVersion"] in {"1.1", "1.2"}
+```
+(d) 파일 끝에 신규 테스트 2개 추가:
+```python
+def test_disagree_basis_is_enum_when_present(fixture):
+    for f in fixture["findings"]:
+        for r in f["rounds"]:
+            for vote in r["votes"].values():
+                if "disagreeBasis" in vote:
+                    assert vote["disagreeBasis"] in VALID_DISAGREE_BASIS
+def test_adversarial_disagree_carries_basis(fixture):
+    """In an adversarial run every disagree vote must cite a refutation basis."""
+    if not fixture["config"].get("adversarial"):
+        return
+    for f in fixture["findings"]:
+        for r in f["rounds"]:
+            for worker, vote in r["votes"].items():
+                if vote["verdict"] == "disagree":
+                    assert vote.get("disagreeBasis") in {"counter-evidence", "burden-not-met"}, (
+                        f"{f['findingId']} {worker}: adversarial disagree without disagreeBasis"
+                    )
+```
+- [ ] **Step 4: 전체 contract 테스트 통과 확인**
+Run: `python3 -m pytest tests/test_convergence_state_contract.py -v`
+Expected: PASS — 기존 3 fixture(1.1) + 신규 1 fixture(1.2) 전부 통과. `config.get("adversarial")` 가 1.1 fixture 에서 `None`(falsy) 이라 `test_adversarial_disagree_carries_basis` 는 그들에 대해 no-op.
+- [ ] **Step 5: 커밋**
+```bash
+git add tests/test_convergence_state_contract.py tests/fixtures/convergence/adversarial-contested.json
+git commit -m "test(convergence): accept schema v1.2 with adversarial config + disagreeBasis"
+```
+---
+### Task 3: convergence SKILL.md — 적대적 모드 동작 정의
+이 Task 는 적대적 *행동*의 authoritative 선언이다(코드 강제 불가, lead/워커 prompt 지시). 다섯 군데를 편집한다. 각 Edit 는 기존 텍스트를 anchor 로 잡는다.
+**Files:**
+- Modify: `skills/okstra-convergence/SKILL.md`
+- [ ] **Step 1: Configuration 표에 `adversarial` 행 추가**
+Edit — `| `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |` 행(48행) 다음에 새 행 삽입:
+```markdown
+| `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
+```
+- [ ] **Step 2: 신규 §"Adversarial Verification Mode" 절 추가**
+Edit — §"Verification Mode" 의 "Full Re-analysis (opt-in)" 블록 끝(`Disadvantages: 2–3 times the cost, increased time` 줄, 193행) 다음에 새 절 삽입:
+```markdown
+## Adversarial Verification Mode
+Active only when `config.adversarial == true` (default for `requirements-discovery` and `error-analysis`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
+In adversarial mode the verifier's job inverts: instead of confirming a peer's finding, the verifier **tries to break it**, and the burden of proof sits on the claim — a finding survives only if refutation attempts fail.
+### Scoped full-reanalysis (BLOCKING)
+Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery and error-analysis" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
+### Adversarial verdict semantics
+The persisted `verdict` enum is unchanged (`agree | disagree | supplement | verification-error`). The prompt-facing labels are adversarial and map down on persistence:
+| Prompt label | Persisted `verdict` | Meaning |
+|---|---|---|
+| SURVIVES | `agree` | Actively tried to refute and failed — the claim withstood the attack. |
+| SURVIVES-WITH-CAVEAT | `supplement` | Holds, but a scope limit / extra condition / precondition was found. |
+| REFUTED | `disagree` | The claim was broken (or failed to prove itself). MUST carry a `disagreeBasis`. |
+Each `disagree` vote records a new field `disagreeBasis`:
+| `disagreeBasis` | Meaning |
+|---|---|
+| `counter-evidence` | The verifier cited contradicting evidence (`file:line` / log line) in `explanation`. A **hard refute**. |
+| `burden-not-met` | The verifier re-inspected the cited evidence and could neither confirm nor refute → the claim failed to prove itself ("when uncertain, lean to rejection"). |
+A `disagree` with `disagreeBasis == null` is a contract violation in adversarial mode — every refutation must state which of the two grounds it rests on. Bare "I disagree" without re-inspection is not allowed.
+### Adversarial classification (replaces the §"Convergence Algorithm" per-round classifier when `adversarial == true`)
+`verification-error` votes are excluded from numerator and denominator exactly as in the collaborative classifier. For each finding `F` in the queue at a round:
+```text
+disagrees    = [v for v in non-error votes if v.verdict == "disagree"]
+hard_refutes = [v for v in disagrees if v.disagreeBasis == "counter-evidence"]
+all_others_disagree = (every non-discoverer non-error vote is "disagree")
+IF len(disagrees) == 0:
+    resolve F as "full-consensus"   (or "partial-consensus" if any SUPPLEMENT/caveat)
+ELIF all_others_disagree:
+    resolve F as "worker-unique"    # only the discoverer still holds it
+ELIF len(hard_refutes) >= 1:
+    # an evidence-backed refute exists and the roster is split → the claim is disputed
+    carry F forward; at the LAST executed round classify it "contested"
+ELIF burden-not-met disagrees are a majority of non-error votes:
+    carry F forward; at the LAST executed round classify it "contested"
+ELSE:
+    # a lone weak (burden-not-met) doubt against an otherwise-surviving claim
+    resolve F as "partial-consensus"
+```
+`contested` remains a **final classification only** (per §"Scope and Terminology"): a disputed finding is carried forward through intermediate rounds and labelled `contested` only at the last executed round. For `requirements-discovery` (`effectiveMaxRounds = 1`) the single round IS the last round, so a split-with-hard-refute finding is labelled `contested` in that one round. The final-classifier block of §"Convergence Algorithm" is unchanged; this section only changes how each round's verdicts resolve into queue actions.
+Design intent: one `counter-evidence` refute is enough to deny a claim consensus (it cannot rise above `contested` no matter how many others AGREE), while a single `burden-not-met` doubt does not by itself sink an otherwise-surviving claim — only a majority of burden-not-met doubts does.
+```
+- [ ] **Step 3: 적대적 재검증 프롬프트 추가**
+Edit — §"Lightweight Re-verification Prompt" 의 코드펜스가 끝나는 지점(283행 `**Verdict**: ...` 다음 ` ``` ` 줄) 다음에 새 하위 절 삽입:
+```markdown
+### Adversarial Re-verification Prompt
+Used instead of the lightweight/full-reanalysis prompt when `config.adversarial == true`. The required anchor headers (§"Required reverify-prompt anchor headers") are identical. The `[Required reading]` clause is suppressed; only the cited-evidence paths of the items under attack are injected (see §"Adversarial Verification Mode" → Scoped full-reanalysis).
+```
+You are <worker-role> performing ADVERSARIAL re-verification for <task-key> (round <N>).
+## Instructions
+Your job is to BREAK each finding below, not to confirm it. For EACH finding,
+open the cited evidence directly and actively search for evidence that the claim
+is wrong, overstated, or unproven. Then respond with exactly one verdict:
+- **REFUTED**: You broke the claim. State the basis:
+  - counter-evidence — you found contradicting evidence (give file:line or log line), OR
+  - burden-not-met — you re-inspected the cited evidence and could neither confirm
+    nor refute it (the claim has not proven itself).
+- **SURVIVES**: You actively tried to refute it and failed — the claim withstood the attack.
+- **SURVIVES-WITH-CAVEAT**: It holds, but a scope limit / extra condition / missing
+  precondition exists (state it).
+The burden of proof is on the claim. If after inspecting the cited evidence you remain
+uncertain, your verdict is REFUTED with basis = burden-not-met.
+Inspect ONLY the evidence each finding cites and its immediate surroundings. Do NOT
+re-read the task brief, instruction-set, or report template.
+## Findings to verify
+### F-001: <one-line summary>
+**Origin**: <worker role>
+**Cited evidence**: <file paths, line numbers, log lines from origin worker>
+### F-002: <one-line summary>
+...
+## Response format
+### F-001
+**Verdict**: REFUTED | SURVIVES | SURVIVES-WITH-CAVEAT
+**Basis** (only if REFUTED): counter-evidence | burden-not-met
+**Explanation**: <2-3 sentences; for counter-evidence include the file:line you found>
+### F-002
+...
+```
+When persisting votes, map SURVIVES→`agree`, SURVIVES-WITH-CAVEAT→`supplement`, REFUTED→`disagree`, and copy the stated Basis into `votes.<worker>.disagreeBasis` (null for non-REFUTED verdicts).
+```
+- [ ] **Step 4: 스키마(State Artifact)에 `config.adversarial` + `disagreeBasis` + v1.2 반영**
+Edit (a) — §"Convergence State Artifact" 예시 JSON 의 `config` 블록에 `adversarial` 추가. `"enabled": true,`(330행 부근) 다음 줄에:
+```json
+    "adversarial": false,
+```
+Edit (b) — 같은 예시의 `votes` 항목에 `disagreeBasis` 를 한 곳 보여준다. `"codex-worker": { "verdict": "agree", "explanation": "<brief>" },` 를 다음으로 교체:
+```json
+            "codex-worker": { "verdict": "agree", "disagreeBasis": null, "explanation": "<brief>" },
+```
+Edit (c) — Schema rules 목록(386–401행)의 `schemaVersion` 규칙 줄을 교체:
+```markdown
+- `schemaVersion`: literal string `"1.2"` for adversarial-capable runs (`"1.1"` for collaborative-only runs remains valid). Readers MUST accept `"1.0"` / `"1.1"` / `"1.2"` and treat any missing field as `null`.
+```
+Edit (d) — 같은 목록의 `config.effectiveMaxRounds` 규칙 줄 **앞에** 새 규칙 줄 추가:
+```markdown
+- `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
+```
+Edit (e) — `findings[].rounds[].votes.<worker>.verdict` 규칙 줄 다음에 새 규칙 줄 추가:
+```markdown
+- `findings[].rounds[].votes.<worker>.disagreeBasis`: enum `counter-evidence | burden-not-met | null`. Non-null only when `verdict == "disagree"` AND `config.adversarial == true`; `null` (or absent, treated as null) otherwise. See §"Adversarial Verification Mode".
+```
+- [ ] **Step 5: 빌드 + 워크플로 검증으로 문서 정합 확인**
+Run: `npm run build && bash validators/validate-workflow.sh`
+Expected: 빌드 성공(`runtime/` 동기화), validator PASS.
+- [ ] **Step 6: 커밋**
+```bash
+git add skills/okstra-convergence/SKILL.md runtime/
+git commit -m "feat(skills/okstra-convergence): define adversarial Phase 5.5 verification mode"
+```
+---
+### Task 4: 프로필 + 공통 계약에 적대적 Phase 5.5 선언
+**Files:**
+- Modify: `prompts/profiles/requirements-discovery.md`
+- Modify: `prompts/profiles/error-analysis.md`
+- Modify: `prompts/profiles/_common-contract.md`
+- [ ] **Step 1: requirements-discovery 프로필에 선언 추가**
+Edit `prompts/profiles/requirements-discovery.md` — `- Non-goals:` 줄(54행) **앞에** 새 항목 삽입:
+```markdown
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
+```
+- [ ] **Step 2: error-analysis 프로필에 선언 추가**
+Edit `prompts/profiles/error-analysis.md` — `- Non-goals:` 줄(33행) **앞에** 동일 항목 삽입:
+```markdown
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each root-cause / reproduction claim by directly re-inspecting the cited code, logs, or config; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
+```
+- [ ] **Step 3: 공통 계약의 Phase 5.5 설명 갱신**
+Edit `prompts/profiles/_common-contract.md` — "Worker interaction model" 의 Phase 5.5 항목(17행)에서, 문장 끝 `See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`).` 다음에 한 문장 추가(같은 bullet 내):
+```markdown
+ For `requirements-discovery` and `error-analysis` this phase runs in **adversarial mode** (`convergence.adversarial=true`): verifiers try to refute each finding against its cited evidence and the burden of proof sits on the claim — see that skill's §"Adversarial Verification Mode".
+```
+- [ ] **Step 4: 빌드 + 검증**
+Run: `npm run build && bash validators/validate-workflow.sh`
+Expected: 빌드 성공, validator PASS.
+- [ ] **Step 5: 커밋**
+```bash
+git add prompts/profiles/requirements-discovery.md prompts/profiles/error-analysis.md prompts/profiles/_common-contract.md runtime/
+git commit -m "feat(prompts/profiles): declare adversarial Phase 5.5 for discovery/error-analysis"
+```
+---
+### Task 5: CHANGES.md + 전체 검증 + 최종 커밋
+**Files:**
+- Modify: `CHANGES.md`
+- [ ] **Step 1: CHANGES.md 항목 추가**
+Edit `CHANGES.md` — `## 2026-06-04` 헤더(5행) 바로 다음에 새 `###` 블록 삽입(기존 첫 항목 위):
+```markdown
+### feat(convergence): requirements-discovery / error-analysis 의 Phase 5.5 를 적대적 검증으로
+- 기존 Phase 5.5 재검증은 협조적이었다 — 프롬프트가 `AGREE/DISAGREE/SUPPLEMENT` 를 묻고(동의가 저비용 기본값), 집계는 "다수 AGREE → consensus" 라 입증 책임이 반박자 쪽에 있어, 틀린 주장이라도 아무도 적극 반박하지 않으면 `full-consensus` 로 살아남았다. 라우팅(`requirements-discovery`)·근본원인(`error-analysis`) 처럼 틀린 주장이 다음 phase 전체를 오도하는 두 phase 에서 거짓 합의 비용이 가장 크다. 이제 이 두 phase 의 Phase 5.5 가 **적대적 모드**(`convergence.adversarial=true`)로 돈다 — 검증자는 인용된 증거를 직접 재조사해 주장을 깨뜨리려 시도하고(REFUTED/SURVIVES/SURVIVES-WITH-CAVEAT), 입증 책임은 주장 쪽에 있다(불확실하면 기각). 증거 기반 반박 1건이면 그 주장은 consensus 에 오르지 못한다. 재조사 범위는 finding 이 인용한 증거 파일+인접부로 한정해 비용 폭증을 막았고, maxRounds 는 현행 유지(req-discovery=1, error-analysis=2). 상태 아티팩트는 `config.adversarial` 와 반박 근거(`disagreeBasis ∈ counter-evidence|burden-not-met`)를 기록(schema v1.2). 다른 phase 의 convergence 는 협조적 그대로다.
+- 사용자 영향: 다음 release + `npx -y okstra@latest install` 후 적용. 이제 두 phase 의 교차검증이 워커 주장을 적극 반박해, 근거 약한 합의가 걸러진다. 적대적 *행동* 자체는 lead/워커 prompt 지시(LLM 실행)이며 런타임 강제가 아니다 — 강제되는 것은 상태 아티팩트 형태(contract 테스트)뿐이다. `contested` 는 기각이 아니라 "다툼 있음" 분류이므로 finding 은 리포트에 남고 강등 사유(반대 증거 vs 입증 실패)가 기록된다.
+```
+- [ ] **Step 2: 전체 테스트 + 검증 + 빌드 정합**
+Run:
+```bash
+npm run build
+python3 -m pytest tests/ -q
+bash validators/validate-workflow.sh
+node bin/okstra --version
+```
+Expected: 빌드 성공, pytest 전부 PASS, validator PASS, 버전 출력.
+- [ ] **Step 3: 리뷰어 시점 self-review (Rule 5)**
+`git diff main...HEAD` 전체를 처음 보는 리뷰어 관점으로 통독하고, 신규 식별자(`adversarial`, `disagreeBasis`, `Adversarial Verification Mode`, `counter-evidence`, `burden-not-met`)를 `grep -rn` 으로 일관성 확인:
+```bash
+grep -rn "disagreeBasis\|adversarial\|Adversarial Verification Mode\|counter-evidence\|burden-not-met" skills/ prompts/ scripts/ tests/ CHANGES.md
+```
+Expected: render.py(주입), skill(정의), 프로필/계약(선언), 테스트/fixture(강제), CHANGES(기록) 모두에서 같은 의미로 등장. 정의되지 않은 곳에서 토큰이 떠다니지 않을 것.
+- [ ] **Step 4: 최종 커밋**
+```bash
+git add CHANGES.md
+git commit -m "docs(changes): log adversarial Phase 5.5 verification for discovery/error-analysis"
+```
+---
+## Self-Review (작성자 체크리스트)
+**1. Spec coverage**
+- §2.1 phase-조건부 모드 → Task 1(render) + Task 3 Step 1(Configuration 표).
+- §2.2 적대적 프롬프트 → Task 3 Step 3.
+- §2.3 verdict 매핑 + disagreeBasis → Task 3 Step 2/4, Task 2.
+- §2.4 적대적 집계 → Task 3 Step 2.
+- §2.5 범위-한정 full-reanalysis → Task 3 Step 2(Scoped full-reanalysis), Step 3(프롬프트 지시).
+- §3.1 상태 스키마 1.2 → Task 2 + Task 3 Step 4.
+- §3.2 render 주입 → Task 1.
+- §4 변경 파일 6종 → Task 1–5 전부 커버.
+- §5 enforcement 정직성 → Task 2(형태 강제), CHANGES(행동은 prompt 지시 명시).
+- §7 수용 기준 1–5 → Task 5 Step 2 의 전체 검증으로 확인.
+**2. Placeholder scan:** 모든 코드/JSON/markdown 블록은 실제 내용. TBD/TODO 없음.
+**3. Type/식별자 일관성:** `adversarial`(bool), `disagreeBasis`(enum `counter-evidence|burden-not-met|null`), `verificationMode`("full-reanalysis"), 분류값(`full-consensus|partial-consensus|contested|worker-unique`) — Task 간 동일 철자 사용 확인.

package/docs/superpowers/specs/2026-06-04-adversarial-verification-design.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Phase 5.5 적대적 검증 (adversarial verification) — 설계
+- 작성일: 2026-06-04
+- 범위: `requirements-discovery` / `error-analysis` 두 phase 의 **Phase 5.5 convergence 재검증**을, 검증자가 다른 워커의 주장을 적극적으로 반박(refute)하려 시도하고 입증 책임을 주장 쪽에 두는 **적대적 검증** 구조로 전환한다. 별도 검증자 에이전트나 새 스테이지를 만들지 않고, 기존 convergence 재검증 루프를 phase-조건부 적대적 모드로 재구성한다.
+- 비범위
+  - 신규 worker/agent 추가 없음. `requirements-discovery` / `error-analysis` 의 `Required workers:` 로스터 불변.
+  - `implementation-planning` / `implementation` / `final-verification` / `release-handoff` 의 convergence 동작 불변 — 이들은 현행 협조적(collaborative) 재검증을 그대로 유지한다.
+  - `implementation-planning` 의 plan-body verification(`P-*` 큐) 불변 — 본 설계는 finding 큐(`F-*`)만 다룬다.
+  - convergence 라운드/큐 구조 자체(Round 0 grouping, queue-pruned 루프, Round 2 gate)는 그대로 재사용한다.
+- 관계: 본 문서는 [`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md) 의 §"Verification Mode" 와 §"Lightweight Re-verification Prompt" 를 **두 phase 에 한해** 적대적 변형으로 확장한다. 협조적 모드 정의는 다른 phase 를 위해 그대로 남는다.
+## 1. 동기 — 현재 재검증은 협조적이라 거짓 합의를 만든다
+현재 Phase 5.5 의 재검증은 본질적으로 "동의 기본값" 구조다.
+1. **프롬프트가 협조적이다.** lightweight reverify 프롬프트([`skills/okstra-convergence/SKILL.md:247`](../../../skills/okstra-convergence/SKILL.md)) 는 `AGREE / DISAGREE / SUPPLEMENT` 를 묻고, "제시된 증거에 기반해 유효한가" 를 판단하게 한다. 적극적으로 깨뜨리라는 압력이 없으므로 AGREE 가 저비용 기본값이 된다.
+2. **집계가 반박자에게 입증 책임을 지운다.** 집계 규칙([`skills/okstra-convergence/SKILL.md:120`](../../../skills/okstra-convergence/SKILL.md)) 은 "다수가 AGREE → consensus" 다. 즉 주장은 다수가 적극적으로 반박해야만 강등된다. 틀린 주장이라도 아무도 적극 반박하지 않으면 `full-consensus` 로 살아남는다.
+3. **lightweight 는 텍스트만 본다.** 검증자는 원본 코드/로그를 재조사하지 않고 "제시된 증거"만 본다([`skills/okstra-convergence/SKILL.md:183`](../../../skills/okstra-convergence/SKILL.md)). 잘못된 증거 인용이 그대로 통과한다.
+특히 `requirements-discovery`(라우팅 결정)와 `error-analysis`(근본 원인 분석)는 **틀린 주장이 다음 phase 전체를 오도**하는 지점이다. 이 두 phase 에서 거짓 합의의 비용이 가장 크다. 따라서 검증의 기본 자세를 "동의" 에서 "반박 시도" 로 뒤집는다.
+## 2. 핵심 원칙
+### 2.1 phase-조건부 적대적 모드
+적대적 검증은 **`requirements-discovery` 와 `error-analysis` 두 phase 에만** 적용한다. convergence skill 은 모든 phase 가 공유하므로, 모드 분기는 manifest 의 `convergence` 블록에 새 플래그로 표현한다.
+| 키 | 두 적대적 phase 기본값 | 그 외 phase 기본값 |
+|---|---|---|
+| `convergence.adversarial` | `true` | `false` |
+| `convergence.verificationMode` | `"full-reanalysis"` | `"lightweight"` |
+| `convergence.maxRounds` | req-discovery=`1`, error-analysis=`2` (현행 유지) | 현행 유지 |
+이 기본값은 [`scripts/okstra_ctl/render.py:899`](../../../scripts/okstra_ctl/render.py) `_build_convergence_block` 가 주입한다. 기존 `maxRounds` 의 phase-aware 분기(`1 if requirements-discovery else 2`) 와 동일한 패턴을 따른다. manifest 가 키를 명시하면 그 값을 우선한다(다른 phase 에서 적대적 검증을 실험적으로 켜는 것은 manifest override 로 가능 — 그러나 기본값으로 권하지 않는다).
+`adversarial=false` 이면 본 설계의 모든 변경은 비활성이고 현행 협조적 동작이 그대로 돈다.
+### 2.2 적대적 재검증 프롬프트 — 반박이 임무다
+`adversarial=true` 일 때 lead 는 §"Lightweight Re-verification Prompt" 대신 **적대적 프롬프트**를 사용한다. 핵심 지시:
+- "너의 임무는 이 주장을 **깨뜨리는 것**이다. 인용된 원본 증거를 직접 열어 재조사하고, 주장을 무너뜨릴 반대 증거를 적극적으로 찾아라."
+- verdict 라벨(프롬프트 표면):
+  - **REFUTED** — 주장을 반박했다. 반드시 근거를 댄다(아래 `disagreeBasis`).
+  - **SURVIVES** — 적극적으로 반박을 시도했으나 깨지 못했다. 주장이 공격을 견뎠다.
+  - **SURVIVES-WITH-CAVEAT** — 견디나 범위 한정/추가 조건/전제를 발견했다.
+- **불확실성 처리(BLOCKING):** 원본 증거를 재조사한 뒤에도 주장을 **확인할 수도, 반증할 수도 없으면** 기본 verdict 는 **REFUTED** 다(`disagreeBasis = burden-not-met`). 입증 책임은 주장 쪽에 있으므로, 스스로 입증되지 않은 주장은 살아남지 못한다.
+### 2.3 verdict 매핑 — 영속 enum 불변, 신규 필드로 적대성 기록
+상태 아티팩트의 `verdict` enum 은 `{agree, disagree, supplement, verification-error}` 를 **그대로 유지**한다(contract 테스트 enum 변경 최소화). 프롬프트 라벨은 아래로 매핑해 영속한다:
+| 프롬프트 라벨 | 영속 `verdict` |
+|---|---|
+| SURVIVES | `agree` |
+| SURVIVES-WITH-CAVEAT | `supplement` |
+| REFUTED | `disagree` |
+적대성의 핵심 정보는 vote 에 추가하는 신규 필드 **`disagreeBasis`** 로 기록한다:
+| 값 | 의미 |
+|---|---|
+| `counter-evidence` | 반대 증거를 `file:line`(또는 로그 라인)으로 인용한 **강한 반박**. 인용은 `votes.<worker>.explanation` 에 포함한다. |
+| `burden-not-met` | 재조사했으나 확인도 반증도 못 함 → 주장이 입증 책임을 다하지 못함(= "불확실하면 기각"). |
+| `null` | verdict 가 `disagree` 가 아닐 때(=agree/supplement/verification-error). |
+`adversarial=true` 인데 verdict 가 `disagree` 이고 `disagreeBasis` 가 null 이면 contract 위반이다(§5 참조). 즉 적대적 모드의 모든 반박은 둘 중 하나의 근거를 반드시 가진다 — 근거 없는 "그냥 반대" 는 허용하지 않는다.
+### 2.4 적대적 집계 규칙 — 입증 책임을 주장 쪽으로
+`adversarial=true` 일 때 §"Convergence Algorithm" 의 분류 로직을 다음으로 대체한다(협조적 모드 로직은 `adversarial=false` 에서 그대로). 한 finding `F` 에 대해, `verification-error` 표는 분자·분모 모두에서 제외한다(현행과 동일):
+```text
+disagrees = [v for v in non-error votes if v.verdict == "disagree"]
+hard_refutes = [v for v in disagrees if v.disagreeBasis == "counter-evidence"]
+IF len(disagrees) == 0:
+    # 아무도 깨지 못함 → 주장이 공격을 견딤
+    F.classification = "full-consensus"
+      (단, supplement(=caveat)가 있으면 "partial-consensus")
+ELIF len(hard_refutes) >= 1:
+    # 증거 기반 반박이 1건이라도 성립 → 즉시 강등 (다수결 무관)
+    IF 비-발견자 전원이 disagree:
+        F.classification = "worker-unique"      # 사실상 기각
+    ELSE:
+        F.classification = "contested"
+ELSE:
+    # disagree 는 있으나 전부 burden-not-met (강한 반박 0건)
+    IF 비-발견자 전원이 disagree:
+        F.classification = "worker-unique"
+    ELIF burden-not-met disagree 가 다수(비-error 표의 과반):
+        F.classification = "contested"
+    ELSE:
+        F.classification = "partial-consensus"  # 소수의 약한 의심 — 견딘 것으로 본다
+```
+설계 의도:
+- **`counter-evidence` 반박 1건 = 강등.** 사용자가 명시한 "증거 기반 반박이 1건이라도 성립하면 강등". 다수가 동의해도 누군가 반대 증거를 `file:line` 으로 제시하면 그 주장은 무조건 `contested` 이상으로 내려간다.
+- **`burden-not-met` 은 다수일 때만 강등.** 한 검증자가 "잘 모르겠다" 한 것만으로 주장을 죽이지는 않되, 과반이 입증 실패를 보고하면 주장은 입증 책임을 못 다한 것으로 강등한다. 이로써 "불확실하면 기각 쪽으로 기운다" 를 구현한다.
+- 반박의 두 종류를 구분 영속하므로, 최종 리포트에서 "왜 강등됐는가"(반대 증거 발견 vs 입증 실패)를 추적할 수 있다.
+multi-라운드(error-analysis maxRounds=2)에서 라운드 간 carry-forward·최종 분류는 현행 규칙을 그대로 따르되, 각 라운드의 분류 판정에 위 적대적 로직을 적용한다.
+### 2.5 full-reanalysis 의 범위 한정 — 비용 폭증 방지
+선택된 `verificationMode="full-reanalysis"` 는 검증자가 원본 증거를 직접 재조사하게 한다. 그러나 [`skills/okstra-convergence/SKILL.md:245`](../../../skills/okstra-convergence/SKILL.md) 는 lightweight 를 "requirements-discovery·error-analysis 에서 가장 큰 회피 가능 비용" 이라 명시한다. 전체 instruction-set 재독으로 회귀하면 이 비용을 정면으로 되살린다.
+**해소:** 적대적 full-reanalysis 의 재조사 범위를 **"해당 finding 이 인용한 증거 파일 + 그 인접부"로 한정**한다. 전체 task brief / instruction-set / `final-report-template.md` 재독은 금지한다. 즉 검증자는 공격 대상 주장이 가리키는 코드/로그만 직접 열어 반대 증거를 찾는다.
+- §"Reverify prompt: required-reading suppression (BLOCKING)" 의 full-reanalysis 분기를 적대적 모드용으로 좁힌다: analysis-worker 파일 목록 전체가 아니라 **인용된 증거 경로만** 주입한다.
+- maxRounds 는 현행 유지(req-discovery=1, error-analysis=2). 적대적 1라운드면 "한 번 깨뜨려 보기" 에 충분하고, 비용을 라운드 수로 곱하지 않는다.
+## 3. 데이터 모델
+### 3.1 convergence 상태 아티팩트 (`runs/<task-type>/state/convergence-<task-type>-<seq>.json`)
+- `schemaVersion` 을 `"1.2"` 로 올린다. reader 는 `"1.0"`/`"1.1"` 을 계속 수용하고 누락 필드는 `null` 로 취급한다.
+- `config` 에 신규 키 추가:
+  - `adversarial`: boolean. 이 run 이 적대적 모드였는지. 현행 두 적대적 phase 는 `true`.
+- `findings[].rounds[].votes.<worker>` 에 신규 키 추가:
+  - `disagreeBasis`: enum `counter-evidence | burden-not-met | null`. §2.3 의 규칙을 따른다.
+- 기존 필드(`verdict` enum, `classification` enum, `finalState` 등)는 불변.
+### 3.2 render.py 가 주입하는 manifest `convergence` 블록
+`_build_convergence_block`([`scripts/okstra_ctl/render.py:899`](../../../scripts/okstra_ctl/render.py)) 가 다음을 추가로 결정한다:
+```python
+adversarial_phases = {"requirements-discovery", "error-analysis"}
+is_adversarial = task_type in adversarial_phases
+# ...
+"adversarial": is_adversarial,
+"verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
+```
+`maxRounds` 의 기존 분기는 그대로 둔다.
+## 4. 변경 대상 파일 (모두 source — `runtime/` 직접 수정 없음)
+1. [`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md)
+   - §"Configuration" 표에 `adversarial` 키 추가, 두 phase 기본값 명시.
+   - §"Verification Mode" 에 적대적 모드 설명 추가(범위 한정 full-reanalysis 포함).
+   - §"Convergence Algorithm" 에 `adversarial=true` 분기 집계 로직(§2.4) 추가. 협조적 로직은 그대로 유지.
+   - §"Lightweight Re-verification Prompt" 옆에 "Adversarial Re-verification Prompt"(§2.2) 신설.
+   - §"Reverify prompt: required-reading suppression" 의 full-reanalysis 분기를 적대적 모드용 인용-증거-한정으로 좁힘.
+   - §"Convergence State Artifact" 스키마를 1.2 로 갱신: `config.adversarial`, `votes.<worker>.disagreeBasis`.
+2. [`scripts/okstra_ctl/render.py:899`](../../../scripts/okstra_ctl/render.py) `_build_convergence_block` — §3.2.
+3. [`prompts/profiles/requirements-discovery.md`](../../../prompts/profiles/requirements-discovery.md) + [`prompts/profiles/error-analysis.md`](../../../prompts/profiles/error-analysis.md) — Phase 5.5 가 적대적으로 돈다는 선언 1줄(프로필이 동작의 authoritative 선언처임).
+4. [`prompts/profiles/_common-contract.md:16`](../../../prompts/profiles/_common-contract.md) — "Worker interaction model" 의 Phase 5.5 설명에, 두 phase 는 적대적 peer review 라는 한 줄 추가.
+5. [`tests/test_convergence_state_contract.py`](../../../tests/test_convergence_state_contract.py) + `tests/fixtures/convergence/` — `1.2` 수용, `disagreeBasis` enum 검증, `config.adversarial` 존재 검증, 적대적 fixture 1개 추가(`counter-evidence` 반박 1건 → `contested` 케이스).
+6. [`CHANGES.md`](../../../CHANGES.md) — `사용자 영향:` 항목.
+## 5. Enforcement — 선언과 강제의 구분
+정직한 enforcement 경계:
+- **적대적 *행동* 자체(lead 가 실제로 반박을 시도했는지, 검증자가 증거를 재조사했는지)는 런타임으로 강제할 수 없다.** lead 와 워커는 LLM 이므로, 적대성은 skill/프롬프트의 선언과 지시로만 유도된다. 이 한계를 문서에 명시한다.
+- **강제되는 것은 아티팩트의 *형태* 뿐이다.** `tests/test_convergence_state_contract.py` 가 fixture 에 대해 검증:
+  - `config.adversarial` 가 boolean 으로 존재.
+  - `disagreeBasis` 가 enum `{counter-evidence, burden-not-met, null}` 안에 있음.
+  - `adversarial==true` 인 fixture 에서, verdict 가 `disagree` 이면 `disagreeBasis != null`.
+- convergence 상태는 런타임 `validators/validate-run.py` 가 검사하지 않는다(현행과 동일). 따라서 본 설계는 런타임 run 에 대한 적대성 강제를 **약속하지 않는다** — fixture contract 테스트가 유일한 자동 검증 지점이다.
+## 6. 비용·리스크
+- **비용:** full-reanalysis 로의 전환은 lightweight 대비 라운드당 비용을 올린다. §2.5 의 인용-증거-한정으로 폭증을 막고, maxRounds 를 현행 유지(req-discovery=1)해 라운드 곱을 억제한다.
+- **리스크 — 거짓 강등(false negative):** 적대적 모드는 참인 주장을 `contested` 로 강등할 수 있다(검증자가 잘못된 반대 증거를 제시). 완화: `counter-evidence` 반박은 반드시 `file:line` 인용을 요구하므로(§2.3), 강등 사유가 리포트에 기록되어 사용자가 추적·반박할 수 있다. `contested` 는 기각이 아니라 "다툼 있음" 분류이므로 finding 은 리포트에 남는다.
+- **리스크 — burden-not-met 남용:** 검증자가 게으르게 "잘 모르겠다" 로 일관하면 다수 burden-not-met 으로 멀쩡한 주장이 강등될 수 있다. 완화: 프롬프트가 "재조사 후" 에만 burden-not-met 을 허용하도록 지시하고, 단일 burden-not-met 은 강등시키지 않는다(과반 필요, §2.4).
+## 7. 수용 기준
+1. `requirements-discovery` / `error-analysis` 의 manifest `convergence` 블록에 `adversarial: true`, `verificationMode: "full-reanalysis"` 가 주입된다. 그 외 phase 는 `adversarial: false`, `lightweight` 유지.
+2. convergence skill 이 `adversarial=true` 분기에서 적대적 프롬프트·적대적 집계·인용-증거-한정 재조사를 정의한다. `adversarial=false` 동작은 byte 단위로 현행과 동일.
+3. 상태 스키마 1.2 가 `config.adversarial` 와 `votes.<worker>.disagreeBasis` 를 문서화하고, contract 테스트가 §5 의 형태 규칙을 강제한다.
+4. `python3 -m pytest tests/` 와 `bash validators/validate-workflow.sh` 통과.
+5. 두 프로필과 `_common-contract.md` 가 적대적 Phase 5.5 를 선언한다.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.46.0",
+  "version": "0.47.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.46.0",
-  "builtAt": "2026-06-04T11:19:42.641Z",
+  "package": "0.47.0",
+  "builtAt": "2026-06-04T12:46:31.759Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/agents/SKILL.md CHANGED Viewed

@@ -250,7 +250,8 @@ Convergence is enabled by default. Configure via task-manifest.json:
 - `convergence.enabled`: true/false (default: true)
 - `convergence.maxRounds`: 1–3 — **phase-aware default**: `1` for `requirements-discovery`, `2` for all other task types
-- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`)
+- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`; the adversarial phases below force `"full-reanalysis"`)
+- `convergence.adversarial`: true/false — **phase-aware default**: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise. When `true`, Phase 5.5 runs in adversarial mode (verifiers refute findings; burden of proof on the claim). See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Adversarial Verification Mode".
 When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolve the effective value via the phase-aware default above before entering Phase 5.5, and record the resolved value in the convergence state artifact at `config.effectiveMaxRounds`.

package/runtime/prompts/profiles/_common-contract.md CHANGED Viewed

@@ -14,7 +14,7 @@ profile document.
 - Worker interaction model (shared — read before inferring behaviour from the roster):
   - the per-profile `Required workers:` block is a **roster**, not a behaviour contract. Each role's interaction mode changes across operating phases of the same run.
   - **Phase 4 / 5 (independent analysis)**: analyser workers (`claude`, `codex`, `gemini` when opted in) produce findings independently and have no access to one another's outputs. `report-writer` does not analyse.
-  - **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`).
+  - **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`). For `requirements-discovery` and `error-analysis` this phase runs in **adversarial mode** (`convergence.adversarial=true`): verifiers try to refute each finding against its cited evidence and the burden of proof sits on the claim — see that skill's §"Adversarial Verification Mode".
   - Do NOT conclude "no peer review happens" from the roster alone — every profile that lists ≥2 analyser workers runs convergence by default (`convergence.enabled=true` in `task-manifest.json`).
 - Tooling — read-only MCP availability (shared):
   - MCP is not implicit okstra context. Query an MCP server only when the task brief explicitly lists it as source material for this run. Any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.

package/runtime/prompts/profiles/error-analysis.md CHANGED Viewed

@@ -30,6 +30,8 @@
   - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
   - **Codebase-first ambiguity resolution (defect rule)**: any ambiguity about repro, file behavior, or symbol semantics that can be answered by `Read` / `Grep` / log inspection MUST be resolved that way and recorded with file:line (or log-line) evidence. Writing a clarification row for something the codebase or shipped logs already answer is a defect of this phase.
   - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <reporter-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed). A row with `none` that *could* have been answered by code or logs is a defect.
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each root-cause / reproduction claim by directly re-inspecting the cited code, logs, or config; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
 - Non-goals:
   - implementation details unless they are necessary to validate the cause
   - **source code edits, builds, migrations, or deployments** — this run produces evidence and cause analysis only; the fix belongs to a later `implementation-planning` run followed by an `implementation` run

package/runtime/prompts/profiles/requirements-discovery.md CHANGED Viewed

@@ -51,6 +51,8 @@
   - every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
   - **Codebase-first ambiguity resolution (defect rule)**: any ambiguity that can be answered by `Read` / `Grep` / file inspection MUST be resolved that way and recorded with file:line evidence. Writing a clarification row for something the codebase already answers is a defect of this phase.
   - **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, external authority). A row with `none` that *could* have been answered by the codebase is a defect.
+- Cross-verification mode:
+  - Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
 - Non-goals:
   - full implementation design unless it is required to decide the next phase
   - **source code edits, plan authoring, builds, or deployments** — this run only classifies the work and routes it; deeper analysis and planning belong to subsequent phases

package/runtime/python/okstra_ctl/render.py CHANGED Viewed

@@ -903,6 +903,8 @@ def _build_convergence_block(ctx: dict) -> dict:
     - `enabled` default True
     - `maxRounds` default 1 for `requirements-discovery`, 2 otherwise
     - `verificationMode` default "lightweight"
+    - `adversarial` default True for `requirements-discovery` / `error-analysis`
+      (forces `verificationMode` to "full-reanalysis"), False otherwise
     - `planBodyVerification` is implementation-planning specific; the key is
       always emitted (dead-letter on other phases) so the schema stays stable.
@@ -912,12 +914,15 @@ def _build_convergence_block(ctx: dict) -> dict:
     """
     task_type = ctx.get("TASK_TYPE", "")
     default_max_rounds = 1 if task_type == "requirements-discovery" else 2
+    adversarial_phases = {"requirements-discovery", "error-analysis"}
+    is_adversarial = task_type in adversarial_phases
     raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
     plan_verify_enabled = raw_plan_verify != "false"
     return {
         "enabled": True,
+        "adversarial": is_adversarial,
         "maxRounds": default_max_rounds,
-        "verificationMode": "lightweight",
+        "verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
         "planBodyVerification": {
             "enabled": plan_verify_enabled,
             "maxRounds": 1,

package/runtime/skills/okstra-convergence/SKILL.md CHANGED Viewed

@@ -46,6 +46,7 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
 | `enabled` | `true` | If `false`, skip the convergence loop and use the existing consensus/divergence method |
 | `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
 | `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
+| `adversarial` | phase-aware: `true` for `requirements-discovery` / `error-analysis`, `false` otherwise | When `true`, Phase 5.5 runs in **adversarial mode** (see §"Adversarial Verification Mode"): verifiers actively try to refute each finding, the burden of proof sits on the claim, and `verificationMode` is forced to `"full-reanalysis"` scoped to the finding's cited evidence. Resolved by `scripts/okstra_ctl/render.py` `_build_convergence_block` and recorded in `config.adversarial` of the convergence state artifact. |
 **Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
@@ -192,6 +193,62 @@ Use the findings as a guide, but reanalyze the original code/data yourself.
 Advantages: High accuracy
 Disadvantages: 2–3 times the cost, increased time
+## Adversarial Verification Mode
+Active only when `config.adversarial == true` (default for `requirements-discovery` and `error-analysis`; see §"Configuration"). When `false`, every rule in this section is inert and the collaborative behaviour documented elsewhere in this skill applies unchanged.
+In adversarial mode the verifier's job inverts: instead of confirming a peer's finding, the verifier **tries to break it**, and the burden of proof sits on the claim — a finding survives only if refutation attempts fail.
+### Scoped full-reanalysis (BLOCKING)
+Adversarial mode forces `verificationMode = "full-reanalysis"`, but the re-analysis is **scoped to the evidence the finding under attack cites** (the file paths / line ranges / log lines in its `originEvidence`), plus the immediately surrounding context. The verifier MUST NOT re-read the whole task brief, instruction-set, or `final-report-template.md`. This keeps the documented "single largest avoidable cost in requirements-discovery and error-analysis" (see §"Reverify prompt: required-reading suppression") bounded while making the refutation real rather than a text-only argument.
+### Adversarial verdict semantics
+The persisted `verdict` enum is unchanged (`agree | disagree | supplement | verification-error`). The prompt-facing labels are adversarial and map down on persistence:
+| Prompt label | Persisted `verdict` | Meaning |
+|---|---|---|
+| SURVIVES | `agree` | Actively tried to refute and failed — the claim withstood the attack. |
+| SURVIVES-WITH-CAVEAT | `supplement` | Holds, but a scope limit / extra condition / precondition was found. |
+| REFUTED | `disagree` | The claim was broken (or failed to prove itself). MUST carry a `disagreeBasis`. |
+Each `disagree` vote records a new field `disagreeBasis`:
+| `disagreeBasis` | Meaning |
+|---|---|
+| `counter-evidence` | The verifier cited contradicting evidence (`file:line` / log line) in `explanation`. A **hard refute**. |
+| `burden-not-met` | The verifier re-inspected the cited evidence and could neither confirm nor refute → the claim failed to prove itself ("when uncertain, lean to rejection"). |
+A `disagree` with `disagreeBasis == null` is a contract violation in adversarial mode — every refutation must state which of the two grounds it rests on. Bare "I disagree" without re-inspection is not allowed.
+### Adversarial classification (replaces the §"Convergence Algorithm" per-round classifier when `adversarial == true`)
+`verification-error` votes are excluded from numerator and denominator exactly as in the collaborative classifier. For each finding `F` in the queue at a round:
+```text
+disagrees    = [v for v in non-error votes if v.verdict == "disagree"]
+hard_refutes = [v for v in disagrees if v.disagreeBasis == "counter-evidence"]
+all_others_disagree = (every non-discoverer non-error vote is "disagree")
+IF len(disagrees) == 0:
+    resolve F as "full-consensus"   (or "partial-consensus" if any SUPPLEMENT/caveat)
+ELIF all_others_disagree:
+    resolve F as "worker-unique"    # only the discoverer still holds it
+ELIF len(hard_refutes) >= 1:
+    # an evidence-backed refute exists and the roster is split → the claim is disputed
+    carry F forward; at the LAST executed round classify it "contested"
+ELIF burden-not-met disagrees are a majority of non-error votes (per the Majority definition in the Convergence Algorithm section):
+    carry F forward; at the LAST executed round classify it "contested"
+ELSE:
+    # a lone weak (burden-not-met) doubt against an otherwise-surviving claim
+    resolve F as "partial-consensus"
+```
+`contested` remains a **final classification only** (per §"Scope and Terminology"): a disputed finding is carried forward through intermediate rounds and labelled `contested` only at the last executed round. For `requirements-discovery` (`effectiveMaxRounds = 1`) the single round IS the last round, so a split-with-hard-refute finding is labelled `contested` in that one round. The final-classifier block of §"Convergence Algorithm" is unchanged; this section only changes how each round's verdicts resolve into queue actions.
+Design intent: one `counter-evidence` refute is enough to deny a claim consensus (it cannot rise above `contested` no matter how many others AGREE), while a single `burden-not-met` doubt does not by itself sink an otherwise-surviving claim — only a majority of burden-not-met doubts does. When every non-discoverer refutes (all_others_disagree), the finding is worker-unique regardless of whether those refutes were counter-evidence or burden-not-met — only the discoverer still holds it. A SUPPLEMENT/caveat with zero disagrees lands partial-consensus rather than full-consensus, because a caveat means the claim does not pass cleanly (this differs from the collaborative classifier, where SUPPLEMENT counts as full agreement).
 ## Re-verification Agent Dispatch
 ### Sponsorship Optimization
@@ -282,6 +339,55 @@ For each finding, respond as:
 **Verdict**: ...
 ```
+### Adversarial Re-verification Prompt
+Used instead of the lightweight/full-reanalysis prompt when `config.adversarial == true`. The required anchor headers (§"Required reverify-prompt anchor headers") are identical. The `[Required reading]` clause is suppressed; only the cited-evidence paths of the items under attack are injected (see §"Adversarial Verification Mode" → Scoped full-reanalysis).
+```
+You are <worker-role> performing ADVERSARIAL re-verification for <task-key> (round <N>).
+## Instructions
+Your job is to BREAK each finding below, not to confirm it. For EACH finding,
+open the cited evidence directly and actively search for evidence that the claim
+is wrong, overstated, or unproven. Then respond with exactly one verdict:
+- **REFUTED**: You broke the claim. State the basis:
+  - counter-evidence — you found contradicting evidence (give file:line or log line), OR
+  - burden-not-met — you re-inspected the cited evidence and could neither confirm
+    nor refute it (the claim has not proven itself).
+- **SURVIVES**: You actively tried to refute it and failed — the claim withstood the attack.
+- **SURVIVES-WITH-CAVEAT**: It holds, but a scope limit / extra condition / missing
+  precondition exists (state it).
+The burden of proof is on the claim. If after inspecting the cited evidence you remain
+uncertain, your verdict is REFUTED with basis = burden-not-met.
+Inspect ONLY the evidence each finding cites and its immediate surroundings. Do NOT
+re-read the task brief, instruction-set, or report template.
+## Findings to verify
+### F-001: <one-line summary>
+**Origin**: <worker role>
+**Cited evidence**: <file paths, line numbers, log lines from origin worker>
+### F-002: <one-line summary>
+...
+## Response format
+### F-001
+**Verdict**: REFUTED | SURVIVES | SURVIVES-WITH-CAVEAT
+**Basis** (only if REFUTED): counter-evidence | burden-not-met
+**Explanation**: <2-3 sentences; for counter-evidence include the file:line you found>
+### F-002
+...
+```
+When persisting votes, map SURVIVES→`agree`, SURVIVES-WITH-CAVEAT→`supplement`, REFUTED→`disagree`, and copy the stated Basis into `votes.<worker>.disagreeBasis` (null for non-REFUTED verdicts).
 ### Full Re-analysis Re-verification Prompt
 ```
@@ -324,10 +430,11 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
 ```json
 {
-  "schemaVersion": "1.1",
+  "schemaVersion": "1.2",
   "taskKey": "<task-key>",
   "config": {
     "enabled": true,
+    "adversarial": false,
     "maxRounds": 2,
     "effectiveMaxRounds": 2,
     "verificationMode": "lightweight"
@@ -345,7 +452,7 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
         {
           "round": 1,
           "votes": {
-            "codex-worker": { "verdict": "agree", "explanation": "<brief>" },
+            "codex-worker": { "verdict": "agree", "disagreeBasis": null, "explanation": "<brief>" },
             "gemini-worker": { "verdict": "supplement", "explanation": "<brief>" }
           }
         }
@@ -385,11 +492,13 @@ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
 Schema rules:
-- `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
+- `schemaVersion`: literal string `"1.2"` for all new runs — both adversarial and collaborative. v1.2 adds `config.adversarial` and `votes.<worker>.disagreeBasis`, written as `false` / `null` respectively on collaborative runs. Readers MUST accept `"1.0"` / `"1.1"` / `"1.2"` for historical artifacts and treat any missing field as `null`.
+- `config.adversarial`: boolean. `true` when this run used adversarial verification (default for `requirements-discovery` / `error-analysis`). When `true`, `config.verificationMode` is `"full-reanalysis"` (scoped) and every `disagree` vote carries a non-null `disagreeBasis`.
 - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
 - `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
 - `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
-- `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted in v1.1.
+- `findings[].rounds[].votes.<worker>.disagreeBasis`: enum `counter-evidence | burden-not-met | null`. Non-null only when `verdict == "disagree"` AND `config.adversarial == true`; `null` (or absent, treated as null) otherwise. See §"Adversarial Verification Mode".
+- `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted.
 - `roundHistory[].inputQueueSize`: queue size at the start of this round.
 - `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
 - `roundHistory[].carriedForwardCount`: queue size at the END of this round — the single definition. In-round insertions into the queue are forbidden, so this always equals `inputQueueSize - resolvedCount`. The pseudocode's per-item `carriedForwardCount += 1` accumulator is a counting convenience that lands on the same value; persist the post-round queue length, not the loop accumulator, if the two ever diverge.
@@ -397,7 +506,7 @@ Schema rules:
 - `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
 - `round2SkippedReason`: literal enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped`. Always present. Use `"not-skipped"` when Round 2 actually ran. Use `"max-rounds-1"` when `effectiveMaxRounds == 1` (Round 2 was never attempted). Use `"queue-empty"` when Round 1 fully drained the queue. Use `"all-reverify-non-result"` when all Round 1 dispatches terminated as non-result.
 - `finalClassificationCounts`: post-loop counts. Required field with keys `fullConsensus`, `partialConsensus`, `contested`, `workerUnique`.
-- `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (Task 3's "Worker failure handling in reverify" rule 4). `aborted-non-result` is the new v1.1 value.
+- `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. Assigned by the lead at WHILE-loop exit: `converged` when the queue is empty at the end of any round; `max-rounds-reached` when the loop exits because `roundIndex == effectiveMaxRounds` with the queue still non-empty; `aborted-non-result` when the loop exits via the Worker-failure BREAK (per the "Worker failure handling in reverify" section, rule 4). `aborted-non-result` is the new v1.1 value.
 - `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
 ## Output