okstra 0.46.0 → 0.48.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/superpowers/plans/2026-06-04-adversarial-implementation-planning.md +294 -0
- package/docs/superpowers/plans/2026-06-04-adversarial-verification.md +570 -0
- package/docs/superpowers/plans/2026-06-04-coverage-critic.md +516 -0
- package/docs/superpowers/plans/2026-06-05-acceptance-critic.md +251 -0
- package/docs/superpowers/specs/2026-06-04-adversarial-implementation-planning-design.md +90 -0
- package/docs/superpowers/specs/2026-06-04-adversarial-verification-design.md +176 -0
- package/docs/superpowers/specs/2026-06-04-coverage-critic-design.md +99 -0
- package/docs/superpowers/specs/2026-06-05-acceptance-critic-design.md +90 -0
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +4 -1
- package/runtime/prompts/profiles/_common-contract.md +1 -1
- package/runtime/prompts/profiles/error-analysis.md +3 -0
- package/runtime/prompts/profiles/final-verification.md +2 -0
- package/runtime/prompts/profiles/implementation-planning.md +5 -1
- package/runtime/prompts/profiles/requirements-discovery.md +3 -0
- package/runtime/prompts/wizard/prompts.ko.json +13 -0
- package/runtime/python/okstra_ctl/render.py +27 -1
- package/runtime/python/okstra_ctl/run.py +12 -0
- package/runtime/python/okstra_ctl/wizard.py +72 -1
- package/runtime/skills/okstra-convergence/SKILL.md +190 -6
- package/runtime/skills/okstra-run/SKILL.md +1 -0
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
# Coverage critic — 설계 (sub-project B1)
|
|
2
|
+
|
|
3
|
+
- 작성일: 2026-06-04
|
|
4
|
+
- 범위: `requirements-discovery` / `error-analysis` / `implementation-planning` 세 phase 에, Phase 5.5 convergence 직후·Phase 6 report-writer 직전에 도는 **coverage critic pass** 를 추가한다. critic 은 기존 worker subagent 를 *재사용*해 dispatch 되며("아무도 안 본 각도/파일/모달리티, finding 없는 요구사항을 내라"), critic 이 낸 coverage gap 은 **1회 적대적 reverify** 를 거쳐 살아남은 것만 최종 finding/clarification 으로 병합된다. 사용 여부·backing provider 는 **okstra-run 초기 select box** 로 고르며(추천 + "직접 입력"), CLI 파라미터가 넘어오면 그 선택 단계를 건너뛴다. 기본값은 off(opt-in).
|
|
5
|
+
- 비범위
|
|
6
|
+
- `final-verification` 의 verdict 악마의 변호인은 **sub-project B2** — 본 문서 밖. (critic primitive 는 B2 가 재사용한다.)
|
|
7
|
+
- 새 설치형 agent(`critic-worker.md` / `installed-agents.json`) 추가 없음 — 기존 `claude-worker`/`codex-worker`/`gemini-worker` subagent_type 을 critic 프롬프트로 재사용.
|
|
8
|
+
- convergence 의 finding/plan-body classification·gate enum 변경 없음 — critic gap 은 기존 adversarial finding classifier 를 그대로 탄다.
|
|
9
|
+
- 워커 *수*(로스터 크기) 변경 없음 — critic 은 별도 pass 이지 Phase 4 analyser 로스터를 늘리는 게 아니다.
|
|
10
|
+
- 관계: 적대적 convergence 기계([`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md) §"Adversarial Verification Mode")를 재사용해 critic gap 을 검증한다. wizard select 패턴은 기존 `okstra-run` step 들(`defaults_or_custom` 등)과 동일한 picker 관례를 따른다.
|
|
11
|
+
|
|
12
|
+
## 1. 동기 — 커버리지 갭은 구조적이다
|
|
13
|
+
|
|
14
|
+
okstra 의 analyser 워커들은 설계상 *같은* section 1–5 질문을 답한다 — triangulation 이지 partition 이 아니다 ([`skills/okstra-team-contract/SKILL.md:204`](../../../skills/okstra-team-contract/SKILL.md)). 따라서 워커를 늘려도 *같은 종류*의 finding 을 더 찾을 뿐, "아무도 보지 않은 각도" 는 구조적으로 비어 있다. 합의 품질(b)은 적대적 검증으로 올렸지만, 커버리지(c) — 놓친 finding — 는 별도 메커니즘이 필요하다. coverage critic 은 통합 findings 를 입력으로 "무엇이 빠졌나" 만 전담해 이 갭을 메운다.
|
|
15
|
+
|
|
16
|
+
## 2. 핵심 설계
|
|
17
|
+
|
|
18
|
+
### 2.1 critic pass primitive
|
|
19
|
+
|
|
20
|
+
Phase 5.5 convergence 가 끝나 findings 가 분류된 직후, Phase 6 report-writer dispatch **전에** lead 가 critic pass 를 1회 실행한다.
|
|
21
|
+
|
|
22
|
+
- **재사용 dispatch:** 선택된 provider 의 기존 subagent_type(`claude-worker` / `codex-worker` / `gemini-worker`)에 critic 프롬프트로 dispatch. 새 agent 정의 없음.
|
|
23
|
+
- 입력: task brief + Phase 5.5 통합 findings 요약 + 코드베이스 read 접근.
|
|
24
|
+
- 프롬프트 골자: "다음은 이미 합의된 findings 다. 아무도 검사하지 않은 파일/디렉터리/실행경로, finding 이 하나도 없는 요구사항/수용기준, 제기됐지만 아무도 검증하지 않은 주장을 찾아라. 각 coverage gap 을 새 finding 으로, 근거(`file:line` 또는 요구사항 인용)와 함께 내라. 이미 있는 finding 을 반복하지 마라."
|
|
25
|
+
- 결과 파일: `runs/<task-type>/worker-results/<provider>-critic-<task-type>-<seq>.md`.
|
|
26
|
+
- **1회 적대적 reverify (질문3 결정):** critic 이 낸 gap 들을 `originWorker=<provider>-critic` 인 새 finding 으로 verification queue 에 넣고, **Phase 4 analyser 들**(critic 자신 제외)이 1라운드 적대적으로 reverify 한다(기존 §"Adversarial Verification Mode" classifier 재사용: 증거기반 반박 1건 → 강등). `full-consensus`/`partial-consensus` 로 살아남은 gap 만 최종 리포트 finding 으로 병합; 강등(`contested`/`worker-unique`)된 gap 은 환각으로 보고 버리거나 dissent 로만 기록.
|
|
27
|
+
- **모델 (질문2 결정):** backing provider 는 `--critic <claude|codex|gemini>` 로 선택, 모델은 그 provider 의 기존 `--<provider>-model` 값(executor 바인딩 패턴 미러링). 별도 critic 모델 플래그는 두지 않는다(YAGNI).
|
|
28
|
+
|
|
29
|
+
### 2.2 선택 UX — wizard select step + CLI bypass
|
|
30
|
+
|
|
31
|
+
critic 사용 여부·provider 는 실행 파라미터 전용이 아니라 **okstra-run 초기 select box** 로 고른다.
|
|
32
|
+
|
|
33
|
+
- 신규 wizard step `S_CRITIC_PICK` — [`scripts/okstra_ctl/wizard.py`](../../../scripts/okstra_ctl/wizard.py) 의 `_build_*`/`_submit_*` + [`prompts/wizard/prompts.ko.json`](../../../prompts/wizard/prompts.ko.json) SOT. picker 관례(추천 1~2 + 마지막 "직접 입력"):
|
|
34
|
+
- `critic 사용 안 함 (기본·추천)` → 비활성
|
|
35
|
+
- `claude critic (opus)` *(추천)* → provider=claude, 모델=해당 phase 의 claude 모델
|
|
36
|
+
- `직접 입력` → provider(+선택 모델)를 사용자가 직접 지정
|
|
37
|
+
- (목록은 현재 analyser 로스터에 맞춰 codex/gemini 옵션을 추천에 추가할 수 있으나, 추천은 최대 2개 + 직접 입력 = 3옵션 관례 유지.)
|
|
38
|
+
- **CLI 우선·건너뛰기:** `--critic <provider>` 또는 `--no-critic` 가 넘어오면 `S_CRITIC_PICK` 을 **건너뛴다**. `okstra.sh` / `node bin/okstra` 비대화 경로는 플래그로, 대화형 `okstra-run` 은 select box 로. 플래그 무지정 + 비대화 경로면 기본 off.
|
|
39
|
+
- 세 진입점(okstra-run skill / okstra.sh / node CLI)은 모두 `prepare_task_bundle()` 로 수렴하므로, critic 선택은 거기서 manifest 로 직렬화된다(단일 참조점 보존).
|
|
40
|
+
|
|
41
|
+
### 2.3 manifest `convergence.critic` 블록 + render resolve
|
|
42
|
+
|
|
43
|
+
[`scripts/okstra_ctl/render.py`](../../../scripts/okstra_ctl/render.py) 가 `convergence` 하위에 `critic` 블록을 emit 한다:
|
|
44
|
+
|
|
45
|
+
```json
|
|
46
|
+
"critic": {
|
|
47
|
+
"enabled": false,
|
|
48
|
+
"provider": null,
|
|
49
|
+
"modelExecutionValue": null
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
- `enabled`: wizard 선택 or `--critic` 플래그가 provider 를 정하면 `true`, 아니면 `false`(기본).
|
|
54
|
+
- `provider`: `claude` | `codex` | `gemini` | `null`.
|
|
55
|
+
- `modelExecutionValue`: 선택된 provider 의 모델(그 provider 의 `--<provider>-model` 시드에서). `enabled=false` 면 `null`.
|
|
56
|
+
|
|
57
|
+
lead 는 Phase 5.5 종료 시 이 블록을 읽어 critic pass 실행 여부/대상을 정한다.
|
|
58
|
+
|
|
59
|
+
### 2.4 적용 phase
|
|
60
|
+
|
|
61
|
+
requirements-discovery / error-analysis / implementation-planning. (final-verification 은 B2.) 이 세 phase 는 모두 finding 을 산출하므로 coverage critic 이 의미가 있다. release-handoff/implementation 은 적용하지 않는다.
|
|
62
|
+
|
|
63
|
+
## 3. 데이터 모델
|
|
64
|
+
|
|
65
|
+
- **manifest:** §2.3 의 `convergence.critic` 블록.
|
|
66
|
+
- **convergence 상태 아티팩트:** critic gap 은 `findings[]` 에 `originWorker: "<provider>-critic"` 로 들어가고 기존 rounds/votes/classification 스키마를 그대로 쓴다. 추적용으로 finding 에 선택 필드 `source: "critic"` 를 둔다(없으면 `null`=일반 워커 발견). schemaVersion 은 `1.2` 유지(optional 필드 추가, reader 는 누락을 null 로 취급) — enum 변경 없음.
|
|
67
|
+
- **convergence state `config`:** critic 실행 시 `config.critic` 에 `{ provider, modelExecutionValue, gapsProposed, gapsMerged }` 요약을 기록(감사용).
|
|
68
|
+
|
|
69
|
+
## 4. 변경 파일
|
|
70
|
+
|
|
71
|
+
1. [`scripts/okstra_ctl/render.py`](../../../scripts/okstra_ctl/render.py) `_build_convergence_block` — `critic` 블록 emit + `--critic`/wizard 선택 resolve.
|
|
72
|
+
2. [`scripts/okstra_ctl/run.py`](../../../scripts/okstra_ctl/run.py) — `--critic <provider>` / `--no-critic` argparse + ctx 전달 + 세 phase 한정 적용.
|
|
73
|
+
3. [`scripts/okstra_ctl/wizard.py`](../../../scripts/okstra_ctl/wizard.py) — `S_CRITIC_PICK` step `_build`/`_submit` + 흐름 편입(CLI 미지정 시에만 표시).
|
|
74
|
+
4. [`prompts/wizard/prompts.ko.json`](../../../prompts/wizard/prompts.ko.json) (+ 영문 SOT 있으면 동기) — `critic_pick` step label/options.
|
|
75
|
+
5. [`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md) — "Coverage critic pass" 절 신설(시점·프롬프트·1회 적대 reverify·병합·`convergence.critic` 스키마).
|
|
76
|
+
6. [`agents/SKILL.md`](../../../agents/SKILL.md) — Phase 5.5→6 흐름에 critic pass + `PROGRESS: phase-5.5-critic provider=<p>` 라인 + (해당 시) 모델/knob 참조.
|
|
77
|
+
7. [`prompts/profiles/requirements-discovery.md`](../../../prompts/profiles/requirements-discovery.md) / [`error-analysis.md`](../../../prompts/profiles/error-analysis.md) / [`implementation-planning.md`](../../../prompts/profiles/implementation-planning.md) — coverage critic opt-in 선언 1줄.
|
|
78
|
+
8. 테스트: render `critic` 블록 resolve(`--critic`/무지정/`--no-critic`), wizard `S_CRITIC_PICK` 빌드+제출, CLI bypass.
|
|
79
|
+
9. [`CHANGES.md`](../../../CHANGES.md) — 사용자 영향 항목.
|
|
80
|
+
|
|
81
|
+
## 5. Enforcement — 선언과 강제의 구분
|
|
82
|
+
|
|
83
|
+
- **machine-강제:** `convergence.critic` 블록 형태 + render resolve(`--critic`/`--no-critic`/무지정) → 단위 테스트. wizard `S_CRITIC_PICK` 의 picker 옵션·CLI bypass → wizard 테스트.
|
|
84
|
+
- **prompt-only(강제 불가):** critic 이 실제로 의미 있는 gap 을 찾는지, 1회 적대 reverify 가 환각을 거르는지는 lead/워커(LLM) 프롬프트 지시일 뿐 런타임 강제 아님 — skill/profile 선언으로 유도.
|
|
85
|
+
|
|
86
|
+
## 6. 비용·리스크
|
|
87
|
+
|
|
88
|
+
- **비용:** opt-in(기본 off). 켜면 critic dispatch 1 + reverify 1라운드(analyser 수만큼). 기본 off 라 미선택 run 은 비용 0.
|
|
89
|
+
- **리스크 — 환각 gap:** critic 이 가짜 gap 을 낼 수 있음. 완화: 1회 적대 reverify 가 증거 없는 gap 을 강등. 살아남은 gap 만 finding 으로.
|
|
90
|
+
- **리스크 — 중복 finding:** critic 이 기존 finding 을 재서술. 완화: 프롬프트가 "이미 있는 finding 반복 금지" 명시 + reverify 단계의 semantic grouping 이 중복을 흡수.
|
|
91
|
+
- **리스크 — reverify 투표자 부족:** critic gap 은 critic 자신을 뺀 Phase 4 analyser 가 검증. 기본 로스터가 ≥2 analyser 라 최소 1명은 항상 투표 가능. analyser 가 1명뿐인 비정상 구성이면 critic gap 은 검증 불가로 표면화만(병합 안 함)하고 그 사실을 기록.
|
|
92
|
+
|
|
93
|
+
## 7. 수용 기준
|
|
94
|
+
|
|
95
|
+
1. wizard 에 `S_CRITIC_PICK` select box 가 추가되고(추천 + "직접 입력"), `--critic`/`--no-critic` 미지정 대화형 run 에서만 표시된다. 플래그 지정 시 건너뛴다.
|
|
96
|
+
2. manifest `convergence.critic` 가 wizard 선택/플래그에서 정확히 resolve 된다(enabled/provider/model). 기본 off.
|
|
97
|
+
3. convergence skill 이 critic pass(시점·프롬프트·1회 적대 reverify·병합)를 정의한다.
|
|
98
|
+
4. 세 적용 phase 프로필이 coverage critic opt-in 을 선언한다.
|
|
99
|
+
5. `python3 -m pytest tests/` + `bash validators/validate-workflow.sh` 통과.
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# Acceptance devil's-advocate critic — 설계 (sub-project B2)
|
|
2
|
+
|
|
3
|
+
- 작성일: 2026-06-05
|
|
4
|
+
- 범위: sub-project B1 의 critic dispatch primitive(기존 worker 재사용 + provider/model 선택 + opt-in)를 `final-verification` 으로 확장한다. final-verification 에서 critic 은 **악마의 변호인** 모드로 동작 — "받아들이면 안 되는 이유/놓친 acceptance blocker 를 캐라" — 그리고 후보 blocker 는 **confirm-or-downgrade** 로 검증(확인→Acceptance Blocker, 미확인→Residual Risk, 절대 drop 안 함)되어 verdict 에 반영된다. 선택 UX·`--critic`/`S_CRITIC_PICK`·`convergence.critic` 블록은 B1 것을 그대로 재사용하며, 적용 phase 에 `final-verification` 을 추가한다.
|
|
5
|
+
- 비범위
|
|
6
|
+
- 새 dispatch 메커니즘·새 설치형 agent·새 selection 플래그 없음 — B1 primitive 전부 재사용.
|
|
7
|
+
- convergence 의 finding/plan-body classification·gate enum·verdict↔blocker validator 변경 없음 — 확인된 critic blocker 는 *기존* Acceptance Blockers 경로로 들어가고 기존 verdict 규칙(`accepted` ⇒ blocker 0)이 그대로 작동한다.
|
|
8
|
+
- B1 의 coverage critic 동작(3 finding-phase, 적대적-drop 검증) 변경 없음 — B2 는 final-verification 전용 *모드*를 추가할 뿐.
|
|
9
|
+
- 관계: B1 [`2026-06-04-coverage-critic-design.md`](2026-06-04-coverage-critic-design.md) 가 만든 critic primitive·`convergence.critic` 블록·`S_CRITIC_PICK` 선택을 재사용한다. final-verification 프로필의 Acceptance Blockers/Residual Risk/Verdict Token 구조([`prompts/profiles/final-verification.md`](../../../prompts/profiles/final-verification.md))에 출력을 연동한다.
|
|
10
|
+
|
|
11
|
+
## 1. 동기
|
|
12
|
+
|
|
13
|
+
final-verification 은 수용 판정 직전의 마지막 게이트다. 거짓 합의가 여기서 새면 결함이 그대로 릴리스된다. 그런데 적대적 검증(sub-project 0)을 final-verification 에 그대로 적용하면 역효과다 — finding 이 *결함/blocker* 이므로 finding 을 적대적으로 반박하면 "재현 못 한 진짜 결함" 이 강등·누락된다(결함 민감도 하락). 그래서 final-verification 의 critic 은 finding 을 반박하는 게 아니라, **verdict 에 대한 악마의 변호인** — "이 작업을 받아들이면 안 되는 이유를 적극적으로 찾는" 추가 패스여야 한다. 이는 커버리지를 늘리되(놓친 blocker 발굴) 결함 민감도를 *높이는* 방향이다.
|
|
14
|
+
|
|
15
|
+
## 2. 핵심 설계
|
|
16
|
+
|
|
17
|
+
### 2.1 critic primitive 재사용 + 적용 phase 확장
|
|
18
|
+
|
|
19
|
+
B1 의 critic dispatch(선택된 provider 의 기존 subagent + provider model, opt-in)를 그대로 쓴다. 바뀌는 것은 적용 phase 집합뿐:
|
|
20
|
+
|
|
21
|
+
- [`scripts/okstra_ctl/render.py`](../../../scripts/okstra_ctl/render.py) `_build_convergence_block` 의 `critic_phases` 에 `final-verification` 을 추가 → `convergence.critic.enabled` 가 final-verification 에서도 true 가능.
|
|
22
|
+
- [`scripts/okstra_ctl/wizard.py`](../../../scripts/okstra_ctl/wizard.py) `S_CRITIC_PICK` 의 `applies` phase tuple + summary/confirmation 의 phase 조건에 `final-verification` 추가.
|
|
23
|
+
- 선택 UX(`--critic <provider>` / okstra-run select box), `convergence.critic {enabled,provider,modelExecutionValue}` 블록, 모델 해석은 **불변**(B1 그대로).
|
|
24
|
+
|
|
25
|
+
### 2.2 critic 행동은 phase 별로 분기 (B2 의 신규 부분)
|
|
26
|
+
|
|
27
|
+
convergence skill 에서 critic 행동을 phase 로 분기한다:
|
|
28
|
+
- `requirements-discovery` / `error-analysis` / `implementation-planning` → **coverage critic**(B1: "뭐가 빠졌나", 적대적-drop 검증). 불변.
|
|
29
|
+
- `final-verification` → **acceptance devil's-advocate critic**(신규).
|
|
30
|
+
|
|
31
|
+
**악마의 변호인 프롬프트(final-verification):**
|
|
32
|
+
```
|
|
33
|
+
You are the acceptance devil's advocate for <task-key>. The delivered work is about
|
|
34
|
+
to be judged for acceptance. Your ONLY job is to find reasons it should NOT be
|
|
35
|
+
accepted — surface candidate acceptance BLOCKERS the verifiers may have missed:
|
|
36
|
+
- requirements / acceptance points with no covering evidence,
|
|
37
|
+
- DB / IO / SQL changes lacking real-execution evidence,
|
|
38
|
+
- regressions or broken error paths,
|
|
39
|
+
- scope/contract violations.
|
|
40
|
+
For each, emit a candidate blocker with a one-line statement, evidence (file:line /
|
|
41
|
+
log / test output), and a severity (critical / major / minor). Do NOT restate an
|
|
42
|
+
existing Acceptance Blocker. If you find none, say so explicitly.
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**검증 = confirm-or-downgrade (B1 의 적대적-drop 과 다름, BLOCKING):**
|
|
46
|
+
각 후보 blocker 를 Phase 4 analyser 들(critic 제외)이 검증한다.
|
|
47
|
+
- **확인**(재현/증거 인용 성공) → `## 4 Acceptance Blockers` 행으로 승격(severity 유지, follow-up phase 포함).
|
|
48
|
+
- **미확인**(재현 불가 또는 증거 약함) → **Residual Risk 로 강등(절대 drop 하지 않음)** — 추적 대상으로 남기고 trigger 를 기록.
|
|
49
|
+
- 적대적 finding classifier 의 "불확실하면 기각" 규칙을 여기 적용하는 것은 금지(진짜 결함을 억누름).
|
|
50
|
+
|
|
51
|
+
**출력·verdict 연동:**
|
|
52
|
+
- 확인된 후보가 Acceptance Blockers 에 들어가면, `accepted` 는 blocker 0 을 요구하므로([`final-verification.md:32`](../../../prompts/profiles/final-verification.md)) verdict 가 자동으로 `conditional-accept` / `blocked` 로 밀린다. 이것이 악마의 변호인의 목적이다.
|
|
53
|
+
- 미확인 후보는 Residual Risk 로 — verdict 를 막지는 않으나 추적된다.
|
|
54
|
+
- 기존 verdict↔blocker 일관성 validator(`validators/validate-run.py` `_validate_final_verification_consistency`)가 그대로 강제한다. 새 enum/validator 없음.
|
|
55
|
+
|
|
56
|
+
### 2.3 critic 결과의 상태 기록
|
|
57
|
+
|
|
58
|
+
- critic 후보 blocker 는 `runs/final-verification/worker-results/<provider>-critic-final-verification-<seq>.md` 에 기록.
|
|
59
|
+
- convergence 상태 아티팩트의 `config.critic` 요약(B1 정의)에 `mode: "acceptance-devils-advocate"`, `candidatesProposed`, `confirmedBlockers`, `downgradedToResidual` 를 기록(optional v1.2 필드, reader 는 누락을 null 로). enum 변경 없음.
|
|
60
|
+
|
|
61
|
+
## 3. 변경 파일
|
|
62
|
+
|
|
63
|
+
1. [`scripts/okstra_ctl/render.py`](../../../scripts/okstra_ctl/render.py) — `critic_phases` 에 `final-verification` 추가.
|
|
64
|
+
2. [`tests/test_render_critic_block.py`](../../../tests/test_render_critic_block.py) — `final-verification` 을 비적용→적용 파라미터로 이동(`enabled=True` 검증).
|
|
65
|
+
3. [`scripts/okstra_ctl/wizard.py`](../../../scripts/okstra_ctl/wizard.py) — `S_CRITIC_PICK.applies` + summary/confirmation phase 조건에 `final-verification` 추가.
|
|
66
|
+
4. [`tests/test_wizard_critic_pick.py`](../../../tests/test_wizard_critic_pick.py) — `final-verification` 을 "skipped" → "applies" 로 플립.
|
|
67
|
+
5. [`skills/okstra-convergence/SKILL.md`](../../../skills/okstra-convergence/SKILL.md) — "Coverage critic pass" 절에 phase 분기 + "Acceptance devil's-advocate critic (final-verification)" 하위 모드(프롬프트·confirm-or-downgrade·blocker/residual-risk 출력·상태 요약) 추가.
|
|
68
|
+
6. [`prompts/profiles/final-verification.md`](../../../prompts/profiles/final-verification.md) — 악마의 변호인 critic opt-in 선언 + 출력이 Acceptance Blockers/Residual Risk 로 들어가고 verdict 에 미치는 영향.
|
|
69
|
+
7. [`agents/SKILL.md`](../../../agents/SKILL.md) — critic pass(Phase 5.6)가 final-verification 에도 적용됨을 반영(phase 행/PROGRESS 주석).
|
|
70
|
+
8. (선택) [`prompts/wizard/prompts.ko.json`](../../../prompts/wizard/prompts.ko.json) — `critic_pick` label 을 phase-중립적으로 일반화(예: "추가 critic 패스(놓친 finding/blocker 발굴) — opt-in").
|
|
71
|
+
9. [`CHANGES.md`](../../../CHANGES.md) — 사용자 영향 항목.
|
|
72
|
+
|
|
73
|
+
## 4. Enforcement — 선언과 강제의 구분
|
|
74
|
+
|
|
75
|
+
- **machine-강제:** render `critic_phases` 에 final-verification 포함 + wizard `applies` 확장 → 단위/wizard 테스트. 확인된 critic blocker 가 Acceptance Blockers 로 들어갔을 때의 verdict 일관성(`accepted` ⇒ blocker 0)은 *기존* `_validate_final_verification_consistency` 가 그대로 강제.
|
|
76
|
+
- **prompt-only(강제 불가):** 악마의 변호인이 실제로 의미 있는 후보를 찾는지, confirm-or-downgrade 가 정확히 분류하는지는 lead/워커(LLM) 프롬프트 지시 — skill/profile 선언으로 유도.
|
|
77
|
+
|
|
78
|
+
## 5. 비용·리스크
|
|
79
|
+
|
|
80
|
+
- **비용:** opt-in(기본 off, B1 과 동일). 켜면 critic dispatch 1 + 후보 검증 1라운드(analyser 수만큼). 미선택 final-verification run 비용 0.
|
|
81
|
+
- **리스크 — 후보 폭증:** critic 이 약한 후보를 다수 낼 수 있음. 완화: confirm-or-downgrade 가 미확인을 Residual Risk 로 강등하므로 verdict 를 막는 것은 *확인된* blocker 뿐. severity·증거 필수.
|
|
82
|
+
- **리스크 — 거짓 통과(억압) 방지가 목적인데 confirm-or-downgrade 가 미확인을 강등:** 미확인을 drop 하지 않고 Residual Risk 로 남기므로 추적은 보존. "확인" 기준은 재현/증거 인용이며, 재현이 불확실한 고-severity 후보는 Residual Risk 의 escalation trigger 로 기록해 사용자가 판단할 수 있게 한다.
|
|
83
|
+
|
|
84
|
+
## 6. 수용 기준
|
|
85
|
+
|
|
86
|
+
1. `final-verification` 의 manifest `convergence.critic` 가 `--critic`/wizard 선택에서 resolve 되어 `enabled=true` 가능(B1 의 3 phase + final-verification = 4 적용 phase).
|
|
87
|
+
2. okstra-run `S_CRITIC_PICK` 이 final-verification 에서도 표시된다.
|
|
88
|
+
3. convergence skill 이 final-verification 의 악마의 변호인 모드(프롬프트·confirm-or-downgrade·blocker/residual-risk 출력)를 정의하고, B1 coverage 모드와 명확히 구분한다.
|
|
89
|
+
4. final-verification 프로필이 critic opt-in 과 verdict 영향을 선언한다.
|
|
90
|
+
5. `python3 -m pytest tests/` + `bash validators/validate-workflow.sh` 통과.
|
package/package.json
CHANGED
package/runtime/BUILD.json
CHANGED
package/runtime/agents/SKILL.md
CHANGED
|
@@ -42,6 +42,7 @@ This SKILL.md is the operating contract and phase index. Detailed procedures liv
|
|
|
42
42
|
| 4. Execution | Spawn analysis workers (Teams preferred) | `okstra-team-contract` |
|
|
43
43
|
| 5. Fallback | Sequential/background dispatch when Teams unavailable | `okstra-team-contract` |
|
|
44
44
|
| 5.5 Convergence | Cross-verify findings across workers | `okstra-convergence` |
|
|
45
|
+
| 5.6 Critic pass | (opt-in) reused-worker critic pass: coverage gaps (discovery/error-analysis/impl-planning) or acceptance devil's-advocate (final-verification), each verified one round | `okstra-convergence` "Coverage critic pass" / "Acceptance critic pass" |
|
|
45
46
|
| 6. Synthesis | Dispatch Report writer worker, review draft. **For `implementation-planning`: then run the Phase 6 plan-body verification sub-step (see Phase 6 section below).** | `okstra-report-writer` + `okstra-convergence` (sub-step) |
|
|
46
47
|
| 7. Persist | Run token-usage collector, update manifests, then disband the worker team (shutdown teammates + `TeamDelete`, after collection) | `okstra-report-writer` + `_common-contract.md` "Run-end team teardown" |
|
|
47
48
|
|
|
@@ -92,6 +93,7 @@ Required checkpoints:
|
|
|
92
93
|
- `PROGRESS: phase-4-dispatch worker=<role> model=<model>` — once per worker, immediately before the `Agent` / wrapper call.
|
|
93
94
|
- `PROGRESS: phase-5-collect worker=<role> status=<terminal-status>` — once per worker, immediately after the result file is verified.
|
|
94
95
|
- `PROGRESS: phase-5.5-convergence round=<N> queue=<count>` — at the start of each convergence round (Phase 5.5).
|
|
96
|
+
- `PROGRESS: phase-5.6-critic provider=<provider> gaps=<n>` — when the coverage critic pass runs (Phase 5.6, opt-in). Omitted when `convergence.critic.enabled == false`.
|
|
95
97
|
- `PROGRESS: phase-6-synthesis dispatching report-writer-worker` — at the start of Phase 6.
|
|
96
98
|
- `PROGRESS: phase-7-persist updating manifests` — at the start of Phase 7.
|
|
97
99
|
- `PROGRESS: phase-7-teardown disbanding team` — after token-usage collection, immediately before shutting down worker teammates + `TeamDelete` (Teams mode only; see `_common-contract.md` "Run-end team teardown"). Skipped in the no-`team_name` fallback.
|
|
@@ -250,7 +252,8 @@ Convergence is enabled by default. Configure via task-manifest.json:
|
|
|
250
252
|
|
|
251
253
|
- `convergence.enabled`: true/false (default: true)
|
|
252
254
|
- `convergence.maxRounds`: 1–3 — **phase-aware default**: `1` for `requirements-discovery`, `2` for all other task types
|
|
253
|
-
- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`)
|
|
255
|
+
- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`; the adversarial phases below force `"full-reanalysis"`)
|
|
256
|
+
- `convergence.adversarial`: true/false — **phase-aware default**: `true` for `requirements-discovery` / `error-analysis` / `implementation-planning`, `false` otherwise. When `true`, Phase 5.5 runs in adversarial mode (verifiers refute findings; burden of proof on the claim). See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Adversarial Verification Mode".
|
|
254
257
|
|
|
255
258
|
When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolve the effective value via the phase-aware default above before entering Phase 5.5, and record the resolved value in the convergence state artifact at `config.effectiveMaxRounds`.
|
|
256
259
|
|
|
@@ -14,7 +14,7 @@ profile document.
|
|
|
14
14
|
- Worker interaction model (shared — read before inferring behaviour from the roster):
|
|
15
15
|
- the per-profile `Required workers:` block is a **roster**, not a behaviour contract. Each role's interaction mode changes across operating phases of the same run.
|
|
16
16
|
- **Phase 4 / 5 (independent analysis)**: analyser workers (`claude`, `codex`, `gemini` when opted in) produce findings independently and have no access to one another's outputs. `report-writer` does not analyse.
|
|
17
|
-
- **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`).
|
|
17
|
+
- **Phase 5.5 (convergence — peer review by workers)**: the lead replays each analyser's findings to the *other* analysers and collects `AGREE` / `DISAGREE` / `SUPPLEMENT` verdicts across up to `effectiveMaxRounds` rounds. Workers act as peer reviewers of each other's findings in this phase; the lead mediates but does not vote. See `skills/okstra-convergence/SKILL.md` for the round protocol, queue invariants, and final classification (`full-consensus` / `partial-consensus` / `contested` / `worker-unique`). For `requirements-discovery`, `error-analysis`, and `implementation-planning` this phase runs in **adversarial mode** (`convergence.adversarial=true`): verifiers try to refute each finding against its cited evidence and the burden of proof sits on the claim — see that skill's §"Adversarial Verification Mode".
|
|
18
18
|
- Do NOT conclude "no peer review happens" from the roster alone — every profile that lists ≥2 analyser workers runs convergence by default (`convergence.enabled=true` in `task-manifest.json`).
|
|
19
19
|
- Tooling — read-only MCP availability (shared):
|
|
20
20
|
- MCP is not implicit okstra context. Query an MCP server only when the task brief explicitly lists it as source material for this run. Any MCP-derived finding MUST cite server, table, and the SELECT used. MCP MUST NEVER be used as a write path — schema/data mutations go through repository migration files reviewed by humans.
|
|
@@ -30,6 +30,9 @@
|
|
|
30
30
|
- every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
|
|
31
31
|
- **Codebase-first ambiguity resolution (defect rule)**: any ambiguity about repro, file behavior, or symbol semantics that can be answered by `Read` / `Grep` / log inspection MUST be resolved that way and recorded with file:line (or log-line) evidence. Writing a clarification row for something the codebase or shipped logs already answer is a defect of this phase.
|
|
32
32
|
- **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <reporter-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only the reporter can answer this" (reporter-side data, business priority, environment they observed). A row with `none` that *could* have been answered by code or logs is a defect.
|
|
33
|
+
- Cross-verification mode:
|
|
34
|
+
- Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each root-cause / reproduction claim by directly re-inspecting the cited code, logs, or config; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
|
|
35
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
33
36
|
- Non-goals:
|
|
34
37
|
- implementation details unless they are necessary to validate the cause
|
|
35
38
|
- **source code edits, builds, migrations, or deployments** — this run produces evidence and cause analysis only; the fix belongs to a later `implementation-planning` run followed by an `implementation` run
|
|
@@ -44,6 +44,8 @@
|
|
|
44
44
|
3. **Coverage check** — every requirement in the originating plan/task brief is either marked covered (with artifact) or listed as a blocker. No silent omissions.
|
|
45
45
|
4. **Verifier dissent preserved** — if workers reach different verdicts, the disagreement is visible in section 1.2; synthesis hides nothing.
|
|
46
46
|
5. **No source-mutation audit** — scan the run's session transcripts for Edit / Write or state-mutating Bash commands that touch paths OUTSIDE `<PROJECT_ROOT>/.okstra/**` and outside the assigned run-artifact paths. Writes to worker prompts, audit sidecars, team-state, the final-report `data.json`, and rendered reports under the run directory are allowed okstra artifacts. Any source/schema/deployment mutation means the run has crossed into implementation and MUST be re-routed; do NOT silently strip the evidence.
|
|
47
|
+
- Cross-verification mode:
|
|
48
|
+
- **Acceptance critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker **acceptance devil's-advocate** pass runs after convergence to surface candidate acceptance blockers the verifiers may have missed. Each candidate is verified **confirm-or-downgrade**: confirmed → an `Acceptance Blockers` row (which, since `accepted` requires zero blockers, moves the verdict to `conditional-accept` / `blocked`); unconfirmed → a `Residual Risk` row (never dropped). See `skills/okstra-convergence/SKILL.md` "Acceptance critic pass (final-verification)".
|
|
47
49
|
- Non-goals:
|
|
48
50
|
- proposing unrelated refactors beyond the delivered scope
|
|
49
51
|
- **source code edits, follow-up bug fixes, or scope expansion** — this run renders a verdict only; defects detected here become inputs to a new `error-analysis` or `implementation-planning` run
|
|
@@ -37,6 +37,10 @@
|
|
|
37
37
|
- recommended execution order
|
|
38
38
|
- Approval gate (phase-specific addendum to shared authority rule):
|
|
39
39
|
- The YAML frontmatter `approved: true|false` field is the only authorised approval gate. report-writer always emits `approved: false`. The user clears it either by (a) editing the frontmatter line to `approved: true` directly, or (b) invoking the next phase with `--approve` so the CLI flips the frontmatter on the user's behalf. `okstra_ctl.run._validate_approved_plan` reads this field and refuses entry until it is `true`.
|
|
40
|
+
- Cross-verification mode:
|
|
41
|
+
- Phase 5.5 finding convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker finding (requirement gap / risk / option) by re-inspecting its cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode".
|
|
42
|
+
- §4.5.9 plan-body verification runs with an **adversarial posture** (`skills/okstra-convergence/SKILL.md` §"Adversarial plan-body posture"): verifiers open and confirm every cited path / command and put the burden of proof on the plan. The gate threshold is unchanged — a *majority* `DISAGREE` (`majority-disagree`) is still required to block approval; a single dissent does not.
|
|
43
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
40
44
|
- Non-goals:
|
|
41
45
|
- code-level micro-optimization unless it changes the implementation approach
|
|
42
46
|
- **source code edits of any kind** — this run produces a plan document only; Edit/Write on project source files is forbidden until the plan is approved and a separate `implementation` run starts
|
|
@@ -74,7 +78,7 @@
|
|
|
74
78
|
- the YAML frontmatter MUST include the line `approved: false` (report-writer always emits the unflipped value). The user authorises the next `implementation` run by flipping it to `approved: true` (manual edit or `--approve` CLI). Do NOT recreate any `User Approval Request` body block — the validator fails reports that contain one (see `validators/validate-run.py` deprecated patterns).
|
|
75
79
|
- **the frontmatter `approved: false` line is rendered unconditionally; if the plan-body verification gate (§4.5.9) returns `blocked-by-disagreement` or `aborted-non-result`, the writer MUST keep `approved: false` and the validator refuses any report that ships with `approved: true` under such a gate result.**
|
|
76
80
|
- every ambiguity flagged during pre-planning that the user must resolve before approval registered as a `Blocks=approval` row in the `## 5. Clarification Items` table (do NOT create a separate `Open Questions` block under `4.5.x` — the unified table is the single home)
|
|
77
|
-
- **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation.
|
|
81
|
+
- **§4.5.9 Plan Body Verification (BLOCKING).** After report-writer finishes the draft, the lead MUST run a worker peer-review round on the consolidated plan body (sections 4.5.1 – 4.5.7) and populate `### 4.5.9 Plan Body Verification` in the final report. The round protocol, plan-item ID scheme (`P-Opt-*` / `P-Step-*` / `P-Dep-*` / `P-Val-*` / `P-Rb-*`), verdict semantics, gate-result classification, and dissent log format are defined in `skills/okstra-convergence/SKILL.md` "Plan-body verification mode". The four gate-result values are `passed`, `passed-with-dissent`, `blocked-by-disagreement`, `aborted-non-result`. When the gate would have been `blocked-by-disagreement` or `aborted-non-result`, the lead MUST NOT silently flip it to one of the passing values to "unblock" the run — that is a contract violation. When `convergence.adversarial=true` (the default for this phase), this round uses the adversarial posture — verifiers confirm cited paths/commands and the burden of proof is on the plan — but the gate threshold stays `majority-disagree` (see that skill's §"Adversarial plan-body posture").
|
|
78
82
|
- **Decision-record evaluation (sole owner)**: this phase is the **single owner** of decision-record evaluation in the okstra lifecycle. The brief never evaluates or drafts decision records — it only forwards `adr-candidate:*` signals. Every `adr-candidate:*` entry inherited from the brief's `Open Questions` is a mandatory evaluation target. In addition, evaluate every decision the recommended option introduces against the three criteria:
|
|
79
83
|
1. **Hard to reverse** — would changing the decision later cost meaningfully more than deciding now?
|
|
80
84
|
2. **Surprising without context** — would a future reader, seeing only the code, wonder "why was it built this way?"?
|
|
@@ -51,6 +51,9 @@
|
|
|
51
51
|
- every clarification row carries a recommended answer + one-line rationale inside the `Expected form` cell; rows that lack a recommendation are rejected as half-formed.
|
|
52
52
|
- **Codebase-first ambiguity resolution (defect rule)**: any ambiguity that can be answered by `Read` / `Grep` / file inspection MUST be resolved that way and recorded with file:line evidence. Writing a clarification row for something the codebase already answers is a defect of this phase.
|
|
53
53
|
- **Evidence note required inside `Statement`**: every clarification row includes `Evidence checked: <path:line>` or `Evidence checked: none — <human-only reason>` in the `Statement` cell. `none` is allowed ONLY when the row's nature is "only a human can answer this" (reporter intent, business priority, external authority). A row with `none` that *could* have been answered by the codebase is a defect.
|
|
54
|
+
- Cross-verification mode:
|
|
55
|
+
- Phase 5.5 convergence runs in **adversarial mode** for this phase (`convergence.adversarial=true`). Verifiers actively try to refute each worker's finding by directly re-inspecting the cited evidence; the burden of proof sits on the claim. See `skills/okstra-convergence/SKILL.md` §"Adversarial Verification Mode". A single evidence-backed refutation prevents a finding from reaching consensus.
|
|
56
|
+
- **Coverage critic (opt-in)**: when `convergence.critic.enabled=true` (chosen via the okstra-run picker or `--critic`), a reused-worker critic pass runs after convergence to surface missed findings; its gaps are merged only after a 1-round adversarial reverify. See `skills/okstra-convergence/SKILL.md` "Coverage critic pass".
|
|
54
57
|
- Non-goals:
|
|
55
58
|
- full implementation design unless it is required to decide the next phase
|
|
56
59
|
- **source code edits, plan authoring, builds, or deployments** — this run only classifies the work and routes it; deeper analysis and planning belong to subsequent phases
|
|
@@ -228,6 +228,19 @@
|
|
|
228
228
|
"_DEFAULT_SUFFIX": " (default)"
|
|
229
229
|
}
|
|
230
230
|
},
|
|
231
|
+
"critic_pick": {
|
|
232
|
+
"label": "추가 critic 패스를 돌릴까요? (놓친 finding/blocker 를 캐는 검증 패스 — opt-in)",
|
|
233
|
+
"echo_template": "critic: {value}",
|
|
234
|
+
"options": {
|
|
235
|
+
"off": "사용 안 함 (기본·추천)",
|
|
236
|
+
"claude": "claude critic (추천)",
|
|
237
|
+
"__free_input__": "직접 입력 (codex / gemini)"
|
|
238
|
+
}
|
|
239
|
+
},
|
|
240
|
+
"critic_text": {
|
|
241
|
+
"label": "critic provider 를 직접 입력하세요 (codex / gemini)",
|
|
242
|
+
"echo_template": "critic: {value}"
|
|
243
|
+
},
|
|
231
244
|
"defaults_or_custom": {
|
|
232
245
|
"label": "역할별로 어떤 모델을 쓸지 정하는 단계입니다 (참여 워커 구성을 바꾸는 게 아닙니다).\n· 기본값으로 진행 — lead·실행자/워커·report-writer 를 모두 추천 모델로 두고 바로 진행합니다.\n· 커스터마이즈 — 역할별 모델을 직접 고르고, 추가 directive·관련 task 도 지정합니다.",
|
|
233
246
|
"echo_template": "customize: {value}",
|
|
@@ -903,21 +903,47 @@ def _build_convergence_block(ctx: dict) -> dict:
|
|
|
903
903
|
- `enabled` default True
|
|
904
904
|
- `maxRounds` default 1 for `requirements-discovery`, 2 otherwise
|
|
905
905
|
- `verificationMode` default "lightweight"
|
|
906
|
+
- `adversarial` default True for `requirements-discovery` / `error-analysis` /
|
|
907
|
+
`implementation-planning` (forces `verificationMode` to "full-reanalysis"),
|
|
908
|
+
False otherwise
|
|
906
909
|
- `planBodyVerification` is implementation-planning specific; the key is
|
|
907
910
|
always emitted (dead-letter on other phases) so the schema stays stable.
|
|
908
911
|
|
|
909
912
|
ctx knobs honoured:
|
|
910
913
|
- `OKSTRA_PLAN_VERIFICATION`: "true" | "false" | "" (empty → default True).
|
|
911
914
|
Wired from CLI `--no-plan-verification` (sets "false").
|
|
915
|
+
- `CRITIC_CHOICE`: "" | "off" | "claude" | "codex" | "gemini" — critic
|
|
916
|
+
backing provider (enabled only for requirements-discovery / error-analysis /
|
|
917
|
+
implementation-planning / final-verification); model taken from that
|
|
918
|
+
provider's execution value.
|
|
912
919
|
"""
|
|
913
920
|
task_type = ctx.get("TASK_TYPE", "")
|
|
914
921
|
default_max_rounds = 1 if task_type == "requirements-discovery" else 2
|
|
922
|
+
adversarial_phases = {"requirements-discovery", "error-analysis", "implementation-planning"}
|
|
923
|
+
is_adversarial = task_type in adversarial_phases
|
|
915
924
|
raw_plan_verify = (ctx.get("OKSTRA_PLAN_VERIFICATION", "") or "").strip().lower()
|
|
916
925
|
plan_verify_enabled = raw_plan_verify != "false"
|
|
926
|
+
critic_choice = (ctx.get("CRITIC_CHOICE", "") or "").strip().lower()
|
|
927
|
+
# Independent of `adversarial_phases` above (they answer different questions and
|
|
928
|
+
# may diverge): the coverage critic is opt-in for the finding-producing phases.
|
|
929
|
+
critic_phases = {"requirements-discovery", "error-analysis", "implementation-planning", "final-verification"}
|
|
930
|
+
critic_exec_key = {
|
|
931
|
+
"claude": "CLAUDE_WORKER_MODEL_EXECUTION_VALUE",
|
|
932
|
+
"codex": "CODEX_WORKER_MODEL_EXECUTION_VALUE",
|
|
933
|
+
"gemini": "GEMINI_WORKER_MODEL_EXECUTION_VALUE",
|
|
934
|
+
}
|
|
935
|
+
critic_enabled = critic_choice in critic_exec_key and task_type in critic_phases
|
|
936
|
+
critic_block = {
|
|
937
|
+
"enabled": critic_enabled,
|
|
938
|
+
"provider": critic_choice if critic_enabled else None,
|
|
939
|
+
"modelExecutionValue": (ctx.get(critic_exec_key[critic_choice]) or None) if critic_enabled else None,
|
|
940
|
+
}
|
|
917
941
|
return {
|
|
918
942
|
"enabled": True,
|
|
943
|
+
"adversarial": is_adversarial,
|
|
919
944
|
"maxRounds": default_max_rounds,
|
|
920
|
-
"verificationMode": "lightweight",
|
|
945
|
+
"verificationMode": "full-reanalysis" if is_adversarial else "lightweight",
|
|
946
|
+
"critic": critic_block,
|
|
921
947
|
"planBodyVerification": {
|
|
922
948
|
"enabled": plan_verify_enabled,
|
|
923
949
|
"maxRounds": 1,
|
|
@@ -120,6 +120,7 @@ class PrepareInputs:
|
|
|
120
120
|
gemini_model: str = ""
|
|
121
121
|
report_writer_model: str = ""
|
|
122
122
|
executor: str = ""
|
|
123
|
+
critic: str = ""
|
|
123
124
|
related_tasks_raw: str = ""
|
|
124
125
|
work_category: str = ""
|
|
125
126
|
base_ref: str = ""
|
|
@@ -499,6 +500,7 @@ def _canonical_argv(inp: PrepareInputs, ctx: dict) -> list[str]:
|
|
|
499
500
|
("--gemini-model", inp.gemini_model or ctx.get("GEMINI_WORKER_MODEL", "")),
|
|
500
501
|
("--report-writer-model", inp.report_writer_model or ctx.get("REPORT_WRITER_MODEL", "")),
|
|
501
502
|
("--executor", inp.executor or ctx.get("EXECUTOR_PROVIDER", "")),
|
|
503
|
+
("--critic", inp.critic or ctx.get("CRITIC_CHOICE", "")),
|
|
502
504
|
("--related-tasks", inp.related_tasks_raw),
|
|
503
505
|
("--work-category", inp.work_category),
|
|
504
506
|
]
|
|
@@ -707,6 +709,13 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
|
|
|
707
709
|
default_display=report_writer_default, default_execution=report_writer_default,
|
|
708
710
|
)
|
|
709
711
|
|
|
712
|
+
# ---- coverage critic choice (validated; phase-gating happens in render) ----
|
|
713
|
+
critic_choice = (inp.critic or "").strip().lower()
|
|
714
|
+
if critic_choice not in ("", "off", "claude", "codex", "gemini"):
|
|
715
|
+
raise PrepareError(
|
|
716
|
+
f"--critic must be one of: off, claude, codex, gemini (got: {critic_choice!r})"
|
|
717
|
+
)
|
|
718
|
+
|
|
710
719
|
# ---- executor binding (implementation phase only; recorded universally for manifest consistency) ----
|
|
711
720
|
executor_default = _default("OKSTRA_DEFAULT_EXECUTOR", "claude")
|
|
712
721
|
executor_provider = (inp.executor or executor_default).strip().lower()
|
|
@@ -842,6 +851,7 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
|
|
|
842
851
|
"EXECUTOR_WORKER_AGENT": executor_worker_agent,
|
|
843
852
|
"EXECUTOR_MODEL_DISPLAY": executor_model_meta.display,
|
|
844
853
|
"EXECUTOR_MODEL_EXECUTION_VALUE": executor_model_meta.execution,
|
|
854
|
+
"CRITIC_CHOICE": critic_choice,
|
|
845
855
|
"RELATED_TASKS_JSON": related_tasks_json_str,
|
|
846
856
|
"RELATED_TASKS_BULLETS": bullets,
|
|
847
857
|
"RELATED_TASKS_INLINE": inline,
|
|
@@ -1098,6 +1108,7 @@ def main(argv: list[str]) -> int:
|
|
|
1098
1108
|
p.add_argument("--gemini-model", default="")
|
|
1099
1109
|
p.add_argument("--report-writer-model", default="")
|
|
1100
1110
|
p.add_argument("--executor", default="")
|
|
1111
|
+
p.add_argument("--critic", default="")
|
|
1101
1112
|
p.add_argument("--related-tasks", default="", dest="related_tasks_raw")
|
|
1102
1113
|
p.add_argument("--approved-plan", default="", dest="approved_plan_path")
|
|
1103
1114
|
p.add_argument(
|
|
@@ -1198,6 +1209,7 @@ def main(argv: list[str]) -> int:
|
|
|
1198
1209
|
gemini_model=args.gemini_model,
|
|
1199
1210
|
report_writer_model=args.report_writer_model,
|
|
1200
1211
|
executor=args.executor,
|
|
1212
|
+
critic=args.critic,
|
|
1201
1213
|
related_tasks_raw=args.related_tasks_raw,
|
|
1202
1214
|
work_category=args.work_category,
|
|
1203
1215
|
base_ref=args.base_ref,
|
|
@@ -181,6 +181,8 @@ S_APPROVED_PLAN_PICK = "approved_plan_pick"
|
|
|
181
181
|
S_APPROVED_PLAN = "approved_plan"
|
|
182
182
|
S_STAGE_PICK = "stage_pick"
|
|
183
183
|
S_EXECUTOR = "executor"
|
|
184
|
+
S_CRITIC_PICK = "critic_pick"
|
|
185
|
+
S_CRITIC_TEXT = "critic_text"
|
|
184
186
|
S_DEFAULTS_OR_CUSTOM = "defaults_or_custom"
|
|
185
187
|
S_WORKERS_OVERRIDE = "workers_override"
|
|
186
188
|
S_LEAD_MODEL = "lead_model"
|
|
@@ -246,6 +248,8 @@ class WizardState:
|
|
|
246
248
|
approved_plan_pending_text: bool = False
|
|
247
249
|
selected_stage: str = "auto"
|
|
248
250
|
executor: str = ""
|
|
251
|
+
critic: str = ""
|
|
252
|
+
critic_pending_text: bool = False
|
|
249
253
|
|
|
250
254
|
# customize
|
|
251
255
|
use_defaults: Optional[bool] = None
|
|
@@ -1459,6 +1463,55 @@ def _submit_pr_template_pick(state: WizardState, value: str) -> Optional[str]:
|
|
|
1459
1463
|
)
|
|
1460
1464
|
|
|
1461
1465
|
|
|
1466
|
+
CRITIC_CHOICES = ["off", "claude", "codex", "gemini"]
|
|
1467
|
+
|
|
1468
|
+
|
|
1469
|
+
def _build_critic_pick(state: WizardState) -> Prompt:
|
|
1470
|
+
t = _p(state.workspace_root, "critic_pick")
|
|
1471
|
+
options: list[Option] = []
|
|
1472
|
+
for k, v in t["options"].items():
|
|
1473
|
+
if not k.startswith("_"):
|
|
1474
|
+
options.append(_opt(k, v))
|
|
1475
|
+
custom_label = t["options"].get(PICK_TYPE_CUSTOM, PICK_TYPE_CUSTOM)
|
|
1476
|
+
options.append(_opt(PICK_TYPE_CUSTOM, custom_label))
|
|
1477
|
+
return Prompt(
|
|
1478
|
+
step=S_CRITIC_PICK, kind="pick",
|
|
1479
|
+
label=t["label"],
|
|
1480
|
+
options=options,
|
|
1481
|
+
echo_template=t["echo_template"],
|
|
1482
|
+
)
|
|
1483
|
+
|
|
1484
|
+
|
|
1485
|
+
def _submit_critic_pick(state: WizardState, value: str) -> Optional[str]:
|
|
1486
|
+
if value == PICK_TYPE_CUSTOM:
|
|
1487
|
+
state.critic_pending_text = True
|
|
1488
|
+
return None
|
|
1489
|
+
choice = (value or "").strip().lower()
|
|
1490
|
+
if choice not in CRITIC_CHOICES:
|
|
1491
|
+
raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
|
|
1492
|
+
state.critic = choice
|
|
1493
|
+
state.critic_pending_text = False
|
|
1494
|
+
return f"critic: {choice}"
|
|
1495
|
+
|
|
1496
|
+
|
|
1497
|
+
def _build_critic_text(state: WizardState) -> Prompt:
|
|
1498
|
+
t = _p(state.workspace_root, "critic_text")
|
|
1499
|
+
return Prompt(
|
|
1500
|
+
step=S_CRITIC_TEXT, kind="text",
|
|
1501
|
+
label=t["label"],
|
|
1502
|
+
echo_template=t["echo_template"],
|
|
1503
|
+
)
|
|
1504
|
+
|
|
1505
|
+
|
|
1506
|
+
def _submit_critic_text(state: WizardState, value: str) -> Optional[str]:
|
|
1507
|
+
choice = (value or "").strip().lower()
|
|
1508
|
+
if choice not in CRITIC_CHOICES:
|
|
1509
|
+
raise WizardError(f"critic must be one of {CRITIC_CHOICES}, got: {value!r}")
|
|
1510
|
+
state.critic = choice
|
|
1511
|
+
state.critic_pending_text = False
|
|
1512
|
+
return f"critic: {choice}"
|
|
1513
|
+
|
|
1514
|
+
|
|
1462
1515
|
def _build_executor(state: WizardState) -> Prompt:
|
|
1463
1516
|
t = _p(state.workspace_root, "executor")
|
|
1464
1517
|
default_suffix = t["options"].get("_DEFAULT_SUFFIX", "")
|
|
@@ -1922,6 +1975,17 @@ STEPS: list[Step] = [
|
|
|
1922
1975
|
and not s.executor),
|
|
1923
1976
|
build=_build_executor, submit=_submit_executor,
|
|
1924
1977
|
owns=("executor",)),
|
|
1978
|
+
Step(S_CRITIC_PICK,
|
|
1979
|
+
applies=lambda s: (s.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification")
|
|
1980
|
+
and not s.critic
|
|
1981
|
+
and not s.critic_pending_text
|
|
1982
|
+
and S_CRITIC_PICK not in s.answered),
|
|
1983
|
+
build=_build_critic_pick, submit=_submit_critic_pick,
|
|
1984
|
+
owns=("critic", "critic_pending_text")),
|
|
1985
|
+
Step(S_CRITIC_TEXT,
|
|
1986
|
+
applies=lambda s: (s.critic_pending_text and S_CRITIC_TEXT not in s.answered),
|
|
1987
|
+
build=_build_critic_text, submit=_submit_critic_text,
|
|
1988
|
+
owns=("critic", "critic_pending_text")),
|
|
1925
1989
|
Step(S_DEFAULTS_OR_CUSTOM,
|
|
1926
1990
|
applies=lambda s: (_identity_ready(s)
|
|
1927
1991
|
and s.use_defaults is None),
|
|
@@ -2118,7 +2182,8 @@ _FIELD_DEFAULTS: dict[str, Any] = {
|
|
|
2118
2182
|
"base_ref_pending_text": False, "approved_plan_path": "",
|
|
2119
2183
|
"approved_plan_pending_text": False,
|
|
2120
2184
|
"selected_stage": "auto",
|
|
2121
|
-
"executor": "", "
|
|
2185
|
+
"executor": "", "critic": "", "critic_pending_text": False,
|
|
2186
|
+
"use_defaults": None, "workers_override": "",
|
|
2122
2187
|
"lead_model": "", "claude_model": "", "codex_model": "",
|
|
2123
2188
|
"gemini_model": "", "report_writer_model": "", "directive": "",
|
|
2124
2189
|
"directive_pending_text": False,
|
|
@@ -2200,6 +2265,7 @@ def render_args(state: WizardState) -> dict[str, str]:
|
|
|
2200
2265
|
"task-type": state.task_type,
|
|
2201
2266
|
"task-brief": state.brief_path,
|
|
2202
2267
|
"executor": state.executor,
|
|
2268
|
+
"critic": state.critic,
|
|
2203
2269
|
"approved-plan": state.approved_plan_path,
|
|
2204
2270
|
"stage": (state.selected_stage or "auto") if state.task_type == "implementation" else "",
|
|
2205
2271
|
"base-ref": base_ref,
|
|
@@ -2244,6 +2310,8 @@ def confirmation_block(state: WizardState) -> str:
|
|
|
2244
2310
|
if state.report_writer_model:
|
|
2245
2311
|
lines.append(f" report-writer : {state.report_writer_model}")
|
|
2246
2312
|
lines.append(f" directive : {state.directive or '(none)'}")
|
|
2313
|
+
if state.task_type in ("requirements-discovery", "error-analysis", "implementation-planning", "final-verification"):
|
|
2314
|
+
lines.append(f" critic : {state.critic or '(off)'}")
|
|
2247
2315
|
if state.task_type == "implementation":
|
|
2248
2316
|
lines.append(f" approved-plan : {state.approved_plan_path}")
|
|
2249
2317
|
if state.clarification_response_path:
|
|
@@ -2288,6 +2356,7 @@ def _cli(argv: list[str]) -> int:
|
|
|
2288
2356
|
p_init.add_argument("--workspace-root", required=True)
|
|
2289
2357
|
p_init.add_argument("--project-root", required=True)
|
|
2290
2358
|
p_init.add_argument("--project-id", required=True)
|
|
2359
|
+
p_init.add_argument("--critic", default="")
|
|
2291
2360
|
|
|
2292
2361
|
p_step = sub.add_parser("step")
|
|
2293
2362
|
p_step.add_argument("--state-file", required=True)
|
|
@@ -2313,6 +2382,8 @@ def _cli(argv: list[str]) -> int:
|
|
|
2313
2382
|
project_root=args.project_root,
|
|
2314
2383
|
project_id=args.project_id,
|
|
2315
2384
|
)
|
|
2385
|
+
if args.critic:
|
|
2386
|
+
state.critic = args.critic
|
|
2316
2387
|
save_state_file(state_path, state)
|
|
2317
2388
|
first = next_prompt(state)
|
|
2318
2389
|
print(json.dumps({"ok": True, "next": first.to_json()},
|