okstra 0.20.1 → 0.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.kr.md +2 -2
  2. package/README.md +2 -2
  3. package/docs/kr/architecture.md +1 -0
  4. package/docs/kr/cli.md +1 -1
  5. package/docs/kr/performance-improvement-plan-v2.md +330 -0
  6. package/docs/kr/performance-improvement-plan.md +125 -0
  7. package/docs/project-structure-overview.md +388 -0
  8. package/docs/superpowers/plans/2026-05-14-convergence-queue-pruning.md +1568 -0
  9. package/package.json +1 -1
  10. package/runtime/BUILD.json +2 -2
  11. package/runtime/agents/SKILL.md +7 -1
  12. package/runtime/agents/workers/claude-worker.md +3 -1
  13. package/runtime/agents/workers/report-writer-worker.md +4 -0
  14. package/runtime/bin/okstra-codex-exec.sh +42 -0
  15. package/runtime/bin/okstra-gemini-exec.sh +7 -0
  16. package/runtime/bin/okstra-trace-cleanup.sh +42 -0
  17. package/runtime/prompts/profiles/final-verification.md +8 -2
  18. package/runtime/prompts/profiles/implementation-planning.md +1 -1
  19. package/runtime/prompts/profiles/release-handoff.md +26 -28
  20. package/runtime/prompts/profiles/requirements-discovery.md +1 -1
  21. package/runtime/python/okstra_ctl/render.py +78 -4
  22. package/runtime/python/okstra_ctl/run_context.py +5 -0
  23. package/runtime/python/okstra_ctl/workflow.py +8 -7
  24. package/runtime/python/okstra_ctl/worktree.py +155 -12
  25. package/runtime/skills/okstra-brief/SKILL.md +523 -0
  26. package/runtime/skills/okstra-convergence/SKILL.md +149 -37
  27. package/runtime/skills/okstra-report-writer/SKILL.md +8 -6
  28. package/runtime/templates/prd/brief.template.md +12 -0
  29. package/runtime/templates/project-docs/task-index.template.md +12 -0
  30. package/runtime/templates/reports/error-analysis-input.template.md +12 -0
  31. package/runtime/templates/reports/final-report.template.md +39 -12
  32. package/runtime/templates/reports/final-verification-input.template.md +22 -0
  33. package/runtime/templates/reports/implementation-input.template.md +12 -0
  34. package/runtime/templates/reports/implementation-planning-input.template.md +12 -0
  35. package/runtime/templates/reports/quick-input.template.md +12 -0
  36. package/runtime/templates/reports/release-handoff-input.template.md +23 -10
  37. package/runtime/templates/reports/schedule.template.md +12 -0
  38. package/runtime/templates/reports/settings.template.json +92 -30
  39. package/runtime/templates/reports/task-brief.template.md +12 -0
  40. package/src/install.mjs +1 -0
  41. package/src/uninstall.mjs +1 -0
@@ -0,0 +1,1568 @@
1
+ # Convergence Queue Pruning (P0+P1) Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** P0(용어/측정 기준 정리) + P1(convergence 재검증 queue pruning)을 구현하여 lead가 confirmed finding을 다음 라운드 prompt에 다시 넣지 않게 하고, Round 2 진입 조건과 worker-failure 처리를 문서·스키마·픽스처·계약 테스트로 고정한다.
6
+
7
+ **Architecture:** Convergence 동작은 lead(Claude)가 `skills/okstra-convergence/SKILL.md`를 따라 수행하는 문서 계약이다. 코드가 convergence state JSON을 읽거나 쓰지 않는다(`grep -r convergence- scripts/` 결과 없음). 따라서 변경 범위는 시드(seed) 문서 4개와 templates 1개, 검증용 fixture 3개, contract test 1개, baseline 계측 헬퍼 1개로 한정한다. Worker prompt 생성도 lead 책임이므로 의사코드(pseudocode)에서 verification queue를 "Round 1 이후에도 mixed/unresolved인 항목" 으로 좁히면 dispatch 수가 자동으로 줄어든다.
8
+
9
+ **Tech Stack:** Markdown 시드 문서, JSON Schema(hand-rolled), pytest, jq, Python 3.
10
+
11
+ > 참고: 본 plan은 `docs/kr/performance-improvement-plan-v2.md` Section 9의 결론 1(P0)·2(P1)만 다룬다. P2(prompt diet) / P3(fast-track) / P4(prompt caching) / P5(render 병렬화) / P6(token-usage 증분화)는 본 plan의 범위가 **아니다** — 별도 plan으로 작성한다.
12
+
13
+ ---
14
+
15
+ ## File Structure
16
+
17
+ | 경로 | 책임 | 변경 종류 |
18
+ |---|---|---|
19
+ | `skills/okstra-convergence/SKILL.md` | Phase 5.5 계약. 용어, Round 0/1/2 의사코드, queue pruning rule, worker-failure 처리, state schema v1.1 | Modify |
20
+ | `agents/SKILL.md` | Lifecycle phase 경계와 Phase 5.5 진입점 | Modify (좁은 범위, Phase 5.5 블록만) |
21
+ | `skills/okstra-report-writer/SKILL.md` | Phase 6 dispatch template, final-report 본문 구조에 round history 반영 | Modify |
22
+ | `agents/workers/report-writer-worker.md` | report-writer subagent의 required-reading + authoring contract에 round history 추가 | Modify |
23
+ | `templates/reports/final-report.template.md` | Section 1(Cross Verification Results) 하위에 round-history sub-section 추가 | Modify |
24
+ | `tests/fixtures/convergence/early-exit.json` | Round 1에서 queue가 비어 Round 2가 skipped된 케이스 | Create |
25
+ | `tests/fixtures/convergence/mixed-round2.json` | Round 1 후 unresolved queue가 있어 Round 2가 실행된 케이스 | Create |
26
+ | `tests/fixtures/convergence/reverify-all-failed.json` | Round 1의 모든 reverify dispatch가 terminal non-result여서 Round 2가 suppress된 케이스 | Create |
27
+ | `tests/test_convergence_state_contract.py` | 위 3개 fixture가 schema v1.1 contract를 만족하는지 검사 | Create |
28
+ | `scripts/okstra-convergence-stats.py` | team-state + convergence-state에서 dispatch count / wall-clock / worker token을 집계하는 baseline 계측 helper | Create |
29
+ | `docs/kr/performance-improvement-plan-v2.md` | Section 9 결론에 본 plan과 구현 상태 링크 추가 | Modify |
30
+
31
+ ---
32
+
33
+ ## Convention Notes (모든 task 공통)
34
+
35
+ - 모든 새 문서/스키마 표현은 **end-user 시드 경로**에 배치한다 (개인 `.claude/` 가 아님). 본 plan의 모든 변경은 위 표의 경로에 들어간다.
36
+ - `effectiveMaxRounds`는 `task-manifest.json`의 `convergence.maxRounds` 가 비어 있을 때 lead가 phase-aware default(`requirements-discovery → 1`, otherwise → 2)로 해석한 후 state artifact에 기록하는 값이다. 이미 `agents/SKILL.md` Phase 5.5 문구에 존재하므로 본 plan은 그 값을 schema 필드로 승격하기만 한다.
37
+ - `round2SkippedReason` 의 enum 값: `queue-empty` | `max-rounds-1` | `all-reverify-non-result` | `not-skipped`.
38
+ - `finalClassificationCounts` 의 키: `fullConsensus`, `partialConsensus`, `contested`, `workerUnique`. 기존 `summary` 객체는 동일 키 구조이므로 별칭(alias)로 유지.
39
+ - `roundHistory[]`라는 기존 배열명은 유지하며 그 안의 필드를 확장한다(이름 변경은 lead가 따르는 mental model을 깨므로 보류).
40
+
41
+ ---
42
+
43
+ ## Task 1: Convergence SKILL — 용어 분리 + `contested` 중간 상태 제거
44
+
45
+ **Files:**
46
+ - Modify: `skills/okstra-convergence/SKILL.md`
47
+ - Test: `tests/test_convergence_state_contract.py` (Task 12에서 생성. 이 단계에서는 verify only)
48
+
49
+ - [ ] **Step 1: 파일 확인**
50
+
51
+ Run: `wc -l /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md`
52
+ Expected: 약 298 lines (existing).
53
+
54
+ - [ ] **Step 2: `# OKSTRA Convergence` 헤더 바로 아래에 "Scope and Terminology" 섹션 삽입**
55
+
56
+ `skills/okstra-convergence/SKILL.md` 의 `## When to Use` 직전 위치(line 7~9 사이)에 아래 블록을 삽입한다.
57
+
58
+ ````markdown
59
+ ## Scope and Terminology (BLOCKING)
60
+
61
+ This skill governs **Phase 5.5 (Convergence loop)** — a *lead operating phase* inside a single okstra run, not a task-type lifecycle phase. The 6 task-type lifecycle phases (`requirements-discovery` → `error-analysis` → `implementation-planning` → `implementation` → `final-verification` → `release-handoff`, see [agents/SKILL.md](../../SKILL.md) "Lifecycle Phase Boundaries") are unchanged by this skill. The lead operating phases (Phase 1 Intake → Phase 7 Persist, see [agents/SKILL.md](../../SKILL.md) "Quick Reference") describe how the lead drives a *single* task-type run.
62
+
63
+ **`contested` is a final classification only.** It is NEVER an intermediate queue label. The verification queue carries findings that are *unique to a single worker* (entered in Round 0) or *mixed/unresolved after a re-verification round* (carried forward). The `contested` label is assigned only when the **last executed round** completes and the queue is still non-empty.
64
+
65
+ When this skill says "queue" without qualifier, it means the *verification queue*: the set of findings that are still candidates for re-verification in subsequent rounds. The queue shrinks monotonically as findings get classified as `full-consensus`, `partial-consensus`, or `worker-unique`. Findings classified into any of these three categories MUST NOT appear in any subsequent round's reverify prompt, for any worker.
66
+ ````
67
+
68
+ - [ ] **Step 3: Finding Category 표의 `contested` row description을 명시적 최종-라운드 조건으로 강화**
69
+
70
+ 기존 (line 31 부근):
71
+
72
+ ```markdown
73
+ | `contested` | No consensus reached even after max rounds; each worker's position is recorded | Required |
74
+ ```
75
+
76
+ 다음으로 교체:
77
+
78
+ ```markdown
79
+ | `contested` | Final classification only. Assigned to a finding that remains in the verification queue after the **last executed round** completes (round index = `effectiveMaxRounds`). Each worker's position across all executed rounds is recorded. NEVER used as an intermediate label. | Required |
80
+ ```
81
+
82
+ - [ ] **Step 4: Convergence Test 섹션의 final-classification 트리거를 last-executed-round로 명시**
83
+
84
+ 기존 (line 81~84 부근):
85
+
86
+ ```markdown
87
+ - If the validation queue is empty → Convergence complete (`converged`)
88
+ - Upon reaching the maximum number of rounds → Apply final classification to remaining unresolved findings:
89
+ - Majority agreement → `partial-consensus`
90
+ - Otherwise → `contested`
91
+ ```
92
+
93
+ 다음으로 교체:
94
+
95
+ ```markdown
96
+ - If the verification queue is empty at the end of any round → Convergence complete (`finalState: "converged"`), remaining rounds are not executed
97
+ - Upon completing the **last executed round** (where round index == `effectiveMaxRounds`, OR where Round 2 was suppressed per the Round 2 gate below) → Apply final classification to remaining queue items:
98
+ - Majority agreement across executed rounds → `partial-consensus`
99
+ - Otherwise → `contested`
100
+ - The final classification step never runs while the queue is still being re-verified — confirmed items always exit the queue first.
101
+ ```
102
+
103
+ - [ ] **Step 5: Commit**
104
+
105
+ ```bash
106
+ git add skills/okstra-convergence/SKILL.md
107
+ git commit -m "docs(convergence): scope terminology, contested as final-only label
108
+
109
+ Adds 'Scope and Terminology' BLOCKING section to disambiguate Phase 5.5 from
110
+ task-type lifecycle phases. Tightens 'contested' definition to a terminal
111
+ classification only — it never labels intermediate queue items. Aligns the
112
+ final-classification trigger to 'last executed round' so the queue-pruning
113
+ algorithm in subsequent tasks reads cleanly."
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Task 2: Convergence SKILL — Round 0 / Round 1 / Optional Round 2 의사코드 재작성
119
+
120
+ **Files:**
121
+ - Modify: `skills/okstra-convergence/SKILL.md` (Convergence Algorithm section)
122
+
123
+ - [ ] **Step 1: Round 0 본문 보강 — queue 입력 규칙을 명시**
124
+
125
+ 기존 `### Round 0: Parse worker results` 섹션의 항목 3 (line 48~52 부근) 끝에 아래 한 줄을 추가한다(기존 4단계 grouping 규칙은 유지).
126
+
127
+ ```markdown
128
+ 6. After grouping, the verification queue contains EXACTLY the `unique`-marked findings (Step 3 case "Only one worker confirms"). `full-consensus` findings reached in Step 3 are recorded immediately in the convergence state with `classification: "full-consensus"` and DO NOT enter the queue.
129
+ ```
130
+
131
+ - [ ] **Step 2: Round 1-N 의사코드 교체**
132
+
133
+ 기존 `### Round 1-N: Re-verification Loop` 코드 블록 (line 56~77) 전체를 다음으로 교체.
134
+
135
+ ````markdown
136
+ ### Round 1-N: Re-verification Loop (queue-pruned)
137
+
138
+ The verification queue holds only findings that are not yet classified. Confirmed items are *removed* from the queue and never re-sent.
139
+
140
+ ```text
141
+ roundIndex = 0
142
+ WHILE roundIndex < effectiveMaxRounds AND queue is non-empty:
143
+ roundIndex += 1
144
+
145
+ # Round 2 gate (only evaluated when entering round 2 or higher)
146
+ IF roundIndex > 1 AND NOT round_gate_open(queue, last_round_dispatch_outcomes):
147
+ record round_skipped_reason in convergence state
148
+ BREAK
149
+
150
+ inputQueueSize = len(queue)
151
+ dispatches = []
152
+ skippedWorkers = []
153
+
154
+ FOR each analysis worker W (excluding report-writer-worker):
155
+ items_for_W = [f for f in queue if W != f.originWorker]
156
+ IF items_for_W is empty:
157
+ skippedWorkers.append({worker: W, reason: "no items to verify"})
158
+ CONTINUE
159
+ dispatch = send_reverify_request(W, items_for_W, roundIndex)
160
+ dispatches.append(dispatch)
161
+
162
+ IF all dispatches in this round are terminal non-result (timeout/error/no-result-file):
163
+ # Per "Worker failure handling in reverify" below — do NOT treat as DISAGREE.
164
+ record verification-error evidence on each finding in the queue for this round
165
+ record round_skipped_reason = "all-reverify-non-result" for any subsequent round
166
+ BREAK
167
+
168
+ resolvedCount = 0
169
+ carriedForwardCount = 0
170
+
171
+ FOR each finding F in queue (snapshot):
172
+ votes = aggregate_votes(F, dispatches) # AGREE / DISAGREE / SUPPLEMENT / verification-error
173
+ IF all non-error votes are AGREE or SUPPLEMENT:
174
+ F.classification = "full-consensus"
175
+ queue.remove(F); resolvedCount += 1
176
+ ELIF majority non-error votes are AGREE or SUPPLEMENT:
177
+ F.classification = "partial-consensus"
178
+ queue.remove(F); resolvedCount += 1
179
+ ELIF all non-error votes are DISAGREE:
180
+ F.classification = "worker-unique"
181
+ queue.remove(F); resolvedCount += 1
182
+ ELSE:
183
+ # mixed / insufficient non-error votes → carry forward
184
+ carriedForwardCount += 1
185
+
186
+ record roundHistory entry { round: roundIndex, inputQueueSize, resolvedCount,
187
+ carriedForwardCount, dispatches, skippedWorkers }
188
+
189
+ # Final classification — runs after the WHILE loop exits (queue empty OR roundIndex == effectiveMaxRounds OR Round 2 gate closed)
190
+ FOR each finding F still in queue:
191
+ IF majority AGREE-or-SUPPLEMENT across all executed rounds:
192
+ F.classification = "partial-consensus"
193
+ ELSE:
194
+ F.classification = "contested"
195
+ ```
196
+
197
+ The lead MUST construct the per-worker reverify prompt body from `items_for_W` only — confirmed findings from earlier rounds MUST NOT appear in the prompt, even as background. The dispatch-prompt invariant (every worker gets the same prompt content modulo their own findings) continues to apply to the per-round prompt body.
198
+ ````
199
+
200
+ - [ ] **Step 3: "Round 2 gate" 서브섹션 추가**
201
+
202
+ 위에서 추가한 의사코드 바로 아래에 다음 sub-section을 삽입.
203
+
204
+ ````markdown
205
+ #### Round 2 gate (`round_gate_open` predicate)
206
+
207
+ `round_gate_open(queue, last_round_dispatch_outcomes)` returns `true` iff ALL three conditions hold; otherwise the lead records `round2SkippedReason` and breaks out of the loop:
208
+
209
+ | Condition | Required value | `round2SkippedReason` if not met |
210
+ |---|---|---|
211
+ | `effectiveMaxRounds >= 2` | true | `"max-rounds-1"` |
212
+ | `len(queue) > 0` after round 1 | true | `"queue-empty"` |
213
+ | At least one round-1 reverify dispatch terminated as `completed` | true | `"all-reverify-non-result"` |
214
+
215
+ When all conditions hold the predicate returns `true` and `round2SkippedReason` is set to `"not-skipped"`. The field is mandatory on every convergence state artifact — write `"not-skipped"` rather than omitting the key.
216
+ ````
217
+
218
+ - [ ] **Step 4: Commit**
219
+
220
+ ```bash
221
+ git add skills/okstra-convergence/SKILL.md
222
+ git commit -m "docs(convergence): queue-pruned Round 0/1/2 algorithm with explicit gate
223
+
224
+ Replaces the Round 1-N pseudocode with a queue-pruning loop: confirmed items
225
+ exit the queue immediately and never enter the next round's reverify prompt.
226
+ Adds a Round 2 gate predicate with three explicit conditions
227
+ (effectiveMaxRounds>=2, queue non-empty, at least one round-1 completed
228
+ dispatch) and a mandatory round2SkippedReason artifact field with enum
229
+ {queue-empty, max-rounds-1, all-reverify-non-result, not-skipped}."
230
+ ```
231
+
232
+ ---
233
+
234
+ ## Task 3: Convergence SKILL — worker-failure 처리 명세
235
+
236
+ **Files:**
237
+ - Modify: `skills/okstra-convergence/SKILL.md` (new "Worker failure handling in reverify" subsection)
238
+
239
+ - [ ] **Step 1: 새 서브섹션을 "Round 2 gate" 바로 아래에 삽입**
240
+
241
+ ````markdown
242
+ #### Worker failure handling in reverify (BLOCKING)
243
+
244
+ A reverify dispatch that returns a **terminal non-result** (`timeout`, `error`, no result file, or the wrapper records `cli-failure`) MUST NOT be aggregated as `DISAGREE`. Misclassifying a worker failure as DISAGREE biases the queue toward `contested`/`worker-unique` and produces meaningless final classifications.
245
+
246
+ Rules:
247
+
248
+ 1. For each affected finding, append a `votes[W].verdict = "verification-error"` entry instead of `disagree`, plus the wrapper's captured exit reason in `votes[W].explanation`.
249
+ 2. Record one event per failed dispatch via `python3 scripts/okstra-error-log.py append-observed --error-type cli-failure --agent <worker> ...` (the worker wrapper does this for Codex/Gemini; for Claude worker timeouts the lead does it).
250
+ 3. Add an entry to the round's `skippedWorkers[]` with `{worker: <W>, reason: "dispatch-non-result", terminalStatus: <timeout|error|not-run>}`.
251
+ 4. If **all** reverify dispatches in a round terminate as non-result, the round is treated as gate-closed: write `round2SkippedReason: "all-reverify-non-result"` (even if the round in question is round 1 — i.e. round 2 never runs because round 1 produced no usable votes), record one `contract-violation` event per non-result dispatch, and exit the WHILE loop.
252
+ 5. Section 6 (Specialization Lens) of a worker output is OUT of convergence scope per "Convergence scope" above — its absence is NEVER a `verification-error`.
253
+
254
+ The final classifier (`FOR each finding F still in queue` block) treats `verification-error` as "no usable vote" — it counts neither toward AGREE nor toward DISAGREE.
255
+ ````
256
+
257
+ - [ ] **Step 2: Commit**
258
+
259
+ ```bash
260
+ git add skills/okstra-convergence/SKILL.md
261
+ git commit -m "docs(convergence): separate worker-failure from DISAGREE in reverify
262
+
263
+ Codifies a 'verification-error' verdict so a dispatch that timed out or
264
+ returned no result file does not get aggregated as DISAGREE. Adds a
265
+ skippedWorkers[] entry per non-result dispatch and forces
266
+ round2SkippedReason=all-reverify-non-result when every dispatch in a round
267
+ fails. The final classifier ignores verification-error votes when counting
268
+ majority."
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Task 4: Convergence SKILL — state artifact schema v1.1
274
+
275
+ **Files:**
276
+ - Modify: `skills/okstra-convergence/SKILL.md` (Convergence State Artifact section)
277
+
278
+ - [ ] **Step 1: 기존 JSON 예제 블록(line 232~283)을 v1.1 스키마로 교체**
279
+
280
+ ````markdown
281
+ ## Convergence State Artifact
282
+
283
+ Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
284
+
285
+ Schema version `1.1` extends `1.0` (legacy fields kept as aliases for backward-compat with already-shipped reports):
286
+
287
+ ```json
288
+ {
289
+ "schemaVersion": "1.1",
290
+ "taskKey": "<task-key>",
291
+ "config": {
292
+ "enabled": true,
293
+ "maxRounds": 2,
294
+ "effectiveMaxRounds": 2,
295
+ "verificationMode": "lightweight"
296
+ },
297
+ "findings": [
298
+ {
299
+ "findingId": "F-001",
300
+ "summary": "<one-line summary>",
301
+ "category": "<bug|risk|missing|observation|...>",
302
+ "ticketIds": ["TICKET-123"],
303
+ "originWorker": "claude-worker",
304
+ "originEvidence": "<evidence text>",
305
+ "classification": "full-consensus",
306
+ "rounds": [
307
+ {
308
+ "round": 1,
309
+ "votes": {
310
+ "codex-worker": { "verdict": "agree", "explanation": "<brief>" },
311
+ "gemini-worker": { "verdict": "supplement", "explanation": "<brief>" }
312
+ }
313
+ }
314
+ ],
315
+ "consensusWorkers": ["claude-worker", "codex-worker", "gemini-worker"],
316
+ "dissentingWorkers": []
317
+ }
318
+ ],
319
+ "roundHistory": [
320
+ {
321
+ "round": 1,
322
+ "inputQueueSize": 3,
323
+ "resolvedCount": 2,
324
+ "carriedForwardCount": 1,
325
+ "dispatches": [
326
+ { "worker": "codex-worker", "status": "completed", "durationMs": 184221 },
327
+ { "worker": "gemini-worker", "status": "completed", "durationMs": 201337 }
328
+ ],
329
+ "skippedWorkers": [
330
+ { "worker": "claude-worker", "reason": "no items to verify" }
331
+ ],
332
+ "verificationsRequested": 2,
333
+ "verificationsCompleted": 2,
334
+ "newConsensus": 2,
335
+ "remainingInQueue": 1,
336
+ "earlyExit": false
337
+ }
338
+ ],
339
+ "round2SkippedReason": "not-skipped",
340
+ "finalState": "converged",
341
+ "totalRounds": 2,
342
+ "finalClassificationCounts": {
343
+ "fullConsensus": 5,
344
+ "partialConsensus": 1,
345
+ "contested": 0,
346
+ "workerUnique": 1
347
+ },
348
+ "summary": {
349
+ "fullConsensus": 5,
350
+ "partialConsensus": 1,
351
+ "contested": 0,
352
+ "workerUnique": 1
353
+ }
354
+ }
355
+ ```
356
+
357
+ Schema rules:
358
+
359
+ - `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
360
+ - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
361
+ - `roundHistory[].inputQueueSize`: queue size at the start of this round.
362
+ - `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
363
+ - `roundHistory[].carriedForwardCount`: queue size at the END of this round (must equal `inputQueueSize - resolvedCount` when there are no in-round queue insertions; in-round insertions are forbidden).
364
+ - `roundHistory[].dispatches[]`: one entry per worker that was actually dispatched in this round. `status ∈ {completed, timeout, error, not-run}`.
365
+ - `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
366
+ - `roundHistory[].verificationsRequested|verificationsCompleted|newConsensus|remainingInQueue|earlyExit`: legacy v1.0 aliases. New runs SHOULD populate them so existing parsers keep working: `verificationsRequested == len(dispatches)`, `verificationsCompleted == len(d for d in dispatches if d.status == "completed")`, `newConsensus == resolvedCount`, `remainingInQueue == carriedForwardCount`, `earlyExit == (round < effectiveMaxRounds AND carriedForwardCount == 0)`.
367
+ - `round2SkippedReason`: literal enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped`. Always present (use `"not-skipped"` when Round 2 ran or wasn't reached for the loop-exit-not-skip reason of `effectiveMaxRounds == 1`. For the `effectiveMaxRounds == 1` case the value is `"max-rounds-1"`).
368
+ - `finalClassificationCounts`: post-loop counts. New required field — must equal `summary` 1:1. `summary` is retained as the v1.0 alias.
369
+ - `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. `aborted-non-result` is the new value for the case all reverify dispatches in a round fail.
370
+ - `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
371
+ ````
372
+
373
+ - [ ] **Step 2: Commit**
374
+
375
+ ```bash
376
+ git add skills/okstra-convergence/SKILL.md
377
+ git commit -m "docs(convergence): schema v1.1 with effectiveMaxRounds and skip reason
378
+
379
+ Adds inputQueueSize, resolvedCount, carriedForwardCount, dispatches[], and
380
+ skippedWorkers[] per round; finalClassificationCounts, round2SkippedReason,
381
+ and config.effectiveMaxRounds at the top level. v1.0 fields are kept as
382
+ aliases so existing report parsers keep working until they migrate."
383
+ ```
384
+
385
+ ---
386
+
387
+ ## Task 5: agents/SKILL.md — Phase 5.5 설명 정합
388
+
389
+ **Files:**
390
+ - Modify: `agents/SKILL.md` (Phase 5.5 section at line 198~212 area)
391
+
392
+ - [ ] **Step 1: Phase 5.5 본문 보강**
393
+
394
+ 기존 (line 200~210 부근, "Convergence is enabled by default..." 블록)을 다음으로 교체:
395
+
396
+ ````markdown
397
+ Convergence is enabled by default. Configure via task-manifest.json:
398
+
399
+ - `convergence.enabled`: true/false (default: true)
400
+ - `convergence.maxRounds`: 1–3 — **phase-aware default**: `1` for `requirements-discovery`, `2` for all other task types
401
+ - `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`)
402
+
403
+ When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolve the effective value via the phase-aware default above before entering Phase 5.5, and record the resolved value in the convergence state artifact at `config.effectiveMaxRounds`.
404
+
405
+ **Round 2 is gated, not unconditional.** Even when `effectiveMaxRounds == 2`, Round 2 runs only when (a) the verification queue is non-empty after Round 1, AND (b) at least one Round 1 reverify dispatch terminated as `completed`. Otherwise lead writes `round2SkippedReason` to the convergence state and proceeds to final classification. See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Round 2 gate" for the predicate.
406
+
407
+ **Confirmed findings are pruned from the queue.** Findings classified as `full-consensus`, `partial-consensus`, or `worker-unique` MUST NOT appear in any subsequent round's reverify prompt for any worker. `contested` is a final classification assigned only when the last executed round completes and the queue is still non-empty — it is NEVER an intermediate queue label.
408
+
409
+ If any re-verification batch yields a `verification-error` terminal status, or a worker result fails the contract, Lead MUST record one event per violation via `python3 scripts/okstra-error-log.py append-observed --error-type contract-violation --agent <offending-agent> ...`. Use `agent: "claude-lead"` only when the violation is detected internally without a specific worker.
410
+
411
+ If convergence is disabled, proceed directly to Phase 6 with the raw worker results.
412
+ ````
413
+
414
+ - [ ] **Step 2: "Common Mistakes" 표(line 264~286 부근)에 신규 행 2개 추가**
415
+
416
+ 기존 표의 마지막 행(`Skipping --substitute-final-report ...`) 바로 위에 다음 두 행을 추가한다.
417
+
418
+ ```markdown
419
+ | Re-sending confirmed findings (`full-consensus`/`partial-consensus`/`worker-unique`) to a worker in Round 2 | Queue pruning rule — see [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Round 1-N: Re-verification Loop (queue-pruned)" |
420
+ | Aggregating a `timeout`/`error` reverify dispatch as `DISAGREE` | Worker failure handling — record as `verification-error` and add to `skippedWorkers[]`. See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Worker failure handling in reverify" |
421
+ ```
422
+
423
+ - [ ] **Step 3: Commit**
424
+
425
+ ```bash
426
+ git add agents/SKILL.md
427
+ git commit -m "docs(agents): Phase 5.5 Round 2 gate and queue-pruning callouts
428
+
429
+ Spells out the Round 2 gate conditions (effectiveMaxRounds>=2, queue
430
+ non-empty, at least one round-1 completed dispatch), records the
431
+ queue-pruning invariant, and adds two new Common Mistakes rows for
432
+ re-sending confirmed findings and mis-aggregating worker-failure dispatches."
433
+ ```
434
+
435
+ ---
436
+
437
+ ## Task 6: report-writer SKILL — round history & skipped-reason 반영
438
+
439
+ **Files:**
440
+ - Modify: `skills/okstra-report-writer/SKILL.md` (Phase 6 dispatch template + Main Body Section)
441
+
442
+ - [ ] **Step 1: Phase 6 dispatch template 항목 9 보강**
443
+
444
+ 기존 (line 49 부근):
445
+
446
+ ```markdown
447
+ 9. The convergence classifications (Full/Partial/Contested/Worker-Unique) and pointers to all worker result files under `worker-results/`.
448
+ ```
449
+
450
+ 다음으로 교체:
451
+
452
+ ```markdown
453
+ 9. The convergence classifications (Full/Partial/Contested/Worker-Unique), the round history table (`roundHistory[]`), the `round2SkippedReason` value, and pointers to all worker result files under `worker-results/`. The report-writer worker must reproduce a Round History sub-table in Section 1 of the final report so the reader can see which rounds executed, queue sizes, and why Round 2 was (or was not) skipped.
454
+ ```
455
+
456
+ - [ ] **Step 2: Main Body Section 2 ("Cross Verification Results") description 보강**
457
+
458
+ 기존 (line 228~233 부근):
459
+
460
+ ```markdown
461
+ 2. **Cross Verification Results** (Use 4 categories when convergence is enabled, per `okstra-convergence`)
462
+ - Full Consensus: Findings agreed upon by all workers
463
+ - Partial Consensus: Agreed upon by a majority of workers; dissenting opinions are specified
464
+ - Contested: No consensus after max rounds; each worker's position specified
465
+ - Worker-Unique: Verified only by the discoverer; verification history specified
466
+ - In runs with convergence disabled, maintain the existing Consensus/Differences format
467
+ ```
468
+
469
+ 다음으로 교체:
470
+
471
+ ```markdown
472
+ 2. **Cross Verification Results** (Use 4 categories when convergence is enabled, per `okstra-convergence`)
473
+ - Round History sub-table (convergence-enabled runs only): one row per executed round with columns `Round | inputQueueSize | resolvedCount | carriedForwardCount | dispatches (worker:status:durationMs) | skippedWorkers (worker:reason)`. Add a one-line note immediately under the table with `round2SkippedReason: <value>` (always present, even when `"not-skipped"`). Pull all values verbatim from `convergence-<task-type>-<seq>.json`.
474
+ - Full Consensus: Findings agreed upon by all workers
475
+ - Partial Consensus: Agreed upon by a majority of workers; dissenting opinions are specified
476
+ - Contested: No consensus after the last executed round; each worker's position specified. Empty contested list is shown as the literal line `- 합의 미달 항목 없음.`
477
+ - Worker-Unique: Verified only by the discoverer; verification history specified
478
+ - In runs with convergence disabled, maintain the existing Consensus/Differences format and omit the Round History sub-table.
479
+ ```
480
+
481
+ - [ ] **Step 3: Writing Guidelines 항목 강화**
482
+
483
+ 기존 "Include the convergence round history and a summary of votes by worker for each finding" 라인을 다음 두 줄로 교체:
484
+
485
+ ```markdown
486
+ - Include the convergence round history sub-table (Section 1) so the reader can audit which rounds executed and why Round 2 was skipped. Pull values verbatim from `convergence-<task-type>-<seq>.json`; do NOT recompute.
487
+ - For each finding, include a brief summary of votes per worker across executed rounds. `verification-error` votes are listed as such — never as `DISAGREE`.
488
+ ```
489
+
490
+ - [ ] **Step 4: Commit**
491
+
492
+ ```bash
493
+ git add skills/okstra-report-writer/SKILL.md
494
+ git commit -m "docs(report-writer): require Round History sub-table in final report
495
+
496
+ The report-writer prompt template now demands a Round History sub-table with
497
+ queue sizes, dispatches, and skippedWorkers, plus an explicit
498
+ round2SkippedReason line. Writing guidelines mandate verbatim copy from the
499
+ convergence state and distinguish verification-error votes from DISAGREE."
500
+ ```
501
+
502
+ ---
503
+
504
+ ## Task 7: report-writer-worker agent — required-reading + authoring contract
505
+
506
+ **Files:**
507
+ - Modify: `agents/workers/report-writer-worker.md`
508
+
509
+ - [ ] **Step 1: "Required Reading Before Authoring" 섹션에 명시적 의무 추가**
510
+
511
+ 기존 (line 44~52 부근의 첫 단락 다음)에 아래 bullet 추가:
512
+
513
+ ```markdown
514
+ - When the convergence-state file is present, read it fully and reproduce the `roundHistory[]` array, `round2SkippedReason`, and `finalClassificationCounts` in the final report's Section 1 Round History sub-table. Do not derive these values from worker results alone — they live in `state/convergence-<task-type>-<seq>.json`.
515
+ ```
516
+
517
+ - [ ] **Step 2: Authoring Contract Hard rules 보강**
518
+
519
+ 기존 "Include all four convergence categories (Full Consensus, Partial Consensus, Contested, Worker-Unique)..." 행을 그대로 두고, 그 다음 줄에 추가:
520
+
521
+ ```markdown
522
+ - Include a Round History sub-table in Section 1 (one row per executed round) and a `round2SkippedReason` line below it. When convergence is disabled, omit both. The values are quoted verbatim from `state/convergence-<task-type>-<seq>.json` — do not recompute.
523
+ - Treat `verification-error` votes as their own verdict. They are listed in vote summaries as `verification-error`, not folded into AGREE/DISAGREE counts.
524
+ ```
525
+
526
+ - [ ] **Step 3: Notes 섹션 보강**
527
+
528
+ 기존 "If the analysis workers disagree and convergence ended with `Contested` items..." 행 바로 아래에 추가:
529
+
530
+ ```markdown
531
+ - `Contested` is a final-only classification. If you see findings labeled `Contested` in the convergence state, the lead has already exhausted re-verification — do not invent a synthesizing answer; surface each worker's position verbatim.
532
+ ```
533
+
534
+ - [ ] **Step 4: Commit**
535
+
536
+ ```bash
537
+ git add agents/workers/report-writer-worker.md
538
+ git commit -m "docs(report-writer-worker): require Round History reproduction
539
+
540
+ Forces the report-writer subagent to read the convergence state file
541
+ end-to-end, reproduce roundHistory[]/round2SkippedReason/finalClassificationCounts
542
+ in Section 1, and treat verification-error as its own verdict (not DISAGREE)."
543
+ ```
544
+
545
+ ---
546
+
547
+ ## Task 8: final-report 템플릿에 Round History sub-section 추가
548
+
549
+ **Files:**
550
+ - Modify: `templates/reports/final-report.template.md` (Section 1 Cross Verification Results 영역)
551
+
552
+ - [ ] **Step 1: 현재 Section 1 구조 확인**
553
+
554
+ Run: `grep -n "^## 1\|^### 1\." /Volumes/Workspaces/workspace/projects/Okstra/templates/reports/final-report.template.md`
555
+ Expected: lines `## 1. Cross Verification Results`, `### 1.1 Consensus`, `### 1.2 Differences`.
556
+
557
+ - [ ] **Step 2: `## 1. Cross Verification Results` 헤더 바로 아래(즉 `### 1.1 Consensus` 위)에 신규 `### 1.0 Round History (convergence-enabled runs only)` 서브섹션을 삽입**
558
+
559
+ ```markdown
560
+ ### 1.0 Round History (convergence-enabled runs only)
561
+
562
+ `state/convergence-<task-type>-<seq>.json` 의 값을 그대로 옮긴다. convergence가 비활성화된 run에서는 이 섹션 전체를 삭제한다.
563
+
564
+ | Round | inputQueueSize | resolvedCount | carriedForwardCount | dispatches (worker:status:durationMs) | skippedWorkers (worker:reason) |
565
+ |-------|----------------|---------------|----------------------|----------------------------------------|---------------------------------|
566
+ | 1 | 3 | 2 | 1 | codex-worker:completed:184221, gemini-worker:completed:201337 | claude-worker:no-items |
567
+ | 2 | 1 | 1 | 0 | claude-worker:completed:92110 | -- |
568
+
569
+ - `round2SkippedReason`: `not-skipped` ← 값은 `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped` 중 하나.
570
+ - 실행된 round 수가 0 (Round 0에서 모든 finding이 곧장 full-consensus 가 된 경우) 이면 표 대신 한 줄로 적는다 — `- Round 0 grouping에서 모든 finding이 합의되어 재검증 라운드가 실행되지 않았습니다.`
571
+
572
+ ```
573
+
574
+ - [ ] **Step 3: Commit**
575
+
576
+ ```bash
577
+ git add templates/reports/final-report.template.md
578
+ git commit -m "docs(template): Section 1.0 Round History sub-table for convergence runs
579
+
580
+ Adds a 1.0 Round History table with queue sizes, dispatches, and
581
+ skippedWorkers, plus an explicit round2SkippedReason line. The block is
582
+ omitted when convergence is disabled. Aligns the template with the
583
+ queue-pruning algorithm contract."
584
+ ```
585
+
586
+ ---
587
+
588
+ ## Task 9: Fixture 1 — early convergence (Round 1 exit)
589
+
590
+ **Files:**
591
+ - Create: `tests/fixtures/convergence/early-exit.json`
592
+
593
+ - [ ] **Step 1: 디렉터리 준비**
594
+
595
+ ```bash
596
+ mkdir -p /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence
597
+ ```
598
+
599
+ - [ ] **Step 2: 파일 작성**
600
+
601
+ `tests/fixtures/convergence/early-exit.json`:
602
+
603
+ ```json
604
+ {
605
+ "schemaVersion": "1.1",
606
+ "taskKey": "fixture/early-exit",
607
+ "config": {
608
+ "enabled": true,
609
+ "maxRounds": 2,
610
+ "effectiveMaxRounds": 2,
611
+ "verificationMode": "lightweight"
612
+ },
613
+ "findings": [
614
+ {
615
+ "findingId": "F-001",
616
+ "summary": "Missing input validation on /api/login",
617
+ "category": "bug",
618
+ "ticketIds": ["EX-100"],
619
+ "originWorker": "codex-worker",
620
+ "originEvidence": "src/auth/login.ts:42",
621
+ "classification": "full-consensus",
622
+ "rounds": [
623
+ {
624
+ "round": 1,
625
+ "votes": {
626
+ "claude-worker": {"verdict": "agree", "explanation": "confirmed"},
627
+ "gemini-worker": {"verdict": "supplement", "explanation": "also missing CSRF check"}
628
+ }
629
+ }
630
+ ],
631
+ "consensusWorkers": ["codex-worker", "claude-worker", "gemini-worker"],
632
+ "dissentingWorkers": []
633
+ },
634
+ {
635
+ "findingId": "F-002",
636
+ "summary": "Stale dependency lockfile",
637
+ "category": "risk",
638
+ "ticketIds": ["EX-101"],
639
+ "originWorker": "claude-worker",
640
+ "originEvidence": "package-lock.json:1",
641
+ "classification": "full-consensus",
642
+ "rounds": [
643
+ {
644
+ "round": 1,
645
+ "votes": {
646
+ "codex-worker": {"verdict": "agree", "explanation": "confirmed"},
647
+ "gemini-worker": {"verdict": "agree", "explanation": "confirmed"}
648
+ }
649
+ }
650
+ ],
651
+ "consensusWorkers": ["claude-worker", "codex-worker", "gemini-worker"],
652
+ "dissentingWorkers": []
653
+ }
654
+ ],
655
+ "roundHistory": [
656
+ {
657
+ "round": 1,
658
+ "inputQueueSize": 2,
659
+ "resolvedCount": 2,
660
+ "carriedForwardCount": 0,
661
+ "dispatches": [
662
+ {"worker": "claude-worker", "status": "completed", "durationMs": 92110},
663
+ {"worker": "codex-worker", "status": "completed", "durationMs": 184221},
664
+ {"worker": "gemini-worker", "status": "completed", "durationMs": 201337}
665
+ ],
666
+ "skippedWorkers": [],
667
+ "verificationsRequested": 3,
668
+ "verificationsCompleted": 3,
669
+ "newConsensus": 2,
670
+ "remainingInQueue": 0,
671
+ "earlyExit": true
672
+ }
673
+ ],
674
+ "round2SkippedReason": "queue-empty",
675
+ "finalState": "converged",
676
+ "totalRounds": 1,
677
+ "finalClassificationCounts": {
678
+ "fullConsensus": 2,
679
+ "partialConsensus": 0,
680
+ "contested": 0,
681
+ "workerUnique": 0
682
+ },
683
+ "summary": {
684
+ "fullConsensus": 2,
685
+ "partialConsensus": 0,
686
+ "contested": 0,
687
+ "workerUnique": 0
688
+ }
689
+ }
690
+ ```
691
+
692
+ - [ ] **Step 3: JSON 유효성 즉시 검증**
693
+
694
+ Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/early-exit.json'))"`
695
+ Expected: 종료 코드 0, 출력 없음.
696
+
697
+ - [ ] **Step 4: Commit**
698
+
699
+ ```bash
700
+ git add tests/fixtures/convergence/early-exit.json
701
+ git commit -m "test(convergence): fixture for Round 1 early-exit (queue-empty)
702
+
703
+ Two findings reach full-consensus in Round 1; queue empties so Round 2 is
704
+ skipped with round2SkippedReason=queue-empty. effectiveMaxRounds=2,
705
+ totalRounds=1."
706
+ ```
707
+
708
+ ---
709
+
710
+ ## Task 10: Fixture 2 — mixed/unresolved Round 2
711
+
712
+ **Files:**
713
+ - Create: `tests/fixtures/convergence/mixed-round2.json`
714
+
715
+ - [ ] **Step 1: 파일 작성**
716
+
717
+ `tests/fixtures/convergence/mixed-round2.json`:
718
+
719
+ ```json
720
+ {
721
+ "schemaVersion": "1.1",
722
+ "taskKey": "fixture/mixed-round2",
723
+ "config": {
724
+ "enabled": true,
725
+ "maxRounds": 2,
726
+ "effectiveMaxRounds": 2,
727
+ "verificationMode": "lightweight"
728
+ },
729
+ "findings": [
730
+ {
731
+ "findingId": "F-010",
732
+ "summary": "Race condition in session refresh",
733
+ "category": "bug",
734
+ "ticketIds": ["EX-200"],
735
+ "originWorker": "codex-worker",
736
+ "originEvidence": "src/session/refresh.ts:88",
737
+ "classification": "full-consensus",
738
+ "rounds": [
739
+ {
740
+ "round": 1,
741
+ "votes": {
742
+ "claude-worker": {"verdict": "agree", "explanation": "confirmed"},
743
+ "gemini-worker": {"verdict": "supplement", "explanation": "additional repro path"}
744
+ }
745
+ }
746
+ ],
747
+ "consensusWorkers": ["codex-worker", "claude-worker", "gemini-worker"],
748
+ "dissentingWorkers": []
749
+ },
750
+ {
751
+ "findingId": "F-011",
752
+ "summary": "Inconsistent error message on 401",
753
+ "category": "observation",
754
+ "ticketIds": ["EX-201"],
755
+ "originWorker": "claude-worker",
756
+ "originEvidence": "src/api/auth.ts:120",
757
+ "classification": "contested",
758
+ "rounds": [
759
+ {
760
+ "round": 1,
761
+ "votes": {
762
+ "codex-worker": {"verdict": "agree", "explanation": "confirmed"},
763
+ "gemini-worker": {"verdict": "disagree", "explanation": "expected behavior per spec"}
764
+ }
765
+ },
766
+ {
767
+ "round": 2,
768
+ "votes": {
769
+ "codex-worker": {"verdict": "agree", "explanation": "still confirmed"},
770
+ "gemini-worker": {"verdict": "disagree", "explanation": "see RFC 7235"}
771
+ }
772
+ }
773
+ ],
774
+ "consensusWorkers": ["claude-worker", "codex-worker"],
775
+ "dissentingWorkers": ["gemini-worker"]
776
+ }
777
+ ],
778
+ "roundHistory": [
779
+ {
780
+ "round": 1,
781
+ "inputQueueSize": 2,
782
+ "resolvedCount": 1,
783
+ "carriedForwardCount": 1,
784
+ "dispatches": [
785
+ {"worker": "claude-worker", "status": "completed", "durationMs": 88012},
786
+ {"worker": "codex-worker", "status": "completed", "durationMs": 175044},
787
+ {"worker": "gemini-worker", "status": "completed", "durationMs": 199820}
788
+ ],
789
+ "skippedWorkers": [],
790
+ "verificationsRequested": 3,
791
+ "verificationsCompleted": 3,
792
+ "newConsensus": 1,
793
+ "remainingInQueue": 1,
794
+ "earlyExit": false
795
+ },
796
+ {
797
+ "round": 2,
798
+ "inputQueueSize": 1,
799
+ "resolvedCount": 0,
800
+ "carriedForwardCount": 1,
801
+ "dispatches": [
802
+ {"worker": "codex-worker", "status": "completed", "durationMs": 110002},
803
+ {"worker": "gemini-worker", "status": "completed", "durationMs": 125110}
804
+ ],
805
+ "skippedWorkers": [
806
+ {"worker": "claude-worker", "reason": "no items to verify"}
807
+ ],
808
+ "verificationsRequested": 2,
809
+ "verificationsCompleted": 2,
810
+ "newConsensus": 0,
811
+ "remainingInQueue": 1,
812
+ "earlyExit": false
813
+ }
814
+ ],
815
+ "round2SkippedReason": "not-skipped",
816
+ "finalState": "max-rounds-reached",
817
+ "totalRounds": 2,
818
+ "finalClassificationCounts": {
819
+ "fullConsensus": 1,
820
+ "partialConsensus": 0,
821
+ "contested": 1,
822
+ "workerUnique": 0
823
+ },
824
+ "summary": {
825
+ "fullConsensus": 1,
826
+ "partialConsensus": 0,
827
+ "contested": 1,
828
+ "workerUnique": 0
829
+ }
830
+ }
831
+ ```
832
+
833
+ - [ ] **Step 2: JSON 유효성 검증**
834
+
835
+ Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/mixed-round2.json'))"`
836
+ Expected: 종료 코드 0.
837
+
838
+ - [ ] **Step 3: 핵심 invariant 수기 검증**
839
+
840
+ Round 2 입력 큐(`inputQueueSize=1`) 는 Round 1 출력 큐(`carriedForwardCount=1`)와 일치해야 한다. 그리고 `F-010` (Round 1에서 full-consensus 확정) 의 `rounds` 배열에 round 2 항목이 없어야 한다 — confirmed finding은 다시 prompt에 들어가지 않는다는 invariant 가 fixture에 반영되었는지 확인.
841
+
842
+ - [ ] **Step 4: Commit**
843
+
844
+ ```bash
845
+ git add tests/fixtures/convergence/mixed-round2.json
846
+ git commit -m "test(convergence): fixture for mixed Round 1 → unresolved Round 2
847
+
848
+ F-010 resolves in Round 1; F-011 remains in queue (claude+codex agree,
849
+ gemini disagrees), enters Round 2 which still cannot resolve it →
850
+ classified as contested. claude-worker is skipped in Round 2 because the
851
+ only queued item is its own. round2SkippedReason=not-skipped."
852
+ ```
853
+
854
+ ---
855
+
856
+ ## Task 11: Fixture 3 — worker-failure (all reverify non-result)
857
+
858
+ **Files:**
859
+ - Create: `tests/fixtures/convergence/reverify-all-failed.json`
860
+
861
+ - [ ] **Step 1: 파일 작성**
862
+
863
+ `tests/fixtures/convergence/reverify-all-failed.json`:
864
+
865
+ ```json
866
+ {
867
+ "schemaVersion": "1.1",
868
+ "taskKey": "fixture/reverify-all-failed",
869
+ "config": {
870
+ "enabled": true,
871
+ "maxRounds": 2,
872
+ "effectiveMaxRounds": 2,
873
+ "verificationMode": "lightweight"
874
+ },
875
+ "findings": [
876
+ {
877
+ "findingId": "F-020",
878
+ "summary": "Possible memory leak in long-running session",
879
+ "category": "risk",
880
+ "ticketIds": ["EX-300"],
881
+ "originWorker": "codex-worker",
882
+ "originEvidence": "src/session/cache.ts:200",
883
+ "classification": "contested",
884
+ "rounds": [
885
+ {
886
+ "round": 1,
887
+ "votes": {
888
+ "claude-worker": {"verdict": "verification-error", "explanation": "dispatch timeout after 900s"},
889
+ "gemini-worker": {"verdict": "verification-error", "explanation": "CLI exit 137 (OOM)"}
890
+ }
891
+ }
892
+ ],
893
+ "consensusWorkers": ["codex-worker"],
894
+ "dissentingWorkers": []
895
+ }
896
+ ],
897
+ "roundHistory": [
898
+ {
899
+ "round": 1,
900
+ "inputQueueSize": 1,
901
+ "resolvedCount": 0,
902
+ "carriedForwardCount": 1,
903
+ "dispatches": [
904
+ {"worker": "claude-worker", "status": "timeout", "durationMs": 900000},
905
+ {"worker": "gemini-worker", "status": "error", "durationMs": 14210}
906
+ ],
907
+ "skippedWorkers": [
908
+ {"worker": "claude-worker", "reason": "dispatch-non-result", "terminalStatus": "timeout"},
909
+ {"worker": "gemini-worker", "reason": "dispatch-non-result", "terminalStatus": "error"}
910
+ ],
911
+ "verificationsRequested": 2,
912
+ "verificationsCompleted": 0,
913
+ "newConsensus": 0,
914
+ "remainingInQueue": 1,
915
+ "earlyExit": false
916
+ }
917
+ ],
918
+ "round2SkippedReason": "all-reverify-non-result",
919
+ "finalState": "aborted-non-result",
920
+ "totalRounds": 1,
921
+ "finalClassificationCounts": {
922
+ "fullConsensus": 0,
923
+ "partialConsensus": 0,
924
+ "contested": 1,
925
+ "workerUnique": 0
926
+ },
927
+ "summary": {
928
+ "fullConsensus": 0,
929
+ "partialConsensus": 0,
930
+ "contested": 1,
931
+ "workerUnique": 0
932
+ }
933
+ }
934
+ ```
935
+
936
+ - [ ] **Step 2: JSON 유효성 검증**
937
+
938
+ Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/reverify-all-failed.json'))"`
939
+ Expected: 종료 코드 0.
940
+
941
+ - [ ] **Step 3: Commit**
942
+
943
+ ```bash
944
+ git add tests/fixtures/convergence/reverify-all-failed.json
945
+ git commit -m "test(convergence): fixture for all-reverify-non-result Round 1 abort
946
+
947
+ Single finding remains unresolved because both reverify dispatches return
948
+ terminal non-result (timeout + error). Round 2 is suppressed with
949
+ round2SkippedReason=all-reverify-non-result; finalState=aborted-non-result.
950
+ Verdicts are recorded as verification-error, not DISAGREE."
951
+ ```
952
+
953
+ ---
954
+
955
+ ## Task 12: Convergence schema contract test (pytest)
956
+
957
+ **Files:**
958
+ - Create: `tests/test_convergence_state_contract.py`
959
+
960
+ 이 task는 코드를 작성하므로 TDD로 진행한다. 픽스처는 이미 Tasks 9~11에서 존재한다고 가정한다.
961
+
962
+ - [ ] **Step 1: 실패하는 첫 테스트 — schemaVersion 검사**
963
+
964
+ `tests/test_convergence_state_contract.py`:
965
+
966
+ ```python
967
+ """Contract tests for convergence-<task-type>-<seq>.json (schema v1.1).
968
+
969
+ Convergence state is a documentation contract — no production code reads it.
970
+ These tests check that fixtures shipped under tests/fixtures/convergence/
971
+ respect the v1.1 invariants documented in
972
+ skills/okstra-convergence/SKILL.md "Convergence State Artifact".
973
+ """
974
+ from __future__ import annotations
975
+
976
+ import json
977
+ from pathlib import Path
978
+
979
+ import pytest
980
+
981
+ FIXTURE_DIR = Path(__file__).parent / "fixtures" / "convergence"
982
+ ALL_FIXTURES = sorted(FIXTURE_DIR.glob("*.json"))
983
+
984
+ VALID_ROUND2_SKIP_REASONS = {
985
+ "queue-empty",
986
+ "max-rounds-1",
987
+ "all-reverify-non-result",
988
+ "not-skipped",
989
+ }
990
+ VALID_FINAL_STATES = {"converged", "max-rounds-reached", "aborted-non-result"}
991
+ VALID_CLASSIFICATIONS = {
992
+ "full-consensus",
993
+ "partial-consensus",
994
+ "contested",
995
+ "worker-unique",
996
+ }
997
+ VALID_DISPATCH_STATUSES = {"completed", "timeout", "error", "not-run"}
998
+ VALID_VERDICTS = {"agree", "disagree", "supplement", "verification-error"}
999
+
1000
+
1001
+ @pytest.fixture(params=ALL_FIXTURES, ids=lambda p: p.stem)
1002
+ def fixture(request) -> dict:
1003
+ return json.loads(request.param.read_text())
1004
+
1005
+
1006
+ def test_schema_version_is_1_1(fixture):
1007
+ assert fixture["schemaVersion"] == "1.1"
1008
+ ```
1009
+
1010
+ - [ ] **Step 2: 첫 테스트 실행 (이 단계의 fail/pass 둘 다 진행에 유효)**
1011
+
1012
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_convergence_state_contract.py -v`
1013
+ Expected: 3 fixtures × 1 test = 3 PASS (이미 Task 9~11에서 `"schemaVersion": "1.1"`로 작성했기 때문). 그렇지 않다면 픽스처 오류 — 픽스처를 먼저 고친다.
1014
+
1015
+ - [ ] **Step 3: config / round2SkippedReason / finalState 테스트 추가**
1016
+
1017
+ `tests/test_convergence_state_contract.py` 끝에 다음을 추가:
1018
+
1019
+ ```python
1020
+ def test_effective_max_rounds_is_one_to_three(fixture):
1021
+ e = fixture["config"]["effectiveMaxRounds"]
1022
+ assert isinstance(e, int) and 1 <= e <= 3
1023
+
1024
+
1025
+ def test_round2_skipped_reason_is_enum(fixture):
1026
+ assert fixture["round2SkippedReason"] in VALID_ROUND2_SKIP_REASONS
1027
+
1028
+
1029
+ def test_final_state_is_enum(fixture):
1030
+ assert fixture["finalState"] in VALID_FINAL_STATES
1031
+
1032
+
1033
+ def test_final_classification_counts_keys_present(fixture):
1034
+ counts = fixture["finalClassificationCounts"]
1035
+ assert set(counts.keys()) == {
1036
+ "fullConsensus",
1037
+ "partialConsensus",
1038
+ "contested",
1039
+ "workerUnique",
1040
+ }
1041
+ for v in counts.values():
1042
+ assert isinstance(v, int) and v >= 0
1043
+ ```
1044
+
1045
+ - [ ] **Step 4: 라운드별 invariant 테스트 추가**
1046
+
1047
+ ```python
1048
+ def test_round_arithmetic_consistent(fixture):
1049
+ """inputQueueSize == resolvedCount + carriedForwardCount."""
1050
+ for r in fixture["roundHistory"]:
1051
+ assert r["inputQueueSize"] == r["resolvedCount"] + r["carriedForwardCount"], (
1052
+ f"round {r['round']}: {r['inputQueueSize']} != {r['resolvedCount']} + {r['carriedForwardCount']}"
1053
+ )
1054
+
1055
+
1056
+ def test_round_input_matches_previous_carry_forward(fixture):
1057
+ """Round N+1 inputQueueSize equals Round N carriedForwardCount."""
1058
+ rounds = fixture["roundHistory"]
1059
+ for prev, curr in zip(rounds, rounds[1:]):
1060
+ assert curr["inputQueueSize"] == prev["carriedForwardCount"], (
1061
+ f"round {curr['round']} input {curr['inputQueueSize']} "
1062
+ f"!= round {prev['round']} carry {prev['carriedForwardCount']}"
1063
+ )
1064
+
1065
+
1066
+ def test_dispatch_statuses_are_terminal(fixture):
1067
+ for r in fixture["roundHistory"]:
1068
+ for d in r["dispatches"]:
1069
+ assert d["status"] in VALID_DISPATCH_STATUSES
1070
+ assert isinstance(d["durationMs"], int) and d["durationMs"] >= 0
1071
+
1072
+
1073
+ def test_total_rounds_matches_round_history(fixture):
1074
+ assert fixture["totalRounds"] == len(fixture["roundHistory"])
1075
+ ```
1076
+
1077
+ - [ ] **Step 5: classification / queue pruning invariant 테스트 추가**
1078
+
1079
+ ```python
1080
+ def test_findings_classifications_are_enum(fixture):
1081
+ for f in fixture["findings"]:
1082
+ assert f["classification"] in VALID_CLASSIFICATIONS
1083
+
1084
+
1085
+ def test_confirmed_findings_dont_reappear_after_classification_round(fixture):
1086
+ """A finding classified as full/partial/worker-unique must not have a
1087
+ `rounds` entry after the round in which it was resolved.
1088
+
1089
+ The fixture's `rounds[]` array is the per-finding vote ledger. A
1090
+ re-prompt that re-included a confirmed item would surface as extra
1091
+ rounds in this ledger. `contested` items may legitimately appear in
1092
+ every round (queue carry-forward).
1093
+ """
1094
+ for f in fixture["findings"]:
1095
+ if f["classification"] == "contested":
1096
+ continue
1097
+ if not f["rounds"]:
1098
+ continue
1099
+ # The last round in `rounds[]` is the resolution round. There must
1100
+ # be no entries beyond it. Equivalent to: every classification
1101
+ # other than contested resolves in its last recorded round.
1102
+ resolution_round = f["rounds"][-1]["round"]
1103
+ for r in f["rounds"]:
1104
+ assert r["round"] <= resolution_round
1105
+
1106
+
1107
+ def test_contested_only_when_last_executed_round_done(fixture):
1108
+ """A finding labeled `contested` must have a `rounds[]` entry whose
1109
+ round index equals `totalRounds` (the last executed round)."""
1110
+ for f in fixture["findings"]:
1111
+ if f["classification"] != "contested":
1112
+ continue
1113
+ last_voted = max(r["round"] for r in f["rounds"])
1114
+ assert last_voted == fixture["totalRounds"], (
1115
+ f"{f['findingId']} contested but last vote in round "
1116
+ f"{last_voted}, while totalRounds={fixture['totalRounds']}"
1117
+ )
1118
+
1119
+
1120
+ def test_verdicts_are_enum(fixture):
1121
+ for f in fixture["findings"]:
1122
+ for r in f["rounds"]:
1123
+ for vote in r["votes"].values():
1124
+ assert vote["verdict"] in VALID_VERDICTS
1125
+
1126
+
1127
+ def test_classification_counts_match_findings(fixture):
1128
+ counts = {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0}
1129
+ mapping = {
1130
+ "full-consensus": "fullConsensus",
1131
+ "partial-consensus": "partialConsensus",
1132
+ "contested": "contested",
1133
+ "worker-unique": "workerUnique",
1134
+ }
1135
+ for f in fixture["findings"]:
1136
+ counts[mapping[f["classification"]]] += 1
1137
+ assert counts == fixture["finalClassificationCounts"]
1138
+ assert counts == fixture["summary"] # legacy alias parity
1139
+ ```
1140
+
1141
+ - [ ] **Step 6: round2SkippedReason 의미적 일관성 테스트 추가**
1142
+
1143
+ ```python
1144
+ def test_skip_reason_consistent_with_round_history(fixture):
1145
+ reason = fixture["round2SkippedReason"]
1146
+ rounds = fixture["roundHistory"]
1147
+ effective_max = fixture["config"]["effectiveMaxRounds"]
1148
+ last_round = rounds[-1]
1149
+
1150
+ if reason == "queue-empty":
1151
+ assert last_round["carriedForwardCount"] == 0
1152
+ assert last_round["round"] < effective_max
1153
+ elif reason == "max-rounds-1":
1154
+ assert effective_max == 1
1155
+ elif reason == "all-reverify-non-result":
1156
+ # every dispatch in the last round was non-completed
1157
+ assert all(d["status"] != "completed" for d in last_round["dispatches"])
1158
+ assert fixture["finalState"] == "aborted-non-result"
1159
+ elif reason == "not-skipped":
1160
+ # round 2 executed (or effective_max == 2 and last round was 2)
1161
+ # AND finalState is converged or max-rounds-reached
1162
+ assert fixture["finalState"] in {"converged", "max-rounds-reached"}
1163
+ ```
1164
+
1165
+ - [ ] **Step 7: 전체 테스트 실행**
1166
+
1167
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_convergence_state_contract.py -v`
1168
+ Expected: 3 fixtures × 10 tests = **30 PASS**. 실패가 있으면 fixture 또는 test의 invariant가 잘못된 것 — fixture가 schema v1.1 contract를 정확히 모델링하지 못한 부분이므로 fixture를 먼저 의심한다.
1169
+
1170
+ - [ ] **Step 8: Commit**
1171
+
1172
+ ```bash
1173
+ git add tests/test_convergence_state_contract.py
1174
+ git commit -m "test(convergence): contract tests for schema v1.1 invariants
1175
+
1176
+ Parametrized over the three convergence fixtures. Asserts: schemaVersion,
1177
+ effectiveMaxRounds range, round2SkippedReason enum, finalState enum,
1178
+ finalClassificationCounts shape, per-round arithmetic
1179
+ (inputQueueSize == resolvedCount + carriedForwardCount), round carry-forward
1180
+ chain, dispatch statuses, verdict enum, contested-only-at-last-round, and
1181
+ skip-reason / round-history consistency."
1182
+ ```
1183
+
1184
+ ---
1185
+
1186
+ ## Task 13: Baseline 계측 helper (`scripts/okstra-convergence-stats.py`)
1187
+
1188
+ **Files:**
1189
+ - Create: `scripts/okstra-convergence-stats.py`
1190
+ - Test: `tests/test_okstra_convergence_stats.py`
1191
+
1192
+ P1 효과(전후 비교) 측정용 helper. TDD로 작성.
1193
+
1194
+ - [ ] **Step 1: 실패하는 첫 테스트 작성**
1195
+
1196
+ `tests/test_okstra_convergence_stats.py`:
1197
+
1198
+ ```python
1199
+ """Tests for scripts/okstra-convergence-stats.py — baseline metrics helper."""
1200
+ from __future__ import annotations
1201
+
1202
+ import json
1203
+ import subprocess
1204
+ import sys
1205
+ from pathlib import Path
1206
+
1207
+ REPO = Path(__file__).resolve().parents[1]
1208
+ SCRIPT = REPO / "scripts" / "okstra-convergence-stats.py"
1209
+
1210
+
1211
+ def run_stats(*args: str) -> dict:
1212
+ """Invoke the script with --json and return parsed stdout."""
1213
+ proc = subprocess.run(
1214
+ [sys.executable, str(SCRIPT), *args, "--json"],
1215
+ check=True,
1216
+ capture_output=True,
1217
+ text=True,
1218
+ )
1219
+ return json.loads(proc.stdout)
1220
+
1221
+
1222
+ def test_reads_convergence_only(tmp_path):
1223
+ """When team-state is omitted, the script still reports convergence-side metrics."""
1224
+ fx = REPO / "tests" / "fixtures" / "convergence" / "early-exit.json"
1225
+ out = run_stats("--convergence-state", str(fx))
1226
+ assert out["roundCount"] == 1
1227
+ assert out["dispatchCount"] == 3
1228
+ assert out["dispatchDurationMsTotal"] == 92110 + 184221 + 201337
1229
+ assert out["round2SkippedReason"] == "queue-empty"
1230
+ assert out["finalClassificationCounts"]["fullConsensus"] == 2
1231
+ ```
1232
+
1233
+ - [ ] **Step 2: 테스트 실행 — 실패 확인**
1234
+
1235
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
1236
+ Expected: FAIL — `scripts/okstra-convergence-stats.py` 가 아직 없음.
1237
+
1238
+ - [ ] **Step 3: 최소 구현 — convergence-state 단독 모드**
1239
+
1240
+ `scripts/okstra-convergence-stats.py`:
1241
+
1242
+ ```python
1243
+ #!/usr/bin/env python3
1244
+ """Baseline metrics for okstra Phase 5.5 convergence.
1245
+
1246
+ Aggregates dispatch counts, wall-clock totals, and (when team-state is
1247
+ supplied) worker token usage filtered to reverify dispatches. Reads:
1248
+
1249
+ - convergence-<task-type>-<seq>.json (required)
1250
+ - team-state-<task-type>-<seq>.json (optional; for token deltas)
1251
+
1252
+ Output: human-readable table by default, JSON when --json is passed. Used
1253
+ to record before/after numbers for the P1 convergence queue-pruning change.
1254
+ """
1255
+ from __future__ import annotations
1256
+
1257
+ import argparse
1258
+ import json
1259
+ import sys
1260
+ from pathlib import Path
1261
+
1262
+
1263
+ def collect(convergence_path: Path, team_state_path: Path | None) -> dict:
1264
+ convergence = json.loads(convergence_path.read_text())
1265
+
1266
+ dispatch_count = 0
1267
+ duration_total = 0
1268
+ for r in convergence["roundHistory"]:
1269
+ for d in r["dispatches"]:
1270
+ dispatch_count += 1
1271
+ duration_total += int(d.get("durationMs", 0))
1272
+
1273
+ out = {
1274
+ "schemaVersion": convergence.get("schemaVersion"),
1275
+ "taskKey": convergence.get("taskKey"),
1276
+ "effectiveMaxRounds": convergence["config"]["effectiveMaxRounds"],
1277
+ "roundCount": len(convergence["roundHistory"]),
1278
+ "dispatchCount": dispatch_count,
1279
+ "dispatchDurationMsTotal": duration_total,
1280
+ "round2SkippedReason": convergence.get("round2SkippedReason"),
1281
+ "finalState": convergence.get("finalState"),
1282
+ "finalClassificationCounts": convergence.get(
1283
+ "finalClassificationCounts", convergence.get("summary", {})
1284
+ ),
1285
+ "reverifyTokenTotal": None,
1286
+ "reverifyCostUsdTotal": None,
1287
+ }
1288
+
1289
+ if team_state_path is not None and team_state_path.exists():
1290
+ team = json.loads(team_state_path.read_text())
1291
+ reverify_tokens = 0
1292
+ reverify_cost = 0.0
1293
+ for w in team.get("workers", []):
1294
+ usage = w.get("usage", {})
1295
+ agent_name = (w.get("agentName") or "")
1296
+ # Reverify dispatches use the `-reverify-r<N>-` slug per
1297
+ # okstra-convergence "Re-verification Agent Dispatch".
1298
+ if "-reverify-r" not in agent_name:
1299
+ continue
1300
+ reverify_tokens += int(usage.get("totalTokens", 0) or 0)
1301
+ reverify_cost += float(usage.get("estimatedCostUsd", 0) or 0)
1302
+ out["reverifyTokenTotal"] = reverify_tokens
1303
+ out["reverifyCostUsdTotal"] = round(reverify_cost, 4)
1304
+
1305
+ return out
1306
+
1307
+
1308
+ def format_human(stats: dict) -> str:
1309
+ lines = [
1310
+ f"taskKey : {stats.get('taskKey')}",
1311
+ f"schemaVersion : {stats.get('schemaVersion')}",
1312
+ f"effectiveMaxRounds : {stats['effectiveMaxRounds']}",
1313
+ f"roundCount : {stats['roundCount']}",
1314
+ f"dispatchCount : {stats['dispatchCount']}",
1315
+ f"dispatchDurationMsTotal: {stats['dispatchDurationMsTotal']}",
1316
+ f"round2SkippedReason : {stats['round2SkippedReason']}",
1317
+ f"finalState : {stats['finalState']}",
1318
+ f"finalClassificationCounts: {stats['finalClassificationCounts']}",
1319
+ ]
1320
+ if stats["reverifyTokenTotal"] is not None:
1321
+ lines.append(f"reverifyTokenTotal : {stats['reverifyTokenTotal']}")
1322
+ lines.append(f"reverifyCostUsdTotal : ${stats['reverifyCostUsdTotal']:.4f}")
1323
+ return "\n".join(lines)
1324
+
1325
+
1326
+ def main(argv: list[str]) -> int:
1327
+ p = argparse.ArgumentParser(description=__doc__)
1328
+ p.add_argument("--convergence-state", required=True, type=Path)
1329
+ p.add_argument("--team-state", type=Path, default=None)
1330
+ p.add_argument("--json", action="store_true", help="emit JSON to stdout")
1331
+ args = p.parse_args(argv)
1332
+
1333
+ if not args.convergence_state.exists():
1334
+ print(f"error: convergence state not found: {args.convergence_state}", file=sys.stderr)
1335
+ return 2
1336
+
1337
+ stats = collect(args.convergence_state, args.team_state)
1338
+
1339
+ if args.json:
1340
+ print(json.dumps(stats, ensure_ascii=False, indent=2))
1341
+ else:
1342
+ print(format_human(stats))
1343
+ return 0
1344
+
1345
+
1346
+ if __name__ == "__main__":
1347
+ sys.exit(main(sys.argv[1:]))
1348
+ ```
1349
+
1350
+ - [ ] **Step 4: 실행 권한 부여**
1351
+
1352
+ Run: `chmod +x /Volumes/Workspaces/workspace/projects/Okstra/scripts/okstra-convergence-stats.py`
1353
+ Expected: 출력 없음.
1354
+
1355
+ - [ ] **Step 5: 테스트 재실행 — PASS 확인**
1356
+
1357
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
1358
+ Expected: 1 passed.
1359
+
1360
+ - [ ] **Step 6: 두 번째 테스트 — team-state 포함 시 reverify 토큰 집계**
1361
+
1362
+ `tests/test_okstra_convergence_stats.py` 끝에 추가:
1363
+
1364
+ ```python
1365
+ def test_reverify_token_aggregation(tmp_path):
1366
+ convergence = tmp_path / "convergence.json"
1367
+ convergence.write_text(json.dumps({
1368
+ "schemaVersion": "1.1",
1369
+ "taskKey": "fixture/tokens",
1370
+ "config": {"enabled": True, "maxRounds": 2, "effectiveMaxRounds": 2, "verificationMode": "lightweight"},
1371
+ "findings": [],
1372
+ "roundHistory": [
1373
+ {
1374
+ "round": 1,
1375
+ "inputQueueSize": 0,
1376
+ "resolvedCount": 0,
1377
+ "carriedForwardCount": 0,
1378
+ "dispatches": [
1379
+ {"worker": "codex-worker", "status": "completed", "durationMs": 1000}
1380
+ ],
1381
+ "skippedWorkers": [],
1382
+ "verificationsRequested": 1,
1383
+ "verificationsCompleted": 1,
1384
+ "newConsensus": 0,
1385
+ "remainingInQueue": 0,
1386
+ "earlyExit": True
1387
+ }
1388
+ ],
1389
+ "round2SkippedReason": "queue-empty",
1390
+ "finalState": "converged",
1391
+ "totalRounds": 1,
1392
+ "finalClassificationCounts": {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0},
1393
+ "summary": {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0}
1394
+ }))
1395
+
1396
+ team = tmp_path / "team-state.json"
1397
+ team.write_text(json.dumps({
1398
+ "workers": [
1399
+ {"agentName": "codex-worker-error-analysis-001", "usage": {"totalTokens": 9999, "estimatedCostUsd": 0.10}},
1400
+ {"agentName": "codex-worker-reverify-r1-error-analysis-001", "usage": {"totalTokens": 5000, "estimatedCostUsd": 0.06}},
1401
+ {"agentName": "gemini-worker-reverify-r1-error-analysis-001", "usage": {"totalTokens": 3000, "estimatedCostUsd": 0.04}}
1402
+ ]
1403
+ }))
1404
+
1405
+ out = run_stats("--convergence-state", str(convergence), "--team-state", str(team))
1406
+ assert out["reverifyTokenTotal"] == 8000
1407
+ assert out["reverifyCostUsdTotal"] == 0.10
1408
+ ```
1409
+
1410
+ - [ ] **Step 7: 테스트 재실행**
1411
+
1412
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
1413
+ Expected: 2 passed.
1414
+
1415
+ - [ ] **Step 8: Commit**
1416
+
1417
+ ```bash
1418
+ git add scripts/okstra-convergence-stats.py tests/test_okstra_convergence_stats.py
1419
+ git commit -m "feat(scripts): okstra-convergence-stats.py baseline metrics helper
1420
+
1421
+ Reads convergence-<task-type>-<seq>.json and (optionally) team-state to
1422
+ aggregate dispatch count, wall-clock total, and worker token/cost for
1423
+ reverify dispatches (filter on agentName containing '-reverify-r'). Used to
1424
+ record before/after numbers for the P1 queue-pruning change. Emits JSON
1425
+ when --json is passed."
1426
+ ```
1427
+
1428
+ ---
1429
+
1430
+ ## Task 14: docs/kr/performance-improvement-plan-v2.md — Section 9 결론 갱신
1431
+
1432
+ **Files:**
1433
+ - Modify: `docs/kr/performance-improvement-plan-v2.md`
1434
+
1435
+ - [ ] **Step 1: Section 9 본문 갱신**
1436
+
1437
+ 기존 (line 318~326):
1438
+
1439
+ ```markdown
1440
+ ## 9. 이번 계획의 결론
1441
+
1442
+ 현재 작업 계획은 P1을 최우선으로 둔 방향은 맞지만, 기존 표현의 "7-phase lifecycle"과 "contested-only 2라운드"는 코드와 맞지 않았다. 개선된 계획은 다음처럼 재정렬한다.
1443
+
1444
+ 1. P0로 용어와 측정 기준을 고정한다.
1445
+ 2. P1에서 convergence queue pruning을 구현한다.
1446
+ 3. P3 fast-track과 P4 prompt caching은 별도 설계/검증이 필요한 후속 작업으로 둔다.
1447
+ 4. prepare render 병렬화와 token usage 증분화는 효과가 작거나 종료 단계 비용이므로 P1 이후 병렬 보조 작업으로 처리한다.
1448
+ ```
1449
+
1450
+ 다음으로 교체:
1451
+
1452
+ ```markdown
1453
+ ## 9. 이번 계획의 결론
1454
+
1455
+ 현재 작업 계획은 P1을 최우선으로 둔 방향은 맞지만, 기존 표현의 "7-phase lifecycle"과 "contested-only 2라운드"는 코드와 맞지 않았다. 개선된 계획은 다음처럼 재정렬한다.
1456
+
1457
+ 1. P0로 용어와 측정 기준을 고정한다.
1458
+ 2. P1에서 convergence queue pruning을 구현한다.
1459
+ 3. P3 fast-track과 P4 prompt caching은 별도 설계/검증이 필요한 후속 작업으로 둔다.
1460
+ 4. prepare render 병렬화와 token usage 증분화는 효과가 작거나 종료 단계 비용이므로 P1 이후 병렬 보조 작업으로 처리한다.
1461
+
1462
+ ### 구현 plan 링크
1463
+
1464
+ - P0 + P1: `docs/superpowers/plans/2026-05-14-convergence-queue-pruning.md`
1465
+ - P2 / P3 / P4 / P5 / P6: 미작성 (각 트랙별로 별도 plan 작성 필요)
1466
+ ```
1467
+
1468
+ - [ ] **Step 2: Commit**
1469
+
1470
+ ```bash
1471
+ git add docs/kr/performance-improvement-plan-v2.md
1472
+ git commit -m "docs(kr): link P0+P1 implementation plan from improvement-plan v2
1473
+
1474
+ Section 9 conclusion now points to docs/superpowers/plans/2026-05-14-
1475
+ convergence-queue-pruning.md as the concrete plan that operationalizes the
1476
+ P0 terminology cleanup and P1 queue-pruning changes."
1477
+ ```
1478
+
1479
+ ---
1480
+
1481
+ ## Final Verification
1482
+
1483
+ 전체 plan 적용 후 다음 명령으로 회귀 없음을 확인한다.
1484
+
1485
+ - [ ] **Step A: 전체 pytest 회귀**
1486
+
1487
+ Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest -q`
1488
+ Expected: 신규 추가 분(`test_convergence_state_contract.py` + `test_okstra_convergence_stats.py`) PASS, 기존 테스트 회귀 없음.
1489
+
1490
+ - [ ] **Step B: 모든 fixture JSON 유효성 일괄 확인**
1491
+
1492
+ Run:
1493
+ ```bash
1494
+ for f in /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence/*.json; do
1495
+ python3 -c "import json; json.load(open('$f'))" && echo "OK $f"
1496
+ done
1497
+ ```
1498
+ Expected: 3 줄 모두 `OK`.
1499
+
1500
+ - [ ] **Step C: Schema doc / fixture 정합성 spot-check**
1501
+
1502
+ Run: `grep -c "schemaVersion.*1.1" /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md`
1503
+ Expected: `>= 1`.
1504
+
1505
+ Run: `grep -c "round2SkippedReason" /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md /Volumes/Workspaces/workspace/projects/Okstra/agents/SKILL.md`
1506
+ Expected: 양쪽 파일에서 각각 `>= 1`.
1507
+
1508
+ - [ ] **Step D: Baseline 측정 한 번 실행 (사전 fixture에 대해)**
1509
+
1510
+ Run:
1511
+ ```bash
1512
+ python3 /Volumes/Workspaces/workspace/projects/Okstra/scripts/okstra-convergence-stats.py \
1513
+ --convergence-state /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence/mixed-round2.json
1514
+ ```
1515
+ Expected:
1516
+ ```
1517
+ taskKey : fixture/mixed-round2
1518
+ schemaVersion : 1.1
1519
+ effectiveMaxRounds : 2
1520
+ roundCount : 2
1521
+ dispatchCount : 5
1522
+ dispatchDurationMsTotal: 698988
1523
+ round2SkippedReason : not-skipped
1524
+ finalState : max-rounds-reached
1525
+ finalClassificationCounts: {'fullConsensus': 1, 'partialConsensus': 0, 'contested': 1, 'workerUnique': 0}
1526
+ ```
1527
+
1528
+ ---
1529
+
1530
+ ## Out of Scope (별도 plan 필요)
1531
+
1532
+ 본 plan은 의도적으로 다음을 포함하지 않는다 — `docs/kr/performance-improvement-plan-v2.md` Section 9의 후속 항목으로 처리한다.
1533
+
1534
+ - P2 (Prompt diet): worker definitions의 `[Required reading]` audience scope 축소. `agents/workers/_common.md` 추출은 install/packaging 호환성 검증이 선행 필요.
1535
+ - P3 (Fast-track routing): `requirements-discovery`가 `route=lite-implementation-planning` 등의 routing token을 남기는 설계. 승인 게이트 정책 결정 필요.
1536
+ - P4 (Prompt caching): Codex/Gemini wrapper에서 cache hint가 의미를 갖는지 spike 선행 필요.
1537
+ - P5 (Prepare render 병렬화): `scripts/okstra_ctl/run.py` instruction-set 독립 write 병렬화.
1538
+ - P6 (Token usage 증분화): `scripts/okstra_token_usage/` jsonl 선형 스캔 캐싱.
1539
+
1540
+ ---
1541
+
1542
+ ## Self-Review
1543
+
1544
+ **Spec coverage:** Section 7 P1 구현 체크리스트 6개 항목 매핑
1545
+
1546
+ 1. Round 1-N pseudocode를 queue pruning으로 수정 → Task 2
1547
+ 2. `contested`를 중간 상태로 쓰지 않음 → Task 1
1548
+ 3. convergence state artifact 신규 필드 8개 → Task 4 + Tasks 9~11(fixture로 검증) + Task 12(contract 강제)
1549
+ 4. report-writer가 round history와 final classification counts를 반영 → Tasks 6, 7, 8
1550
+ 5. 단순 early convergence / mixed unresolved fixture + contract test → Tasks 9~12
1551
+ 6. token usage collector로 dispatch/token/wall-clock 전후 기록 → Task 13
1552
+
1553
+ Section 5 P0 항목 매핑
1554
+
1555
+ - 문서가 task-type lifecycle과 lead 운영 단계를 혼동하지 않게 함 → Task 1 Step 2 ("Scope and Terminology" 블록)
1556
+ - convergence state 신규 필드 baseline 명시 → Task 4
1557
+
1558
+ **Placeholder scan:** 모든 step에 실제 markdown/JSON/Python 코드가 포함되어 있고 TBD/TODO/"add appropriate error handling" 류 표현 없음.
1559
+
1560
+ **Type consistency:**
1561
+ - `effectiveMaxRounds` (Tasks 4, 5, 9~12, 13): integer, 1..3, `config.` 하위 — 일관.
1562
+ - `round2SkippedReason` (Tasks 2, 4, 5, 7, 8, 9~12): top-level string enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped` — 일관.
1563
+ - `finalClassificationCounts` (Tasks 4, 7, 8, 9~12, 13): keys `fullConsensus | partialConsensus | contested | workerUnique` — 일관.
1564
+ - `roundHistory[].dispatches[]` shape `{worker, status, durationMs}` (Tasks 4, 9~12, 13) — 일관.
1565
+ - `roundHistory[].skippedWorkers[]` shape `{worker, reason}` 또는 dispatch 실패 시 `{worker, reason, terminalStatus}` (Tasks 4, 11, 12) — 일관 (terminalStatus는 optional).
1566
+ - Verdict enum `agree | disagree | supplement | verification-error` (Tasks 3, 11, 12) — 일관.
1567
+
1568
+ **Execution order constraint:** Task 12(contract test)는 Tasks 9~11(fixture 생성) 이후에 실행되어야 한다. Task 13(stats helper)은 Tasks 9~11에 의존(fixture 사용)하므로 9~11 이후. 그 외는 독립적이며 임의 순서 가능.