okstra 0.20.0 → 0.21.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/kr/architecture.md +1 -1
- package/docs/kr/performance-improvement-plan-v2.md +330 -0
- package/docs/kr/performance-improvement-plan.md +125 -0
- package/docs/project-structure-overview.md +386 -0
- package/docs/superpowers/plans/2026-05-14-convergence-queue-pruning.md +1568 -0
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +7 -1
- package/runtime/agents/workers/codex-worker.md +6 -4
- package/runtime/agents/workers/gemini-worker.md +6 -4
- package/runtime/agents/workers/report-writer-worker.md +4 -0
- package/runtime/bin/okstra-codex-exec.sh +36 -6
- package/runtime/bin/okstra-gemini-exec.sh +6 -8
- package/runtime/prompts/profiles/final-verification.md +8 -2
- package/runtime/prompts/profiles/implementation-planning.md +1 -1
- package/runtime/prompts/profiles/release-handoff.md +26 -28
- package/runtime/prompts/profiles/requirements-discovery.md +1 -1
- package/runtime/python/okstra_ctl/render.py +78 -4
- package/runtime/python/okstra_ctl/run.py +0 -6
- package/runtime/python/okstra_ctl/run_context.py +5 -0
- package/runtime/python/okstra_ctl/workflow.py +8 -7
- package/runtime/python/okstra_ctl/worktree.py +155 -15
- package/runtime/python/okstra_token_usage/blocks.py +0 -2
- package/runtime/python/okstra_token_usage/claude.py +0 -2
- package/runtime/skills/okstra-brief/SKILL.md +523 -0
- package/runtime/skills/okstra-convergence/SKILL.md +149 -37
- package/runtime/skills/okstra-report-writer/SKILL.md +8 -6
- package/runtime/templates/prd/brief.template.md +12 -0
- package/runtime/templates/project-docs/task-index.template.md +12 -0
- package/runtime/templates/reports/error-analysis-input.template.md +12 -0
- package/runtime/templates/reports/final-report.template.md +39 -12
- package/runtime/templates/reports/final-verification-input.template.md +22 -0
- package/runtime/templates/reports/implementation-input.template.md +12 -0
- package/runtime/templates/reports/implementation-planning-input.template.md +12 -0
- package/runtime/templates/reports/quick-input.template.md +12 -0
- package/runtime/templates/reports/release-handoff-input.template.md +23 -10
- package/runtime/templates/reports/schedule.template.md +12 -0
- package/runtime/templates/reports/settings.template.json +83 -30
- package/runtime/templates/reports/task-brief.template.md +12 -0
|
@@ -0,0 +1,1568 @@
|
|
|
1
|
+
# Convergence Queue Pruning (P0+P1) Implementation Plan
|
|
2
|
+
|
|
3
|
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
4
|
+
|
|
5
|
+
**Goal:** P0(용어/측정 기준 정리) + P1(convergence 재검증 queue pruning)을 구현하여 lead가 confirmed finding을 다음 라운드 prompt에 다시 넣지 않게 하고, Round 2 진입 조건과 worker-failure 처리를 문서·스키마·픽스처·계약 테스트로 고정한다.
|
|
6
|
+
|
|
7
|
+
**Architecture:** Convergence 동작은 lead(Claude)가 `skills/okstra-convergence/SKILL.md`를 따라 수행하는 문서 계약이다. 코드가 convergence state JSON을 읽거나 쓰지 않는다(`grep -r convergence- scripts/` 결과 없음). 따라서 변경 범위는 시드(seed) 문서 4개와 templates 1개, 검증용 fixture 3개, contract test 1개, baseline 계측 헬퍼 1개로 한정한다. Worker prompt 생성도 lead 책임이므로 의사코드(pseudocode)에서 verification queue를 "Round 1 이후에도 mixed/unresolved인 항목" 으로 좁히면 dispatch 수가 자동으로 줄어든다.
|
|
8
|
+
|
|
9
|
+
**Tech Stack:** Markdown 시드 문서, JSON Schema(hand-rolled), pytest, jq, Python 3.
|
|
10
|
+
|
|
11
|
+
> 참고: 본 plan은 `docs/kr/performance-improvement-plan-v2.md` Section 9의 결론 1(P0)·2(P1)만 다룬다. P2(prompt diet) / P3(fast-track) / P4(prompt caching) / P5(render 병렬화) / P6(token-usage 증분화)는 본 plan의 범위가 **아니다** — 별도 plan으로 작성한다.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## File Structure
|
|
16
|
+
|
|
17
|
+
| 경로 | 책임 | 변경 종류 |
|
|
18
|
+
|---|---|---|
|
|
19
|
+
| `skills/okstra-convergence/SKILL.md` | Phase 5.5 계약. 용어, Round 0/1/2 의사코드, queue pruning rule, worker-failure 처리, state schema v1.1 | Modify |
|
|
20
|
+
| `agents/SKILL.md` | Lifecycle phase 경계와 Phase 5.5 진입점 | Modify (좁은 범위, Phase 5.5 블록만) |
|
|
21
|
+
| `skills/okstra-report-writer/SKILL.md` | Phase 6 dispatch template, final-report 본문 구조에 round history 반영 | Modify |
|
|
22
|
+
| `agents/workers/report-writer-worker.md` | report-writer subagent의 required-reading + authoring contract에 round history 추가 | Modify |
|
|
23
|
+
| `templates/reports/final-report.template.md` | Section 1(Cross Verification Results) 하위에 round-history sub-section 추가 | Modify |
|
|
24
|
+
| `tests/fixtures/convergence/early-exit.json` | Round 1에서 queue가 비어 Round 2가 skipped된 케이스 | Create |
|
|
25
|
+
| `tests/fixtures/convergence/mixed-round2.json` | Round 1 후 unresolved queue가 있어 Round 2가 실행된 케이스 | Create |
|
|
26
|
+
| `tests/fixtures/convergence/reverify-all-failed.json` | Round 1의 모든 reverify dispatch가 terminal non-result여서 Round 2가 suppress된 케이스 | Create |
|
|
27
|
+
| `tests/test_convergence_state_contract.py` | 위 3개 fixture가 schema v1.1 contract를 만족하는지 검사 | Create |
|
|
28
|
+
| `scripts/okstra-convergence-stats.py` | team-state + convergence-state에서 dispatch count / wall-clock / worker token을 집계하는 baseline 계측 helper | Create |
|
|
29
|
+
| `docs/kr/performance-improvement-plan-v2.md` | Section 9 결론에 본 plan과 구현 상태 링크 추가 | Modify |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Convention Notes (모든 task 공통)
|
|
34
|
+
|
|
35
|
+
- 모든 새 문서/스키마 표현은 **end-user 시드 경로**에 배치한다 (개인 `.claude/` 가 아님). 본 plan의 모든 변경은 위 표의 경로에 들어간다.
|
|
36
|
+
- `effectiveMaxRounds`는 `task-manifest.json`의 `convergence.maxRounds` 가 비어 있을 때 lead가 phase-aware default(`requirements-discovery → 1`, otherwise → 2)로 해석한 후 state artifact에 기록하는 값이다. 이미 `agents/SKILL.md` Phase 5.5 문구에 존재하므로 본 plan은 그 값을 schema 필드로 승격하기만 한다.
|
|
37
|
+
- `round2SkippedReason` 의 enum 값: `queue-empty` | `max-rounds-1` | `all-reverify-non-result` | `not-skipped`.
|
|
38
|
+
- `finalClassificationCounts` 의 키: `fullConsensus`, `partialConsensus`, `contested`, `workerUnique`. 기존 `summary` 객체는 동일 키 구조이므로 별칭(alias)로 유지.
|
|
39
|
+
- `roundHistory[]`라는 기존 배열명은 유지하며 그 안의 필드를 확장한다(이름 변경은 lead가 따르는 mental model을 깨므로 보류).
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Task 1: Convergence SKILL — 용어 분리 + `contested` 중간 상태 제거
|
|
44
|
+
|
|
45
|
+
**Files:**
|
|
46
|
+
- Modify: `skills/okstra-convergence/SKILL.md`
|
|
47
|
+
- Test: `tests/test_convergence_state_contract.py` (Task 12에서 생성. 이 단계에서는 verify only)
|
|
48
|
+
|
|
49
|
+
- [ ] **Step 1: 파일 확인**
|
|
50
|
+
|
|
51
|
+
Run: `wc -l /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md`
|
|
52
|
+
Expected: 약 298 lines (existing).
|
|
53
|
+
|
|
54
|
+
- [ ] **Step 2: `# OKSTRA Convergence` 헤더 바로 아래에 "Scope and Terminology" 섹션 삽입**
|
|
55
|
+
|
|
56
|
+
`skills/okstra-convergence/SKILL.md` 의 `## When to Use` 직전 위치(line 7~9 사이)에 아래 블록을 삽입한다.
|
|
57
|
+
|
|
58
|
+
````markdown
|
|
59
|
+
## Scope and Terminology (BLOCKING)
|
|
60
|
+
|
|
61
|
+
This skill governs **Phase 5.5 (Convergence loop)** — a *lead operating phase* inside a single okstra run, not a task-type lifecycle phase. The 6 task-type lifecycle phases (`requirements-discovery` → `error-analysis` → `implementation-planning` → `implementation` → `final-verification` → `release-handoff`, see [agents/SKILL.md](../../SKILL.md) "Lifecycle Phase Boundaries") are unchanged by this skill. The lead operating phases (Phase 1 Intake → Phase 7 Persist, see [agents/SKILL.md](../../SKILL.md) "Quick Reference") describe how the lead drives a *single* task-type run.
|
|
62
|
+
|
|
63
|
+
**`contested` is a final classification only.** It is NEVER an intermediate queue label. The verification queue carries findings that are *unique to a single worker* (entered in Round 0) or *mixed/unresolved after a re-verification round* (carried forward). The `contested` label is assigned only when the **last executed round** completes and the queue is still non-empty.
|
|
64
|
+
|
|
65
|
+
When this skill says "queue" without qualifier, it means the *verification queue*: the set of findings that are still candidates for re-verification in subsequent rounds. The queue shrinks monotonically as findings get classified as `full-consensus`, `partial-consensus`, or `worker-unique`. Findings classified into any of these three categories MUST NOT appear in any subsequent round's reverify prompt, for any worker.
|
|
66
|
+
````
|
|
67
|
+
|
|
68
|
+
- [ ] **Step 3: Finding Category 표의 `contested` row description을 명시적 최종-라운드 조건으로 강화**
|
|
69
|
+
|
|
70
|
+
기존 (line 31 부근):
|
|
71
|
+
|
|
72
|
+
```markdown
|
|
73
|
+
| `contested` | No consensus reached even after max rounds; each worker's position is recorded | Required |
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
다음으로 교체:
|
|
77
|
+
|
|
78
|
+
```markdown
|
|
79
|
+
| `contested` | Final classification only. Assigned to a finding that remains in the verification queue after the **last executed round** completes (round index = `effectiveMaxRounds`). Each worker's position across all executed rounds is recorded. NEVER used as an intermediate label. | Required |
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
- [ ] **Step 4: Convergence Test 섹션의 final-classification 트리거를 last-executed-round로 명시**
|
|
83
|
+
|
|
84
|
+
기존 (line 81~84 부근):
|
|
85
|
+
|
|
86
|
+
```markdown
|
|
87
|
+
- If the validation queue is empty → Convergence complete (`converged`)
|
|
88
|
+
- Upon reaching the maximum number of rounds → Apply final classification to remaining unresolved findings:
|
|
89
|
+
- Majority agreement → `partial-consensus`
|
|
90
|
+
- Otherwise → `contested`
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
다음으로 교체:
|
|
94
|
+
|
|
95
|
+
```markdown
|
|
96
|
+
- If the verification queue is empty at the end of any round → Convergence complete (`finalState: "converged"`), remaining rounds are not executed
|
|
97
|
+
- Upon completing the **last executed round** (where round index == `effectiveMaxRounds`, OR where Round 2 was suppressed per the Round 2 gate below) → Apply final classification to remaining queue items:
|
|
98
|
+
- Majority agreement across executed rounds → `partial-consensus`
|
|
99
|
+
- Otherwise → `contested`
|
|
100
|
+
- The final classification step never runs while the queue is still being re-verified — confirmed items always exit the queue first.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
- [ ] **Step 5: Commit**
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
git add skills/okstra-convergence/SKILL.md
|
|
107
|
+
git commit -m "docs(convergence): scope terminology, contested as final-only label
|
|
108
|
+
|
|
109
|
+
Adds 'Scope and Terminology' BLOCKING section to disambiguate Phase 5.5 from
|
|
110
|
+
task-type lifecycle phases. Tightens 'contested' definition to a terminal
|
|
111
|
+
classification only — it never labels intermediate queue items. Aligns the
|
|
112
|
+
final-classification trigger to 'last executed round' so the queue-pruning
|
|
113
|
+
algorithm in subsequent tasks reads cleanly."
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## Task 2: Convergence SKILL — Round 0 / Round 1 / Optional Round 2 의사코드 재작성
|
|
119
|
+
|
|
120
|
+
**Files:**
|
|
121
|
+
- Modify: `skills/okstra-convergence/SKILL.md` (Convergence Algorithm section)
|
|
122
|
+
|
|
123
|
+
- [ ] **Step 1: Round 0 본문 보강 — queue 입력 규칙을 명시**
|
|
124
|
+
|
|
125
|
+
기존 `### Round 0: Parse worker results` 섹션의 항목 3 (line 48~52 부근) 끝에 아래 한 줄을 추가한다(기존 4단계 grouping 규칙은 유지).
|
|
126
|
+
|
|
127
|
+
```markdown
|
|
128
|
+
6. After grouping, the verification queue contains EXACTLY the `unique`-marked findings (Step 3 case "Only one worker confirms"). `full-consensus` findings reached in Step 3 are recorded immediately in the convergence state with `classification: "full-consensus"` and DO NOT enter the queue.
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
- [ ] **Step 2: Round 1-N 의사코드 교체**
|
|
132
|
+
|
|
133
|
+
기존 `### Round 1-N: Re-verification Loop` 코드 블록 (line 56~77) 전체를 다음으로 교체.
|
|
134
|
+
|
|
135
|
+
````markdown
|
|
136
|
+
### Round 1-N: Re-verification Loop (queue-pruned)
|
|
137
|
+
|
|
138
|
+
The verification queue holds only findings that are not yet classified. Confirmed items are *removed* from the queue and never re-sent.
|
|
139
|
+
|
|
140
|
+
```text
|
|
141
|
+
roundIndex = 0
|
|
142
|
+
WHILE roundIndex < effectiveMaxRounds AND queue is non-empty:
|
|
143
|
+
roundIndex += 1
|
|
144
|
+
|
|
145
|
+
# Round 2 gate (only evaluated when entering round 2 or higher)
|
|
146
|
+
IF roundIndex > 1 AND NOT round_gate_open(queue, last_round_dispatch_outcomes):
|
|
147
|
+
record round_skipped_reason in convergence state
|
|
148
|
+
BREAK
|
|
149
|
+
|
|
150
|
+
inputQueueSize = len(queue)
|
|
151
|
+
dispatches = []
|
|
152
|
+
skippedWorkers = []
|
|
153
|
+
|
|
154
|
+
FOR each analysis worker W (excluding report-writer-worker):
|
|
155
|
+
items_for_W = [f for f in queue if W != f.originWorker]
|
|
156
|
+
IF items_for_W is empty:
|
|
157
|
+
skippedWorkers.append({worker: W, reason: "no items to verify"})
|
|
158
|
+
CONTINUE
|
|
159
|
+
dispatch = send_reverify_request(W, items_for_W, roundIndex)
|
|
160
|
+
dispatches.append(dispatch)
|
|
161
|
+
|
|
162
|
+
IF all dispatches in this round are terminal non-result (timeout/error/no-result-file):
|
|
163
|
+
# Per "Worker failure handling in reverify" below — do NOT treat as DISAGREE.
|
|
164
|
+
record verification-error evidence on each finding in the queue for this round
|
|
165
|
+
record round_skipped_reason = "all-reverify-non-result" for any subsequent round
|
|
166
|
+
BREAK
|
|
167
|
+
|
|
168
|
+
resolvedCount = 0
|
|
169
|
+
carriedForwardCount = 0
|
|
170
|
+
|
|
171
|
+
FOR each finding F in queue (snapshot):
|
|
172
|
+
votes = aggregate_votes(F, dispatches) # AGREE / DISAGREE / SUPPLEMENT / verification-error
|
|
173
|
+
IF all non-error votes are AGREE or SUPPLEMENT:
|
|
174
|
+
F.classification = "full-consensus"
|
|
175
|
+
queue.remove(F); resolvedCount += 1
|
|
176
|
+
ELIF majority non-error votes are AGREE or SUPPLEMENT:
|
|
177
|
+
F.classification = "partial-consensus"
|
|
178
|
+
queue.remove(F); resolvedCount += 1
|
|
179
|
+
ELIF all non-error votes are DISAGREE:
|
|
180
|
+
F.classification = "worker-unique"
|
|
181
|
+
queue.remove(F); resolvedCount += 1
|
|
182
|
+
ELSE:
|
|
183
|
+
# mixed / insufficient non-error votes → carry forward
|
|
184
|
+
carriedForwardCount += 1
|
|
185
|
+
|
|
186
|
+
record roundHistory entry { round: roundIndex, inputQueueSize, resolvedCount,
|
|
187
|
+
carriedForwardCount, dispatches, skippedWorkers }
|
|
188
|
+
|
|
189
|
+
# Final classification — runs after the WHILE loop exits (queue empty OR roundIndex == effectiveMaxRounds OR Round 2 gate closed)
|
|
190
|
+
FOR each finding F still in queue:
|
|
191
|
+
IF majority AGREE-or-SUPPLEMENT across all executed rounds:
|
|
192
|
+
F.classification = "partial-consensus"
|
|
193
|
+
ELSE:
|
|
194
|
+
F.classification = "contested"
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
The lead MUST construct the per-worker reverify prompt body from `items_for_W` only — confirmed findings from earlier rounds MUST NOT appear in the prompt, even as background. The dispatch-prompt invariant (every worker gets the same prompt content modulo their own findings) continues to apply to the per-round prompt body.
|
|
198
|
+
````
|
|
199
|
+
|
|
200
|
+
- [ ] **Step 3: "Round 2 gate" 서브섹션 추가**
|
|
201
|
+
|
|
202
|
+
위에서 추가한 의사코드 바로 아래에 다음 sub-section을 삽입.
|
|
203
|
+
|
|
204
|
+
````markdown
|
|
205
|
+
#### Round 2 gate (`round_gate_open` predicate)
|
|
206
|
+
|
|
207
|
+
`round_gate_open(queue, last_round_dispatch_outcomes)` returns `true` iff ALL three conditions hold; otherwise the lead records `round2SkippedReason` and breaks out of the loop:
|
|
208
|
+
|
|
209
|
+
| Condition | Required value | `round2SkippedReason` if not met |
|
|
210
|
+
|---|---|---|
|
|
211
|
+
| `effectiveMaxRounds >= 2` | true | `"max-rounds-1"` |
|
|
212
|
+
| `len(queue) > 0` after round 1 | true | `"queue-empty"` |
|
|
213
|
+
| At least one round-1 reverify dispatch terminated as `completed` | true | `"all-reverify-non-result"` |
|
|
214
|
+
|
|
215
|
+
When all conditions hold the predicate returns `true` and `round2SkippedReason` is set to `"not-skipped"`. The field is mandatory on every convergence state artifact — write `"not-skipped"` rather than omitting the key.
|
|
216
|
+
````
|
|
217
|
+
|
|
218
|
+
- [ ] **Step 4: Commit**
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
git add skills/okstra-convergence/SKILL.md
|
|
222
|
+
git commit -m "docs(convergence): queue-pruned Round 0/1/2 algorithm with explicit gate
|
|
223
|
+
|
|
224
|
+
Replaces the Round 1-N pseudocode with a queue-pruning loop: confirmed items
|
|
225
|
+
exit the queue immediately and never enter the next round's reverify prompt.
|
|
226
|
+
Adds a Round 2 gate predicate with three explicit conditions
|
|
227
|
+
(effectiveMaxRounds>=2, queue non-empty, at least one round-1 completed
|
|
228
|
+
dispatch) and a mandatory round2SkippedReason artifact field with enum
|
|
229
|
+
{queue-empty, max-rounds-1, all-reverify-non-result, not-skipped}."
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## Task 3: Convergence SKILL — worker-failure 처리 명세
|
|
235
|
+
|
|
236
|
+
**Files:**
|
|
237
|
+
- Modify: `skills/okstra-convergence/SKILL.md` (new "Worker failure handling in reverify" subsection)
|
|
238
|
+
|
|
239
|
+
- [ ] **Step 1: 새 서브섹션을 "Round 2 gate" 바로 아래에 삽입**
|
|
240
|
+
|
|
241
|
+
````markdown
|
|
242
|
+
#### Worker failure handling in reverify (BLOCKING)
|
|
243
|
+
|
|
244
|
+
A reverify dispatch that returns a **terminal non-result** (`timeout`, `error`, no result file, or the wrapper records `cli-failure`) MUST NOT be aggregated as `DISAGREE`. Misclassifying a worker failure as DISAGREE biases the queue toward `contested`/`worker-unique` and produces meaningless final classifications.
|
|
245
|
+
|
|
246
|
+
Rules:
|
|
247
|
+
|
|
248
|
+
1. For each affected finding, append a `votes[W].verdict = "verification-error"` entry instead of `disagree`, plus the wrapper's captured exit reason in `votes[W].explanation`.
|
|
249
|
+
2. Record one event per failed dispatch via `python3 scripts/okstra-error-log.py append-observed --error-type cli-failure --agent <worker> ...` (the worker wrapper does this for Codex/Gemini; for Claude worker timeouts the lead does it).
|
|
250
|
+
3. Add an entry to the round's `skippedWorkers[]` with `{worker: <W>, reason: "dispatch-non-result", terminalStatus: <timeout|error|not-run>}`.
|
|
251
|
+
4. If **all** reverify dispatches in a round terminate as non-result, the round is treated as gate-closed: write `round2SkippedReason: "all-reverify-non-result"` (even if the round in question is round 1 — i.e. round 2 never runs because round 1 produced no usable votes), record one `contract-violation` event per non-result dispatch, and exit the WHILE loop.
|
|
252
|
+
5. Section 6 (Specialization Lens) of a worker output is OUT of convergence scope per "Convergence scope" above — its absence is NEVER a `verification-error`.
|
|
253
|
+
|
|
254
|
+
The final classifier (`FOR each finding F still in queue` block) treats `verification-error` as "no usable vote" — it counts neither toward AGREE nor toward DISAGREE.
|
|
255
|
+
````
|
|
256
|
+
|
|
257
|
+
- [ ] **Step 2: Commit**
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
git add skills/okstra-convergence/SKILL.md
|
|
261
|
+
git commit -m "docs(convergence): separate worker-failure from DISAGREE in reverify
|
|
262
|
+
|
|
263
|
+
Codifies a 'verification-error' verdict so a dispatch that timed out or
|
|
264
|
+
returned no result file does not get aggregated as DISAGREE. Adds a
|
|
265
|
+
skippedWorkers[] entry per non-result dispatch and forces
|
|
266
|
+
round2SkippedReason=all-reverify-non-result when every dispatch in a round
|
|
267
|
+
fails. The final classifier ignores verification-error votes when counting
|
|
268
|
+
majority."
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Task 4: Convergence SKILL — state artifact schema v1.1
|
|
274
|
+
|
|
275
|
+
**Files:**
|
|
276
|
+
- Modify: `skills/okstra-convergence/SKILL.md` (Convergence State Artifact section)
|
|
277
|
+
|
|
278
|
+
- [ ] **Step 1: 기존 JSON 예제 블록(line 232~283)을 v1.1 스키마로 교체**
|
|
279
|
+
|
|
280
|
+
````markdown
|
|
281
|
+
## Convergence State Artifact
|
|
282
|
+
|
|
283
|
+
Save it to `runs/<task-type>/state/convergence-<task-type>-<seq>.json`.
|
|
284
|
+
|
|
285
|
+
Schema version `1.1` extends `1.0` (legacy fields kept as aliases for backward-compat with already-shipped reports):
|
|
286
|
+
|
|
287
|
+
```json
|
|
288
|
+
{
|
|
289
|
+
"schemaVersion": "1.1",
|
|
290
|
+
"taskKey": "<task-key>",
|
|
291
|
+
"config": {
|
|
292
|
+
"enabled": true,
|
|
293
|
+
"maxRounds": 2,
|
|
294
|
+
"effectiveMaxRounds": 2,
|
|
295
|
+
"verificationMode": "lightweight"
|
|
296
|
+
},
|
|
297
|
+
"findings": [
|
|
298
|
+
{
|
|
299
|
+
"findingId": "F-001",
|
|
300
|
+
"summary": "<one-line summary>",
|
|
301
|
+
"category": "<bug|risk|missing|observation|...>",
|
|
302
|
+
"ticketIds": ["TICKET-123"],
|
|
303
|
+
"originWorker": "claude-worker",
|
|
304
|
+
"originEvidence": "<evidence text>",
|
|
305
|
+
"classification": "full-consensus",
|
|
306
|
+
"rounds": [
|
|
307
|
+
{
|
|
308
|
+
"round": 1,
|
|
309
|
+
"votes": {
|
|
310
|
+
"codex-worker": { "verdict": "agree", "explanation": "<brief>" },
|
|
311
|
+
"gemini-worker": { "verdict": "supplement", "explanation": "<brief>" }
|
|
312
|
+
}
|
|
313
|
+
}
|
|
314
|
+
],
|
|
315
|
+
"consensusWorkers": ["claude-worker", "codex-worker", "gemini-worker"],
|
|
316
|
+
"dissentingWorkers": []
|
|
317
|
+
}
|
|
318
|
+
],
|
|
319
|
+
"roundHistory": [
|
|
320
|
+
{
|
|
321
|
+
"round": 1,
|
|
322
|
+
"inputQueueSize": 3,
|
|
323
|
+
"resolvedCount": 2,
|
|
324
|
+
"carriedForwardCount": 1,
|
|
325
|
+
"dispatches": [
|
|
326
|
+
{ "worker": "codex-worker", "status": "completed", "durationMs": 184221 },
|
|
327
|
+
{ "worker": "gemini-worker", "status": "completed", "durationMs": 201337 }
|
|
328
|
+
],
|
|
329
|
+
"skippedWorkers": [
|
|
330
|
+
{ "worker": "claude-worker", "reason": "no items to verify" }
|
|
331
|
+
],
|
|
332
|
+
"verificationsRequested": 2,
|
|
333
|
+
"verificationsCompleted": 2,
|
|
334
|
+
"newConsensus": 2,
|
|
335
|
+
"remainingInQueue": 1,
|
|
336
|
+
"earlyExit": false
|
|
337
|
+
}
|
|
338
|
+
],
|
|
339
|
+
"round2SkippedReason": "not-skipped",
|
|
340
|
+
"finalState": "converged",
|
|
341
|
+
"totalRounds": 2,
|
|
342
|
+
"finalClassificationCounts": {
|
|
343
|
+
"fullConsensus": 5,
|
|
344
|
+
"partialConsensus": 1,
|
|
345
|
+
"contested": 0,
|
|
346
|
+
"workerUnique": 1
|
|
347
|
+
},
|
|
348
|
+
"summary": {
|
|
349
|
+
"fullConsensus": 5,
|
|
350
|
+
"partialConsensus": 1,
|
|
351
|
+
"contested": 0,
|
|
352
|
+
"workerUnique": 1
|
|
353
|
+
}
|
|
354
|
+
}
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
Schema rules:
|
|
358
|
+
|
|
359
|
+
- `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
|
|
360
|
+
- `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
|
|
361
|
+
- `roundHistory[].inputQueueSize`: queue size at the start of this round.
|
|
362
|
+
- `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
|
|
363
|
+
- `roundHistory[].carriedForwardCount`: queue size at the END of this round (must equal `inputQueueSize - resolvedCount` when there are no in-round queue insertions; in-round insertions are forbidden).
|
|
364
|
+
- `roundHistory[].dispatches[]`: one entry per worker that was actually dispatched in this round. `status ∈ {completed, timeout, error, not-run}`.
|
|
365
|
+
- `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
|
|
366
|
+
- `roundHistory[].verificationsRequested|verificationsCompleted|newConsensus|remainingInQueue|earlyExit`: legacy v1.0 aliases. New runs SHOULD populate them so existing parsers keep working: `verificationsRequested == len(dispatches)`, `verificationsCompleted == len(d for d in dispatches if d.status == "completed")`, `newConsensus == resolvedCount`, `remainingInQueue == carriedForwardCount`, `earlyExit == (round < effectiveMaxRounds AND carriedForwardCount == 0)`.
|
|
367
|
+
- `round2SkippedReason`: literal enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped`. Always present (use `"not-skipped"` when Round 2 ran or wasn't reached for the loop-exit-not-skip reason of `effectiveMaxRounds == 1`. For the `effectiveMaxRounds == 1` case the value is `"max-rounds-1"`).
|
|
368
|
+
- `finalClassificationCounts`: post-loop counts. New required field — must equal `summary` 1:1. `summary` is retained as the v1.0 alias.
|
|
369
|
+
- `finalState ∈ {converged, max-rounds-reached, aborted-non-result}`. `aborted-non-result` is the new value for the case all reverify dispatches in a round fail.
|
|
370
|
+
- `totalRounds`: count of rounds actually executed (not `effectiveMaxRounds`). May be `0` when Round 0 produced no queue items (all findings reached consensus during grouping).
|
|
371
|
+
````
|
|
372
|
+
|
|
373
|
+
- [ ] **Step 2: Commit**
|
|
374
|
+
|
|
375
|
+
```bash
|
|
376
|
+
git add skills/okstra-convergence/SKILL.md
|
|
377
|
+
git commit -m "docs(convergence): schema v1.1 with effectiveMaxRounds and skip reason
|
|
378
|
+
|
|
379
|
+
Adds inputQueueSize, resolvedCount, carriedForwardCount, dispatches[], and
|
|
380
|
+
skippedWorkers[] per round; finalClassificationCounts, round2SkippedReason,
|
|
381
|
+
and config.effectiveMaxRounds at the top level. v1.0 fields are kept as
|
|
382
|
+
aliases so existing report parsers keep working until they migrate."
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## Task 5: agents/SKILL.md — Phase 5.5 설명 정합
|
|
388
|
+
|
|
389
|
+
**Files:**
|
|
390
|
+
- Modify: `agents/SKILL.md` (Phase 5.5 section at line 198~212 area)
|
|
391
|
+
|
|
392
|
+
- [ ] **Step 1: Phase 5.5 본문 보강**
|
|
393
|
+
|
|
394
|
+
기존 (line 200~210 부근, "Convergence is enabled by default..." 블록)을 다음으로 교체:
|
|
395
|
+
|
|
396
|
+
````markdown
|
|
397
|
+
Convergence is enabled by default. Configure via task-manifest.json:
|
|
398
|
+
|
|
399
|
+
- `convergence.enabled`: true/false (default: true)
|
|
400
|
+
- `convergence.maxRounds`: 1–3 — **phase-aware default**: `1` for `requirements-discovery`, `2` for all other task types
|
|
401
|
+
- `convergence.verificationMode`: `"lightweight"` | `"full-reanalysis"` (default: `"lightweight"`)
|
|
402
|
+
|
|
403
|
+
When `task-manifest.json` does not set `convergence.maxRounds`, lead MUST resolve the effective value via the phase-aware default above before entering Phase 5.5, and record the resolved value in the convergence state artifact at `config.effectiveMaxRounds`.
|
|
404
|
+
|
|
405
|
+
**Round 2 is gated, not unconditional.** Even when `effectiveMaxRounds == 2`, Round 2 runs only when (a) the verification queue is non-empty after Round 1, AND (b) at least one Round 1 reverify dispatch terminated as `completed`. Otherwise lead writes `round2SkippedReason` to the convergence state and proceeds to final classification. See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Round 2 gate" for the predicate.
|
|
406
|
+
|
|
407
|
+
**Confirmed findings are pruned from the queue.** Findings classified as `full-consensus`, `partial-consensus`, or `worker-unique` MUST NOT appear in any subsequent round's reverify prompt for any worker. `contested` is a final classification assigned only when the last executed round completes and the queue is still non-empty — it is NEVER an intermediate queue label.
|
|
408
|
+
|
|
409
|
+
If any re-verification batch yields a `verification-error` terminal status, or a worker result fails the contract, Lead MUST record one event per violation via `python3 scripts/okstra-error-log.py append-observed --error-type contract-violation --agent <offending-agent> ...`. Use `agent: "claude-lead"` only when the violation is detected internally without a specific worker.
|
|
410
|
+
|
|
411
|
+
If convergence is disabled, proceed directly to Phase 6 with the raw worker results.
|
|
412
|
+
````
|
|
413
|
+
|
|
414
|
+
- [ ] **Step 2: "Common Mistakes" 표(line 264~286 부근)에 신규 행 2개 추가**
|
|
415
|
+
|
|
416
|
+
기존 표의 마지막 행(`Skipping --substitute-final-report ...`) 바로 위에 다음 두 행을 추가한다.
|
|
417
|
+
|
|
418
|
+
```markdown
|
|
419
|
+
| Re-sending confirmed findings (`full-consensus`/`partial-consensus`/`worker-unique`) to a worker in Round 2 | Queue pruning rule — see [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Round 1-N: Re-verification Loop (queue-pruned)" |
|
|
420
|
+
| Aggregating a `timeout`/`error` reverify dispatch as `DISAGREE` | Worker failure handling — record as `verification-error` and add to `skippedWorkers[]`. See [okstra-convergence](./skills/okstra-convergence/SKILL.md) "Worker failure handling in reverify" |
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
- [ ] **Step 3: Commit**
|
|
424
|
+
|
|
425
|
+
```bash
|
|
426
|
+
git add agents/SKILL.md
|
|
427
|
+
git commit -m "docs(agents): Phase 5.5 Round 2 gate and queue-pruning callouts
|
|
428
|
+
|
|
429
|
+
Spells out the Round 2 gate conditions (effectiveMaxRounds>=2, queue
|
|
430
|
+
non-empty, at least one round-1 completed dispatch), records the
|
|
431
|
+
queue-pruning invariant, and adds two new Common Mistakes rows for
|
|
432
|
+
re-sending confirmed findings and mis-aggregating worker-failure dispatches."
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
---
|
|
436
|
+
|
|
437
|
+
## Task 6: report-writer SKILL — round history & skipped-reason 반영
|
|
438
|
+
|
|
439
|
+
**Files:**
|
|
440
|
+
- Modify: `skills/okstra-report-writer/SKILL.md` (Phase 6 dispatch template + Main Body Section)
|
|
441
|
+
|
|
442
|
+
- [ ] **Step 1: Phase 6 dispatch template 항목 9 보강**
|
|
443
|
+
|
|
444
|
+
기존 (line 49 부근):
|
|
445
|
+
|
|
446
|
+
```markdown
|
|
447
|
+
9. The convergence classifications (Full/Partial/Contested/Worker-Unique) and pointers to all worker result files under `worker-results/`.
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
다음으로 교체:
|
|
451
|
+
|
|
452
|
+
```markdown
|
|
453
|
+
9. The convergence classifications (Full/Partial/Contested/Worker-Unique), the round history table (`roundHistory[]`), the `round2SkippedReason` value, and pointers to all worker result files under `worker-results/`. The report-writer worker must reproduce a Round History sub-table in Section 1 of the final report so the reader can see which rounds executed, queue sizes, and why Round 2 was (or was not) skipped.
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
- [ ] **Step 2: Main Body Section 2 ("Cross Verification Results") description 보강**
|
|
457
|
+
|
|
458
|
+
기존 (line 228~233 부근):
|
|
459
|
+
|
|
460
|
+
```markdown
|
|
461
|
+
2. **Cross Verification Results** (Use 4 categories when convergence is enabled, per `okstra-convergence`)
|
|
462
|
+
- Full Consensus: Findings agreed upon by all workers
|
|
463
|
+
- Partial Consensus: Agreed upon by a majority of workers; dissenting opinions are specified
|
|
464
|
+
- Contested: No consensus after max rounds; each worker's position specified
|
|
465
|
+
- Worker-Unique: Verified only by the discoverer; verification history specified
|
|
466
|
+
- In runs with convergence disabled, maintain the existing Consensus/Differences format
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
다음으로 교체:
|
|
470
|
+
|
|
471
|
+
```markdown
|
|
472
|
+
2. **Cross Verification Results** (Use 4 categories when convergence is enabled, per `okstra-convergence`)
|
|
473
|
+
- Round History sub-table (convergence-enabled runs only): one row per executed round with columns `Round | inputQueueSize | resolvedCount | carriedForwardCount | dispatches (worker:status:durationMs) | skippedWorkers (worker:reason)`. Add a one-line note immediately under the table with `round2SkippedReason: <value>` (always present, even when `"not-skipped"`). Pull all values verbatim from `convergence-<task-type>-<seq>.json`.
|
|
474
|
+
- Full Consensus: Findings agreed upon by all workers
|
|
475
|
+
- Partial Consensus: Agreed upon by a majority of workers; dissenting opinions are specified
|
|
476
|
+
- Contested: No consensus after the last executed round; each worker's position specified. Empty contested list is shown as the literal line `- 합의 미달 항목 없음.`
|
|
477
|
+
- Worker-Unique: Verified only by the discoverer; verification history specified
|
|
478
|
+
- In runs with convergence disabled, maintain the existing Consensus/Differences format and omit the Round History sub-table.
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
- [ ] **Step 3: Writing Guidelines 항목 강화**
|
|
482
|
+
|
|
483
|
+
기존 "Include the convergence round history and a summary of votes by worker for each finding" 라인을 다음 두 줄로 교체:
|
|
484
|
+
|
|
485
|
+
```markdown
|
|
486
|
+
- Include the convergence round history sub-table (Section 1) so the reader can audit which rounds executed and why Round 2 was skipped. Pull values verbatim from `convergence-<task-type>-<seq>.json`; do NOT recompute.
|
|
487
|
+
- For each finding, include a brief summary of votes per worker across executed rounds. `verification-error` votes are listed as such — never as `DISAGREE`.
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
- [ ] **Step 4: Commit**
|
|
491
|
+
|
|
492
|
+
```bash
|
|
493
|
+
git add skills/okstra-report-writer/SKILL.md
|
|
494
|
+
git commit -m "docs(report-writer): require Round History sub-table in final report
|
|
495
|
+
|
|
496
|
+
The report-writer prompt template now demands a Round History sub-table with
|
|
497
|
+
queue sizes, dispatches, and skippedWorkers, plus an explicit
|
|
498
|
+
round2SkippedReason line. Writing guidelines mandate verbatim copy from the
|
|
499
|
+
convergence state and distinguish verification-error votes from DISAGREE."
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
---
|
|
503
|
+
|
|
504
|
+
## Task 7: report-writer-worker agent — required-reading + authoring contract
|
|
505
|
+
|
|
506
|
+
**Files:**
|
|
507
|
+
- Modify: `agents/workers/report-writer-worker.md`
|
|
508
|
+
|
|
509
|
+
- [ ] **Step 1: "Required Reading Before Authoring" 섹션에 명시적 의무 추가**
|
|
510
|
+
|
|
511
|
+
기존 (line 44~52 부근의 첫 단락 다음)에 아래 bullet 추가:
|
|
512
|
+
|
|
513
|
+
```markdown
|
|
514
|
+
- When the convergence-state file is present, read it fully and reproduce the `roundHistory[]` array, `round2SkippedReason`, and `finalClassificationCounts` in the final report's Section 1 Round History sub-table. Do not derive these values from worker results alone — they live in `state/convergence-<task-type>-<seq>.json`.
|
|
515
|
+
```
|
|
516
|
+
|
|
517
|
+
- [ ] **Step 2: Authoring Contract Hard rules 보강**
|
|
518
|
+
|
|
519
|
+
기존 "Include all four convergence categories (Full Consensus, Partial Consensus, Contested, Worker-Unique)..." 행을 그대로 두고, 그 다음 줄에 추가:
|
|
520
|
+
|
|
521
|
+
```markdown
|
|
522
|
+
- Include a Round History sub-table in Section 1 (one row per executed round) and a `round2SkippedReason` line below it. When convergence is disabled, omit both. The values are quoted verbatim from `state/convergence-<task-type>-<seq>.json` — do not recompute.
|
|
523
|
+
- Treat `verification-error` votes as their own verdict. They are listed in vote summaries as `verification-error`, not folded into AGREE/DISAGREE counts.
|
|
524
|
+
```
|
|
525
|
+
|
|
526
|
+
- [ ] **Step 3: Notes 섹션 보강**
|
|
527
|
+
|
|
528
|
+
기존 "If the analysis workers disagree and convergence ended with `Contested` items..." 행 바로 아래에 추가:
|
|
529
|
+
|
|
530
|
+
```markdown
|
|
531
|
+
- `Contested` is a final-only classification. If you see findings labeled `Contested` in the convergence state, the lead has already exhausted re-verification — do not invent a synthesizing answer; surface each worker's position verbatim.
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
- [ ] **Step 4: Commit**
|
|
535
|
+
|
|
536
|
+
```bash
|
|
537
|
+
git add agents/workers/report-writer-worker.md
|
|
538
|
+
git commit -m "docs(report-writer-worker): require Round History reproduction
|
|
539
|
+
|
|
540
|
+
Forces the report-writer subagent to read the convergence state file
|
|
541
|
+
end-to-end, reproduce roundHistory[]/round2SkippedReason/finalClassificationCounts
|
|
542
|
+
in Section 1, and treat verification-error as its own verdict (not DISAGREE)."
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
---
|
|
546
|
+
|
|
547
|
+
## Task 8: final-report 템플릿에 Round History sub-section 추가
|
|
548
|
+
|
|
549
|
+
**Files:**
|
|
550
|
+
- Modify: `templates/reports/final-report.template.md` (Section 1 Cross Verification Results 영역)
|
|
551
|
+
|
|
552
|
+
- [ ] **Step 1: 현재 Section 1 구조 확인**
|
|
553
|
+
|
|
554
|
+
Run: `grep -n "^## 1\|^### 1\." /Volumes/Workspaces/workspace/projects/Okstra/templates/reports/final-report.template.md`
|
|
555
|
+
Expected: lines `## 1. Cross Verification Results`, `### 1.1 Consensus`, `### 1.2 Differences`.
|
|
556
|
+
|
|
557
|
+
- [ ] **Step 2: `## 1. Cross Verification Results` 헤더 바로 아래(즉 `### 1.1 Consensus` 위)에 신규 `### 1.0 Round History (convergence-enabled runs only)` 서브섹션을 삽입**
|
|
558
|
+
|
|
559
|
+
```markdown
|
|
560
|
+
### 1.0 Round History (convergence-enabled runs only)
|
|
561
|
+
|
|
562
|
+
`state/convergence-<task-type>-<seq>.json` 의 값을 그대로 옮긴다. convergence가 비활성화된 run에서는 이 섹션 전체를 삭제한다.
|
|
563
|
+
|
|
564
|
+
| Round | inputQueueSize | resolvedCount | carriedForwardCount | dispatches (worker:status:durationMs) | skippedWorkers (worker:reason) |
|
|
565
|
+
|-------|----------------|---------------|----------------------|----------------------------------------|---------------------------------|
|
|
566
|
+
| 1 | 3 | 2 | 1 | codex-worker:completed:184221, gemini-worker:completed:201337 | claude-worker:no-items |
|
|
567
|
+
| 2 | 1 | 1 | 0 | claude-worker:completed:92110 | -- |
|
|
568
|
+
|
|
569
|
+
- `round2SkippedReason`: `not-skipped` ← 값은 `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped` 중 하나.
|
|
570
|
+
- 실행된 round 수가 0 (Round 0에서 모든 finding이 곧장 full-consensus 가 된 경우) 이면 표 대신 한 줄로 적는다 — `- Round 0 grouping에서 모든 finding이 합의되어 재검증 라운드가 실행되지 않았습니다.`
|
|
571
|
+
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
- [ ] **Step 3: Commit**
|
|
575
|
+
|
|
576
|
+
```bash
|
|
577
|
+
git add templates/reports/final-report.template.md
|
|
578
|
+
git commit -m "docs(template): Section 1.0 Round History sub-table for convergence runs
|
|
579
|
+
|
|
580
|
+
Adds a 1.0 Round History table with queue sizes, dispatches, and
|
|
581
|
+
skippedWorkers, plus an explicit round2SkippedReason line. The block is
|
|
582
|
+
omitted when convergence is disabled. Aligns the template with the
|
|
583
|
+
queue-pruning algorithm contract."
|
|
584
|
+
```
|
|
585
|
+
|
|
586
|
+
---
|
|
587
|
+
|
|
588
|
+
## Task 9: Fixture 1 — early convergence (Round 1 exit)
|
|
589
|
+
|
|
590
|
+
**Files:**
|
|
591
|
+
- Create: `tests/fixtures/convergence/early-exit.json`
|
|
592
|
+
|
|
593
|
+
- [ ] **Step 1: 디렉터리 준비**
|
|
594
|
+
|
|
595
|
+
```bash
|
|
596
|
+
mkdir -p /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence
|
|
597
|
+
```
|
|
598
|
+
|
|
599
|
+
- [ ] **Step 2: 파일 작성**
|
|
600
|
+
|
|
601
|
+
`tests/fixtures/convergence/early-exit.json`:
|
|
602
|
+
|
|
603
|
+
```json
|
|
604
|
+
{
|
|
605
|
+
"schemaVersion": "1.1",
|
|
606
|
+
"taskKey": "fixture/early-exit",
|
|
607
|
+
"config": {
|
|
608
|
+
"enabled": true,
|
|
609
|
+
"maxRounds": 2,
|
|
610
|
+
"effectiveMaxRounds": 2,
|
|
611
|
+
"verificationMode": "lightweight"
|
|
612
|
+
},
|
|
613
|
+
"findings": [
|
|
614
|
+
{
|
|
615
|
+
"findingId": "F-001",
|
|
616
|
+
"summary": "Missing input validation on /api/login",
|
|
617
|
+
"category": "bug",
|
|
618
|
+
"ticketIds": ["EX-100"],
|
|
619
|
+
"originWorker": "codex-worker",
|
|
620
|
+
"originEvidence": "src/auth/login.ts:42",
|
|
621
|
+
"classification": "full-consensus",
|
|
622
|
+
"rounds": [
|
|
623
|
+
{
|
|
624
|
+
"round": 1,
|
|
625
|
+
"votes": {
|
|
626
|
+
"claude-worker": {"verdict": "agree", "explanation": "confirmed"},
|
|
627
|
+
"gemini-worker": {"verdict": "supplement", "explanation": "also missing CSRF check"}
|
|
628
|
+
}
|
|
629
|
+
}
|
|
630
|
+
],
|
|
631
|
+
"consensusWorkers": ["codex-worker", "claude-worker", "gemini-worker"],
|
|
632
|
+
"dissentingWorkers": []
|
|
633
|
+
},
|
|
634
|
+
{
|
|
635
|
+
"findingId": "F-002",
|
|
636
|
+
"summary": "Stale dependency lockfile",
|
|
637
|
+
"category": "risk",
|
|
638
|
+
"ticketIds": ["EX-101"],
|
|
639
|
+
"originWorker": "claude-worker",
|
|
640
|
+
"originEvidence": "package-lock.json:1",
|
|
641
|
+
"classification": "full-consensus",
|
|
642
|
+
"rounds": [
|
|
643
|
+
{
|
|
644
|
+
"round": 1,
|
|
645
|
+
"votes": {
|
|
646
|
+
"codex-worker": {"verdict": "agree", "explanation": "confirmed"},
|
|
647
|
+
"gemini-worker": {"verdict": "agree", "explanation": "confirmed"}
|
|
648
|
+
}
|
|
649
|
+
}
|
|
650
|
+
],
|
|
651
|
+
"consensusWorkers": ["claude-worker", "codex-worker", "gemini-worker"],
|
|
652
|
+
"dissentingWorkers": []
|
|
653
|
+
}
|
|
654
|
+
],
|
|
655
|
+
"roundHistory": [
|
|
656
|
+
{
|
|
657
|
+
"round": 1,
|
|
658
|
+
"inputQueueSize": 2,
|
|
659
|
+
"resolvedCount": 2,
|
|
660
|
+
"carriedForwardCount": 0,
|
|
661
|
+
"dispatches": [
|
|
662
|
+
{"worker": "claude-worker", "status": "completed", "durationMs": 92110},
|
|
663
|
+
{"worker": "codex-worker", "status": "completed", "durationMs": 184221},
|
|
664
|
+
{"worker": "gemini-worker", "status": "completed", "durationMs": 201337}
|
|
665
|
+
],
|
|
666
|
+
"skippedWorkers": [],
|
|
667
|
+
"verificationsRequested": 3,
|
|
668
|
+
"verificationsCompleted": 3,
|
|
669
|
+
"newConsensus": 2,
|
|
670
|
+
"remainingInQueue": 0,
|
|
671
|
+
"earlyExit": true
|
|
672
|
+
}
|
|
673
|
+
],
|
|
674
|
+
"round2SkippedReason": "queue-empty",
|
|
675
|
+
"finalState": "converged",
|
|
676
|
+
"totalRounds": 1,
|
|
677
|
+
"finalClassificationCounts": {
|
|
678
|
+
"fullConsensus": 2,
|
|
679
|
+
"partialConsensus": 0,
|
|
680
|
+
"contested": 0,
|
|
681
|
+
"workerUnique": 0
|
|
682
|
+
},
|
|
683
|
+
"summary": {
|
|
684
|
+
"fullConsensus": 2,
|
|
685
|
+
"partialConsensus": 0,
|
|
686
|
+
"contested": 0,
|
|
687
|
+
"workerUnique": 0
|
|
688
|
+
}
|
|
689
|
+
}
|
|
690
|
+
```
|
|
691
|
+
|
|
692
|
+
- [ ] **Step 3: JSON 유효성 즉시 검증**
|
|
693
|
+
|
|
694
|
+
Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/early-exit.json'))"`
|
|
695
|
+
Expected: 종료 코드 0, 출력 없음.
|
|
696
|
+
|
|
697
|
+
- [ ] **Step 4: Commit**
|
|
698
|
+
|
|
699
|
+
```bash
|
|
700
|
+
git add tests/fixtures/convergence/early-exit.json
|
|
701
|
+
git commit -m "test(convergence): fixture for Round 1 early-exit (queue-empty)
|
|
702
|
+
|
|
703
|
+
Two findings reach full-consensus in Round 1; queue empties so Round 2 is
|
|
704
|
+
skipped with round2SkippedReason=queue-empty. effectiveMaxRounds=2,
|
|
705
|
+
totalRounds=1."
|
|
706
|
+
```
|
|
707
|
+
|
|
708
|
+
---
|
|
709
|
+
|
|
710
|
+
## Task 10: Fixture 2 — mixed/unresolved Round 2
|
|
711
|
+
|
|
712
|
+
**Files:**
|
|
713
|
+
- Create: `tests/fixtures/convergence/mixed-round2.json`
|
|
714
|
+
|
|
715
|
+
- [ ] **Step 1: 파일 작성**
|
|
716
|
+
|
|
717
|
+
`tests/fixtures/convergence/mixed-round2.json`:
|
|
718
|
+
|
|
719
|
+
```json
|
|
720
|
+
{
|
|
721
|
+
"schemaVersion": "1.1",
|
|
722
|
+
"taskKey": "fixture/mixed-round2",
|
|
723
|
+
"config": {
|
|
724
|
+
"enabled": true,
|
|
725
|
+
"maxRounds": 2,
|
|
726
|
+
"effectiveMaxRounds": 2,
|
|
727
|
+
"verificationMode": "lightweight"
|
|
728
|
+
},
|
|
729
|
+
"findings": [
|
|
730
|
+
{
|
|
731
|
+
"findingId": "F-010",
|
|
732
|
+
"summary": "Race condition in session refresh",
|
|
733
|
+
"category": "bug",
|
|
734
|
+
"ticketIds": ["EX-200"],
|
|
735
|
+
"originWorker": "codex-worker",
|
|
736
|
+
"originEvidence": "src/session/refresh.ts:88",
|
|
737
|
+
"classification": "full-consensus",
|
|
738
|
+
"rounds": [
|
|
739
|
+
{
|
|
740
|
+
"round": 1,
|
|
741
|
+
"votes": {
|
|
742
|
+
"claude-worker": {"verdict": "agree", "explanation": "confirmed"},
|
|
743
|
+
"gemini-worker": {"verdict": "supplement", "explanation": "additional repro path"}
|
|
744
|
+
}
|
|
745
|
+
}
|
|
746
|
+
],
|
|
747
|
+
"consensusWorkers": ["codex-worker", "claude-worker", "gemini-worker"],
|
|
748
|
+
"dissentingWorkers": []
|
|
749
|
+
},
|
|
750
|
+
{
|
|
751
|
+
"findingId": "F-011",
|
|
752
|
+
"summary": "Inconsistent error message on 401",
|
|
753
|
+
"category": "observation",
|
|
754
|
+
"ticketIds": ["EX-201"],
|
|
755
|
+
"originWorker": "claude-worker",
|
|
756
|
+
"originEvidence": "src/api/auth.ts:120",
|
|
757
|
+
"classification": "contested",
|
|
758
|
+
"rounds": [
|
|
759
|
+
{
|
|
760
|
+
"round": 1,
|
|
761
|
+
"votes": {
|
|
762
|
+
"codex-worker": {"verdict": "agree", "explanation": "confirmed"},
|
|
763
|
+
"gemini-worker": {"verdict": "disagree", "explanation": "expected behavior per spec"}
|
|
764
|
+
}
|
|
765
|
+
},
|
|
766
|
+
{
|
|
767
|
+
"round": 2,
|
|
768
|
+
"votes": {
|
|
769
|
+
"codex-worker": {"verdict": "agree", "explanation": "still confirmed"},
|
|
770
|
+
"gemini-worker": {"verdict": "disagree", "explanation": "see RFC 7235"}
|
|
771
|
+
}
|
|
772
|
+
}
|
|
773
|
+
],
|
|
774
|
+
"consensusWorkers": ["claude-worker", "codex-worker"],
|
|
775
|
+
"dissentingWorkers": ["gemini-worker"]
|
|
776
|
+
}
|
|
777
|
+
],
|
|
778
|
+
"roundHistory": [
|
|
779
|
+
{
|
|
780
|
+
"round": 1,
|
|
781
|
+
"inputQueueSize": 2,
|
|
782
|
+
"resolvedCount": 1,
|
|
783
|
+
"carriedForwardCount": 1,
|
|
784
|
+
"dispatches": [
|
|
785
|
+
{"worker": "claude-worker", "status": "completed", "durationMs": 88012},
|
|
786
|
+
{"worker": "codex-worker", "status": "completed", "durationMs": 175044},
|
|
787
|
+
{"worker": "gemini-worker", "status": "completed", "durationMs": 199820}
|
|
788
|
+
],
|
|
789
|
+
"skippedWorkers": [],
|
|
790
|
+
"verificationsRequested": 3,
|
|
791
|
+
"verificationsCompleted": 3,
|
|
792
|
+
"newConsensus": 1,
|
|
793
|
+
"remainingInQueue": 1,
|
|
794
|
+
"earlyExit": false
|
|
795
|
+
},
|
|
796
|
+
{
|
|
797
|
+
"round": 2,
|
|
798
|
+
"inputQueueSize": 1,
|
|
799
|
+
"resolvedCount": 0,
|
|
800
|
+
"carriedForwardCount": 1,
|
|
801
|
+
"dispatches": [
|
|
802
|
+
{"worker": "codex-worker", "status": "completed", "durationMs": 110002},
|
|
803
|
+
{"worker": "gemini-worker", "status": "completed", "durationMs": 125110}
|
|
804
|
+
],
|
|
805
|
+
"skippedWorkers": [
|
|
806
|
+
{"worker": "claude-worker", "reason": "no items to verify"}
|
|
807
|
+
],
|
|
808
|
+
"verificationsRequested": 2,
|
|
809
|
+
"verificationsCompleted": 2,
|
|
810
|
+
"newConsensus": 0,
|
|
811
|
+
"remainingInQueue": 1,
|
|
812
|
+
"earlyExit": false
|
|
813
|
+
}
|
|
814
|
+
],
|
|
815
|
+
"round2SkippedReason": "not-skipped",
|
|
816
|
+
"finalState": "max-rounds-reached",
|
|
817
|
+
"totalRounds": 2,
|
|
818
|
+
"finalClassificationCounts": {
|
|
819
|
+
"fullConsensus": 1,
|
|
820
|
+
"partialConsensus": 0,
|
|
821
|
+
"contested": 1,
|
|
822
|
+
"workerUnique": 0
|
|
823
|
+
},
|
|
824
|
+
"summary": {
|
|
825
|
+
"fullConsensus": 1,
|
|
826
|
+
"partialConsensus": 0,
|
|
827
|
+
"contested": 1,
|
|
828
|
+
"workerUnique": 0
|
|
829
|
+
}
|
|
830
|
+
}
|
|
831
|
+
```
|
|
832
|
+
|
|
833
|
+
- [ ] **Step 2: JSON 유효성 검증**
|
|
834
|
+
|
|
835
|
+
Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/mixed-round2.json'))"`
|
|
836
|
+
Expected: 종료 코드 0.
|
|
837
|
+
|
|
838
|
+
- [ ] **Step 3: 핵심 invariant 수기 검증**
|
|
839
|
+
|
|
840
|
+
Round 2 입력 큐(`inputQueueSize=1`) 는 Round 1 출력 큐(`carriedForwardCount=1`)와 일치해야 한다. 그리고 `F-010` (Round 1에서 full-consensus 확정) 의 `rounds` 배열에 round 2 항목이 없어야 한다 — confirmed finding은 다시 prompt에 들어가지 않는다는 invariant 가 fixture에 반영되었는지 확인.
|
|
841
|
+
|
|
842
|
+
- [ ] **Step 4: Commit**
|
|
843
|
+
|
|
844
|
+
```bash
|
|
845
|
+
git add tests/fixtures/convergence/mixed-round2.json
|
|
846
|
+
git commit -m "test(convergence): fixture for mixed Round 1 → unresolved Round 2
|
|
847
|
+
|
|
848
|
+
F-010 resolves in Round 1; F-011 remains in queue (claude+codex agree,
|
|
849
|
+
gemini disagrees), enters Round 2 which still cannot resolve it →
|
|
850
|
+
classified as contested. claude-worker is skipped in Round 2 because the
|
|
851
|
+
only queued item is its own. round2SkippedReason=not-skipped."
|
|
852
|
+
```
|
|
853
|
+
|
|
854
|
+
---
|
|
855
|
+
|
|
856
|
+
## Task 11: Fixture 3 — worker-failure (all reverify non-result)
|
|
857
|
+
|
|
858
|
+
**Files:**
|
|
859
|
+
- Create: `tests/fixtures/convergence/reverify-all-failed.json`
|
|
860
|
+
|
|
861
|
+
- [ ] **Step 1: 파일 작성**
|
|
862
|
+
|
|
863
|
+
`tests/fixtures/convergence/reverify-all-failed.json`:
|
|
864
|
+
|
|
865
|
+
```json
|
|
866
|
+
{
|
|
867
|
+
"schemaVersion": "1.1",
|
|
868
|
+
"taskKey": "fixture/reverify-all-failed",
|
|
869
|
+
"config": {
|
|
870
|
+
"enabled": true,
|
|
871
|
+
"maxRounds": 2,
|
|
872
|
+
"effectiveMaxRounds": 2,
|
|
873
|
+
"verificationMode": "lightweight"
|
|
874
|
+
},
|
|
875
|
+
"findings": [
|
|
876
|
+
{
|
|
877
|
+
"findingId": "F-020",
|
|
878
|
+
"summary": "Possible memory leak in long-running session",
|
|
879
|
+
"category": "risk",
|
|
880
|
+
"ticketIds": ["EX-300"],
|
|
881
|
+
"originWorker": "codex-worker",
|
|
882
|
+
"originEvidence": "src/session/cache.ts:200",
|
|
883
|
+
"classification": "contested",
|
|
884
|
+
"rounds": [
|
|
885
|
+
{
|
|
886
|
+
"round": 1,
|
|
887
|
+
"votes": {
|
|
888
|
+
"claude-worker": {"verdict": "verification-error", "explanation": "dispatch timeout after 900s"},
|
|
889
|
+
"gemini-worker": {"verdict": "verification-error", "explanation": "CLI exit 137 (OOM)"}
|
|
890
|
+
}
|
|
891
|
+
}
|
|
892
|
+
],
|
|
893
|
+
"consensusWorkers": ["codex-worker"],
|
|
894
|
+
"dissentingWorkers": []
|
|
895
|
+
}
|
|
896
|
+
],
|
|
897
|
+
"roundHistory": [
|
|
898
|
+
{
|
|
899
|
+
"round": 1,
|
|
900
|
+
"inputQueueSize": 1,
|
|
901
|
+
"resolvedCount": 0,
|
|
902
|
+
"carriedForwardCount": 1,
|
|
903
|
+
"dispatches": [
|
|
904
|
+
{"worker": "claude-worker", "status": "timeout", "durationMs": 900000},
|
|
905
|
+
{"worker": "gemini-worker", "status": "error", "durationMs": 14210}
|
|
906
|
+
],
|
|
907
|
+
"skippedWorkers": [
|
|
908
|
+
{"worker": "claude-worker", "reason": "dispatch-non-result", "terminalStatus": "timeout"},
|
|
909
|
+
{"worker": "gemini-worker", "reason": "dispatch-non-result", "terminalStatus": "error"}
|
|
910
|
+
],
|
|
911
|
+
"verificationsRequested": 2,
|
|
912
|
+
"verificationsCompleted": 0,
|
|
913
|
+
"newConsensus": 0,
|
|
914
|
+
"remainingInQueue": 1,
|
|
915
|
+
"earlyExit": false
|
|
916
|
+
}
|
|
917
|
+
],
|
|
918
|
+
"round2SkippedReason": "all-reverify-non-result",
|
|
919
|
+
"finalState": "aborted-non-result",
|
|
920
|
+
"totalRounds": 1,
|
|
921
|
+
"finalClassificationCounts": {
|
|
922
|
+
"fullConsensus": 0,
|
|
923
|
+
"partialConsensus": 0,
|
|
924
|
+
"contested": 1,
|
|
925
|
+
"workerUnique": 0
|
|
926
|
+
},
|
|
927
|
+
"summary": {
|
|
928
|
+
"fullConsensus": 0,
|
|
929
|
+
"partialConsensus": 0,
|
|
930
|
+
"contested": 1,
|
|
931
|
+
"workerUnique": 0
|
|
932
|
+
}
|
|
933
|
+
}
|
|
934
|
+
```
|
|
935
|
+
|
|
936
|
+
- [ ] **Step 2: JSON 유효성 검증**
|
|
937
|
+
|
|
938
|
+
Run: `python3 -c "import json; json.load(open('tests/fixtures/convergence/reverify-all-failed.json'))"`
|
|
939
|
+
Expected: 종료 코드 0.
|
|
940
|
+
|
|
941
|
+
- [ ] **Step 3: Commit**
|
|
942
|
+
|
|
943
|
+
```bash
|
|
944
|
+
git add tests/fixtures/convergence/reverify-all-failed.json
|
|
945
|
+
git commit -m "test(convergence): fixture for all-reverify-non-result Round 1 abort
|
|
946
|
+
|
|
947
|
+
Single finding remains unresolved because both reverify dispatches return
|
|
948
|
+
terminal non-result (timeout + error). Round 2 is suppressed with
|
|
949
|
+
round2SkippedReason=all-reverify-non-result; finalState=aborted-non-result.
|
|
950
|
+
Verdicts are recorded as verification-error, not DISAGREE."
|
|
951
|
+
```
|
|
952
|
+
|
|
953
|
+
---
|
|
954
|
+
|
|
955
|
+
## Task 12: Convergence schema contract test (pytest)
|
|
956
|
+
|
|
957
|
+
**Files:**
|
|
958
|
+
- Create: `tests/test_convergence_state_contract.py`
|
|
959
|
+
|
|
960
|
+
이 task는 코드를 작성하므로 TDD로 진행한다. 픽스처는 이미 Tasks 9~11에서 존재한다고 가정한다.
|
|
961
|
+
|
|
962
|
+
- [ ] **Step 1: 실패하는 첫 테스트 — schemaVersion 검사**
|
|
963
|
+
|
|
964
|
+
`tests/test_convergence_state_contract.py`:
|
|
965
|
+
|
|
966
|
+
```python
|
|
967
|
+
"""Contract tests for convergence-<task-type>-<seq>.json (schema v1.1).
|
|
968
|
+
|
|
969
|
+
Convergence state is a documentation contract — no production code reads it.
|
|
970
|
+
These tests check that fixtures shipped under tests/fixtures/convergence/
|
|
971
|
+
respect the v1.1 invariants documented in
|
|
972
|
+
skills/okstra-convergence/SKILL.md "Convergence State Artifact".
|
|
973
|
+
"""
|
|
974
|
+
from __future__ import annotations
|
|
975
|
+
|
|
976
|
+
import json
|
|
977
|
+
from pathlib import Path
|
|
978
|
+
|
|
979
|
+
import pytest
|
|
980
|
+
|
|
981
|
+
FIXTURE_DIR = Path(__file__).parent / "fixtures" / "convergence"
|
|
982
|
+
ALL_FIXTURES = sorted(FIXTURE_DIR.glob("*.json"))
|
|
983
|
+
|
|
984
|
+
VALID_ROUND2_SKIP_REASONS = {
|
|
985
|
+
"queue-empty",
|
|
986
|
+
"max-rounds-1",
|
|
987
|
+
"all-reverify-non-result",
|
|
988
|
+
"not-skipped",
|
|
989
|
+
}
|
|
990
|
+
VALID_FINAL_STATES = {"converged", "max-rounds-reached", "aborted-non-result"}
|
|
991
|
+
VALID_CLASSIFICATIONS = {
|
|
992
|
+
"full-consensus",
|
|
993
|
+
"partial-consensus",
|
|
994
|
+
"contested",
|
|
995
|
+
"worker-unique",
|
|
996
|
+
}
|
|
997
|
+
VALID_DISPATCH_STATUSES = {"completed", "timeout", "error", "not-run"}
|
|
998
|
+
VALID_VERDICTS = {"agree", "disagree", "supplement", "verification-error"}
|
|
999
|
+
|
|
1000
|
+
|
|
1001
|
+
@pytest.fixture(params=ALL_FIXTURES, ids=lambda p: p.stem)
|
|
1002
|
+
def fixture(request) -> dict:
|
|
1003
|
+
return json.loads(request.param.read_text())
|
|
1004
|
+
|
|
1005
|
+
|
|
1006
|
+
def test_schema_version_is_1_1(fixture):
|
|
1007
|
+
assert fixture["schemaVersion"] == "1.1"
|
|
1008
|
+
```
|
|
1009
|
+
|
|
1010
|
+
- [ ] **Step 2: 첫 테스트 실행 (이 단계의 fail/pass 둘 다 진행에 유효)**
|
|
1011
|
+
|
|
1012
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_convergence_state_contract.py -v`
|
|
1013
|
+
Expected: 3 fixtures × 1 test = 3 PASS (이미 Task 9~11에서 `"schemaVersion": "1.1"`로 작성했기 때문). 그렇지 않다면 픽스처 오류 — 픽스처를 먼저 고친다.
|
|
1014
|
+
|
|
1015
|
+
- [ ] **Step 3: config / round2SkippedReason / finalState 테스트 추가**
|
|
1016
|
+
|
|
1017
|
+
`tests/test_convergence_state_contract.py` 끝에 다음을 추가:
|
|
1018
|
+
|
|
1019
|
+
```python
|
|
1020
|
+
def test_effective_max_rounds_is_one_to_three(fixture):
|
|
1021
|
+
e = fixture["config"]["effectiveMaxRounds"]
|
|
1022
|
+
assert isinstance(e, int) and 1 <= e <= 3
|
|
1023
|
+
|
|
1024
|
+
|
|
1025
|
+
def test_round2_skipped_reason_is_enum(fixture):
|
|
1026
|
+
assert fixture["round2SkippedReason"] in VALID_ROUND2_SKIP_REASONS
|
|
1027
|
+
|
|
1028
|
+
|
|
1029
|
+
def test_final_state_is_enum(fixture):
|
|
1030
|
+
assert fixture["finalState"] in VALID_FINAL_STATES
|
|
1031
|
+
|
|
1032
|
+
|
|
1033
|
+
def test_final_classification_counts_keys_present(fixture):
|
|
1034
|
+
counts = fixture["finalClassificationCounts"]
|
|
1035
|
+
assert set(counts.keys()) == {
|
|
1036
|
+
"fullConsensus",
|
|
1037
|
+
"partialConsensus",
|
|
1038
|
+
"contested",
|
|
1039
|
+
"workerUnique",
|
|
1040
|
+
}
|
|
1041
|
+
for v in counts.values():
|
|
1042
|
+
assert isinstance(v, int) and v >= 0
|
|
1043
|
+
```
|
|
1044
|
+
|
|
1045
|
+
- [ ] **Step 4: 라운드별 invariant 테스트 추가**
|
|
1046
|
+
|
|
1047
|
+
```python
|
|
1048
|
+
def test_round_arithmetic_consistent(fixture):
|
|
1049
|
+
"""inputQueueSize == resolvedCount + carriedForwardCount."""
|
|
1050
|
+
for r in fixture["roundHistory"]:
|
|
1051
|
+
assert r["inputQueueSize"] == r["resolvedCount"] + r["carriedForwardCount"], (
|
|
1052
|
+
f"round {r['round']}: {r['inputQueueSize']} != {r['resolvedCount']} + {r['carriedForwardCount']}"
|
|
1053
|
+
)
|
|
1054
|
+
|
|
1055
|
+
|
|
1056
|
+
def test_round_input_matches_previous_carry_forward(fixture):
|
|
1057
|
+
"""Round N+1 inputQueueSize equals Round N carriedForwardCount."""
|
|
1058
|
+
rounds = fixture["roundHistory"]
|
|
1059
|
+
for prev, curr in zip(rounds, rounds[1:]):
|
|
1060
|
+
assert curr["inputQueueSize"] == prev["carriedForwardCount"], (
|
|
1061
|
+
f"round {curr['round']} input {curr['inputQueueSize']} "
|
|
1062
|
+
f"!= round {prev['round']} carry {prev['carriedForwardCount']}"
|
|
1063
|
+
)
|
|
1064
|
+
|
|
1065
|
+
|
|
1066
|
+
def test_dispatch_statuses_are_terminal(fixture):
|
|
1067
|
+
for r in fixture["roundHistory"]:
|
|
1068
|
+
for d in r["dispatches"]:
|
|
1069
|
+
assert d["status"] in VALID_DISPATCH_STATUSES
|
|
1070
|
+
assert isinstance(d["durationMs"], int) and d["durationMs"] >= 0
|
|
1071
|
+
|
|
1072
|
+
|
|
1073
|
+
def test_total_rounds_matches_round_history(fixture):
|
|
1074
|
+
assert fixture["totalRounds"] == len(fixture["roundHistory"])
|
|
1075
|
+
```
|
|
1076
|
+
|
|
1077
|
+
- [ ] **Step 5: classification / queue pruning invariant 테스트 추가**
|
|
1078
|
+
|
|
1079
|
+
```python
|
|
1080
|
+
def test_findings_classifications_are_enum(fixture):
|
|
1081
|
+
for f in fixture["findings"]:
|
|
1082
|
+
assert f["classification"] in VALID_CLASSIFICATIONS
|
|
1083
|
+
|
|
1084
|
+
|
|
1085
|
+
def test_confirmed_findings_dont_reappear_after_classification_round(fixture):
|
|
1086
|
+
"""A finding classified as full/partial/worker-unique must not have a
|
|
1087
|
+
`rounds` entry after the round in which it was resolved.
|
|
1088
|
+
|
|
1089
|
+
The fixture's `rounds[]` array is the per-finding vote ledger. A
|
|
1090
|
+
re-prompt that re-included a confirmed item would surface as extra
|
|
1091
|
+
rounds in this ledger. `contested` items may legitimately appear in
|
|
1092
|
+
every round (queue carry-forward).
|
|
1093
|
+
"""
|
|
1094
|
+
for f in fixture["findings"]:
|
|
1095
|
+
if f["classification"] == "contested":
|
|
1096
|
+
continue
|
|
1097
|
+
if not f["rounds"]:
|
|
1098
|
+
continue
|
|
1099
|
+
# The last round in `rounds[]` is the resolution round. There must
|
|
1100
|
+
# be no entries beyond it. Equivalent to: every classification
|
|
1101
|
+
# other than contested resolves in its last recorded round.
|
|
1102
|
+
resolution_round = f["rounds"][-1]["round"]
|
|
1103
|
+
for r in f["rounds"]:
|
|
1104
|
+
assert r["round"] <= resolution_round
|
|
1105
|
+
|
|
1106
|
+
|
|
1107
|
+
def test_contested_only_when_last_executed_round_done(fixture):
|
|
1108
|
+
"""A finding labeled `contested` must have a `rounds[]` entry whose
|
|
1109
|
+
round index equals `totalRounds` (the last executed round)."""
|
|
1110
|
+
for f in fixture["findings"]:
|
|
1111
|
+
if f["classification"] != "contested":
|
|
1112
|
+
continue
|
|
1113
|
+
last_voted = max(r["round"] for r in f["rounds"])
|
|
1114
|
+
assert last_voted == fixture["totalRounds"], (
|
|
1115
|
+
f"{f['findingId']} contested but last vote in round "
|
|
1116
|
+
f"{last_voted}, while totalRounds={fixture['totalRounds']}"
|
|
1117
|
+
)
|
|
1118
|
+
|
|
1119
|
+
|
|
1120
|
+
def test_verdicts_are_enum(fixture):
|
|
1121
|
+
for f in fixture["findings"]:
|
|
1122
|
+
for r in f["rounds"]:
|
|
1123
|
+
for vote in r["votes"].values():
|
|
1124
|
+
assert vote["verdict"] in VALID_VERDICTS
|
|
1125
|
+
|
|
1126
|
+
|
|
1127
|
+
def test_classification_counts_match_findings(fixture):
|
|
1128
|
+
counts = {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0}
|
|
1129
|
+
mapping = {
|
|
1130
|
+
"full-consensus": "fullConsensus",
|
|
1131
|
+
"partial-consensus": "partialConsensus",
|
|
1132
|
+
"contested": "contested",
|
|
1133
|
+
"worker-unique": "workerUnique",
|
|
1134
|
+
}
|
|
1135
|
+
for f in fixture["findings"]:
|
|
1136
|
+
counts[mapping[f["classification"]]] += 1
|
|
1137
|
+
assert counts == fixture["finalClassificationCounts"]
|
|
1138
|
+
assert counts == fixture["summary"] # legacy alias parity
|
|
1139
|
+
```
|
|
1140
|
+
|
|
1141
|
+
- [ ] **Step 6: round2SkippedReason 의미적 일관성 테스트 추가**
|
|
1142
|
+
|
|
1143
|
+
```python
|
|
1144
|
+
def test_skip_reason_consistent_with_round_history(fixture):
|
|
1145
|
+
reason = fixture["round2SkippedReason"]
|
|
1146
|
+
rounds = fixture["roundHistory"]
|
|
1147
|
+
effective_max = fixture["config"]["effectiveMaxRounds"]
|
|
1148
|
+
last_round = rounds[-1]
|
|
1149
|
+
|
|
1150
|
+
if reason == "queue-empty":
|
|
1151
|
+
assert last_round["carriedForwardCount"] == 0
|
|
1152
|
+
assert last_round["round"] < effective_max
|
|
1153
|
+
elif reason == "max-rounds-1":
|
|
1154
|
+
assert effective_max == 1
|
|
1155
|
+
elif reason == "all-reverify-non-result":
|
|
1156
|
+
# every dispatch in the last round was non-completed
|
|
1157
|
+
assert all(d["status"] != "completed" for d in last_round["dispatches"])
|
|
1158
|
+
assert fixture["finalState"] == "aborted-non-result"
|
|
1159
|
+
elif reason == "not-skipped":
|
|
1160
|
+
# round 2 executed (or effective_max == 2 and last round was 2)
|
|
1161
|
+
# AND finalState is converged or max-rounds-reached
|
|
1162
|
+
assert fixture["finalState"] in {"converged", "max-rounds-reached"}
|
|
1163
|
+
```
|
|
1164
|
+
|
|
1165
|
+
- [ ] **Step 7: 전체 테스트 실행**
|
|
1166
|
+
|
|
1167
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_convergence_state_contract.py -v`
|
|
1168
|
+
Expected: 3 fixtures × 10 tests = **30 PASS**. 실패가 있으면 fixture 또는 test의 invariant가 잘못된 것 — fixture가 schema v1.1 contract를 정확히 모델링하지 못한 부분이므로 fixture를 먼저 의심한다.
|
|
1169
|
+
|
|
1170
|
+
- [ ] **Step 8: Commit**
|
|
1171
|
+
|
|
1172
|
+
```bash
|
|
1173
|
+
git add tests/test_convergence_state_contract.py
|
|
1174
|
+
git commit -m "test(convergence): contract tests for schema v1.1 invariants
|
|
1175
|
+
|
|
1176
|
+
Parametrized over the three convergence fixtures. Asserts: schemaVersion,
|
|
1177
|
+
effectiveMaxRounds range, round2SkippedReason enum, finalState enum,
|
|
1178
|
+
finalClassificationCounts shape, per-round arithmetic
|
|
1179
|
+
(inputQueueSize == resolvedCount + carriedForwardCount), round carry-forward
|
|
1180
|
+
chain, dispatch statuses, verdict enum, contested-only-at-last-round, and
|
|
1181
|
+
skip-reason / round-history consistency."
|
|
1182
|
+
```
|
|
1183
|
+
|
|
1184
|
+
---
|
|
1185
|
+
|
|
1186
|
+
## Task 13: Baseline 계측 helper (`scripts/okstra-convergence-stats.py`)
|
|
1187
|
+
|
|
1188
|
+
**Files:**
|
|
1189
|
+
- Create: `scripts/okstra-convergence-stats.py`
|
|
1190
|
+
- Test: `tests/test_okstra_convergence_stats.py`
|
|
1191
|
+
|
|
1192
|
+
P1 효과(전후 비교) 측정용 helper. TDD로 작성.
|
|
1193
|
+
|
|
1194
|
+
- [ ] **Step 1: 실패하는 첫 테스트 작성**
|
|
1195
|
+
|
|
1196
|
+
`tests/test_okstra_convergence_stats.py`:
|
|
1197
|
+
|
|
1198
|
+
```python
|
|
1199
|
+
"""Tests for scripts/okstra-convergence-stats.py — baseline metrics helper."""
|
|
1200
|
+
from __future__ import annotations
|
|
1201
|
+
|
|
1202
|
+
import json
|
|
1203
|
+
import subprocess
|
|
1204
|
+
import sys
|
|
1205
|
+
from pathlib import Path
|
|
1206
|
+
|
|
1207
|
+
REPO = Path(__file__).resolve().parents[1]
|
|
1208
|
+
SCRIPT = REPO / "scripts" / "okstra-convergence-stats.py"
|
|
1209
|
+
|
|
1210
|
+
|
|
1211
|
+
def run_stats(*args: str) -> dict:
|
|
1212
|
+
"""Invoke the script with --json and return parsed stdout."""
|
|
1213
|
+
proc = subprocess.run(
|
|
1214
|
+
[sys.executable, str(SCRIPT), *args, "--json"],
|
|
1215
|
+
check=True,
|
|
1216
|
+
capture_output=True,
|
|
1217
|
+
text=True,
|
|
1218
|
+
)
|
|
1219
|
+
return json.loads(proc.stdout)
|
|
1220
|
+
|
|
1221
|
+
|
|
1222
|
+
def test_reads_convergence_only(tmp_path):
|
|
1223
|
+
"""When team-state is omitted, the script still reports convergence-side metrics."""
|
|
1224
|
+
fx = REPO / "tests" / "fixtures" / "convergence" / "early-exit.json"
|
|
1225
|
+
out = run_stats("--convergence-state", str(fx))
|
|
1226
|
+
assert out["roundCount"] == 1
|
|
1227
|
+
assert out["dispatchCount"] == 3
|
|
1228
|
+
assert out["dispatchDurationMsTotal"] == 92110 + 184221 + 201337
|
|
1229
|
+
assert out["round2SkippedReason"] == "queue-empty"
|
|
1230
|
+
assert out["finalClassificationCounts"]["fullConsensus"] == 2
|
|
1231
|
+
```
|
|
1232
|
+
|
|
1233
|
+
- [ ] **Step 2: 테스트 실행 — 실패 확인**
|
|
1234
|
+
|
|
1235
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
|
|
1236
|
+
Expected: FAIL — `scripts/okstra-convergence-stats.py` 가 아직 없음.
|
|
1237
|
+
|
|
1238
|
+
- [ ] **Step 3: 최소 구현 — convergence-state 단독 모드**
|
|
1239
|
+
|
|
1240
|
+
`scripts/okstra-convergence-stats.py`:
|
|
1241
|
+
|
|
1242
|
+
```python
|
|
1243
|
+
#!/usr/bin/env python3
|
|
1244
|
+
"""Baseline metrics for okstra Phase 5.5 convergence.
|
|
1245
|
+
|
|
1246
|
+
Aggregates dispatch counts, wall-clock totals, and (when team-state is
|
|
1247
|
+
supplied) worker token usage filtered to reverify dispatches. Reads:
|
|
1248
|
+
|
|
1249
|
+
- convergence-<task-type>-<seq>.json (required)
|
|
1250
|
+
- team-state-<task-type>-<seq>.json (optional; for token deltas)
|
|
1251
|
+
|
|
1252
|
+
Output: human-readable table by default, JSON when --json is passed. Used
|
|
1253
|
+
to record before/after numbers for the P1 convergence queue-pruning change.
|
|
1254
|
+
"""
|
|
1255
|
+
from __future__ import annotations
|
|
1256
|
+
|
|
1257
|
+
import argparse
|
|
1258
|
+
import json
|
|
1259
|
+
import sys
|
|
1260
|
+
from pathlib import Path
|
|
1261
|
+
|
|
1262
|
+
|
|
1263
|
+
def collect(convergence_path: Path, team_state_path: Path | None) -> dict:
|
|
1264
|
+
convergence = json.loads(convergence_path.read_text())
|
|
1265
|
+
|
|
1266
|
+
dispatch_count = 0
|
|
1267
|
+
duration_total = 0
|
|
1268
|
+
for r in convergence["roundHistory"]:
|
|
1269
|
+
for d in r["dispatches"]:
|
|
1270
|
+
dispatch_count += 1
|
|
1271
|
+
duration_total += int(d.get("durationMs", 0))
|
|
1272
|
+
|
|
1273
|
+
out = {
|
|
1274
|
+
"schemaVersion": convergence.get("schemaVersion"),
|
|
1275
|
+
"taskKey": convergence.get("taskKey"),
|
|
1276
|
+
"effectiveMaxRounds": convergence["config"]["effectiveMaxRounds"],
|
|
1277
|
+
"roundCount": len(convergence["roundHistory"]),
|
|
1278
|
+
"dispatchCount": dispatch_count,
|
|
1279
|
+
"dispatchDurationMsTotal": duration_total,
|
|
1280
|
+
"round2SkippedReason": convergence.get("round2SkippedReason"),
|
|
1281
|
+
"finalState": convergence.get("finalState"),
|
|
1282
|
+
"finalClassificationCounts": convergence.get(
|
|
1283
|
+
"finalClassificationCounts", convergence.get("summary", {})
|
|
1284
|
+
),
|
|
1285
|
+
"reverifyTokenTotal": None,
|
|
1286
|
+
"reverifyCostUsdTotal": None,
|
|
1287
|
+
}
|
|
1288
|
+
|
|
1289
|
+
if team_state_path is not None and team_state_path.exists():
|
|
1290
|
+
team = json.loads(team_state_path.read_text())
|
|
1291
|
+
reverify_tokens = 0
|
|
1292
|
+
reverify_cost = 0.0
|
|
1293
|
+
for w in team.get("workers", []):
|
|
1294
|
+
usage = w.get("usage", {})
|
|
1295
|
+
agent_name = (w.get("agentName") or "")
|
|
1296
|
+
# Reverify dispatches use the `-reverify-r<N>-` slug per
|
|
1297
|
+
# okstra-convergence "Re-verification Agent Dispatch".
|
|
1298
|
+
if "-reverify-r" not in agent_name:
|
|
1299
|
+
continue
|
|
1300
|
+
reverify_tokens += int(usage.get("totalTokens", 0) or 0)
|
|
1301
|
+
reverify_cost += float(usage.get("estimatedCostUsd", 0) or 0)
|
|
1302
|
+
out["reverifyTokenTotal"] = reverify_tokens
|
|
1303
|
+
out["reverifyCostUsdTotal"] = round(reverify_cost, 4)
|
|
1304
|
+
|
|
1305
|
+
return out
|
|
1306
|
+
|
|
1307
|
+
|
|
1308
|
+
def format_human(stats: dict) -> str:
|
|
1309
|
+
lines = [
|
|
1310
|
+
f"taskKey : {stats.get('taskKey')}",
|
|
1311
|
+
f"schemaVersion : {stats.get('schemaVersion')}",
|
|
1312
|
+
f"effectiveMaxRounds : {stats['effectiveMaxRounds']}",
|
|
1313
|
+
f"roundCount : {stats['roundCount']}",
|
|
1314
|
+
f"dispatchCount : {stats['dispatchCount']}",
|
|
1315
|
+
f"dispatchDurationMsTotal: {stats['dispatchDurationMsTotal']}",
|
|
1316
|
+
f"round2SkippedReason : {stats['round2SkippedReason']}",
|
|
1317
|
+
f"finalState : {stats['finalState']}",
|
|
1318
|
+
f"finalClassificationCounts: {stats['finalClassificationCounts']}",
|
|
1319
|
+
]
|
|
1320
|
+
if stats["reverifyTokenTotal"] is not None:
|
|
1321
|
+
lines.append(f"reverifyTokenTotal : {stats['reverifyTokenTotal']}")
|
|
1322
|
+
lines.append(f"reverifyCostUsdTotal : ${stats['reverifyCostUsdTotal']:.4f}")
|
|
1323
|
+
return "\n".join(lines)
|
|
1324
|
+
|
|
1325
|
+
|
|
1326
|
+
def main(argv: list[str]) -> int:
|
|
1327
|
+
p = argparse.ArgumentParser(description=__doc__)
|
|
1328
|
+
p.add_argument("--convergence-state", required=True, type=Path)
|
|
1329
|
+
p.add_argument("--team-state", type=Path, default=None)
|
|
1330
|
+
p.add_argument("--json", action="store_true", help="emit JSON to stdout")
|
|
1331
|
+
args = p.parse_args(argv)
|
|
1332
|
+
|
|
1333
|
+
if not args.convergence_state.exists():
|
|
1334
|
+
print(f"error: convergence state not found: {args.convergence_state}", file=sys.stderr)
|
|
1335
|
+
return 2
|
|
1336
|
+
|
|
1337
|
+
stats = collect(args.convergence_state, args.team_state)
|
|
1338
|
+
|
|
1339
|
+
if args.json:
|
|
1340
|
+
print(json.dumps(stats, ensure_ascii=False, indent=2))
|
|
1341
|
+
else:
|
|
1342
|
+
print(format_human(stats))
|
|
1343
|
+
return 0
|
|
1344
|
+
|
|
1345
|
+
|
|
1346
|
+
if __name__ == "__main__":
|
|
1347
|
+
sys.exit(main(sys.argv[1:]))
|
|
1348
|
+
```
|
|
1349
|
+
|
|
1350
|
+
- [ ] **Step 4: 실행 권한 부여**
|
|
1351
|
+
|
|
1352
|
+
Run: `chmod +x /Volumes/Workspaces/workspace/projects/Okstra/scripts/okstra-convergence-stats.py`
|
|
1353
|
+
Expected: 출력 없음.
|
|
1354
|
+
|
|
1355
|
+
- [ ] **Step 5: 테스트 재실행 — PASS 확인**
|
|
1356
|
+
|
|
1357
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
|
|
1358
|
+
Expected: 1 passed.
|
|
1359
|
+
|
|
1360
|
+
- [ ] **Step 6: 두 번째 테스트 — team-state 포함 시 reverify 토큰 집계**
|
|
1361
|
+
|
|
1362
|
+
`tests/test_okstra_convergence_stats.py` 끝에 추가:
|
|
1363
|
+
|
|
1364
|
+
```python
|
|
1365
|
+
def test_reverify_token_aggregation(tmp_path):
|
|
1366
|
+
convergence = tmp_path / "convergence.json"
|
|
1367
|
+
convergence.write_text(json.dumps({
|
|
1368
|
+
"schemaVersion": "1.1",
|
|
1369
|
+
"taskKey": "fixture/tokens",
|
|
1370
|
+
"config": {"enabled": True, "maxRounds": 2, "effectiveMaxRounds": 2, "verificationMode": "lightweight"},
|
|
1371
|
+
"findings": [],
|
|
1372
|
+
"roundHistory": [
|
|
1373
|
+
{
|
|
1374
|
+
"round": 1,
|
|
1375
|
+
"inputQueueSize": 0,
|
|
1376
|
+
"resolvedCount": 0,
|
|
1377
|
+
"carriedForwardCount": 0,
|
|
1378
|
+
"dispatches": [
|
|
1379
|
+
{"worker": "codex-worker", "status": "completed", "durationMs": 1000}
|
|
1380
|
+
],
|
|
1381
|
+
"skippedWorkers": [],
|
|
1382
|
+
"verificationsRequested": 1,
|
|
1383
|
+
"verificationsCompleted": 1,
|
|
1384
|
+
"newConsensus": 0,
|
|
1385
|
+
"remainingInQueue": 0,
|
|
1386
|
+
"earlyExit": True
|
|
1387
|
+
}
|
|
1388
|
+
],
|
|
1389
|
+
"round2SkippedReason": "queue-empty",
|
|
1390
|
+
"finalState": "converged",
|
|
1391
|
+
"totalRounds": 1,
|
|
1392
|
+
"finalClassificationCounts": {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0},
|
|
1393
|
+
"summary": {"fullConsensus": 0, "partialConsensus": 0, "contested": 0, "workerUnique": 0}
|
|
1394
|
+
}))
|
|
1395
|
+
|
|
1396
|
+
team = tmp_path / "team-state.json"
|
|
1397
|
+
team.write_text(json.dumps({
|
|
1398
|
+
"workers": [
|
|
1399
|
+
{"agentName": "codex-worker-error-analysis-001", "usage": {"totalTokens": 9999, "estimatedCostUsd": 0.10}},
|
|
1400
|
+
{"agentName": "codex-worker-reverify-r1-error-analysis-001", "usage": {"totalTokens": 5000, "estimatedCostUsd": 0.06}},
|
|
1401
|
+
{"agentName": "gemini-worker-reverify-r1-error-analysis-001", "usage": {"totalTokens": 3000, "estimatedCostUsd": 0.04}}
|
|
1402
|
+
]
|
|
1403
|
+
}))
|
|
1404
|
+
|
|
1405
|
+
out = run_stats("--convergence-state", str(convergence), "--team-state", str(team))
|
|
1406
|
+
assert out["reverifyTokenTotal"] == 8000
|
|
1407
|
+
assert out["reverifyCostUsdTotal"] == 0.10
|
|
1408
|
+
```
|
|
1409
|
+
|
|
1410
|
+
- [ ] **Step 7: 테스트 재실행**
|
|
1411
|
+
|
|
1412
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest tests/test_okstra_convergence_stats.py -v`
|
|
1413
|
+
Expected: 2 passed.
|
|
1414
|
+
|
|
1415
|
+
- [ ] **Step 8: Commit**
|
|
1416
|
+
|
|
1417
|
+
```bash
|
|
1418
|
+
git add scripts/okstra-convergence-stats.py tests/test_okstra_convergence_stats.py
|
|
1419
|
+
git commit -m "feat(scripts): okstra-convergence-stats.py baseline metrics helper
|
|
1420
|
+
|
|
1421
|
+
Reads convergence-<task-type>-<seq>.json and (optionally) team-state to
|
|
1422
|
+
aggregate dispatch count, wall-clock total, and worker token/cost for
|
|
1423
|
+
reverify dispatches (filter on agentName containing '-reverify-r'). Used to
|
|
1424
|
+
record before/after numbers for the P1 queue-pruning change. Emits JSON
|
|
1425
|
+
when --json is passed."
|
|
1426
|
+
```
|
|
1427
|
+
|
|
1428
|
+
---
|
|
1429
|
+
|
|
1430
|
+
## Task 14: docs/kr/performance-improvement-plan-v2.md — Section 9 결론 갱신
|
|
1431
|
+
|
|
1432
|
+
**Files:**
|
|
1433
|
+
- Modify: `docs/kr/performance-improvement-plan-v2.md`
|
|
1434
|
+
|
|
1435
|
+
- [ ] **Step 1: Section 9 본문 갱신**
|
|
1436
|
+
|
|
1437
|
+
기존 (line 318~326):
|
|
1438
|
+
|
|
1439
|
+
```markdown
|
|
1440
|
+
## 9. 이번 계획의 결론
|
|
1441
|
+
|
|
1442
|
+
현재 작업 계획은 P1을 최우선으로 둔 방향은 맞지만, 기존 표현의 "7-phase lifecycle"과 "contested-only 2라운드"는 코드와 맞지 않았다. 개선된 계획은 다음처럼 재정렬한다.
|
|
1443
|
+
|
|
1444
|
+
1. P0로 용어와 측정 기준을 고정한다.
|
|
1445
|
+
2. P1에서 convergence queue pruning을 구현한다.
|
|
1446
|
+
3. P3 fast-track과 P4 prompt caching은 별도 설계/검증이 필요한 후속 작업으로 둔다.
|
|
1447
|
+
4. prepare render 병렬화와 token usage 증분화는 효과가 작거나 종료 단계 비용이므로 P1 이후 병렬 보조 작업으로 처리한다.
|
|
1448
|
+
```
|
|
1449
|
+
|
|
1450
|
+
다음으로 교체:
|
|
1451
|
+
|
|
1452
|
+
```markdown
|
|
1453
|
+
## 9. 이번 계획의 결론
|
|
1454
|
+
|
|
1455
|
+
현재 작업 계획은 P1을 최우선으로 둔 방향은 맞지만, 기존 표현의 "7-phase lifecycle"과 "contested-only 2라운드"는 코드와 맞지 않았다. 개선된 계획은 다음처럼 재정렬한다.
|
|
1456
|
+
|
|
1457
|
+
1. P0로 용어와 측정 기준을 고정한다.
|
|
1458
|
+
2. P1에서 convergence queue pruning을 구현한다.
|
|
1459
|
+
3. P3 fast-track과 P4 prompt caching은 별도 설계/검증이 필요한 후속 작업으로 둔다.
|
|
1460
|
+
4. prepare render 병렬화와 token usage 증분화는 효과가 작거나 종료 단계 비용이므로 P1 이후 병렬 보조 작업으로 처리한다.
|
|
1461
|
+
|
|
1462
|
+
### 구현 plan 링크
|
|
1463
|
+
|
|
1464
|
+
- P0 + P1: `docs/superpowers/plans/2026-05-14-convergence-queue-pruning.md`
|
|
1465
|
+
- P2 / P3 / P4 / P5 / P6: 미작성 (각 트랙별로 별도 plan 작성 필요)
|
|
1466
|
+
```
|
|
1467
|
+
|
|
1468
|
+
- [ ] **Step 2: Commit**
|
|
1469
|
+
|
|
1470
|
+
```bash
|
|
1471
|
+
git add docs/kr/performance-improvement-plan-v2.md
|
|
1472
|
+
git commit -m "docs(kr): link P0+P1 implementation plan from improvement-plan v2
|
|
1473
|
+
|
|
1474
|
+
Section 9 conclusion now points to docs/superpowers/plans/2026-05-14-
|
|
1475
|
+
convergence-queue-pruning.md as the concrete plan that operationalizes the
|
|
1476
|
+
P0 terminology cleanup and P1 queue-pruning changes."
|
|
1477
|
+
```
|
|
1478
|
+
|
|
1479
|
+
---
|
|
1480
|
+
|
|
1481
|
+
## Final Verification
|
|
1482
|
+
|
|
1483
|
+
전체 plan 적용 후 다음 명령으로 회귀 없음을 확인한다.
|
|
1484
|
+
|
|
1485
|
+
- [ ] **Step A: 전체 pytest 회귀**
|
|
1486
|
+
|
|
1487
|
+
Run: `cd /Volumes/Workspaces/workspace/projects/Okstra && pytest -q`
|
|
1488
|
+
Expected: 신규 추가 분(`test_convergence_state_contract.py` + `test_okstra_convergence_stats.py`) PASS, 기존 테스트 회귀 없음.
|
|
1489
|
+
|
|
1490
|
+
- [ ] **Step B: 모든 fixture JSON 유효성 일괄 확인**
|
|
1491
|
+
|
|
1492
|
+
Run:
|
|
1493
|
+
```bash
|
|
1494
|
+
for f in /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence/*.json; do
|
|
1495
|
+
python3 -c "import json; json.load(open('$f'))" && echo "OK $f"
|
|
1496
|
+
done
|
|
1497
|
+
```
|
|
1498
|
+
Expected: 3 줄 모두 `OK`.
|
|
1499
|
+
|
|
1500
|
+
- [ ] **Step C: Schema doc / fixture 정합성 spot-check**
|
|
1501
|
+
|
|
1502
|
+
Run: `grep -c "schemaVersion.*1.1" /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md`
|
|
1503
|
+
Expected: `>= 1`.
|
|
1504
|
+
|
|
1505
|
+
Run: `grep -c "round2SkippedReason" /Volumes/Workspaces/workspace/projects/Okstra/skills/okstra-convergence/SKILL.md /Volumes/Workspaces/workspace/projects/Okstra/agents/SKILL.md`
|
|
1506
|
+
Expected: 양쪽 파일에서 각각 `>= 1`.
|
|
1507
|
+
|
|
1508
|
+
- [ ] **Step D: Baseline 측정 한 번 실행 (사전 fixture에 대해)**
|
|
1509
|
+
|
|
1510
|
+
Run:
|
|
1511
|
+
```bash
|
|
1512
|
+
python3 /Volumes/Workspaces/workspace/projects/Okstra/scripts/okstra-convergence-stats.py \
|
|
1513
|
+
--convergence-state /Volumes/Workspaces/workspace/projects/Okstra/tests/fixtures/convergence/mixed-round2.json
|
|
1514
|
+
```
|
|
1515
|
+
Expected:
|
|
1516
|
+
```
|
|
1517
|
+
taskKey : fixture/mixed-round2
|
|
1518
|
+
schemaVersion : 1.1
|
|
1519
|
+
effectiveMaxRounds : 2
|
|
1520
|
+
roundCount : 2
|
|
1521
|
+
dispatchCount : 5
|
|
1522
|
+
dispatchDurationMsTotal: 698988
|
|
1523
|
+
round2SkippedReason : not-skipped
|
|
1524
|
+
finalState : max-rounds-reached
|
|
1525
|
+
finalClassificationCounts: {'fullConsensus': 1, 'partialConsensus': 0, 'contested': 1, 'workerUnique': 0}
|
|
1526
|
+
```
|
|
1527
|
+
|
|
1528
|
+
---
|
|
1529
|
+
|
|
1530
|
+
## Out of Scope (별도 plan 필요)
|
|
1531
|
+
|
|
1532
|
+
본 plan은 의도적으로 다음을 포함하지 않는다 — `docs/kr/performance-improvement-plan-v2.md` Section 9의 후속 항목으로 처리한다.
|
|
1533
|
+
|
|
1534
|
+
- P2 (Prompt diet): worker definitions의 `[Required reading]` audience scope 축소. `agents/workers/_common.md` 추출은 install/packaging 호환성 검증이 선행 필요.
|
|
1535
|
+
- P3 (Fast-track routing): `requirements-discovery`가 `route=lite-implementation-planning` 등의 routing token을 남기는 설계. 승인 게이트 정책 결정 필요.
|
|
1536
|
+
- P4 (Prompt caching): Codex/Gemini wrapper에서 cache hint가 의미를 갖는지 spike 선행 필요.
|
|
1537
|
+
- P5 (Prepare render 병렬화): `scripts/okstra_ctl/run.py` instruction-set 독립 write 병렬화.
|
|
1538
|
+
- P6 (Token usage 증분화): `scripts/okstra_token_usage/` jsonl 선형 스캔 캐싱.
|
|
1539
|
+
|
|
1540
|
+
---
|
|
1541
|
+
|
|
1542
|
+
## Self-Review
|
|
1543
|
+
|
|
1544
|
+
**Spec coverage:** Section 7 P1 구현 체크리스트 6개 항목 매핑
|
|
1545
|
+
|
|
1546
|
+
1. Round 1-N pseudocode를 queue pruning으로 수정 → Task 2
|
|
1547
|
+
2. `contested`를 중간 상태로 쓰지 않음 → Task 1
|
|
1548
|
+
3. convergence state artifact 신규 필드 8개 → Task 4 + Tasks 9~11(fixture로 검증) + Task 12(contract 강제)
|
|
1549
|
+
4. report-writer가 round history와 final classification counts를 반영 → Tasks 6, 7, 8
|
|
1550
|
+
5. 단순 early convergence / mixed unresolved fixture + contract test → Tasks 9~12
|
|
1551
|
+
6. token usage collector로 dispatch/token/wall-clock 전후 기록 → Task 13
|
|
1552
|
+
|
|
1553
|
+
Section 5 P0 항목 매핑
|
|
1554
|
+
|
|
1555
|
+
- 문서가 task-type lifecycle과 lead 운영 단계를 혼동하지 않게 함 → Task 1 Step 2 ("Scope and Terminology" 블록)
|
|
1556
|
+
- convergence state 신규 필드 baseline 명시 → Task 4
|
|
1557
|
+
|
|
1558
|
+
**Placeholder scan:** 모든 step에 실제 markdown/JSON/Python 코드가 포함되어 있고 TBD/TODO/"add appropriate error handling" 류 표현 없음.
|
|
1559
|
+
|
|
1560
|
+
**Type consistency:**
|
|
1561
|
+
- `effectiveMaxRounds` (Tasks 4, 5, 9~12, 13): integer, 1..3, `config.` 하위 — 일관.
|
|
1562
|
+
- `round2SkippedReason` (Tasks 2, 4, 5, 7, 8, 9~12): top-level string enum `queue-empty | max-rounds-1 | all-reverify-non-result | not-skipped` — 일관.
|
|
1563
|
+
- `finalClassificationCounts` (Tasks 4, 7, 8, 9~12, 13): keys `fullConsensus | partialConsensus | contested | workerUnique` — 일관.
|
|
1564
|
+
- `roundHistory[].dispatches[]` shape `{worker, status, durationMs}` (Tasks 4, 9~12, 13) — 일관.
|
|
1565
|
+
- `roundHistory[].skippedWorkers[]` shape `{worker, reason}` 또는 dispatch 실패 시 `{worker, reason, terminalStatus}` (Tasks 4, 11, 12) — 일관 (terminalStatus는 optional).
|
|
1566
|
+
- Verdict enum `agree | disagree | supplement | verification-error` (Tasks 3, 11, 12) — 일관.
|
|
1567
|
+
|
|
1568
|
+
**Execution order constraint:** Task 12(contract test)는 Tasks 9~11(fixture 생성) 이후에 실행되어야 한다. Task 13(stats helper)은 Tasks 9~11에 의존(fixture 사용)하므로 9~11 이후. 그 외는 독립적이며 임의 순서 가능.
|