okstra 0.51.0 → 0.53.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.kr.md +1 -1
- package/README.md +1 -1
- package/docs/kr/architecture.md +1 -0
- package/docs/kr/cli.md +2 -1
- package/docs/superpowers/plans/2026-06-06-final-verification-whole-task-gate.md +993 -0
- package/docs/superpowers/plans/2026-06-06-stage-parallel-and-pending-fixes.md +93 -0
- package/docs/superpowers/plans/2026-06-06-stage-worktree-isolation-p1.md +447 -0
- package/docs/superpowers/plans/2026-06-06-stage-worktree-isolation-p2.md +289 -0
- package/docs/superpowers/plans/2026-06-06-stage-worktree-isolation-p3.md +774 -0
- package/docs/superpowers/plans/2026-06-06-stage-worktree-isolation-p4.md +303 -0
- package/docs/superpowers/plans/2026-06-06-stage-worktree-isolation-p5-multidep-base.md +387 -0
- package/docs/superpowers/specs/2026-06-06-final-verification-whole-task-gate-design.md +126 -0
- package/docs/superpowers/specs/2026-06-06-stage-worktree-isolation-design.md +180 -0
- package/docs/superpowers/specs/2026-06-06-vertical-slice-tdd-planning-design.md +179 -0
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/workers/report-writer-worker.md +1 -0
- package/runtime/bin/lib/okstra/cli.sh +5 -1
- package/runtime/bin/okstra.sh +1 -0
- package/runtime/prompts/launch.template.md +1 -0
- package/runtime/prompts/profiles/_implementation-deliverable.md +1 -1
- package/runtime/prompts/profiles/_implementation-executor.md +16 -9
- package/runtime/prompts/profiles/_implementation-verifier.md +4 -1
- package/runtime/prompts/profiles/final-verification.md +7 -7
- package/runtime/prompts/profiles/implementation-planning.md +14 -7
- package/runtime/prompts/wizard/prompts.ko.json +3 -2
- package/runtime/python/okstra_ctl/analysis_packet.py +14 -2
- package/runtime/python/okstra_ctl/render.py +3 -0
- package/runtime/python/okstra_ctl/run.py +541 -41
- package/runtime/python/okstra_ctl/wizard.py +25 -7
- package/runtime/python/okstra_ctl/worktree.py +126 -9
- package/runtime/python/okstra_ctl/worktree_registry.py +88 -17
- package/runtime/schemas/final-report-v1.0.schema.json +36 -0
- package/runtime/skills/okstra-convergence/SKILL.md +14 -3
- package/runtime/skills/okstra-memory/SKILL.md +28 -5
- package/runtime/skills/okstra-run/SKILL.md +1 -1
- package/runtime/templates/reports/final-report.template.md +12 -0
- package/runtime/templates/reports/final-verification-input.template.md +8 -5
- package/runtime/templates/reports/i18n/en.json +3 -1
- package/runtime/templates/reports/i18n/ko.json +3 -1
- package/runtime/validators/validate-implementation-plan-stages.py +57 -11
- package/runtime/validators/validate-run.py +143 -1
- package/runtime/validators/validate-workflow.sh +6 -1
- package/src/memory.mjs +50 -11
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
# stage별 worktree 격리 + 동시 병렬 실행 — 설계
|
|
2
|
+
|
|
3
|
+
- 작성일: 2026-06-06
|
|
4
|
+
- 범위: `implementation` phase에서 stage를 **별도 git worktree로 격리**해, 사용자가 수동으로 동시에 띄운 여러 `implementation` run이 안전하게 서로 다른 stage를 진행하도록 한다. `started-exclusion`(A2)을 같은 설계에 통합한다.
|
|
5
|
+
- 비범위
|
|
6
|
+
- **자동 fan-out 없음** — okstra가 ready stage들을 여러 프로세스로 자동 분기하지 않는다. 병렬 트리거는 사용자가 stage별 run을 각각 기동하는 **수동 동시**만 지원한다.
|
|
7
|
+
- **okstra 자동 머지 없음** — stage 브랜치 합류는 사용자 수동 머지(또는 release-handoff 수집)다.
|
|
8
|
+
- `implementation` 외 phase(`requirements-discovery` / `error-analysis` / `implementation-planning` / `final-verification` / `release-handoff`)의 worktree 모델은 불변 — 기존 task-key worktree 1개 유지.
|
|
9
|
+
- ADR↔gitignore(C1)는 별도 plan. 다국어/i18n.
|
|
10
|
+
- 관계: [`2026-05-20-implementation-planning-multi-stage-design.md`](2026-05-20-implementation-planning-multi-stage-design.md)의 stage 개념·carry-in 모델 위에 선다. [`2026-06-04-stage-run-batching.md`](../plans/2026-06-04-stage-run-batching.md)가 known gap으로 남긴 **started-exclusion 미구현**을 본 설계가 해소한다. [`2026-06-04-stage-splitting-cost-aware-design.md`](2026-06-04-stage-splitting-cost-aware-design.md)의 "병렬=부수효과, run batch=비용 단위" 원칙과 양립한다(본 설계는 부수효과인 병렬을 **안전하게** 만들 뿐, 분할 기준을 바꾸지 않는다).
|
|
11
|
+
|
|
12
|
+
## 1. 동기
|
|
13
|
+
|
|
14
|
+
`fontradar-v2-api:dev-9184`에서 사용자가 stage2/stage3를 두 `implementation` run으로 동시에 띄웠고, 두 run이 **같은 task-key worktree·브랜치를 공유**해 커밋이 한 브랜치에 인터리브됐다. 근본 원인 두 가지:
|
|
15
|
+
|
|
16
|
+
1. **worktree가 task-key 단위 1개**다. [`compute_worktree_path`](../../../scripts/okstra_ctl/worktree.py:489)·[`compute_branch_name`](../../../scripts/okstra_ctl/worktree.py:511)에 run-seq도 stage도 없고, 모든 phase가 같은 worktree를 재사용한다([worktree.py:5-9](../../../scripts/okstra_ctl/worktree.py:5)). 같은 task의 두 stage run은 같은 디렉토리에서 파일을 편집하고 같은 브랜치에 커밋한다.
|
|
17
|
+
2. **started-exclusion 미구현**이다. [`_resolve_effective_stages`](../../../scripts/okstra_ctl/run.py:264)가 `consumers.jsonl`의 `done`만 보고 `started`를 무시해, stage가 `started`(미완)인데 다른 run이 또 `auto`로 같은 ready-set을 잡는다(known gap: [stage-run-batching.md:13](../plans/2026-06-04-stage-run-batching.md), `tests/test_e2e_multi_stage_q1_q9.py::test_q7`가 현 동작을 박제).
|
|
18
|
+
|
|
19
|
+
사용자 욕구는 "진짜 동시 병렬"이다. 본 설계는 stage를 worktree로 격리하고 점유를 원자화해 이를 안전하게 만든다.
|
|
20
|
+
|
|
21
|
+
## 2. 핵심 원칙
|
|
22
|
+
|
|
23
|
+
### 2.1 격리 모델 — `implementation`만 stage worktree
|
|
24
|
+
|
|
25
|
+
stage 개념은 `implementation`에만 존재한다. 따라서 stage worktree 격리도 `implementation` phase에서만 일어난다.
|
|
26
|
+
|
|
27
|
+
- `requirements-discovery`~`implementation-planning`: 기존 task-key worktree 1개 그대로(분기 없는 단일 흐름).
|
|
28
|
+
- `implementation`: 각 run이 자신이 맡은 stage(또는 ready batch)를 **전용 stage worktree**에서 실행한다.
|
|
29
|
+
|
|
30
|
+
**비-git / nested-worktree degradation:** 주변 흐름의 [`provision_task_worktree`](../../../scripts/okstra_ctl/worktree.py)가 `project_root`가 git repo가 아니거나(`skipped-not-git`) 이미 다른 worktree 안인 경우(`skipped-in-worktree`) 일반 phase는 graceful degrade한다(task worktree 미발급, `EXECUTOR_WORKTREE_PATH=project_root`). `implementation` stage 격리도 같은 신호를 따라 degrade한다 — stage worktree 발급을 건너뛰고 ctx의 `EXECUTOR_WORKTREE_*`를 그대로 두며, `consumers.jsonl`에 선택된 stage의 `started` 행만 기록한다. (이 경로는 render-only 테스트·non-git 프로젝트 fallback용. 실제 stage 격리·동시 병렬은 git project에서만 작동.)
|
|
31
|
+
|
|
32
|
+
### 2.2 base 결정 — carry-in의 git-native 구현
|
|
33
|
+
|
|
34
|
+
| stage 종류 | worktree base | 동시성 |
|
|
35
|
+
|---|---|---|
|
|
36
|
+
| **독립** (`depends-on (none)`) | **공통 base** = `implementation` 첫 진입 시점의 task-key worktree HEAD(=planning 종료 상태) | 서로 **동시 가능** |
|
|
37
|
+
| **단일 의존** (`depends-on X`) | 선행 stage X의 **done commit**(그 stage 브랜치 위에 적층) | X done 이후 시작 |
|
|
38
|
+
| **다중 의존** (`depends-on X,Y…`) | task-key worktree HEAD(candidate). 선행들의 `done.head_commit`이 **모두 candidate의 ancestor**일 때만(=사용자가 선행을 머지함) — §9 #1 | 선행 모두 done + 머지 이후 |
|
|
39
|
+
|
|
40
|
+
- 공통 base는 **첫 stage 진입 시 한 번 결정·고정**한다(아래 §3.2). dev-9184에서 stage들이 `8a18f99`(다른 task 머지가 반영된 main)에서 출발한 그 지점이다.
|
|
41
|
+
- **단일 의존** stage Y의 base는 선행 stage X의 done commit이다. X의 done은 이미 `consumers.jsonl`에 `head_commit`으로 기록된다(dev-9184 consumers의 `stage:1 done head_commit:b3971782…`가 그 증거). Y는 X 브랜치 위에 적층되므로 Y 머지 시 X도 따라온다.
|
|
42
|
+
- **다중 의존** stage는 선행들이 서로 다른 stage 브랜치에 흩어져 있어 단일 base commit이 자명하지 않다. 옵션 A 자동 감지(§9 #1): task-key worktree HEAD를 candidate로 두고, 선행 stage들의 `done.head_commit`이 모두 candidate의 ancestor면(`git merge-base --is-ancestor`) 사용자가 선행을 머지한 것으로 보고 candidate를 base로 발급한다. 하나라도 ancestor가 아니면 `PrepareError`로 "선행 stage 브랜치를 머지 후 재시도" 안내.
|
|
43
|
+
- 결론: **진짜 동시 실행은 독립 stage끼리만** 성립하고, dev-9184의 "의존 stage가 선행 변경 없는 base에서 시작" 문제가 구조적으로 사라진다.
|
|
44
|
+
|
|
45
|
+
### 2.3 점유 / 동시성 — registry flock 예약이 SSOT (A2 통합)
|
|
46
|
+
|
|
47
|
+
- worktree **registry를 stage-key 단위로 확장**하고 flock으로 **원자적 예약**한다([worktree_registry](../../../scripts/okstra_ctl/worktree.py:47) 경유).
|
|
48
|
+
- **ready 집합** = `depends-on`이 모두 `done` **AND** 자신이 `consumers.jsonl`에 `started`/`done` 행이 없음 **AND** registry에 active 예약 없음.
|
|
49
|
+
- 두 동시 run이 같은 stage를 잡으려 하면 **flock 예약이 직렬화**한다 — 한쪽만 예약에 성공하고, 다른쪽은 다음 ready stage를 잡거나 "잡을 stage 없음"으로 정상 종료.
|
|
50
|
+
- **이것이 점유의 SSOT**다. `consumers.jsonl`의 `started`는 기록·관찰용이고, 실제 동시성 backstop은 registry flock 예약이다. A2(started-exclusion)는 ready 집합 계산에 `started`/예약 제외를 더하는 것으로 충족된다.
|
|
51
|
+
- **한 run = 한 stage** (단일 stage 실행). [`cost-aware-design.md §2.3`](2026-06-04-stage-splitting-cost-aware-design.md)의 "ready-set batch"는 stage가 한 worktree·한 branch에서 순차 실행된다는 가정 위에 섰지만, 본 설계가 stage마다 격리 worktree·격리 branch를 요구하므로 batch는 의미를 잃는다(같은 branch에 두 stage-key를 reserve하려 하면 P1 registry의 branch-uniqueness 불변식 충돌). `_resolve_effective_stages`는 backward compat로 batch 리스트를 반환하지만, implementation 통합 경로는 **첫 ready stage 하나만** 실행한다. 사용자가 여러 stage를 동시에 진행하려면 별도 run을 띄우면 되고(이게 본 설계의 동시 병렬), 순차로 진행하려면 stage가 done 되는 대로 다음 run을 띄우면 된다 — 어느 쪽이든 cost 측면 등가다.
|
|
52
|
+
|
|
53
|
+
> **declaration ↔ enforcement** (okstra `CLAUDE.md` rule 3): 점유 규칙의 강제 지점은 **런타임 registry 예약**이다. `prepare_task_bundle`이 이미 active 예약된 stage-key를 또 잡으려 하면 `PrepareError`로 중단한다. 프로파일 문구(MUST)만으로 끝내지 않는다.
|
|
54
|
+
|
|
55
|
+
## 3. 데이터 모델
|
|
56
|
+
|
|
57
|
+
### 3.1 worktree 키 / 브랜치
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
경로: ~/.okstra/worktrees/<project>/<group>/<task-id>/stage-<N>/
|
|
61
|
+
브랜치: <work-category-prefix>-<task-id>-s<N>
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
- [`compute_worktree_path`](../../../scripts/okstra_ctl/worktree.py:489)에 optional `stage_number` 추가. None이면 기존 task-key 경로(다른 phase 호환).
|
|
65
|
+
- [`compute_branch_name`](../../../scripts/okstra_ctl/worktree.py:511)에 optional `stage_number` 추가. None이면 기존 `<prefix>-<task-id>`.
|
|
66
|
+
|
|
67
|
+
### 3.2 registry 엔트리 — task-key + stage-key 공존
|
|
68
|
+
|
|
69
|
+
기존 task-key 엔트리(다른 phase가 사용)는 유지하고, `implementation` stage마다 stage-key 엔트리를 추가한다. task-key 엔트리에 **공통 base 고정값**을 1회 기록한다.
|
|
70
|
+
|
|
71
|
+
```jsonc
|
|
72
|
+
"tasks": {
|
|
73
|
+
"<proj>/<group>/<task-id>": { // 기존 (다른 phase)
|
|
74
|
+
"branch": "<prefix>-<task-id>",
|
|
75
|
+
"implementationBaseCommit": "8a18f99…", // 신규: 첫 stage 진입 시 1회 고정
|
|
76
|
+
...
|
|
77
|
+
},
|
|
78
|
+
"<proj>/<group>/<task-id>#stage-2": { // 신규 (implementation stage)
|
|
79
|
+
"branch": "<prefix>-<task-id>-s2",
|
|
80
|
+
"worktree_path": ".../stage-2",
|
|
81
|
+
"base_ref": "8a18f99…", // 독립: 공통 base
|
|
82
|
+
"stage": 2, "status": "active"
|
|
83
|
+
},
|
|
84
|
+
"<proj>/<group>/<task-id>#stage-3": {
|
|
85
|
+
"branch": "<prefix>-<task-id>-s3",
|
|
86
|
+
"base_ref": "b3971782…", // 의존: 선행 stage done commit
|
|
87
|
+
"stage": 3, "status": "active"
|
|
88
|
+
}
|
|
89
|
+
}
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
- `implementationBaseCommit`는 첫 stage 예약 시 flock 안에서 한 번 써진다. 동시 첫 진입이라도 직렬화되어 둘째 run은 기록된 값을 읽는다(race 없음).
|
|
93
|
+
|
|
94
|
+
### 3.3 ready 집합 계산 (A2)
|
|
95
|
+
|
|
96
|
+
`_resolve_effective_stages`의 ready 판정을 확장한다(`requested == "auto"` 경로):
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
ready = [s for s in stages
|
|
100
|
+
if s.number not in done_stages
|
|
101
|
+
and s.number not in started_stages # ← A2 신규
|
|
102
|
+
and s.number not in reserved_stages # ← registry 예약 신규
|
|
103
|
+
and all(d in done_stages for d in s.depends_on)]
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
`--stage N` 명시 경로는 단일 stage를 직접 예약 시도하고, 이미 예약/done이면 `PrepareError`.
|
|
107
|
+
|
|
108
|
+
## 4. 흐름
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
implementation-planning done (task-key worktree HEAD = 공통 base 후보)
|
|
112
|
+
│
|
|
113
|
+
사용자가 동시에 두 run 기동: okstra-run --stage 2 │ okstra-run --stage 3
|
|
114
|
+
▼ ▼
|
|
115
|
+
prepare(stage 2) prepare(stage 3)
|
|
116
|
+
flock: flock:
|
|
117
|
+
- implementationBaseCommit 고정(1회) - (이미 고정됨, 읽기)
|
|
118
|
+
- ready/예약 확인 → stage-2 예약 - ready/예약 확인 → stage-3 예약
|
|
119
|
+
- base=공통(독립) - base=공통(독립; depends-on 동일 Stage1만)
|
|
120
|
+
- worktree add stage-2 @ base - worktree add stage-3 @ base
|
|
121
|
+
▼ ▼
|
|
122
|
+
executor(stage-2) → commit → consumers executor(stage-3) → commit → consumers
|
|
123
|
+
stage-2 done + carry/stage-2.json stage-3 done + carry/stage-3.json
|
|
124
|
+
▼ ▼
|
|
125
|
+
└──────── 사용자 수동 머지(의존성 순) ────────┘
|
|
126
|
+
release-handoff가 stage PR 목록 수집
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
의존 stage(예: Stage 4 `depends-on 2,3`)는 2·3가 done될 때까지 ready 아님 → 자동으로 직렬.
|
|
130
|
+
|
|
131
|
+
## 5. 변경 대상 파일 (seed)
|
|
132
|
+
|
|
133
|
+
| 파일 | 변경 |
|
|
134
|
+
|---|---|
|
|
135
|
+
| [`worktree.py`](../../../scripts/okstra_ctl/worktree.py:489) | `compute_worktree_path`/`compute_branch_name`에 optional `stage_number`; `provision_task_worktree`에 stage 인지 + base 계산(독립=공통 `implementationBaseCommit`, 의존=선행 done commit) |
|
|
136
|
+
| [`worktree_registry.py`](../../../scripts/okstra_ctl/worktree.py:47) | stage-key(`<task-key>#stage-<N>`) 예약/lookup; `implementationBaseCommit` 1회 고정(flock 내) |
|
|
137
|
+
| [`run.py`](../../../scripts/okstra_ctl/run.py:264) | `_resolve_effective_stages`에 `started`/예약 제외(A2); `_reserve_implementation_stages`([run.py:890](../../../scripts/okstra_ctl/run.py:890))를 stage worktree 발급에 연결; `--stage N` 중복 예약 시 `PrepareError` |
|
|
138
|
+
| [`_implementation-executor.md`](../../../prompts/profiles/_implementation-executor.md:30) | "owns a ready-set batch"를 stage worktree 컨텍스트로 — executor가 자신의 stage worktree에서만 작업, 다른 stage worktree 접근 금지 |
|
|
139
|
+
| [`okstra.sh`](../../../scripts/okstra.sh) · [`cli.sh`](../../../scripts/lib/okstra/cli.sh) | `--stage` 패스스루(B2) — 수동 동시 병렬의 CLI 진입점 |
|
|
140
|
+
| [`release-handoff.md`](../../../prompts/profiles/release-handoff.md) | stage 브랜치(`-s<N>`) PR 목록을 `consumers.jsonl`에서 수집 |
|
|
141
|
+
| `validators/` (`validate-run.py` 인접) | prepare 시 stage-key 중복 예약 거부(런타임 강제 지점 명시) |
|
|
142
|
+
|
|
143
|
+
seed 규칙(`feedback_okstra_fixes_target_end_users`): `runtime/`·개인 `.claude/`가 아니라 위 source 파일에만 가한다.
|
|
144
|
+
|
|
145
|
+
## 6. teardown / 정리
|
|
146
|
+
|
|
147
|
+
모든 stage done + 머지 후 stage worktree들을 정리한다. 기존 수동 절차(`git worktree remove <path>` → `git branch -D <branch>` → registry 키 삭제)를 **stage-key N개로 확장**한다. 자동 teardown은 본 설계 비범위(사용자 수동 또는 후속 spec).
|
|
148
|
+
|
|
149
|
+
## 7. 호환성
|
|
150
|
+
|
|
151
|
+
okstra는 pre-1.0(`feedback_pre_v1_no_compat`). 기존 단일-worktree `implementation` 흐름은 **stage worktree 흐름으로 대체**된다 — N=1(단일 stage) plan도 `stage-1` worktree를 발급받는다. compat shim 없음. 다른 phase는 영향 없음.
|
|
152
|
+
|
|
153
|
+
## 8. 검증 시나리오 (수동 QA)
|
|
154
|
+
|
|
155
|
+
| # | 시나리오 | 기대 |
|
|
156
|
+
|---|----------|------|
|
|
157
|
+
| W1 | 1-stage plan `implementation` | `stage-1` worktree 발급, PR 1개 |
|
|
158
|
+
| W2 | 독립 stage 2·3을 두 run 동시 `--stage 2` / `--stage 3` | 각자 stage worktree·브랜치, 공통 base, 충돌 없음, consumers 2줄 |
|
|
159
|
+
| W3 | W2를 둘 다 `auto`로 동시 | flock 직렬화 → 한 run이 2, 다른 run이 3 (중복 점유 없음) |
|
|
160
|
+
| W4 | 다중 의존 stage 4(`depends-on 2,3`)를 2·3 done 전에 `--stage 4` | ready 아님 → `PrepareError`(선행 done 대기) |
|
|
161
|
+
| W5 | 단일 의존 stage(`depends-on 2`)를 2 done 후 실행 | base = stage2 done commit, 2 브랜치 위 적층, carry-in 주입 |
|
|
162
|
+
| W6 | 다중 의존 stage 4를 2·3 done + 통합 후 실행 | base = §9 규칙(통합 commit), carry-in 주입 |
|
|
163
|
+
| W7 | 이미 active 예약된 stage를 또 `--stage N` | `PrepareError`(중복 예약 거부) — 런타임 강제 확인 |
|
|
164
|
+
| W8 | `implementation` 외 phase | 기존 task-key worktree 그대로(stage 분기 없음) |
|
|
165
|
+
|
|
166
|
+
## 9. 미해결 / writing-plans 전 확정
|
|
167
|
+
|
|
168
|
+
1. **다중 의존 stage base 전략** (§2.2):
|
|
169
|
+
- **(옵션 A, 잠정)** 선행 stage들이 사용자 수동 머지로 한 라인에 통합된 뒤, 그 통합 commit을 base로. → 다중 의존 stage는 선행 머지 대기(자연 직렬화). 단순하고 §5의 "수동 머지" 결정과 일관. 단점: 사용자가 머지를 늦추면 다중 의존 stage가 블록됨.
|
|
170
|
+
- **(옵션 B)** okstra가 선행 done commit들을 임시 octopus 머지한 base를 생성. → 사용자 머지 전에도 시작 가능하나, 머지 충돌 해소 책임이 okstra로 들어옴(자동 머지 비범위와 충돌).
|
|
171
|
+
- **확정: 옵션 A** (2026-06-06 사용자 승인). 다중 의존 stage는 선행 stage들이 사용자 머지로 통합된 뒤 ready가 된다. 옵션 B는 후속 spec 후보로 보류.
|
|
172
|
+
- **자동 감지 구현** (2026-06-06 P5): 다중 의존 stage(`len(depends_on) >= 2`)의 base 결정은:
|
|
173
|
+
1. candidate = task-key worktree HEAD(`git rev-parse HEAD` in `project_root` — 모든 phase 공유 worktree이며 사용자가 선행 stage 머지를 반영하는 곳).
|
|
174
|
+
2. 각 선행 stage의 `done.head_commit`(consumers.jsonl)을 수집. 하나라도 done 행/head_commit이 없으면 `PrepareError`(선행 미완).
|
|
175
|
+
3. 각 선행 done commit이 candidate의 **ancestor**인지 `git merge-base --is-ancestor <done> <candidate>`(returncode 0)로 검증.
|
|
176
|
+
- 모두 ancestor → 사용자가 선행을 머지함 → candidate를 base로 반환.
|
|
177
|
+
- 하나라도 아니면 → `PrepareError`로 "선행 stage 브랜치(`-s<X>`/`-s<Y>`)를 task worktree에 머지(또는 main 머지 후 worktree 갱신) 후 재시도" 안내.
|
|
178
|
+
- 사용자 워크플로우: stage X·Y가 done → 각 stage 브랜치를 task-key worktree(또는 main)에 머지 → task worktree HEAD가 X·Y done을 ancestor로 가짐 → 다중 의존 stage가 그 위에서 자동으로 base를 잡고 시작.
|
|
179
|
+
- 옵션 B(okstra octopus 임시 머지)는 여전히 보류.
|
|
180
|
+
2. **stage worktree 자동 teardown** — 현재 수동(§6). 후속 spec 후보.
|
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
# implementation-planning 수직 슬라이스 + RED→GREEN 강제 설계
|
|
2
|
+
|
|
3
|
+
- 작성일: 2026-06-06
|
|
4
|
+
- 상태: Proposed (사용자 검토 대기)
|
|
5
|
+
- 대상 phase: `implementation-planning` (okstra-run Phase 5 계획 산출)
|
|
6
|
+
|
|
7
|
+
## 1. 배경 / 문제
|
|
8
|
+
|
|
9
|
+
PR 사이즈가 비대해지는 것을 막기 위해, implementation-planning 이 작업 계획을
|
|
10
|
+
세울 때 **PR 작업 단위를 수직 슬라이스(기능 단위 end-to-end)로 끊고**, 각 슬라이스를
|
|
11
|
+
**TDD RED→GREEN** 흐름으로 구성하도록 강제하고 싶다.
|
|
12
|
+
|
|
13
|
+
현재 상태(이미 존재하는 인프라):
|
|
14
|
+
|
|
15
|
+
- 계획은 작업을 **Stage** 로 분할하며 Stage 당 effective step ≤ 6 캡이 걸려 있다.
|
|
16
|
+
([implementation-planning.md:69](../../../prompts/profiles/implementation-planning.md:69))
|
|
17
|
+
- implementation 실행 시 **run 당 PR 1개**(`One-PR-per-run`)가 원칙이라
|
|
18
|
+
Stage = PR 단위 구조가 이미 성립한다.
|
|
19
|
+
([_implementation-executor.md:44](../../../prompts/profiles/_implementation-executor.md:44))
|
|
20
|
+
- executor 의 **Mandatory TDD loop** 는 이미 강제: 실패 테스트 → `test(...)` 커밋 →
|
|
21
|
+
최소 구현 → `feat|fix(...)` 커밋 → refactor.
|
|
22
|
+
([_implementation-executor.md:25](../../../prompts/profiles/_implementation-executor.md:25))
|
|
23
|
+
|
|
24
|
+
격차(gap) 세 가지:
|
|
25
|
+
|
|
26
|
+
1. Stage 분할 앵커가 **"함께 바뀌는 파일 근접도(cohesion / file proximity)"** 로
|
|
27
|
+
표현돼 있어 "독립 배포 가능한 사용자 가치 증분"이라는 수직 슬라이스 멘탈모델과
|
|
28
|
+
말이 다르다. ([implementation-planning.md:72](../../../prompts/profiles/implementation-planning.md:72))
|
|
29
|
+
2. 각 Stage 가 "이 Stage 가 전달하는 사용자 관찰 가능한 증분"을 선언하지 않아,
|
|
30
|
+
레이어 가로 절단(horizontal slice)인지 수직 슬라이스인지 구분이 불가능하다.
|
|
31
|
+
3. 계획 단계의 TDD 는 `prefer TDD ordering` 수준의 **권고**일 뿐 강제가 아니다.
|
|
32
|
+
([implementation-planning.md:69](../../../prompts/profiles/implementation-planning.md:69))
|
|
33
|
+
실행(executor)에서만 강제되어 계획-실행 간 정합이 비대칭이다.
|
|
34
|
+
|
|
35
|
+
## 2. 목표 / 비목표
|
|
36
|
+
|
|
37
|
+
### 목표
|
|
38
|
+
- Stage 분할 1차 앵커를 **수직 슬라이스(vertical slice)** 로 재정의한다.
|
|
39
|
+
- 각 Stage 가 `Slice value:` / `Acceptance:` 를 선언하게 한다.
|
|
40
|
+
- 계획 단계 `Stepwise Execution Order` 를 **RED→GREEN mandatory** 로 격상한다.
|
|
41
|
+
- 위 규칙을 **검증기(validator)로 강제**한다 — 선언과 강제를 일치시킨다(Rule #3).
|
|
42
|
+
|
|
43
|
+
### 비목표
|
|
44
|
+
- Stage Map 계약 골격(`## 5.5 Stage Map` + `## 5.5.<i> Stage <i>:` 4-subsection)
|
|
45
|
+
전면 개편은 하지 않는다(접근 A 기각). 골격은 보존하고 문구 + 검증만 추가한다.
|
|
46
|
+
- 리포트 템플릿의 평면 `### 5.5.4 Stepwise Execution Order` 렌더링 경로
|
|
47
|
+
([final-report.template.md:178](../../../templates/reports/final-report.template.md:178))는
|
|
48
|
+
이번 변경 대상이 아니다(멀티-Stage 본문은 report-writer 가 프로파일 가이던스로
|
|
49
|
+
직접 작성하며, `tests/fixtures/plans/valid_one_stage.md` 가 실제 산출 형태다).
|
|
50
|
+
- executor 의 TDD loop 동작 변경은 하지 않는다 — 이미 강제되어 있고, 계획의
|
|
51
|
+
RED/GREEN step 이 executor 의 `test(...)` / `feat|fix(...)` 커밋에 1:1 매핑된다.
|
|
52
|
+
- `One-PR-per-run` / parallel-safety(S9) 모델 변경 없음.
|
|
53
|
+
|
|
54
|
+
## 3. 채택 접근 — C (외과적 변경)
|
|
55
|
+
|
|
56
|
+
`Stage = PR = run` 모델을 그대로 두고, **프로파일 문구 3곳 + 검증기 S10 1개**만
|
|
57
|
+
바꾼다. 기존 아키텍처·검증기 골격을 보존하면서 두 의도(수직 슬라이스 + 명시적
|
|
58
|
+
RED→GREEN)를 채운다.
|
|
59
|
+
|
|
60
|
+
접근 A(검증기 전면 개편)와 B(응집도 위 추가 차원)는 변경 규모/이중화 때문에 기각.
|
|
61
|
+
|
|
62
|
+
## 4. 상세 설계
|
|
63
|
+
|
|
64
|
+
### 4.1 Delta 1 — 분할 앵커 재정의
|
|
65
|
+
|
|
66
|
+
[implementation-planning.md:72](../../../prompts/profiles/implementation-planning.md:72)
|
|
67
|
+
"Cohesion-first partition rule" → **"Vertical-slice-first partition rule"**:
|
|
68
|
+
|
|
69
|
+
- 1차 앵커 = **사용자 관찰 가능한 증분 1개를 end-to-end 로 전달하는 얇은 수직
|
|
70
|
+
슬라이스**. 한 Stage 는 레이어를 가로지르더라도 하나의 기능 증분을 완결한다.
|
|
71
|
+
- 파일 응집도("shared file/module proximity")는 폐기하지 않고 **슬라이스 내부에서
|
|
72
|
+
step 을 묶는 2차 기준**으로 강등한다.
|
|
73
|
+
- **레이어 가로 절단(horizontal layering) 금지** 문구를 명시한다 — 예: "DB 레이어만
|
|
74
|
+
한 Stage, 서비스 레이어만 다음 Stage" 식 분할은 거부.
|
|
75
|
+
- 기존 분할 트리거는 유지하되 (c)만 재문구화:
|
|
76
|
+
- (a) 실제 `depends-on` 데이터/계약 의존성이 있을 때
|
|
77
|
+
- (b) effective step 이 6을 초과할 때
|
|
78
|
+
- (c) **별개의 수직 슬라이스(서로 다른 사용자 가치 증분)일 때**
|
|
79
|
+
- "Maximising parallel stages is NOT a reason to split" 원칙은 유지.
|
|
80
|
+
|
|
81
|
+
### 4.2 Delta 2 — Stage 당 슬라이스 선언
|
|
82
|
+
|
|
83
|
+
[implementation-planning.md:67-74](../../../prompts/profiles/implementation-planning.md:67)
|
|
84
|
+
각 `## 5.5.<i> Stage <i>:` 섹션에 **필수 두 줄**을 추가한다(Carry-In 직전, 헤딩 바로 아래):
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
Slice value: <이 Stage 가 전달하는 사용자 관찰 가능한 증분 한 줄>
|
|
88
|
+
Acceptance: <관찰 가능한 통과 조건 또는 정확한 커맨드>
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
- `Slice value` 는 "무엇이 동작하게 되는가"를 사용자/소비자 관점으로 기술한다.
|
|
92
|
+
레이어 이름("repository 추가")이 아니라 증분("X 를 조회하면 Y 가 반환된다").
|
|
93
|
+
- `Acceptance` 는 그 슬라이스가 끝났음을 증명하는 관찰 가능 신호 — 보통 4.3 의
|
|
94
|
+
RED step 이 PASS 로 전환되는 테스트 커맨드와 동일하다.
|
|
95
|
+
|
|
96
|
+
### 4.3 Delta 3 — 계획 단계 RED→GREEN mandatory
|
|
97
|
+
|
|
98
|
+
[implementation-planning.md:69](../../../prompts/profiles/implementation-planning.md:69)
|
|
99
|
+
`### Stepwise Execution Order` 요구를 `prefer TDD ordering` → **MUST** 로 격상:
|
|
100
|
+
|
|
101
|
+
- 각 비면제 Stage 의 **첫 effective step 의 `action` 셀은 리터럴 `RED:` 로 시작**하고,
|
|
102
|
+
그 슬라이스의 acceptance 를 포착하는 **실패 테스트**를 기술한다(`expected` = FAIL).
|
|
103
|
+
- 이후 구현 step 중 최소 하나의 `action` 셀은 리터럴 `GREEN:` 로 시작하고 테스트를
|
|
104
|
+
통과시키는 최소 구현을 기술한다(`expected` = PASS).
|
|
105
|
+
- refactor step 은 선택(있으면 `REFACTOR:` 접두).
|
|
106
|
+
- **면제**: doc-only / config-only / 순수 rename 등 런타임 관찰 동작이 없는 Stage 는
|
|
107
|
+
섹션에 한 줄 `TDD exemption: <사유>` 를 두고 RED/GREEN 을 생략할 수 있다(executor
|
|
108
|
+
의 동일 면제 규칙 [_implementation-executor.md:27](../../../prompts/profiles/_implementation-executor.md:27)과 정합).
|
|
109
|
+
|
|
110
|
+
리터럴 토큰(`RED:` / `GREEN:` / `REFACTOR:` / `TDD exemption:`)을 쓰는 이유: 검증기가
|
|
111
|
+
"실패 테스트인지 단어로 추론"하는 brittle 방식 대신 substring 으로 확정 검사하기
|
|
112
|
+
위함 — 기존 §"Section heading contract"의 리터럴-substring 철학과 일관
|
|
113
|
+
([implementation-planning.md:54](../../../prompts/profiles/implementation-planning.md:54)).
|
|
114
|
+
|
|
115
|
+
### 4.4 Delta 4 — 검증기 S10 (강제)
|
|
116
|
+
|
|
117
|
+
[validators/validate-implementation-plan-stages.py](../../../validators/validate-implementation-plan-stages.py)
|
|
118
|
+
에 `_check_slice_tdd()` 를 추가하고 `collect_validation_errors()` 에 연결한다.
|
|
119
|
+
각 Stage 섹션(`_slice_stage_section` 으로 추출)마다:
|
|
120
|
+
|
|
121
|
+
- **S10a**: `Slice value:` 라인이 있고 콜론 뒤 값이 비어있지 않음.
|
|
122
|
+
- **S10b**: `Acceptance:` 라인이 있고 콜론 뒤 값이 비어있지 않음.
|
|
123
|
+
- **S10c (TDD ordering)**: 다음 둘 중 하나를 만족.
|
|
124
|
+
- (i) `Stepwise Execution Order` 의 **첫 effective row 의 action 셀이 `RED:` 로
|
|
125
|
+
시작** AND 같은 표의 어떤 row 의 action 셀이 `GREEN:` 로 시작, **또는**
|
|
126
|
+
- (ii) 섹션에 `TDD exemption:` 라인이 존재.
|
|
127
|
+
|
|
128
|
+
구현 메모:
|
|
129
|
+
- 첫 effective row 의 action 셀 추출은 기존 `_count_effective_steps` 의 셀 파싱 로직
|
|
130
|
+
(header/divider skip, `strip("|").split("|")`)을 재사용해 새 헬퍼로 분리한다 — 표 컬럼
|
|
131
|
+
순서는 `step | action | files | command | expected` 이므로 action 은 index 1.
|
|
132
|
+
- S10 은 S1(Stage Map 부재) 단락 시 실행되지 않으며, stage 파싱 성공 시
|
|
133
|
+
`_check_each_stage_section` 과 같은 레벨에서 호출한다.
|
|
134
|
+
- 에러 코드 `S10`, stage 번호 포함, 메시지는 누락 항목 명시.
|
|
135
|
+
|
|
136
|
+
### 4.5 정합성 — executor / convergence
|
|
137
|
+
|
|
138
|
+
- executor 의 per-step TDD loop 은 계획의 RED/GREEN step 을 그대로 실행하면 되므로
|
|
139
|
+
[_implementation-executor.md](../../../prompts/profiles/_implementation-executor.md) 본문 변경 없음.
|
|
140
|
+
단, 계획이 이미 RED/GREEN 을 명시하므로 executor 가 "계획의 RED step = 첫 실패
|
|
141
|
+
테스트"로 읽도록 한 줄 정합 코멘트만 선택적으로 추가 가능(필수 아님).
|
|
142
|
+
- §5.5.9 Plan Body Verification / self-review pass 의 "Stage Map self-check"
|
|
143
|
+
([implementation-planning.md:102](../../../prompts/profiles/implementation-planning.md:102))에
|
|
144
|
+
Slice value/Acceptance/RED-GREEN 확인 항목을 한 줄 추가해 사람-검토와 기계-검토를
|
|
145
|
+
이중화한다.
|
|
146
|
+
|
|
147
|
+
## 5. 영향 / 마이그레이션
|
|
148
|
+
|
|
149
|
+
### 5.1 테스트 픽스처 (BLOCKING)
|
|
150
|
+
S10 추가로 기존 valid 픽스처가 새로 실패하므로 **반드시 갱신**한다:
|
|
151
|
+
- [tests/fixtures/plans/valid_one_stage.md](../../../tests/fixtures/plans/valid_one_stage.md):
|
|
152
|
+
`Slice value:` / `Acceptance:` 두 줄 추가, action 셀을 `RED: ...` / `GREEN: ...` 로 수정.
|
|
153
|
+
- [tests/fixtures/plans/valid_three_stage_parallel.md](../../../tests/fixtures/plans/valid_three_stage_parallel.md):
|
|
154
|
+
동일 갱신(3 stage 모두).
|
|
155
|
+
- 신규 invalid 픽스처 추가: `invalid_missing_slice_value.md`, `invalid_missing_red_step.md`.
|
|
156
|
+
|
|
157
|
+
### 5.2 테스트 코드
|
|
158
|
+
- [tests/test_validate_implementation_plan_stages.py](../../../tests/test_validate_implementation_plan_stages.py)
|
|
159
|
+
에 S10a/S10b/S10c 통과·실패 케이스, `TDD exemption` 면제 통과 케이스 추가.
|
|
160
|
+
- `tests/test_render_final_report.py` / 골든 리포트가 stage 본문을 포함한다면 동반 갱신.
|
|
161
|
+
|
|
162
|
+
### 5.3 빌드 / 동기화
|
|
163
|
+
- 소스(`prompts/`, `validators/`) 수정 후 `npm run build` 로 `runtime/` 동기화.
|
|
164
|
+
- `runtime/` 직접 수정 금지.
|
|
165
|
+
|
|
166
|
+
### 5.4 사전 확인 (가정 금지)
|
|
167
|
+
- 구현 착수 시 `report-writer` 가 stage 본문을 실제로 어디서 emit 하는지 1회 실측
|
|
168
|
+
확인한다(프로파일 가이던스 vs 템플릿). `valid_one_stage.md` 픽스처가 멀티-Stage
|
|
169
|
+
구조를 보이므로 프로파일 가이던스 경로가 유력하나, 코드로 확인 후 진행한다.
|
|
170
|
+
|
|
171
|
+
## 6. 수용 기준 (이 설계의 done 조건)
|
|
172
|
+
|
|
173
|
+
1. 프로파일 분할 앵커가 vertical-slice-first 로 재정의되고 horizontal 금지 명시.
|
|
174
|
+
2. 각 Stage 가 `Slice value:` / `Acceptance:` 를 선언하도록 프로파일이 요구.
|
|
175
|
+
3. 계획 `Stepwise Execution Order` 가 `RED:` 첫 step + `GREEN:` 을 MUST 로 요구.
|
|
176
|
+
4. `validate-implementation-plan-stages.py` S10 이 위 1–3 의 산출물을 강제하고
|
|
177
|
+
`python3 -m pytest tests/test_validate_implementation_plan_stages.py` 가 통과.
|
|
178
|
+
5. 기존 valid 픽스처 갱신으로 전체 stages 검증 스위트 green.
|
|
179
|
+
6. `npm run build` 후 `runtime/` 의 프로파일·검증기가 소스와 일치.
|
package/package.json
CHANGED
package/runtime/BUILD.json
CHANGED
|
@@ -100,6 +100,7 @@ Rules (the schema enforces most of these — they are listed here so you know *w
|
|
|
100
100
|
- If evidence is missing, write `"I don't know"` in the relevant statement field rather than fabricating confidence.
|
|
101
101
|
- Cite file paths and line numbers in every `evidence.primary[].source` / `consensus[].evidence` cell.
|
|
102
102
|
- Preserve every analysis worker's ticket tagging — every row's `ticketId` field carries the ticket key or the task-fallback. For single-ticket runs, set `ticketCoverage` to `{"singleTicket": "<ticket>"}`. For runs that do not require ticket tagging (`release-handoff`, `final-verification`), set `ticketCoverage` to `{"omit": true}`.
|
|
103
|
+
- For `implementation-planning`, populate `implementationPlanning.requirementCoverage` with one row per concrete requirement from the brief / packet, using IDs `R-001`, `R-002`, ... in source order. `coveredBy` MUST name the specific Option Candidate plus Stage/Step that satisfies the requirement. Use `status: "covered"` only when the report's plan actually covers it; otherwise use `gap` or `blocked C-NNN` and ensure the corresponding `Clarification Items` row blocks approval. Do not collapse this into `ticketCoverage`; ticket coverage is not requirement coverage.
|
|
103
104
|
- When the `Task Type` is `improvement-discovery`, populate `## 5.9 Improvement Candidates` with the 10-column schema enforced by `validators/validate-improvement-report.py`. Source the row IDs (`I-NNN`), lens whitelist, and Source workers patterns from `scripts/okstra_ctl/improvement_lenses.py` — do NOT introduce new lens names or worker prefixes.
|
|
104
105
|
|
|
105
106
|
Write the data.json with your `Write` tool against the absolute `Result Path`. Then invoke the renderer (`Bash`): `python3 scripts/okstra-render-final-report.py <data.json path>`. Confirm both files exist and respond with a short status line: `data.json written to <abs path>; markdown rendered to <abs path>. Sections populated: <count>.`
|
|
@@ -95,6 +95,10 @@ while [[ $# -gt 0 ]]; do
|
|
|
95
95
|
BASE_REF="$(require_option_value --base-ref "${2-}")"
|
|
96
96
|
shift 2
|
|
97
97
|
;;
|
|
98
|
+
--stage)
|
|
99
|
+
STAGE="$(require_option_value --stage "${2-}")"
|
|
100
|
+
shift 2
|
|
101
|
+
;;
|
|
98
102
|
--task-type)
|
|
99
103
|
TASK_TYPE="$(require_option_value --task-type "${2-}")"
|
|
100
104
|
shift 2
|
|
@@ -185,7 +189,7 @@ while [[ $# -gt 0 ]]; do
|
|
|
185
189
|
printf ' hint: did you mean --task-id?\n' >&2
|
|
186
190
|
;;
|
|
187
191
|
esac
|
|
188
|
-
printf ' valid options: --render-only --resume-clarification --yes --workers --lead-model --claude-model --codex-model --gemini-model --report-writer-model --related-tasks --task-type --project-id --project-root --task-group --task-id --task-brief --directive --clarification-response --approved-plan --approve --implementation-option --no-plan-verification -h|--help\n' >&2
|
|
192
|
+
printf ' valid options: --render-only --resume-clarification --yes --workers --lead-model --claude-model --codex-model --gemini-model --report-writer-model --related-tasks --task-type --project-id --project-root --task-group --task-id --task-brief --directive --clarification-response --approved-plan --approve --implementation-option --stage --no-plan-verification -h|--help\n' >&2
|
|
189
193
|
usage
|
|
190
194
|
exit 1
|
|
191
195
|
;;
|
package/runtime/bin/okstra.sh
CHANGED
|
@@ -122,6 +122,7 @@ PY_ARGS=(
|
|
|
122
122
|
[[ -n "${CLARIFICATION_RESPONSE_PATH-}" ]] && PY_ARGS+=(--clarification-response "$CLARIFICATION_RESPONSE_PATH")
|
|
123
123
|
[[ -n "${WORK_CATEGORY-}" ]] && PY_ARGS+=(--work-category "$WORK_CATEGORY")
|
|
124
124
|
[[ -n "${BASE_REF-}" ]] && PY_ARGS+=(--base-ref "$BASE_REF")
|
|
125
|
+
[[ -n "${STAGE-}" ]] && PY_ARGS+=(--stage "$STAGE")
|
|
125
126
|
[[ "$RENDER_ONLY" == "true" ]] && PY_ARGS+=(--render-only)
|
|
126
127
|
[[ "$PLAN_VERIFICATION_ENABLED" == "false" ]] && PY_ARGS+=(--no-plan-verification)
|
|
127
128
|
|
|
@@ -16,6 +16,7 @@ Emit one `PROGRESS: <phase-id> <verb-phrase>` line as plain user-facing text at
|
|
|
16
16
|
{{PHASE_FORBIDDEN_ACTIONS}}
|
|
17
17
|
- This run executes `{{WORKFLOW_CURRENT_PHASE}}` only. Do not start `{{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` or any later phase inside this run, even if the user says "다음 단계 진행해" or similar.
|
|
18
18
|
{{STAGE_BATCH_DIRECTIVE}}
|
|
19
|
+
{{VERIFICATION_TARGET}}
|
|
19
20
|
- Phase advancement requires a new okstra invocation launched with `--task-type {{WORKFLOW_NEXT_RECOMMENDED_PHASE}}` after this run's final report is written and approved. The lead must not write source code, run builds/migrations/deployments, or otherwise produce artifacts of a different phase from inside this run.
|
|
20
21
|
- See `Lifecycle Phase Boundaries` in the okstra skill (`agents/SKILL.md`) for the canonical rules and the phase-transition checklist.
|
|
21
22
|
|
|
@@ -49,6 +49,6 @@ are collected and convergence finished. Phase 1-5 do not need it.
|
|
|
49
49
|
|
|
50
50
|
- Parse the executor's `### Stage Carry Evidence` JSON block. If absent or unparsable, end with status `contract-violated` and route to a follow-up `error-analysis`.
|
|
51
51
|
- For EACH stage in this run's batch: write its JSON verbatim to `runs/<impl-task-key>/carry/stage-<N>.json`. Refuse to overwrite an existing file (one stage = one sidecar; re-runs are out of scope for this version).
|
|
52
|
-
- For EACH stage in this run's batch: append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock.
|
|
52
|
+
- For EACH stage in this run's batch: append a `status:"done"` row to `runs/<plan-task-key>/consumers.jsonl` with `completed_at`, `carry_path`, `report_path` (this run's final-report path relative to the run root), and the SHA of HEAD. Use the okstra runtime's `consumers_mutex` helper (NOT a raw filesystem write) to honour the lock. `report_path` lets `final-verification` cite each stage's originating report when assembling its Source Implementation Report list.
|
|
53
53
|
- The verifier round, Phase 5.5 convergence, and this Phase 6 report run **once per run** over the batch's combined diff — NOT per stage. The single final report covers every batched stage, with a per-stage subsection.
|
|
54
54
|
- Quote every batched stage's new contents (each sidecar JSON in full, each new consumers row by itself) in the final report's `Stage sidecar evidence` deliverable section.
|
|
@@ -20,6 +20,7 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
|
|
|
20
20
|
|
|
21
21
|
- **Coding-conventions preflight (BLOCKING — runs before the first `Edit` / `Write`, and binds the TDD loop below):** load the applicable coding conventions for every language the diff will touch, then state in ONE line which conventions apply (e.g. `Applying TS + hexagonal overlay; domain at src/domains/*/domain/`). Lint/test green is necessary but NOT sufficient — self-mocked tests, interaction-only assertions, and untruthful names all pass a green pipeline; this gate is what keeps them out of the diff.
|
|
22
22
|
- **Language-specific rules load per situation — never inline them here.** Detect each touched file's language (extension / project manifest) and load the matching reference by reading okstra's installed coding-conventions files directly at `~/.claude/skills/okstra-coding-preflight/` (placed there by `okstra install`): read `languages/<lang>.md` (mock/spy API, idioms, test framework) + `clean-code.md` + any `architecture/*` overlay via the Read tool by absolute path. The skill is `user-invocable: false`, so do NOT rely on Skill-tool auto-invocation — read the files directly. For a ports-and-adapters / NestJS-hex layout (`domain/` + `ports/` + `adapters/`, `*.port.*`), load the hexagonal overlay too. This per-language split is the skill's job — the executor does not carry a multi-language block in context.
|
|
23
|
+
- **Project review rule packs:** also look for project-local review skills in `<PROJECT_ROOT>/skills/*review*`, `<PROJECT_ROOT>/.claude/skills/*review*`, and up to two parent directories' `skills/*review*/SKILL.md`. Read the relevant `SKILL.md` plus referenced `references/*.md` files and apply their rules during implementation. This is a prevention pass, not a PR-comment generation workflow: do not dispatch reviewer subagents from the executor. For Fonts Ninja-style PR review packs, the executor must avoid newly introduced duplicate helper stacks, tautological tests that merely re-call the delegated helper, self-mocking, domain rules in adapters/ports, domain objects outside `domain/`, dead APIs, weak public names, and functions that fail the plain-English read.
|
|
23
24
|
- **Language-agnostic principles that ALWAYS bind (the TDD loop below MUST satisfy them):** (1) no self-mocking of the SUT — stub/spy only injected collaborators, never the subject's own methods; (2) behavioral assertions on outcomes (return value, state, persisted rows, events, boundary calls) — never `toHaveBeenCalled*` on an internal helper as the only/primary assertion; (3) truthful names — a `get*` / `find*` that writes/inserts, or a name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*`), is a defect; (4) single-purpose functions ≤50 effective lines, plain-English readability.
|
|
24
25
|
- **Graceful degradation (codex / gemini executor runtimes, or any runtime where the `~/.claude/skills/okstra-coding-preflight/` files are absent or unreadable):** do NOT skip the gate — apply the agnostic principles above plus the project's own `CLAUDE.md` / `CONTRIBUTING` / formatter+lint config, and record `coding-conventions: skill-unavailable → applied <project rules + agnostic principles>` in the final report. Never claim a skill read that did not happen.
|
|
25
26
|
- **Mandatory TDD loop**: BEFORE the first `Edit` or `Write` call, the executor MUST apply a red-green-refactor loop for every code change in this run. This is required; skipping it is a `contract-violated` outcome. This governs HOW each step is executed (failing test first → minimal implementation → refactor); it does not override the approved plan's WHAT/file scope.
|
|
@@ -27,23 +28,29 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
|
|
|
27
28
|
- Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
|
|
28
29
|
- When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
|
|
29
30
|
- **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
|
|
30
|
-
- re-read the approved plan end-to-end and parse the `## 5.5 Stage Map`. Read the **Stage
|
|
31
|
-
-
|
|
32
|
-
-
|
|
31
|
+
- re-read the approved plan end-to-end and parse the `## 5.5 Stage Map`. Read the **Stage** injected in the launch prompt (`Stage for this implementation run`): the single stage number this run owns. The runtime already selected and reserved this stage (one run = one stage) — do NOT recompute the start stage from `consumers.jsonl`.
|
|
32
|
+
- load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(this stage)` and inject them into the executor's working context as "runtime carry-in". For a `depends-on (none)` stage, no sidecar load — task-brief only.
|
|
33
|
+
- this stage's `depends-on` are all already `status:done`. Its file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope.
|
|
33
34
|
- inspect the current state of every file the plan names; if any file has changed materially since the plan was written, stop and route to a new `implementation-planning` run instead of editing speculatively
|
|
34
35
|
- "materially changed" means: the function, class, section, or behaviour the plan targets has been edited, renamed, moved, removed, or otherwise altered in a way that invalidates the plan's reasoning. Cosmetic edits (whitespace, comment-only changes, unrelated function modifications elsewhere in the same file) do NOT trigger a re-plan; cite the diff (`git log --oneline <plan-created-at>..HEAD -- <file>`) in the final report and proceed.
|
|
35
36
|
- distinguish the two file-scope rules (they are not in conflict):
|
|
36
37
|
- **drift rule** (this section): if a file *named in the plan* has materially drifted, refuse to edit and route back to planning. This protects trust in the approved scope.
|
|
37
38
|
- **out-of-plan rule** (Allowed actions section below): if a step *requires touching a file NOT in the plan list*, that is permitted with `Out-of-plan edits` justification. This handles honest scope discovery during execution.
|
|
38
39
|
- confirm the test/build commands referenced in the plan still exist and run from a clean state
|
|
40
|
+
- **Pre-commit review-rule sweep (BLOCKING before the executor's final commit):** inspect the run diff (`git diff <stage-base>..HEAD`) against the loaded coding conventions and project review rule packs. Fix in-place before handing to verifiers when the issue is inside this stage's scope. Minimum sweep:
|
|
41
|
+
- no new byte-identical or semantically equivalent helper stack appears in two touched services; extract a shared helper/domain module unless the approved plan explicitly justified duplication,
|
|
42
|
+
- no test asserts equality to a direct re-invocation of the collaborator/helper being delegated to; keep literal-value or observable-state assertions,
|
|
43
|
+
- no public method/repository function introduced by this run is left with zero in-scope callers unless it is part of a declared interface contract,
|
|
44
|
+
- no exported/public name hides side effects or omits the entity it acts on,
|
|
45
|
+
- no newly introduced function requires a reader to mentally name several phases that should be helper calls.
|
|
39
46
|
|
|
40
|
-
## Stage execution contract (this run owns
|
|
47
|
+
## Stage execution contract (this run owns one stage)
|
|
41
48
|
|
|
42
|
-
- **Sidecar evidence writer (BLOCKING
|
|
43
|
-
- **Reverse link (BLOCKING
|
|
44
|
-
- **One-PR-per-run.** This run creates exactly one PR titled `
|
|
45
|
-
- `## Stage <N>` —
|
|
46
|
-
- `## Carry-In summary` —
|
|
49
|
+
- **Sidecar evidence writer (BLOCKING).** When this stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. The file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
|
|
50
|
+
- **Reverse link (BLOCKING).** The runtime already appended a `status:"started"` row for this stage before the run began. On completion, append a `status:"done"` row with `carry_path` populated for this stage number.
|
|
51
|
+
- **One-PR-per-run.** This run creates exactly one PR titled `Stage <N>: <title>`. The PR body MUST include:
|
|
52
|
+
- `## Stage <N>` — number, title (from Stage Map row), touched files, and validation result.
|
|
53
|
+
- `## Carry-In summary` — depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
|
|
47
54
|
- `## Previous run` / `## Next run` — links so a reviewer can navigate the run chain.
|
|
48
55
|
|
|
49
56
|
## Allowed actions during the run
|
|
@@ -67,12 +67,15 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
|
|
|
67
67
|
|
|
68
68
|
- **Scope (no silent sampling).** Enumerate every changed source/test file via `git diff --name-only <base>...HEAD` and review each one. Skipping a changed file silently is a `contract-violated` outcome. If a file's language has no reference and is not covered by the agnostic checks below, record `design-review skipped: <file> (language=<x> no reference)` — never pass it silently.
|
|
69
69
|
- **Load the same conventions the executor used, per language.** For each touched language load the coding-conventions reference by reading `~/.claude/skills/okstra-coding-preflight/languages/<lang>.md` + `clean-code.md` + the `architecture/hexagonal.md` overlay when the layout matches; degrade to the agnostic checks below when those files are not readable. The verifier does NOT inline language rules — it loads them per situation, identical to the executor preflight.
|
|
70
|
+
- **Load project review rule packs when present.** Search the project root, `.claude/skills`, and up to two parent `skills/` directories for `*review*/SKILL.md` rule packs. Read their referenced `references/*.md` files and apply them as an overlay on this static review. If a premium review skill exists, use its coverage philosophy (recall-first enumeration followed by verify-only confirmation) as the verifier's mental model, but do NOT dispatch extra reviewer agents unless the task explicitly configured them. Record `project-review-rules: <paths read>` or `project-review-rules: none found` in the worker result.
|
|
70
71
|
- **Blocking checks (any hit → verdict `FAIL`, cited `path:line` + rule name, recommended fix recorded — the verifier does NOT apply it):**
|
|
72
|
+
- **New duplication / DRY:** two or more newly added or meaningfully modified blocks implement the same helper stack, transform, or domain rule. Literal copy-paste is always blocking; semantically equivalent transforms across services are blocking unless the approved plan explicitly justified keeping them separate. Recommend the shared module location.
|
|
71
73
|
- **Self-mocking:** a test for `Foo` stubs/spies a method on the `Foo` instance under test (`jest.spyOn(sut, ...)`, `spyOn(FooService.prototype, ...)` in `foo.*.spec.*`, `vi.mocked(sut)` + stub). Mocking injected collaborators is fine.
|
|
72
74
|
- **Interaction-only assertion:** a test whose only/primary assertion is `toHaveBeenCalled*` / `toHaveBeenCalledTimes` on an internal helper or a non-side-effecting collaborator, with no assertion on the returned value / resulting state / persisted row / emitted event.
|
|
75
|
+
- **Tautological delegation assertion:** a test asserts the SUT result equals a direct call to the same pure helper/collaborator that the SUT delegates to, instead of asserting an independent literal value or observable state.
|
|
73
76
|
- **Untruthful name:** a read-named function (`get*` / `find*` / `load*`) that writes/inserts/mutates; an adapter or repository name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*` / `findActive*`).
|
|
74
77
|
- **Hexagonal (only when the overlay is loaded):** business logic inside a port body; an adapter method that is not pure I/O (post-fetch JS filtering on domain state, domain-rule evaluation); a domain object declared outside the `domain/` boundary.
|
|
75
|
-
- **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
|
|
78
|
+
- **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion, newly orphaned private/public code that is safe to remove but not on a critical path, or weak-but-not-misleading names. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
|
|
76
79
|
- **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
|
|
77
80
|
|
|
78
81
|
### DB / IO / SQL change — real-execution gate (mock-only acceptance forbidden)
|
|
@@ -22,20 +22,20 @@
|
|
|
22
22
|
- regression risk in adjacent code paths not directly changed
|
|
23
23
|
- documentation or rollout gaps
|
|
24
24
|
- production-specific failure modes not caught by tests (env/config drift across stages, secrets & permission/auth changes, migration ordering & rollback executability, observability gaps)
|
|
25
|
-
- Pre-verification entry gate (
|
|
26
|
-
- the
|
|
27
|
-
-
|
|
28
|
-
-
|
|
29
|
-
-
|
|
25
|
+
- Pre-verification entry gate (resolved & enforced by `okstra render-bundle` prep — the lead does NOT recompute it):
|
|
26
|
+
- the verification target (scope / worktree / base / head / stages / source reports / diff stat) is injected as the `VERIFICATION_TARGET` block. The lead MUST treat it as authoritative and MUST NOT re-pick a target from the brief.
|
|
27
|
+
- **whole-task scope** (`--stage auto`, default): prep has already verified every Stage Map stage is `status:done` in `consumers.jsonl`, every done stage's `head_commit` is an ancestor of the task worktree HEAD (all stage branches merged), and the worktree is clean outside `.okstra/`. If any check failed the run never started (PrepareError); a started whole-task run is therefore a fully-merged, clean target.
|
|
28
|
+
- **single-stage scope** (`--stage N`): prep verified stage N is `status:done` and its isolated stage worktree exists and is clean. Other stages' state is irrelevant. A single-stage run is a partial verification and MUST NOT recommend `release-handoff`.
|
|
29
|
+
- the lead still captures `git rev-parse HEAD` / `git status --short` from the injected worktree to confirm the analysis ran against the injected head; a mismatch is a `tool-failure`, not a silent proceed.
|
|
30
30
|
- Required deliverable shape (final report, in addition to the standard sections):
|
|
31
|
-
- **Source Implementation Report**:
|
|
31
|
+
- **Source Implementation Report(s)**: the `VERIFICATION_TARGET` snapshot verbatim — verification scope, worktree path, base/head SHAs, the list of stages under verification, and one row per stage citing its originating implementation final-report (`report_path` from `consumers.jsonl`; render `(report_path unrecorded)` when absent). The lead injects this same snapshot into every analyser prompt (`**Verification scope:** / **Worktree:** / **Verification base ref:** / **Verification head SHA:** / **Verification diff stat:**`); a worker that cannot confirm its analysis ran against that exact head MUST record a `tool-failure`.
|
|
32
32
|
- **Verdict vocabulary**: Section 7 (`Final Verdict`) MUST include a `Verdict Token` field whose value is exactly one of `accepted`, `conditional-accept`, or `blocked`. `conditional-accept` requires an explicit, exhaustive list of conditions; ambiguous verdicts ("looks good", "mostly ready") are not allowed. Each condition MUST be recorded as a row in the **Conditional Acceptance Conditions** deliverable (`id` `CA-NNN`, `condition`, `evidenceRequired`, `blocksReleaseHandoff`). The validator enforces verdict↔deliverable consistency: `accepted` ⇒ zero acceptance blockers, `blocked` ⇒ at least one, `conditional-accept` ⇒ at least one condition, and a `release-handoff` routing recommendation is allowed only when the verdict is `accepted`.
|
|
33
33
|
- **Acceptance Blockers block** (under section 4): one row per blocker with `id`, `severity` (`critical` / `major` / `minor`), evidence (file path, log excerpt, or test output), and the recommended follow-up phase (`error-analysis` or `implementation-planning`). Empty block is acceptable and preferred — render the single line `- No acceptance blockers found.`
|
|
34
34
|
- **Residual Risk block** (under section 4): risks that are not blockers but should be tracked, each with mitigation owner and a trigger that would escalate them to a blocker.
|
|
35
35
|
- **Validation Evidence**: for every requirement in the originating plan or task brief, cite the artifact (commit SHA, test output, log line, MCP SELECT result) that demonstrates coverage. Paraphrased "verified" claims without an artifact are rejected.
|
|
36
36
|
- **Read-only command log**: any pre-existing test/validation command executed during this run MUST be listed with its exact command line and exit code. No mutating commands may appear here.
|
|
37
37
|
- **Two-tier command lookup (shared with `implementation`):** when this phase performs its own independent re-validation, the command source is exactly the same two tiers `implementation` verifiers use — Tier 1 is the originating task brief / approved plan's `validation` set, Tier 2 is `<PROJECT_ROOT>/.okstra/project.json` under `qaCommands`. Auto-detecting tools from manifest files is forbidden; missing tiers are recorded as `qa-command not configured: <category>` and do NOT trigger a guess. The `cmd` deny-list (`--fix`, `--write`, ` -w`, ` -u`, `--snapshot-update`, `INSTA_UPDATE=<not-no>`, `cargo update`, `npm install` without `ci`, etc.) is enforced identically. NOTE: runtime fail-fast validation (`okstra_ctl.qa_commands.validate_qa_commands`) only fires at `--task-type implementation` run-prep, so this phase MUST self-check each `qaCommands` entry against the deny-list before executing it — if a denied token is present, skip the command and record it as a `Read-only command log` line `qa-command rejected (denied token: <token>): <label>`.
|
|
38
|
-
- **Routing recommendation**: the next safe phase — one of `release-handoff`, `done`, `error-analysis`, `implementation-planning` — tied to the verdict and blocker list. `release-handoff` is allowed ONLY when the Verdict Token is `accepted`.
|
|
38
|
+
- **Routing recommendation**: the next safe phase — one of `release-handoff`, `done`, `error-analysis`, `implementation-planning` — tied to the verdict and blocker list. `release-handoff` is allowed ONLY when the Verdict Token is `accepted`. `release-handoff` is additionally allowed ONLY when the verification scope (the `Verification scope:` line of the injected `VERIFICATION_TARGET` block, recorded as the report's `verificationScope` field) is `whole-task`; a `single-stage` run is partial and routes to `implementation` / `done` even on an `accepted` verdict.
|
|
39
39
|
- Clarification request policy (phase-specific addendum — shared policy is in `_common-contract.md`):
|
|
40
40
|
- populate `## 1. Clarification Items` only when a blocker hinges on information only the user can supply (deployment intent, intended target environment, business-rule interpretation); use `Blocks=next-phase` for items that gate continuing to release-handoff
|
|
41
41
|
- Self-review pass before finalising the report (`Claude lead` runs this; do not delegate to a generic subagent):
|