okstra 0.64.0 → 0.65.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/okstra +1 -0
- package/docs/kr/architecture.md +2 -0
- package/docs/kr/cli.md +11 -3
- package/docs/kr/performance-improvement-plan-v2.md +2 -1
- package/docs/project-structure-overview.md +1 -0
- package/docs/superpowers/plans/2026-06-10-p6-token-usage-incremental.md +1029 -0
- package/docs/superpowers/specs/2026-06-10-blocking-contract-posthoc-conformance-design.md +168 -0
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +3 -1
- package/runtime/agents/workers/claude-worker.md +1 -1
- package/runtime/agents/workers/codex-worker.md +1 -0
- package/runtime/agents/workers/gemini-worker.md +1 -0
- package/runtime/bin/lib/okstra/cli.sh +4 -0
- package/runtime/bin/lib/okstra/globals.sh +1 -0
- package/runtime/bin/lib/okstra/usage.sh +4 -1
- package/runtime/bin/okstra.sh +1 -0
- package/runtime/prompts/profiles/_implementation-executor.md +2 -0
- package/runtime/prompts/profiles/_implementation-verifier.md +1 -0
- package/runtime/python/okstra_ctl/clarification_items.py +96 -37
- package/runtime/python/okstra_ctl/context_cost.py +86 -8
- package/runtime/python/okstra_ctl/locks.py +32 -0
- package/runtime/python/okstra_ctl/migrate.py +45 -6
- package/runtime/python/okstra_ctl/pr_template.py +2 -7
- package/runtime/python/okstra_ctl/run.py +58 -44
- package/runtime/python/okstra_ctl/run_context.py +3 -8
- package/runtime/python/okstra_ctl/seeding.py +25 -18
- package/runtime/python/okstra_ctl/wizard.py +8 -10
- package/runtime/python/okstra_ctl/worktree.py +13 -0
- package/runtime/python/okstra_project/dirs.py +10 -1
- package/runtime/python/okstra_token_usage/claude.py +226 -61
- package/runtime/python/okstra_token_usage/cli.py +10 -1
- package/runtime/python/okstra_token_usage/collect.py +34 -27
- package/runtime/python/okstra_token_usage/cursor.py +93 -0
- package/runtime/python/okstra_token_usage/paths.py +29 -2
- package/runtime/skills/okstra-coding-preflight/clean-code.md +15 -0
- package/runtime/skills/okstra-inspect/SKILL.md +16 -11
- package/runtime/skills/okstra-run/templates/pr-body.template.md +13 -16
- package/runtime/validators/lib/fixtures.sh +73 -10
- package/runtime/validators/lib/runners.sh +4 -0
- package/runtime/validators/validate-run.py +53 -0
- package/runtime/validators/validate_session_conformance.py +430 -0
- package/src/migrate.mjs +31 -0
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
# BLOCKING 계약 3종 post-hoc conformance 검사 설계
|
|
2
|
+
|
|
3
|
+
- 작성일: 2026-06-10
|
|
4
|
+
- 상태: 구현 완료 (2026-06-10 승인 — `validators/validate_session_conformance.py` 로 구현, 실 run 검증 포함)
|
|
5
|
+
- 관련 글로벌 규칙: CLAUDE.md "선언과 강제를 구분하라" (규칙 #3) — 문서에 MUST/BLOCKING 을 쓸 때마다 "어디서 어떻게 검증되는가" 한 줄이 있어야 한다.
|
|
6
|
+
|
|
7
|
+
## 1. 배경과 동기
|
|
8
|
+
|
|
9
|
+
`agents/SKILL.md` 는 아래 3개 계약을 BLOCKING 으로 선언하지만, 현재 어떤 코드 경로(validator / 테스트 / 런타임 가드)도 위반을 실패로 만들지 않는다. 선언만 있고 강제가 없는 상태다.
|
|
10
|
+
|
|
11
|
+
| # | 계약 | 선언 위치 | 현재 강제 |
|
|
12
|
+
|---|------|----------|----------|
|
|
13
|
+
| 1 | lead 의 `PROGRESS: <phase-id>` 체크포인트 라인 12종 | [agents/SKILL.md:83-105](../../../agents/SKILL.md:83) | 없음 — `grep -rln "PROGRESS:" validators/ tests/ scripts/okstra_ctl/` → 0 hit (2026-06-10 확인) |
|
|
14
|
+
| 2 | claude-worker 의 audit 사이드카 5분 heartbeat (`- PROGRESS: <stage> <ISO-UTC>` append) | [agents/SKILL.md:384](../../../agents/SKILL.md:384), [agents/workers/claude-worker.md:65](../../../agents/workers/claude-worker.md:65) | 사이드카 **존재** 만 검사 ([validators/validate-run.py:974-981](../../../validators/validate-run.py:974)). cadence 는 미검사. 기존 heartbeat 테스트([tests/test_okstra_wrapper_status.py](../../../tests/test_okstra_wrapper_status.py))는 codex/gemini wrapper 의 `.status.json` 만 다룬다 |
|
|
15
|
+
| 3 | implementation sidecar entry guard — Phase 5/6 진입 전 `_implementation-executor/-verifier/-deliverable.md` Read 의무 | [agents/SKILL.md:176-188](../../../agents/SKILL.md:176) | 없음 — "lead session jsonl 에서 확인 가능" 이라고 선언만 하고 아무도 jsonl 을 보지 않는다 |
|
|
16
|
+
|
|
17
|
+
세 계약 모두 **lead 세션 jsonl 또는 run 산출물(디스크 파일)에 사후 증거가 남는다**. 따라서 Phase 7 에서 lead 가 반드시 실행하는 `validators/validate-run.py` 에 post-hoc 검사를 추가하는 것이 가장 자연스러운 강제 지점이다 — 위반 시 기존 메커니즘 그대로 `validation failed → run status = contract-violated` 로 떨어진다 ([validators/validate-run.py:256-258](../../../validators/validate-run.py:256)).
|
|
18
|
+
|
|
19
|
+
## 2. 범위 / 비범위
|
|
20
|
+
|
|
21
|
+
**범위**
|
|
22
|
+
- `validate-run.py` 실행 시점(Phase 7)에 검증 가능한 post-hoc 검사 3종.
|
|
23
|
+
- 검사 로직은 새 모듈 1개로 분리, `validate-run.py` `main()` 에서 호출.
|
|
24
|
+
- 단위 테스트 (합성 jsonl / 사이드카 fixture).
|
|
25
|
+
|
|
26
|
+
**비범위**
|
|
27
|
+
- 런타임(실시간) 강제 — 예: 5분 stale mtime 감시는 run 진행 중 lead 의 폴링 영역이며 post-hoc 으로 대체 불가. 이 설계는 "위반한 run 이 통과 판정을 받는 것"을 막는다.
|
|
28
|
+
- 악의적 위조 방어 — heartbeat timestamp 는 worker 자기 보고 값이다. post-hoc 검사는 **누락**을 잡지, 위조를 잡지 않는다.
|
|
29
|
+
- `PROGRESS: complete` / `phase-7-teardown` 라인 검사 — validator 실행 시점 **이후**에 출력되는 라인이므로 구조적으로 검사 불가 (아래 4.1 참고).
|
|
30
|
+
|
|
31
|
+
## 3. 설계 개요
|
|
32
|
+
|
|
33
|
+
### 3.1 모듈 배치
|
|
34
|
+
|
|
35
|
+
새 모듈: **`validators/validate_session_conformance.py`**
|
|
36
|
+
|
|
37
|
+
- 기존 phase-특화 validator 와 같은 패턴 — [validators/validate_fanout.py](../../../validators/validate_fanout.py), [validators/validate_improvement_report.py](../../../validators/validate_improvement_report.py) 처럼 `validate-run.py` 가 import 해 결과를 `failures` 리스트로 fold 한다 (접두 `session-conformance: `).
|
|
38
|
+
- 기존 `scripts/okstra_ctl/conformance.py` 는 **stage QA conformance** (별개 개념, [docs/superpowers/specs/2026-06-07-stage-conformance-qa-design.md](2026-06-07-stage-conformance-qa-design.md)) 이므로 이름 충돌을 피해 `session_conformance` 로 명명한다.
|
|
39
|
+
|
|
40
|
+
### 3.2 공유 인프라 — 단일 참조점 재사용
|
|
41
|
+
|
|
42
|
+
lead 세션 jsonl 탐색·파싱은 이미 `okstra_token_usage` 패키지에 구현돼 있다. 중복 구현하지 않고 그대로 쓴다.
|
|
43
|
+
|
|
44
|
+
| 기능 | 재사용 지점 |
|
|
45
|
+
|------|------------|
|
|
46
|
+
| 프로젝트별 jsonl 디렉터리 (`~/.claude/projects/<encoded-cwd>/`) | [scripts/okstra_token_usage/paths.py:19](../../../scripts/okstra_token_usage/paths.py:19) `claude_project_dir` |
|
|
47
|
+
| lead sessionId → jsonl 매핑 (+ teamName 태그 스캔 폴백) | [scripts/okstra_token_usage/claude.py:100](../../../scripts/okstra_token_usage/claude.py:100) `find_claude_team_sessions` |
|
|
48
|
+
| jsonl 레코드 iterator | `scripts/okstra_token_usage/jsonl_io.py` `iter_jsonl` |
|
|
49
|
+
| run 시간 윈도우 (in-session lead 의 세션 전체 jsonl 에서 이번 run 만 스코핑) | [scripts/okstra_token_usage/collect.py:121-137](../../../scripts/okstra_token_usage/collect.py:121) `_resolve_run_window` — **public `resolve_run_window` 로 승격** 하고 `collect()` 와 새 모듈이 함께 사용 (pre-1.0 이므로 호환 shim 없이 rename) |
|
|
50
|
+
| lead sessionId 출처 | team-state `lead.sessionId` ([scripts/okstra_ctl/render.py:425](../../../scripts/okstra_ctl/render.py:425) 에서 기록, [scripts/okstra_token_usage/collect.py:165](../../../scripts/okstra_token_usage/collect.py:165) 에서 소비) |
|
|
51
|
+
|
|
52
|
+
run 윈도우 스코핑이 필수인 이유: in-session(`okstra-run` skill) lead 는 사용자 세션 전체 jsonl 에 기록되므로, 윈도우 없이 스캔하면 **같은 세션의 이전 okstra run 이 남긴 PROGRESS 라인이 이번 run 의 증거로 오인**된다(false pass). 토큰 집계가 이미 같은 이유로 윈도우를 쓴다 ([collect.py:124-130](../../../scripts/okstra_token_usage/collect.py:124) 주석).
|
|
53
|
+
|
|
54
|
+
### 3.3 jsonl 스캔 규칙 (검사 1·3 공통)
|
|
55
|
+
|
|
56
|
+
- 스캔 대상 레코드: `type == "assistant"` 인 레코드만. — Skill 호출 시 SKILL.md 본문(체크포인트 라인 예시 포함!)이 tool_result(user 레코드)로 transcript 에 주입되므로, assistant 외 레코드를 보면 즉시 false pass 가 난다.
|
|
57
|
+
- PROGRESS 라인: assistant 레코드의 `message.content[].type == "text"` 블록에서 line-anchored 정규식 `^PROGRESS: <phase-id>\b` (MULTILINE) 으로 추출. thinking 블록은 제외.
|
|
58
|
+
- Read tool-call: assistant 레코드의 `message.content[].type == "tool_use"`, `name == "Read"`, `input.file_path` 의 basename 매칭.
|
|
59
|
+
- 각 증거에 레코드 `timestamp` 를 부착해 run 윈도우(`since ≤ ts ≤ until`) 안의 것만 인정.
|
|
60
|
+
- 스캔 대상 파일: `find_claude_team_sessions(cwd, team_name, lead_sid)` 결과 중 **lead 후보 세트** = {기록된 `lead.sessionId` jsonl} ∪ {team 태그는 있으나 `agentName` 이 없는 jsonl}. 후자는 `claude --resume` 으로 lead 세션이 fork 된 경우(새 sessionId, agentName 없음)를 흡수한다 — worker 세션은 `agentName` 이 있으므로 자연 배제된다.
|
|
61
|
+
|
|
62
|
+
**P0 검증 항목 (구현 첫 단계):** 위 레코드 형태 가정(assistant text 블록 / `tool_use.name=="Read"` / `input.file_path`)을 실제 okstra run 의 lead jsonl 1개로 확인한 뒤 파서를 확정한다. 가정이 틀리면 설계로 되돌아온다.
|
|
63
|
+
|
|
64
|
+
## 4. 검사 상세
|
|
65
|
+
|
|
66
|
+
### 4.1 검사 1 — PROGRESS 체크포인트 라인
|
|
67
|
+
|
|
68
|
+
[agents/SKILL.md:87-101](../../../agents/SKILL.md:87) 의 12종 중 **validator 실행 시점에 이미 출력돼 있어야 하는 것만** 필수로 요구한다. run 형태(roster, 산출물)에 따라 요구 세트를 동적으로 구성한다:
|
|
69
|
+
|
|
70
|
+
| 체크포인트 | 요구 조건 | 판정 |
|
|
71
|
+
|-----------|----------|------|
|
|
72
|
+
| `phase-1-intake reading…` / `phase-1-intake complete` | 항상 | 각 ≥1 |
|
|
73
|
+
| `phase-2-prompts…` | 항상 | ≥1 |
|
|
74
|
+
| `phase-3-team-create…` | worker 가 1개 이상 dispatch 된 경우 (team-state worker status ∈ {completed, timeout, error, in-progress} — [validators/validate-run.py:373-377](../../../validators/validate-run.py:373) 의 `any_dispatched` 와 동일 기준) | ≥1 |
|
|
75
|
+
| `phase-4-dispatch worker=<role>…` | dispatch 시도된(status ∈ ATTEMPTED_STATUSES) worker 마다 | role 별 ≥1. role 매칭은 normalize(소문자화, 공백/하이픈 동일시) 후 `worker=` 토큰 비교; normalize 매칭 실패 시 해당 worker 를 실패 항목으로 보고 |
|
|
76
|
+
| `phase-5-poll…` | 검사 안 함 (pending 집합이 관측되지 않은 짧은 run 에선 합법적으로 0개) | — |
|
|
77
|
+
| `phase-5-collect worker=<role>…` | status == completed 인 worker 마다 | role 별 ≥1 |
|
|
78
|
+
| `phase-5.5-convergence round=…` | convergence state artifact 가 run 디렉터리에 존재하는 경우 | ≥1 |
|
|
79
|
+
| `phase-5.6-critic…` | 검사 안 함 (opt-in — [agents/SKILL.md:97](../../../agents/SKILL.md:97)) | — |
|
|
80
|
+
| `phase-6-synthesis…` | `Report writer worker` 가 required roster 에 있는 경우 | ≥1 |
|
|
81
|
+
| `phase-7-persist…` | 항상 (validator 호출은 Phase 7 내부이므로 시작 라인은 이미 출력됨) | ≥1 |
|
|
82
|
+
| `phase-7-teardown…` / `complete…` | 검사 안 함 (validator 실행 **이후** 출력) | — |
|
|
83
|
+
|
|
84
|
+
실패 메시지는 누락된 체크포인트 id 와 SKILL.md 의 해당 행을 명시한다.
|
|
85
|
+
|
|
86
|
+
**lead jsonl 미발견 시:** 실패로 처리한다. 근거 — 토큰 사용량 계약이 이미 같은 원칙을 강제한다: lead jsonl 을 못 찾으면 `accuracy-failed` 로 validation 이 실패하므로 ([validators/validate-run.py:1819-1824](../../../validators/validate-run.py:1819)), "jsonl 이 없어 conformance 를 못 본다" 는 상황은 어차피 통과할 수 없는 run 이다. 별도 opt-out 은 만들지 않는다 (allowlist 원칙 — 통과 조건만 정의).
|
|
87
|
+
|
|
88
|
+
### 4.2 검사 2 — claude-worker heartbeat cadence
|
|
89
|
+
|
|
90
|
+
대상: `worker-results/claude-worker-audit-<task-type>-<seq>.md` (jsonl 불필요 — 디스크 파일 파싱). 사이드카 **존재** 는 기존 검사가 이미 강제하므로 ([validators/validate-run.py:974-981](../../../validators/validate-run.py:974)), 이 검사는 **내용 cadence** 만 추가한다. 대상 worker 는 claude-worker 가 ATTEMPTED 인 run 으로 한정한다 — 5분 heartbeat 계약은 [claude-worker.md:65](../../../agents/workers/claude-worker.md:65) 의 것이고, codex/gemini 는 `.status.json` watchdog 가 별도 경로로 이미 강제된다.
|
|
91
|
+
|
|
92
|
+
파싱: `- PROGRESS: <stage> <ISO-8601-UTC>` 라인 목록을 추출. 검사 항목:
|
|
93
|
+
|
|
94
|
+
1. **시작 마커**: 첫 PROGRESS 라인의 stage 가 `started` 일 것 ("write the sidecar … with one `- PROGRESS: started <ISO>` line").
|
|
95
|
+
2. **종료 직전 마커**: `write-result-start` stage 라인이 존재할 것 (result 파일이 있는데 이 마커가 없으면 단계별 append 계약 위반).
|
|
96
|
+
3. **timestamp 파싱 가능 + 단조 비감소**: 파싱 불가 라인, 역행 timestamp 는 각각 실패 항목.
|
|
97
|
+
4. **cadence**: 연속 PROGRESS 라인 간 간격 ≤ **5분 + grace 60초**. 계약상 5분이지만 worker 가 append 직전 측정한 시각과 실제 쓰기 사이 지연을 흡수하기 위한 고정 grace 다.
|
|
98
|
+
|
|
99
|
+
**한계 (명시):** 마지막 PROGRESS 라인 **이후** 의 hang 은 post-hoc 으로 잡을 수 없다 (종료 시각 anchor 가 없다 — 파일 mtime 은 git/조작에 취약해 anchor 로 쓰지 않는다). 그 구간은 run 중 lead 의 5-min stale mtime 감시([agents/SKILL.md:384](../../../agents/SKILL.md:384))가 담당하는 런타임 영역이다.
|
|
100
|
+
|
|
101
|
+
### 4.3 검사 3 — implementation sidecar entry guard
|
|
102
|
+
|
|
103
|
+
`task_type == "implementation"` 인 run 에만 적용. lead jsonl(run 윈도우 내)에서 `Read` tool_use 의 `input.file_path` basename 으로 다음을 요구한다:
|
|
104
|
+
|
|
105
|
+
1. **존재성**: `_implementation-executor.md`, `_implementation-verifier.md`, `_implementation-deliverable.md` 각각에 대한 Read ≥1.
|
|
106
|
+
2. **순서** (anchor 가 있을 때만): 검사 1 이 수집한 PROGRESS 라인을 anchor 로 사용 —
|
|
107
|
+
- executor·verifier sidecar Read 의 timestamp < 첫 `PROGRESS: phase-6-synthesis` 라인 timestamp ([agents/SKILL.md:182-183](../../../agents/SKILL.md:182): 둘 다 "Read at Phase 5").
|
|
108
|
+
- deliverable sidecar Read 의 timestamp < 첫 `PROGRESS: phase-7-persist` 라인 timestamp ([agents/SKILL.md:184](../../../agents/SKILL.md:184): "Read at Phase 6").
|
|
109
|
+
- anchor 라인 자체가 없으면(검사 1 이 이미 그 누락을 실패로 보고) 순서 검사는 생략하고 존재성만 본다 — 같은 원인으로 이중 실패를 쌓지 않는다.
|
|
110
|
+
3. **fresh-read 규칙** ([agents/SKILL.md:188](../../../agents/SKILL.md:188) "이전 run 기억으로 갈음 불가"): run 윈도우 스코핑 자체가 이를 보장한다 — 이번 run 윈도우 안의 Read 만 인정된다.
|
|
111
|
+
|
|
112
|
+
basename 매칭인 이유: sidecar 의 절대 경로는 레이어(repo / `~/.okstra/lib` / 설치본)에 따라 달라지지만 파일명은 세 레이어에서 동일하며, 언더스코어 접두 파일명은 이 3종 외에 lead 가 Read 할 일이 없을 만큼 특이적이다.
|
|
113
|
+
|
|
114
|
+
## 5. validate-run.py 통합
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
main() # validators/validate-run.py:1969
|
|
118
|
+
...
|
|
119
|
+
task_type = effective_run_task_type(...)
|
|
120
|
+
+ sc_result = validate_session_conformance(
|
|
121
|
+
+ team_state=team_state,
|
|
122
|
+
+ project_root=project_root,
|
|
123
|
+
+ report_path=report_path,
|
|
124
|
+
+ run_manifest=run_manifest,
|
|
125
|
+
+ task_type=task_type,
|
|
126
|
+
+ claude_projects_dir=args.claude_projects_dir, # 기본 None → 실제 ~/.claude/projects
|
|
127
|
+
+ )
|
|
128
|
+
+ failures.extend(f"session-conformance: {e}" for e in sc_result.errors)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
- 새 CLI 인자 `--claude-projects-dir <path>` (optional): 테스트·진단용 주입 시드. 기본값은 실제 디렉터리. env var 가 아닌 명시적 인자로 둔다 (사용자 규칙: env var 보다 명시적 인자). launch prompt 가 렌더링하는 validator 호출 커맨드는 변경 불필요 (기본값 사용).
|
|
132
|
+
- 검사 2 (heartbeat) 는 jsonl 과 무관하므로 jsonl 미발견 시에도 항상 수행.
|
|
133
|
+
- 실패 시 동작은 기존과 동일: `failures` 비어 있지 않음 → `validation_status = "failed"` → run/task manifest 에 `contract-violated` 기록 ([validators/validate-run.py:2069-2076](../../../validators/validate-run.py:2069)).
|
|
134
|
+
|
|
135
|
+
## 6. 테스트 전략
|
|
136
|
+
|
|
137
|
+
새 파일 `tests/test_validate_session_conformance.py` — 기존 validate-run 테스트 패턴(importlib 으로 모듈 로드 후 함수 직접 호출, [tests/test_validate_run_report_format.py:33-38](../../../tests/test_validate_run_report_format.py:33))을 따른다. `main()` 통합 경로가 아닌 함수 단위로 검증하므로 기존 validate-run 테스트는 영향받지 않는다.
|
|
138
|
+
|
|
139
|
+
| 케이스 | fixture |
|
|
140
|
+
|--------|---------|
|
|
141
|
+
| 검사 1 happy path / 체크포인트 누락 / 윈도우 밖 라인 무시(직전 run 오염) / SKILL.md 본문 주입(false-pass 방지: user 레코드의 PROGRESS 텍스트는 불인정) | 합성 jsonl (tmp_path 에 작성, `--claude-projects-dir` 주입 시드 사용) |
|
|
142
|
+
| 검사 2 happy path / started 누락 / write-result-start 누락 / 6분 gap / timestamp 역행·파싱 불가 | 합성 audit 사이드카 md |
|
|
143
|
+
| 검사 3 3종 Read 존재 / 1종 누락 / phase-6 anchor 이후 Read(순서 위반) / anchor 부재 시 존재성만 | 합성 jsonl |
|
|
144
|
+
| lead jsonl 미발견 → 실패 | 빈 projects dir |
|
|
145
|
+
| `resolve_run_window` 승격 후 collect 회귀 없음 | 기존 token-usage 테스트 통과 확인 |
|
|
146
|
+
|
|
147
|
+
**구현 완료 판정의 실행 검증 (사용자 규칙 — mocked green 으로 "검증됨" 선언 금지):** 단위 테스트 외에, 실제 okstra run 1회의 실 artifacts (lead jsonl + audit 사이드카) 에 대해 validate-run.py 를 실행해 관측한 결과를 보고한다. 그 전까지 상태 표기는 "정적/단위테스트상 통과, 실 run 미검증".
|
|
148
|
+
|
|
149
|
+
## 7. 리스크와 한계
|
|
150
|
+
|
|
151
|
+
1. **jsonl flush 타이밍**: validator 는 lead 의 Bash tool-call 로 실행된다. 그 시점까지의 assistant 메시지가 jsonl 에 기록돼 있다는 가정 — Claude Code 는 메시지 단위로 append 하므로 성립하지만, P0 에서 실 jsonl 로 함께 확인한다.
|
|
152
|
+
2. **harness 포맷 의존**: jsonl 레코드 스키마는 Claude Code 내부 포맷이다. 토큰 집계(`okstra_token_usage`)가 이미 같은 의존을 지므로 새 위험은 아니지만, 포맷 변경 시 두 소비자가 함께 깨진다 — 파서는 "인식 불가 레코드는 건너뜀 + 증거 0건이면 실패" 로 보수적으로 동작해 silent pass 는 없다.
|
|
153
|
+
3. **소급 실패**: 이 검사 추가 후 과거 계약을 안 지킨 lead 의 run 은 Phase 7 에서 실패하게 된다 — 이것이 의도다. validate-run 은 새 run 의 Phase 7 에서만 호출되므로 이미 완료된 과거 run 에는 영향 없다.
|
|
154
|
+
4. **자기 보고 timestamp**: 검사 2 의 한계 (§4.2). 누락 탐지가 목표이고, 위조 방어는 비범위.
|
|
155
|
+
5. **role 표기 편차**: 검사 1 의 `worker=<role>` 매칭은 normalize 로 흡수하되, 실 run 에서 편차가 관측되면 매칭 규칙을 SSOT(team-state role 문자열) 기준으로 조정한다.
|
|
156
|
+
|
|
157
|
+
## 8. 구현 단계 (승인 후)
|
|
158
|
+
|
|
159
|
+
1. **P0**: 실 lead jsonl 1개로 §3.3 레코드 가정 검증 → 파서 시그니처 확정.
|
|
160
|
+
2. `scripts/okstra_token_usage/collect.py` — `_resolve_run_window` → `resolve_run_window` 승격 (호출자 갱신).
|
|
161
|
+
3. `validators/validate_session_conformance.py` 신규 — 검사 1·2·3 + Result 타입 (`ok` / `errors`), 기존 validator 모듈 패턴 준수.
|
|
162
|
+
4. `validators/validate-run.py` — `--claude-projects-dir` 인자 + `main()` 통합 (§5).
|
|
163
|
+
5. `tests/test_validate_session_conformance.py` 신규 (§6 케이스).
|
|
164
|
+
6. 선언부 갱신 — 강제 지점 명시 한 줄씩 추가 (글로벌 규칙 #3 의 "옆에 한 줄"):
|
|
165
|
+
- [agents/SKILL.md:83](../../../agents/SKILL.md:83) Progress reporting 절에 "Phase 7 `validate-run.py` 가 lead 세션 jsonl 을 스캔해 누락 시 contract-violated 처리" 1줄.
|
|
166
|
+
- [agents/workers/claude-worker.md:65](../../../agents/workers/claude-worker.md:65) Heartbeat 절에 cadence post-hoc 검사 1줄.
|
|
167
|
+
- [agents/SKILL.md:186](../../../agents/SKILL.md:186) Entry guard 절의 "visible in the lead session jsonl" 뒤에 validator 가 실제로 검사함을 명시.
|
|
168
|
+
7. `npm run build` + `python3 -m pytest tests/` + 실 run 1회 검증 (§6) → `CHANGES.md` 에 `사용자 영향:` 라인 포함 엔트리 추가.
|
package/package.json
CHANGED
package/runtime/BUILD.json
CHANGED
package/runtime/agents/SKILL.md
CHANGED
|
@@ -104,6 +104,8 @@ These lines are the only structured signal the user has during a long run. Do NO
|
|
|
104
104
|
|
|
105
105
|
`okstra-run` (in-session) surfaces these lines to the user directly; the bash-spawned path leaves them in the session jsonl for post-hoc retrieval. Neither path requires any additional formatting from Lead — emit the literal `PROGRESS:` prefix and the rest of the line as plain text.
|
|
106
106
|
|
|
107
|
+
**Enforcement:** the Phase 7 validator (`validators/validate-run.py` → `validate_session_conformance.py`) scans the lead session jsonl (run-window-scoped, assistant text blocks only) and fails the run as `contract-violated` when a required checkpoint is missing — including the per-worker `phase-4-dispatch` / `phase-5-collect` lines, which must name each worker's role. `phase-7-teardown` and `complete` fire after validation and are not checked.
|
|
108
|
+
|
|
107
109
|
## Model assignments
|
|
108
110
|
|
|
109
111
|
**The lead never invents a model.** Every role's model is read from `task-manifest.json` → `resultContract.requiredWorkerRoles[*].modelExecutionValue` (and the lead model metadata). A missing assignment is a manifest defect, not a license to fall back — see [okstra-team-contract](./skills/okstra-team-contract/SKILL.md) "Model Assignment Rules". The manifest is always populated at run-prep time by the CLI, which seeds these values from `OKSTRA_DEFAULT_*_MODEL` (`scripts/okstra_ctl/run.py`).
|
|
@@ -183,7 +185,7 @@ The `implementation` profile's thin core (`prompts/profiles/implementation.md`)
|
|
|
183
185
|
| `prompts/profiles/_implementation-verifier.md` | **Phase 5**, between Executor stage completion and the first verifier dispatch | Verifier roles, Two-tier command lookup, deny-list, discrepancy rule, Read-only command log, verifier-specific forbidden actions |
|
|
184
186
|
| `prompts/profiles/_implementation-deliverable.md` | **Phase 6**, after Phase 5.5 convergence completes, BEFORE constructing the report-writer dispatch prompt | Required deliverable shape, Validation / TDD evidence rules, Verifier results structure, Self-review pass, Lead post-stage persistence |
|
|
185
187
|
|
|
186
|
-
**Entry guard (BLOCKING).** Before transitioning into Phase 5 or Phase 6 for an `implementation` run, lead MUST emit a single Read tool call for the sidecar(s) above whose `Read at` matches the entering phase. If lead enters the phase without that Read on record (visible in the lead session jsonl), phase 진입 거부 — lead writes a `contract-violation` to the run-level errors log with `--message "implementation-sidecar-not-loaded"` and stops. Re-entry requires the sidecar Read first.
|
|
188
|
+
**Entry guard (BLOCKING).** Before transitioning into Phase 5 or Phase 6 for an `implementation` run, lead MUST emit a single Read tool call for the sidecar(s) above whose `Read at` matches the entering phase. If lead enters the phase without that Read on record (visible in the lead session jsonl), phase 진입 거부 — lead writes a `contract-violation` to the run-level errors log with `--message "implementation-sidecar-not-loaded"` and stops. Re-entry requires the sidecar Read first. **Enforcement:** the Phase 7 validator (`validate_session_conformance.py`) verifies post-hoc that all three sidecar Reads exist in the lead session jsonl within this run's window, and that they precede the `phase-6-synthesis` / `phase-7-persist` checkpoints respectively.
|
|
187
189
|
|
|
188
190
|
The guard is not satisfied by remembering content from a prior run — each implementation run reads the sidecar fresh, because the sidecars are part of the runtime shipped via `okstra install` and may have been updated between runs.
|
|
189
191
|
|
|
@@ -62,7 +62,7 @@ Before producing any output, you MUST:
|
|
|
62
62
|
1. Extract the absolute path from the lead's `**Worker Preamble Path:**` anchor header and Read that file end-to-end with a single `Read` call (no `offset`, no `limit`). This is the canonical SSOT for the Required Reading + Error Reporting + Output sections contract.
|
|
63
63
|
2. Read every primary input file the lead enumerated under `## Inputs` (or equivalent heading) in the dispatch prompt body, end-to-end, following the rules stated in the preamble. For analysis workers this is normally `analysis-packet.md`; the source files named inside that packet are fallback/evidence paths to open when needed. Analysis workers do NOT read `final-report-template.md` — that file is for the report writer only.
|
|
64
64
|
|
|
65
|
-
**Heartbeat — write the audit sidecar EARLY and APPEND per stage (BLOCKING).** Because this worker runs as an in-process Agent or a fresh-session tmux pane, the lead has no `BashOutput`-style liveness signal while waiting for your return. The audit sidecar is the only signal that survives a silent hang. Write the sidecar at `runs/<task-type>/worker-results/claude-worker-audit-<task-type>-<seq>.md` immediately after extracting `Project Root` and the assigned paths — BEFORE the per-file end-to-end reads — with just the heading line (`# Claude Worker Audit — <task-key>`) and one `- PROGRESS: started <ISO-8601-UTC>` line. Then APPEND one short progress line per stage as you advance: `read-<filename>`, `analysis-start`, `findings-draft-start`, `findings-draft-complete`, `write-result-start`. The append cadence MUST NOT exceed 5 minutes — if a single analysis stage is taking longer, emit a `- PROGRESS: in-stage:<stage> <ISO-8601-UTC>` heartbeat. A 5-minute stale sidecar mtime is the canonical "this worker has hung" signal for the operator. Sidecar write/append uses `Write` (initial) and `Edit` / heredoc `>>` (per-stage append).
|
|
65
|
+
**Heartbeat — write the audit sidecar EARLY and APPEND per stage (BLOCKING).** Because this worker runs as an in-process Agent or a fresh-session tmux pane, the lead has no `BashOutput`-style liveness signal while waiting for your return. The audit sidecar is the only signal that survives a silent hang. Write the sidecar at `runs/<task-type>/worker-results/claude-worker-audit-<task-type>-<seq>.md` immediately after extracting `Project Root` and the assigned paths — BEFORE the per-file end-to-end reads — with just the heading line (`# Claude Worker Audit — <task-key>`) and one `- PROGRESS: started <ISO-8601-UTC>` line. Then APPEND one short progress line per stage as you advance: `read-<filename>`, `analysis-start`, `findings-draft-start`, `findings-draft-complete`, `write-result-start`. The append cadence MUST NOT exceed 5 minutes — if a single analysis stage is taking longer, emit a `- PROGRESS: in-stage:<stage> <ISO-8601-UTC>` heartbeat. A 5-minute stale sidecar mtime is the canonical "this worker has hung" signal for the operator. Sidecar write/append uses `Write` (initial) and `Edit` / heredoc `>>` (per-stage append). **Enforcement:** the Phase 7 validator (`validate_session_conformance.py`) parses these `- PROGRESS:` lines post-hoc and fails the run when the first stage is not `started`, `write-result-start` is missing despite an existing result file, timestamps regress/unparse, or consecutive lines are more than 5 minutes (+60s grace) apart.
|
|
66
66
|
|
|
67
67
|
## Worker Output Structure
|
|
68
68
|
|
|
@@ -135,6 +135,7 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Cod
|
|
|
135
135
|
- Treat the prompt-history path as the canonical worker prompt history artifact for the current run, resolved to absolute against `Project Root` if given as relative.
|
|
136
136
|
- The assigned model execution value is canonical for CLI execution. Do not substitute a different Codex model unless the task bundle explicitly changes it.
|
|
137
137
|
- Pass the prompt received from Lead directly to codex after persisting the exact prompt to the assigned path.
|
|
138
|
+
- **Executor preflight forwarding check (implementation runs only).** When the lead prompt assigns this dispatch the `Executor` role for an `implementation` run, the persisted prompt body MUST contain the literal heading `Coding-conventions preflight` (transcribed by the lead from `prompts/profiles/_implementation-executor.md` → "Pre-implementation context exploration") — the Codex CLI does not share the lead's context, so an untranscribed gate never reaches the process that writes the code. If the heading is absent, return `CODEX_PREFLIGHT_MISSING: executor dispatch prompt lacks the coding-conventions preflight block` instead of invoking the CLI; the lead is responsible for re-dispatching with the block included. This check does NOT apply to verifier or analysis dispatches.
|
|
138
139
|
- Include context (code, diff, file paths) if provided.
|
|
139
140
|
- For long prompts, dispatch through the wrapper with literal absolute paths (plus the worktree path for implementation phase):
|
|
140
141
|
```bash
|
|
@@ -135,6 +135,7 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Gem
|
|
|
135
135
|
- Treat the prompt-history path as the canonical worker prompt history artifact for the current run, resolved to absolute against `Project Root` if given as relative.
|
|
136
136
|
- The assigned model execution value is canonical for CLI execution. Do not substitute a different Gemini model unless the task bundle explicitly changes it.
|
|
137
137
|
- Pass the prompt received from Lead directly to gemini after persisting the exact prompt to the assigned path.
|
|
138
|
+
- **Executor preflight forwarding check (implementation runs only).** When the lead prompt assigns this dispatch the `Executor` role for an `implementation` run, the persisted prompt body MUST contain the literal heading `Coding-conventions preflight` (transcribed by the lead from `prompts/profiles/_implementation-executor.md` → "Pre-implementation context exploration") — the Gemini CLI does not share the lead's context, so an untranscribed gate never reaches the process that writes the code. If the heading is absent, return `GEMINI_PREFLIGHT_MISSING: executor dispatch prompt lacks the coding-conventions preflight block` instead of invoking the CLI; the lead is responsible for re-dispatching with the block included. This check does NOT apply to verifier or analysis dispatches.
|
|
138
139
|
- Include context (code, diff, file paths) if provided.
|
|
139
140
|
- For long prompts, dispatch through the wrapper with literal absolute paths (plus the worktree path for implementation phase):
|
|
140
141
|
```bash
|
|
@@ -83,6 +83,10 @@ while [[ $# -gt 0 ]]; do
|
|
|
83
83
|
EXECUTOR_OVERRIDE="$(require_option_value --executor "${2-}")"
|
|
84
84
|
shift 2
|
|
85
85
|
;;
|
|
86
|
+
--critic)
|
|
87
|
+
CRITIC_CHOICE="$(require_option_value --critic "${2-}")"
|
|
88
|
+
shift 2
|
|
89
|
+
;;
|
|
86
90
|
--related-tasks)
|
|
87
91
|
RELATED_TASKS_RAW="$(require_option_value --related-tasks "${2-}")"
|
|
88
92
|
shift 2
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
usage() {
|
|
4
4
|
cat >&2 <<USAGE_EOF
|
|
5
5
|
usage:
|
|
6
|
-
$DISPLAY_COMMAND_NAME [--render-only] [--yes] [--no-plan-verification] --task-type <task-type> [--workers worker1,worker2] [--lead-model <model>] [--claude-model <model>] [--codex-model <model>] [--gemini-model <model>] [--report-writer-model <model>] [--executor claude|codex|gemini] [--related-tasks taskA,taskB] --project-id <project-id> [--project-root <path>] --task-group <task-group> --task-id <task-id> --task-brief <brief-path> [--directive <directive>]
|
|
6
|
+
$DISPLAY_COMMAND_NAME [--render-only] [--yes] [--no-plan-verification] --task-type <task-type> [--workers worker1,worker2] [--lead-model <model>] [--claude-model <model>] [--codex-model <model>] [--gemini-model <model>] [--report-writer-model <model>] [--executor claude|codex|gemini] [--critic off|claude|codex|gemini] [--related-tasks taskA,taskB] --project-id <project-id> [--project-root <path>] --task-group <task-group> --task-id <task-id> --task-brief <brief-path> [--directive <directive>]
|
|
7
7
|
|
|
8
8
|
summary:
|
|
9
9
|
$DISPLAY_TOOL_NAME prepares a task-keyed instruction bundle for Claude Code and launches an interactive Claude session by default.
|
|
@@ -94,6 +94,9 @@ options:
|
|
|
94
94
|
The Executor is the only worker allowed to mutate project files; the other two
|
|
95
95
|
providers are dispatched as read-only verifiers regardless of this selection.
|
|
96
96
|
Has no effect on other task types.
|
|
97
|
+
--critic Provider for the opt-in Phase 5.6 critic pass (coverage gaps /
|
|
98
|
+
acceptance devil's-advocate). One of: off | claude | codex | gemini.
|
|
99
|
+
Default: off.
|
|
97
100
|
--related-tasks Optional comma-separated related task identifiers. Example: auth-token-refresh,frontend-login-ui
|
|
98
101
|
--work-category Work-category classification for this task. One of:
|
|
99
102
|
bugfix | feature | refactor | ops | improvement | unknown.
|
package/runtime/bin/okstra.sh
CHANGED
|
@@ -115,6 +115,7 @@ PY_ARGS=(
|
|
|
115
115
|
[[ -n "${GEMINI_MODEL_OVERRIDE-}" ]] && PY_ARGS+=(--gemini-model "$GEMINI_MODEL_OVERRIDE")
|
|
116
116
|
[[ -n "${REPORT_WRITER_MODEL_OVERRIDE-}" ]] && PY_ARGS+=(--report-writer-model "$REPORT_WRITER_MODEL_OVERRIDE")
|
|
117
117
|
[[ -n "${EXECUTOR_OVERRIDE-}" ]] && PY_ARGS+=(--executor "$EXECUTOR_OVERRIDE")
|
|
118
|
+
[[ -n "${CRITIC_CHOICE-}" ]] && PY_ARGS+=(--critic "$CRITIC_CHOICE")
|
|
118
119
|
[[ -n "${RELATED_TASKS_RAW-}" ]] && PY_ARGS+=(--related-tasks "$RELATED_TASKS_RAW")
|
|
119
120
|
[[ -n "${APPROVED_PLAN_PATH-}" ]] && PY_ARGS+=(--approved-plan "$APPROVED_PLAN_PATH")
|
|
120
121
|
[[ "$APPROVE_PLAN_ACK" == "true" ]] && PY_ARGS+=(--approve)
|
|
@@ -23,11 +23,13 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
|
|
|
23
23
|
- **Project review rule packs:** also look for project-local review skills in `<PROJECT_ROOT>/skills/*review*`, `<PROJECT_ROOT>/.claude/skills/*review*`, and up to two parent directories' `skills/*review*/SKILL.md`. Read the relevant `SKILL.md` plus referenced `references/*.md` files and apply their rules during implementation. This is a prevention pass, not a PR-comment generation workflow: do not dispatch reviewer subagents from the executor. For Fonts Ninja-style PR review packs, the executor must avoid newly introduced duplicate helper stacks, tautological tests that merely re-call the delegated helper, self-mocking, domain rules in adapters/ports, domain objects outside `domain/`, dead APIs, weak public names, and functions that fail the plain-English read.
|
|
24
24
|
- **Language-agnostic principles that ALWAYS bind (the TDD loop below MUST satisfy them):** (1) no self-mocking of the SUT — stub/spy only injected collaborators, never the subject's own methods; (2) behavioral assertions on outcomes (return value, state, persisted rows, events, boundary calls) — never `toHaveBeenCalled*` on an internal helper as the only/primary assertion; (3) truthful names — a `get*` / `find*` that writes/inserts, or a name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*`), is a defect; (4) single-purpose functions ≤50 effective lines, plain-English readability.
|
|
25
25
|
- **Graceful degradation (codex / gemini executor runtimes, or any runtime where the `~/.claude/skills/okstra-coding-preflight/` files are absent or unreadable):** do NOT skip the gate — apply the agnostic principles above plus the project's own `CLAUDE.md` / `CONTRIBUTING` / formatter+lint config, and record `coding-conventions: skill-unavailable → applied <project rules + agnostic principles>` in the final report. Never claim a skill read that did not happen.
|
|
26
|
+
- **CLI executor transcription (BLOCKING when the executor provider is `codex` or `gemini`):** the executor CLI process does NOT share the lead's context — a gate that stays in lead memory never reaches it. The lead MUST copy this entire "Coding-conventions preflight" bullet tree (file-read instructions, project review rule packs, agnostic principles, graceful degradation) verbatim into the dispatched executor prompt body. Enforcement: the CLI wrapper agents refuse an implementation-Executor dispatch whose persisted prompt lacks the literal heading `Coding-conventions preflight`, returning `<SENTINEL_PREFIX>_PREFLIGHT_MISSING` (see `agents/workers/_cli-wrapper-template.md` → Prompt Composition).
|
|
26
27
|
- **Mandatory TDD loop**: BEFORE the first `Edit` or `Write` call, the executor MUST apply a red-green-refactor loop for every code change in this run. This is required; skipping it is a `contract-violated` outcome. This governs HOW each step is executed (failing test first → minimal implementation → refactor); it does not override the approved plan's WHAT/file scope.
|
|
27
28
|
- Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
|
|
28
29
|
- Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
|
|
29
30
|
- When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
|
|
30
31
|
- **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
|
|
32
|
+
- **Real-IO test isolation (BLOCKING).** A test that exercises a **real** datastore, HTTP endpoint, external service, message queue, or filesystem — a live DB connection / DSN, a real `fetch` / `axios` / `http` request, an actual S3 / queue client, anything the project's normal CI test suite cannot run because that backend is absent — MUST be written under the task's qa directory `<task_root>/qa/` (the `TASK_QA_PATH` token; same directory that holds the Tier 3 conformance manifest). It MUST NOT be written into the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*`, or anywhere the project's lint/test globs collect. Two reasons: (a) the project's CI / normal suite has no real DB or network, so a real-IO test placed in source silently breaks the pipeline; (b) it is an okstra verification artifact, and the artifact-home rule confines okstra outputs to `.okstra/`. **The dividing line is the IO, not the intent:** a unit test that stubs/spies only *injected collaborators* (mock — no real socket, no real DB handle) is a TDD red-green artifact and stays in source; the moment a test opens a real connection or makes a real network call it belongs in qa. A stage's real-IO requirement check is a Tier 3 conformance script under `<task_root>/qa/` (declared via the implementation-planning conformance entry) — never smuggle real IO into a `*.spec.*` in source to make it run "as a unit test". The `db-test` real-execution gate above is satisfied by the conformance/db-test path against the replica, NOT by adding a live-DB `*.spec.*` to the project suite.
|
|
31
33
|
- re-read the approved plan end-to-end and parse the `## 5.5 Stage Map`. Read the **Stage** injected in the launch prompt (`Stage for this implementation run`): the single stage number this run owns. The runtime already selected and reserved this stage (one run = one stage) — do NOT recompute the start stage from `consumers.jsonl`.
|
|
32
34
|
- load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(this stage)` and inject them into the executor's working context as "runtime carry-in". For a `depends-on (none)` stage, no sidecar load — task-brief only.
|
|
33
35
|
- this stage's `depends-on` are all already `status:done`. Its file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope.
|
|
@@ -96,6 +96,7 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
|
|
|
96
96
|
- **Tautological delegation assertion:** a test asserts the SUT result equals a direct call to the same pure helper/collaborator that the SUT delegates to, instead of asserting an independent literal value or observable state.
|
|
97
97
|
- **Untruthful name:** a read-named function (`get*` / `find*` / `load*`) that writes/inserts/mutates; an adapter or repository name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*` / `findActive*`).
|
|
98
98
|
- **Hexagonal (only when the overlay is loaded):** business logic inside a port body; an adapter method that is not pure I/O (post-fetch JS filtering on domain state, domain-rule evaluation); a domain object declared outside the `domain/` boundary.
|
|
99
|
+
- **Real-IO test in source tree:** a changed/added test under the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*` — that opens a **real** DB connection / DSN, makes a real `fetch` / `axios` / `http` request, or otherwise hits real external IO without mocking the injected collaborator (a live handle, not a stub/spy). Real-IO tests MUST live under `<task_root>/qa/` per the executor's *Real-IO test isolation* rule — a live-IO test in source silently breaks the project's CI suite and violates the artifact-home rule. Cite the test file + the real-IO line; recommend moving it to `<task_root>/qa/` (or declaring it as a Tier 3 conformance script). Mock-only unit tests in source are NOT a hit.
|
|
99
100
|
- **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion, newly orphaned private/public code that is safe to remove but not on a critical path, or weak-but-not-misleading names. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
|
|
100
101
|
- **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
|
|
101
102
|
|
|
@@ -8,15 +8,17 @@ one of ``{approval, next-phase, none}``. Rows with ``Blocks=approval`` are
|
|
|
8
8
|
the approval gate: they MUST resolve before the user flips the frontmatter
|
|
9
9
|
``approved`` field to ``true`` and starts the next ``implementation`` run.
|
|
10
10
|
|
|
11
|
-
This module exposes one read function for that gate
|
|
12
|
-
``
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
11
|
+
This module exposes one read function for that gate
|
|
12
|
+
(``scan_approval_gate``) so both ``_validate_approved_plan``
|
|
13
|
+
(pre-implementation run-prep) and the wizard share the same parsing logic.
|
|
14
|
+
|
|
15
|
+
Gate semantics are fail-closed: when the §1 schema cannot be read with
|
|
16
|
+
confidence (heading missing/drifted, table header unrecognized, or any
|
|
17
|
+
body row whose metadata cell fails to parse), the scan reports an
|
|
18
|
+
``unreadable_reason`` and callers must refuse approval instead of
|
|
19
|
+
soft-passing. ``parse_clarification_items`` keeps the lenient
|
|
20
|
+
None-on-absence contract for the HTML-view renderers, which only need
|
|
21
|
+
best-effort row extraction.
|
|
20
22
|
"""
|
|
21
23
|
from __future__ import annotations
|
|
22
24
|
|
|
@@ -150,24 +152,28 @@ def parse_meta_cell(cell: str) -> Optional[ClarificationItem]:
|
|
|
150
152
|
)
|
|
151
153
|
|
|
152
154
|
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
155
|
+
@dataclass(frozen=True)
|
|
156
|
+
class _Section1Table:
|
|
157
|
+
"""Outcome of walking the §1 slice for its data table.
|
|
158
|
+
|
|
159
|
+
``items`` is ``None`` when no recognizable table header exists among the
|
|
160
|
+
pipe lines. ``unparsed_row_count`` counts body rows whose metadata cell
|
|
161
|
+
failed ``parse_meta_cell`` (all-empty filler rows excluded).
|
|
162
|
+
``has_pipe_lines`` distinguishes the renderer's legitimate table-less
|
|
163
|
+
placeholder (emptyState bullet) from a table whose header drifted.
|
|
161
164
|
"""
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
+
items: Optional[list[ClarificationItem]]
|
|
166
|
+
unparsed_row_count: int
|
|
167
|
+
has_pipe_lines: bool
|
|
165
168
|
|
|
169
|
+
|
|
170
|
+
def _walk_section_1_table(section: str) -> _Section1Table:
|
|
166
171
|
lines = section.splitlines()
|
|
172
|
+
has_pipe_lines = any(line.lstrip().startswith("|") for line in lines)
|
|
167
173
|
# Locate the §1 data table by its header. The merged-meta layout collapses
|
|
168
174
|
# ID/Ticket/Kind/Blocks/Status into one metadata cell and keeps the
|
|
169
175
|
# English `Statement` + `User input` columns; detect on those two (any
|
|
170
|
-
# other table — intro, legacy 5.1/5.2 — is rejected
|
|
176
|
+
# other table — intro, legacy 5.1/5.2 — is rejected).
|
|
171
177
|
header_idx = -1
|
|
172
178
|
for idx, line in enumerate(lines):
|
|
173
179
|
if not line.lstrip().startswith("|"):
|
|
@@ -177,9 +183,10 @@ def parse_clarification_items(report_text: str) -> Optional[list[ClarificationIt
|
|
|
177
183
|
header_idx = idx
|
|
178
184
|
break
|
|
179
185
|
if header_idx < 0:
|
|
180
|
-
return None
|
|
186
|
+
return _Section1Table(None, 0, has_pipe_lines)
|
|
181
187
|
|
|
182
188
|
items: list[ClarificationItem] = []
|
|
189
|
+
unparsed = 0
|
|
183
190
|
body_started = False
|
|
184
191
|
for line in lines[header_idx + 1:]:
|
|
185
192
|
if not line.lstrip().startswith("|"):
|
|
@@ -192,32 +199,84 @@ def parse_clarification_items(report_text: str) -> Optional[list[ClarificationIt
|
|
|
192
199
|
if not body_started:
|
|
193
200
|
continue
|
|
194
201
|
cells = _split_pipe_row(line)
|
|
195
|
-
if not cells:
|
|
202
|
+
if not any(cells):
|
|
196
203
|
continue
|
|
197
204
|
item = parse_meta_cell(cells[0])
|
|
198
|
-
if item is
|
|
199
|
-
|
|
200
|
-
|
|
205
|
+
if item is None:
|
|
206
|
+
unparsed += 1
|
|
207
|
+
continue
|
|
208
|
+
items.append(item)
|
|
209
|
+
return _Section1Table(items, unparsed, True)
|
|
210
|
+
|
|
211
|
+
|
|
212
|
+
def parse_clarification_items(report_text: str) -> Optional[list[ClarificationItem]]:
|
|
213
|
+
"""Return the list of §1 rows. ``None`` means "no §1 meta table detected"
|
|
214
|
+
(missing section or unrecognized table header) — caller must NOT treat
|
|
215
|
+
that as "table is empty".
|
|
216
|
+
|
|
217
|
+
Lenient view-renderer contract: rows whose metadata cell fails to parse
|
|
218
|
+
are skipped, not surfaced. The approval gate must use
|
|
219
|
+
``scan_approval_gate`` instead, which fail-closes on those rows.
|
|
220
|
+
"""
|
|
221
|
+
section = _section_1_slice(report_text)
|
|
222
|
+
if section is None:
|
|
223
|
+
return None
|
|
224
|
+
return _walk_section_1_table(section).items
|
|
201
225
|
|
|
202
226
|
|
|
203
227
|
UNRESOLVED_STATUSES = {"open", "answered"}
|
|
204
228
|
|
|
205
229
|
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
230
|
+
@dataclass(frozen=True)
|
|
231
|
+
class ApprovalGateScan:
|
|
232
|
+
"""Fail-closed read of the §1 approval gate.
|
|
209
233
|
|
|
210
|
-
``None``
|
|
211
|
-
|
|
212
|
-
|
|
234
|
+
``unreadable_reason`` is ``None`` only when the scan is confident: §1
|
|
235
|
+
parsed cleanly (or is the legitimate table-less placeholder) and
|
|
236
|
+
``blockers`` is therefore authoritative. A non-None reason means the
|
|
237
|
+
gate must refuse approval — never soft-pass.
|
|
213
238
|
"""
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
239
|
+
blockers: list[ClarificationItem]
|
|
240
|
+
unreadable_reason: Optional[str]
|
|
241
|
+
|
|
242
|
+
|
|
243
|
+
def scan_approval_gate(report_text: str) -> ApprovalGateScan:
|
|
244
|
+
"""Scan §1 for unresolved ``Blocks=approval`` rows (``Status`` in
|
|
245
|
+
``{open, answered}``), refusing to guess whenever the schema drifted."""
|
|
246
|
+
section = _section_1_slice(report_text)
|
|
247
|
+
if section is None:
|
|
248
|
+
if _LOOSE_SECTION_1_RE.search(report_text):
|
|
249
|
+
reason = (
|
|
250
|
+
"`## 1. Clarification Items` heading exists but does not match "
|
|
251
|
+
"the schema heading format (anchor/format drift)"
|
|
252
|
+
)
|
|
253
|
+
else:
|
|
254
|
+
reason = (
|
|
255
|
+
"report has no `## 1. Clarification Items` section — the gate "
|
|
256
|
+
"cannot confirm there are no unresolved `Blocks=approval` rows"
|
|
257
|
+
)
|
|
258
|
+
return ApprovalGateScan([], reason)
|
|
259
|
+
table = _walk_section_1_table(section)
|
|
260
|
+
if table.items is None:
|
|
261
|
+
if table.has_pipe_lines:
|
|
262
|
+
return ApprovalGateScan([], (
|
|
263
|
+
"§1 contains a table but its header row is not the schema "
|
|
264
|
+
"header (`| ... | Statement | Expected form | User input |`)"
|
|
265
|
+
))
|
|
266
|
+
# Renderer's emptyState placeholder: heading is intact and no table
|
|
267
|
+
# was emitted — confidently "no approval-blocking items".
|
|
268
|
+
return ApprovalGateScan([], None)
|
|
269
|
+
if table.unparsed_row_count:
|
|
270
|
+
return ApprovalGateScan([], (
|
|
271
|
+
f"§1 table has {table.unparsed_row_count} row(s) whose metadata "
|
|
272
|
+
"cell could not be parsed (Blocks/Status markers missing or "
|
|
273
|
+
"malformed)"
|
|
274
|
+
))
|
|
275
|
+
blockers = [
|
|
276
|
+
it for it in table.items
|
|
219
277
|
if it.blocks == "approval" and it.status in UNRESOLVED_STATUSES
|
|
220
278
|
]
|
|
279
|
+
return ApprovalGateScan(blockers, None)
|
|
221
280
|
|
|
222
281
|
|
|
223
282
|
# 느슨한 §1 헤딩 탐지: 엄격한 SECTION_HEADING_PATTERN 이 실패해도 이게 매칭되면
|