@kodevibe/harness 0.11.1 → 0.11.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +33 -3
- package/README.md +29 -4
- package/harness/agents/lead.md +12 -4
- package/harness/agents/pm.md +20 -12
- package/harness/agents/reviewer.md +26 -29
- package/harness/project-state.md +11 -0
- package/harness/skills/breakdown.md +2 -1
- package/harness/skills/pr-review.md +16 -0
- package/harness/skills/setup.md +1 -0
- package/harness/skills/state-check.md +55 -0
- package/harness/skills/wrap-up.md +32 -25
- package/package.json +11 -2
- package/src/dependency-scan.js +194 -0
- package/src/guard.js +1017 -0
- package/src/llm-bench.js +323 -0
- package/src/pack-check.js +47 -0
package/README.ko.md
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
|
|
12
12
|
> **AI 코딩 에이전트는 세션이 끝나면 모든 것을 잊습니다. kode:harness는 목표, 결정, 실패, 프로젝트 방향을 기억하게 합니다.**
|
|
13
13
|
|
|
14
|
-
AI 코딩 에이전트를 위한
|
|
14
|
+
AI 코딩 에이전트를 위한 pre-1.0 가드레일입니다. 실제 프로젝트 파일럿과 팀 사용을 기준으로 방향, proof, 세션 간 메모리를 유지합니다. 컨텍스트 부패를 방지하고, 프로젝트 방향을 강제하며, 세션 간 상태를 유지합니다. **Copilot, Claude, Cursor, Codex, Windsurf, Gemini** 지원. 의존성 제로.
|
|
15
15
|
|
|
16
16
|
> **에코시스템 내 위치.** kode:harness는 **kode:vibe** 에코시스템의 *실행(execution)* 레이어입니다 — 계획 레이어(PRD / 아키텍처 / ARB)와 인프라 레이어(CI / 런타임) 사이에 위치하며, 코딩 중 AI의 방향을 잡아주는 역할을 합니다. 다른 레이어는 선택이며, kode:harness만 독립적으로 쓸 수 있습니다.
|
|
17
17
|
|
|
@@ -32,6 +32,8 @@ npx @kodevibe/harness init # IDE 선택
|
|
|
32
32
|
|
|
33
33
|
끝입니다. 이제 AI는 영속적인 메모리, 방향 가드레일, 자기 교정 루프를 갖게 됩니다.
|
|
34
34
|
|
|
35
|
+
처음이라면 [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md)부터 보세요. proof-first 흐름은 [docs/PROOF_LEDGER_WALKTHROUGH.md](docs/PROOF_LEDGER_WALKTHROUGH.md), 팀 도입은 [docs/TEAM_ONBOARDING.md](docs/TEAM_ONBOARDING.md), 설치/제거 안전 경계는 [docs/SAFETY_GUIDE.md](docs/SAFETY_GUIDE.md)에 정리했습니다.
|
|
36
|
+
|
|
35
37
|
v0.11부터는 **Proof-First Enforcement**가 Common Mode Confidence Loop 위에 추가됩니다. pm은 실행 가능한 proof를 계획해야 하고, lead는 증거 없이 Story를 완료 처리하지 않으며, reviewer는 proof가 없거나 실패하면 커밋 안내 전에 멈추고, state-check는 Proof Ledger 누락을 점검합니다.
|
|
36
38
|
|
|
37
39
|
<details>
|
|
@@ -170,6 +172,27 @@ npx @kodevibe/harness validate # state 파일에 실제 내용 확인
|
|
|
170
172
|
npx @kodevibe/harness uninstall --dry-run --ide vscode # 안전 제거 미리보기
|
|
171
173
|
```
|
|
172
174
|
|
|
175
|
+
소스 repo maintainer는 결정론 guard와 model-tier 증거 체크도 함께 실행할 수 있습니다:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
npm run harness:check-pack
|
|
179
|
+
npm run harness:guard:all
|
|
180
|
+
npm run harness:guard:wrap-up -- --quiet
|
|
181
|
+
npm run harness:state-sync
|
|
182
|
+
npm run harness:dependency-scan
|
|
183
|
+
npm run harness:llm-bench:smoke
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
릴리즈 증거로 인정하려면 예시 fixture 대신 실제 3개 이상 모델 tier의 sealed result를 `docs/llm-bench-results.json`에 기록한 뒤 실행하세요. 각 run은 `docs/llm-bench-scenarios.json`의 표준 시나리오를 사용해야 하며, `capturedAt`, 일치하는 `promptHash`, `outputHash` 또는 `transcriptHash`가 필요합니다.
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
npm run harness:llm-bench:export
|
|
190
|
+
node scripts/llm-bench.js --collect-results --model-id local-small-2026-06-08 --tier constrained --outputs bench/r10/local-small-2026-06-08 --out docs/llm-bench-results.json --append
|
|
191
|
+
npm run harness:llm-bench:real
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
전체 3-model 증거 생성 절차는 `docs/LLM_BENCH_GUIDE.md`를 참고하세요.
|
|
195
|
+
|
|
173
196
|
---
|
|
174
197
|
|
|
175
198
|
## 지원 IDE
|
|
@@ -329,6 +352,12 @@ npx @kodevibe/harness init --team
|
|
|
329
352
|
|
|
330
353
|
---
|
|
331
354
|
|
|
355
|
+
## 문서
|
|
356
|
+
|
|
357
|
+
패키지 사용자는 이 README에서 시작하면 됩니다. 저장소 maintainer는 GitHub 전용 [docs/wiki-index.md](https://github.com/AIDD-Projects/harness/blob/main/docs/wiki-index.md)를 source repo 문서 지도로 사용합니다. 기여 가이드, 아키텍처 노트, 릴리스 기록, local docs hub 경계를 연결하되, source repo 안에 active harness instruction을 다시 설치하지 않습니다.
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
332
361
|
## 왜 만들었나
|
|
333
362
|
|
|
334
363
|
기존 AI 코딩 프레임워크는 **AI가 무엇을 하는지** — 코드 생성, 테스트 실행, 배포에 집중합니다. 하지만 진짜 문제는 능력이 아닙니다. **방향**입니다.
|
|
@@ -380,7 +409,7 @@ Bootstrap이 `docs/crew/`, `docs/PM/`, `docs/Analyst/`, `docs/ARB/`에서 crew
|
|
|
380
409
|
|
|
381
410
|
## 로드맵
|
|
382
411
|
|
|
383
|
-
kode:harness는 현재 **v0.11.
|
|
412
|
+
kode:harness는 현재 **v0.11.2** — v0.11의 proof-first와 uninstall safety 기반 위에 deterministic source-repo guardrail과 manifest-sealed R10 model benchmark workflow를 추가했습니다.
|
|
384
413
|
|
|
385
414
|
| 단계 | 버전 | 상태 | 초점 |
|
|
386
415
|
|------|------|------|------|
|
|
@@ -395,7 +424,8 @@ kode:harness는 현재 **v0.11.1** — manifest 기반 안전 제거를 추가
|
|
|
395
424
|
| **Drift Guard & Positioning** | v0.9.7 | ✅ 완료 | `harness/`↔`.github/` drift 가드, reviewer working-proof 게이트, kode:vibe 위치 안내, IDE 선택 가이드, project-brief 예시 |
|
|
396
425
|
| **Confidence Loop** | v0.10.0 | ✅ 완료 | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content 회귀 테스트 |
|
|
397
426
|
| **Proof-First Enforcement** | v0.11.0 | ✅ 완료 | Mandatory Proof Plan, lead proof blocker, reviewer proof blocker, state-check Proof Ledger coverage |
|
|
398
|
-
| **Uninstall Safety** | v0.11.1 | ✅
|
|
427
|
+
| **Uninstall Safety** | v0.11.1 | ✅ 완료 | Manifest 기반 uninstall, state 기본 보존, shared owner 복원, purge cleanup |
|
|
428
|
+
| **Deterministic Release Guard** | v0.11.2 | ✅ 현재 | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
|
|
399
429
|
| **Docs Bridge** | v0.11.1 | 🧪 Experimental | Project Docs Hub Index, docs-bridge 스킬, visibility 경계를 가진 로컬 docs hub 인덱스 |
|
|
400
430
|
| **Safety & Branding** | v0.9.6 | ✅ 완료 | init overwrite 백업, 배포 파일 pm 네이밍 정리, LICENSE 브랜딩 정리 |
|
|
401
431
|
| **Validation** | v1.0 | 🔜 다음 | 실사용 검증, 사용자 피드백 수집 |
|
package/README.md
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
|
|
12
12
|
> **Your AI coding agent forgets everything between sessions. kode:harness makes it remember — goals, decisions, failures, and project direction.**
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
Pre-1.0 guardrails for AI coding agents, designed for real project pilots and teams that need direction, proof, and memory across sessions. Prevents context rot, enforces project direction, and persists state across sessions. Works with **Copilot, Claude, Cursor, Codex, Windsurf, and Gemini**. Zero dependencies.
|
|
15
15
|
|
|
16
16
|
> **Where this fits.** kode:harness is the *execution* layer of the **kode:vibe** ecosystem — it sits between a planning layer (PRD / architecture / ARB) and an infrastructure layer (CI / runtime). kode:harness keeps the AI on direction while you code; the other layers are optional. You can use kode:harness alone.
|
|
17
17
|
|
|
@@ -32,6 +32,8 @@ npx @kodevibe/harness init # pick your IDE
|
|
|
32
32
|
|
|
33
33
|
That's it. Your AI now has persistent memory, direction guardrails, and self-correction loops.
|
|
34
34
|
|
|
35
|
+
New to kode:harness? Start with [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md), then read [docs/PROOF_LEDGER_WALKTHROUGH.md](docs/PROOF_LEDGER_WALKTHROUGH.md) for the proof-first loop. Teams should also read [docs/TEAM_ONBOARDING.md](docs/TEAM_ONBOARDING.md). For install and uninstall trust boundaries, see [docs/SAFETY_GUIDE.md](docs/SAFETY_GUIDE.md).
|
|
36
|
+
|
|
35
37
|
v0.11 adds **Proof-First Enforcement** on top of the Common Mode Confidence Loop: pm must define runnable proof, lead cannot mark a Story done without passing evidence, reviewer blocks commit guidance when proof is missing or failing, and state-check audits Proof Ledger coverage.
|
|
36
38
|
|
|
37
39
|
<details>
|
|
@@ -182,6 +184,27 @@ npx @kodevibe/harness validate # verify state files have real content
|
|
|
182
184
|
npx @kodevibe/harness uninstall --dry-run --ide vscode # preview safe removal
|
|
183
185
|
```
|
|
184
186
|
|
|
187
|
+
Source repo maintainers can also run the deterministic guard and model-tier evidence checks:
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
npm run harness:check-pack
|
|
191
|
+
npm run harness:guard:all
|
|
192
|
+
npm run harness:guard:wrap-up -- --quiet
|
|
193
|
+
npm run harness:state-sync
|
|
194
|
+
npm run harness:dependency-scan
|
|
195
|
+
npm run harness:llm-bench:smoke
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
For release evidence, replace the example fixture with sealed results from at least three real model tiers. Each run must use the standard scenarios in `docs/llm-bench-scenarios.json` and include `capturedAt`, a matching `promptHash`, and `outputHash` or `transcriptHash`.
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
npm run harness:llm-bench:export
|
|
202
|
+
node scripts/llm-bench.js --collect-results --model-id local-small-2026-06-08 --tier constrained --outputs bench/r10/local-small-2026-06-08 --out docs/llm-bench-results.json --append
|
|
203
|
+
npm run harness:llm-bench:real
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
See `docs/LLM_BENCH_GUIDE.md` for the full three-model evidence workflow.
|
|
207
|
+
|
|
185
208
|
## Supported IDEs
|
|
186
209
|
|
|
187
210
|
Not sure which to pick? Use the IDE you already code in — each install path is generated from the same `harness/` source, so the underlying skills/agents are identical:
|
|
@@ -315,7 +338,7 @@ These 11 rules are enforced across all skills and agents. They form the quality
|
|
|
315
338
|
|
|
316
339
|
## Documentation
|
|
317
340
|
|
|
318
|
-
|
|
341
|
+
Package users can start with this README. Repository maintainers can use the GitHub-only [docs/wiki-index.md](https://github.com/AIDD-Projects/harness/blob/main/docs/wiki-index.md) as the source repo documentation map; it links the contribution guide, architecture notes, release history, and local docs hub boundaries without installing active harness instructions into this repo.
|
|
319
342
|
|
|
320
343
|
## Why We Built This
|
|
321
344
|
|
|
@@ -366,7 +389,7 @@ It adds a Project Docs Hub Index to `project-brief.md` with each local source, r
|
|
|
366
389
|
|
|
367
390
|
## Roadmap
|
|
368
391
|
|
|
369
|
-
kode:harness is at **v0.11.
|
|
392
|
+
kode:harness is at **v0.11.3** — adds R15 experiment hardening for section integrity, Wave Scope drift, and filter coverage honesty on top of the v0.11 proof-first and deterministic release guard foundation.
|
|
370
393
|
|
|
371
394
|
| Phase | Version | Status | Focus |
|
|
372
395
|
|---|---|---|---|
|
|
@@ -381,7 +404,9 @@ kode:harness is at **v0.11.1** — adds manifest-based safe uninstall so generat
|
|
|
381
404
|
| **Drift Guard & Positioning** | v0.9.7 | ✅ Done | `harness/`↔`.github/` drift detector, reviewer working-proof gate, kode:vibe positioning, IDE selection guide, project-brief example |
|
|
382
405
|
| **Confidence Loop** | v0.10.0 | ✅ Done | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content regression tests |
|
|
383
406
|
| **Proof-First Enforcement** | v0.11.0 | ✅ Complete | Mandatory Proof Plan, lead proof blockers, reviewer proof blockers, state-check Proof Ledger coverage |
|
|
384
|
-
| **Uninstall Safety** | v0.11.1 | ✅
|
|
407
|
+
| **Uninstall Safety** | v0.11.1 | ✅ Complete | Manifest-based uninstall, default state preservation, shared owner restore, purge cleanup |
|
|
408
|
+
| **Deterministic Release Guard** | v0.11.2 | ✅ Complete | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
|
|
409
|
+
| **Experiment Hardening** | v0.11.3 | ✅ Current | R15 Recent Changes integrity, Wave Scope boundary drift checks, enum/filter coverage honesty, R15 bench scenarios |
|
|
385
410
|
| **Docs Bridge** | v0.11.1 | 🧪 Experimental | Project Docs Hub Index, docs-bridge skill, local docs hub index with visibility boundaries |
|
|
386
411
|
| **Safety & Branding** | v0.9.6 | ✅ Done | init overwrite backups, shipped pm naming cleanup, LICENSE branding cleanup |
|
|
387
412
|
| **Validation** | v1.0 | 🔜 Next | Real-world project adoption, user feedback collection |
|
package/harness/agents/lead.md
CHANGED
|
@@ -104,15 +104,18 @@ After every status check, recommend the next action based on current context:
|
|
|
104
104
|
|
|
105
105
|
**Request: "story done" / "S{N}-{M} done"**
|
|
106
106
|
1. Read the Story's Proof Plan and current Evidence-Gated Progress Board row.
|
|
107
|
-
2.
|
|
107
|
+
2. Read `## Story Contracts` rows for the Story. Every row must be proven before Done.
|
|
108
|
+
3. Require proof before marking done:
|
|
108
109
|
- Passing proof → set state to `Proven`, update Story status to `✅ done`, append Proof Ledger / Evidence Summary row.
|
|
109
110
|
- Missing proof → keep state `Proof Pending`, output `[BLOCKER: PROOF_MISSING]`, and do not advance to the next Story.
|
|
110
111
|
- Failing proof → keep state `Implementing`, output `[BLOCKER: PROOF_FAILING]`, and fix within current Story.
|
|
111
|
-
|
|
112
|
-
4.
|
|
112
|
+
- Unproven contract → keep state `Proof Pending`, output `[BLOCKER: CONTRACT_NOT_PROVEN]`, and update the contract Evidence target.
|
|
113
|
+
4. If proof passes, update Story Contract Proof Status to `✅ pass` / `proven` with evidence. Never leave `needs-user-confirmation` on a done Story.
|
|
114
|
+
5. Add completion record to "Recent Changes" section only after passing proof.
|
|
115
|
+
6. **Commit/Push check**: If changes are uncommitted, remind:
|
|
113
116
|
- "⚠️ S{N}-{M} 완료 — 커밋하셨나요? `git add <files> && git commit -m \"S{N}-{M}: {description}\"`"
|
|
114
117
|
- Team mode: Also remind to push — "팀원에게 공유하려면 `git push origin {branch}` 실행"
|
|
115
|
-
|
|
118
|
+
7. Guide to next Story only after proof passes.
|
|
116
119
|
|
|
117
120
|
**Request: "new story" / "next task"**
|
|
118
121
|
1. Find next `todo` Story in docs/project-state.md
|
|
@@ -147,7 +150,12 @@ When invoked after pm approval, verify that pm wrote state files correctly:
|
|
|
147
150
|
|
|
148
151
|
When a Story contains multiple Tasks/Waves (from breakdown):
|
|
149
152
|
- Guide implementation **one Wave at a time** (not one file at a time, not all at once)
|
|
153
|
+
- For every Wave, print a compact **Wave Scope** before implementation:
|
|
154
|
+
`Wave {N} Scope: allowed files = <files>; expected proof = <command/manual observation>`
|
|
150
155
|
- After each Wave is implemented, **run tests or smoke proof** to verify the Wave is clean before proceeding
|
|
156
|
+
- After each Wave, compare changed files against the Wave Scope:
|
|
157
|
+
- Only allowed files changed → continue.
|
|
158
|
+
- Extra files changed → output `[SCOPE-DRIFT: WAVE_BOUNDARY]`, record the extra files, and ask whether the Wave should be collapsed/approved before proceeding.
|
|
151
159
|
- Record a mini Proof Ledger row inline: Evidence, Result, Command / Observation
|
|
152
160
|
- Only after verification passes, prompt: "Wave {N} 완료 (tests pass). Wave {N+1}로 넘어갈까요?"
|
|
153
161
|
- If tests fail → output `[BLOCKER: WAVE_PROOF_FAILING]`, fix within the current Wave, and do NOT advance.
|
package/harness/agents/pm.md
CHANGED
|
@@ -70,7 +70,7 @@ Apply these insights when creating the implementation plan. If the memory file i
|
|
|
70
70
|
<!-- CREW_MODE_END -->
|
|
71
71
|
|
|
72
72
|
1. `docs/project-brief.md`의 Goals + `docs/dependency-map.md`의 현재 모듈 구조를 읽는다
|
|
73
|
-
2.
|
|
73
|
+
2. Roadmap 초안 생성:
|
|
74
74
|
```
|
|
75
75
|
## Feature Roadmap
|
|
76
76
|
### Phase 1 — Core (Goal 달성 필수)
|
|
@@ -78,9 +78,8 @@ Apply these insights when creating the implementation plan. If the memory file i
|
|
|
78
78
|
### Phase 2 — Enhancement (사용성/완성도)
|
|
79
79
|
- [ ] F-002: ...
|
|
80
80
|
```
|
|
81
|
-
3.
|
|
82
|
-
4.
|
|
83
|
-
5. Feature Roadmap이 확정되면 아래 "For New Feature" 절차로 진행한다
|
|
81
|
+
3. 교정 후 `docs/project-brief.md`에 기록한다
|
|
82
|
+
4. 확정되면 "For New Feature"로 진행한다
|
|
84
83
|
|
|
85
84
|
### For New Feature
|
|
86
85
|
|
|
@@ -132,10 +131,11 @@ Apply these insights when creating the implementation plan. If the memory file i
|
|
|
132
131
|
10. Run **check-impact** skill for each existing module being modified (pm calls both skills independently — breakdown does NOT invoke check-impact internally. Ordering: breakdown first → register modules → check-impact second.)
|
|
133
132
|
11. Check `docs/failure-patterns.md` for relevant past mistakes
|
|
134
133
|
12. Produce a **Goal Card** (6 lines max) and implementation plan.
|
|
135
|
-
13. Produce a **Proof Plan** per Story: exact test/smoke
|
|
136
|
-
14.
|
|
137
|
-
15. **
|
|
138
|
-
16. **After user approves** → Update `docs/
|
|
134
|
+
13. Produce a **Proof Plan** per Story: exact test/smoke/manual proof; never TBD. No path → add Story 0: set up test/smoke proof. Blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP.
|
|
135
|
+
14. Produce **Story Contracts**: Done-blocking field/API/UI/ARB assertions.
|
|
136
|
+
15. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) — do NOT write state files yet
|
|
137
|
+
16. **After user approves** → Update `docs/project-state.md` with the new Story and Story Contract rows
|
|
138
|
+
17. **After user approves** → Update `docs/features.md` with the new feature entry
|
|
139
139
|
|
|
140
140
|
State writes (Steps 15-16) execute ONLY after user approval. Rejected plans never touch state.
|
|
141
141
|
|
|
@@ -185,8 +185,10 @@ After user approves the plan, perform all writes before 🧭:
|
|
|
185
185
|
- If no Sprint exists, create Sprint 1 with theme
|
|
186
186
|
- Add Story rows to the Story Status table (status = `⬜ todo`)
|
|
187
187
|
- Each Story: ID (S{N}-{M}), Title, Status, Scope (files/modules), Proof Plan
|
|
188
|
+
- Add `## Story Contracts` rows. Initial status: `⬜ not proven` or `needs-user-confirmation`.
|
|
188
189
|
<!-- CREW_MODE_START -->
|
|
189
190
|
- If crew-driven, include FR reference
|
|
191
|
+
- Convert FR/ARB criteria naming fields, API contracts, invariants, proof gates, or UI requirements.
|
|
190
192
|
<!-- CREW_MODE_END -->
|
|
191
193
|
- Update Quick Summary section
|
|
192
194
|
|
|
@@ -201,10 +203,11 @@ After user approves the plan, perform all writes before 🧭:
|
|
|
201
203
|
- ARB Fail Resolution: fill Story column with mapped Story IDs
|
|
202
204
|
<!-- CREW_MODE_END -->
|
|
203
205
|
|
|
204
|
-
**Completion Check**:
|
|
205
|
-
- [ ] features.md
|
|
206
|
-
- [ ] project-state.md
|
|
207
|
-
- [ ]
|
|
206
|
+
**Completion Check**:
|
|
207
|
+
- [ ] features.md new row(s)
|
|
208
|
+
- [ ] project-state.md Story rows with `⬜ todo`
|
|
209
|
+
- [ ] project-state.md Story Contract rows or "none identified"
|
|
210
|
+
- [ ] dependency-map.md new module rows (if any)
|
|
208
211
|
<!-- CREW_MODE_START -->
|
|
209
212
|
- [ ] project-brief.md Validation Tracker updated (if 🟣 pipeline)
|
|
210
213
|
<!-- CREW_MODE_END -->
|
|
@@ -247,6 +250,11 @@ After the Post-Approval state writes complete, run the `state-check` skill:
|
|
|
247
250
|
| S{N}-0 | Proof setup, if needed | `npm test` / `npm run smoke` / manual checklist |
|
|
248
251
|
| S{N}-{M} | Tests / smoke / manual | exact command/checklist; never TBD |
|
|
249
252
|
|
|
253
|
+
### Story Contract Plan
|
|
254
|
+
| Story | Contract | Required Assertion | Initial Proof Status | Evidence Target |
|
|
255
|
+
|-------|----------|--------------------|----------------------|-----------------|
|
|
256
|
+
| S{N}-{M} | Field/API/domain/UI contract | assertion to prove before Done | ⬜ not proven | unit/API/smoke/manual |
|
|
257
|
+
|
|
250
258
|
### Implementation Plan
|
|
251
259
|
[Output from breakdown skill]
|
|
252
260
|
|
|
@@ -57,6 +57,7 @@ Changed file list (user-provided or from `git diff --name-only`)
|
|
|
57
57
|
**Step 1: Identify Change Scope**
|
|
58
58
|
- Run `git diff --cached --stat` or `git diff --stat` to see changed files
|
|
59
59
|
- Compare against current Story scope in docs/project-state.md
|
|
60
|
+
- If lead/pm named a Wave Scope, changed files outside that Wave are `[SCOPE-DRIFT: WAVE_BOUNDARY]`. Do not revert automatically; require user approval or a state note explaining the collapsed wave.
|
|
60
61
|
|
|
61
62
|
**Step 2: Architecture Rule Check**
|
|
62
63
|
- [ ] No imports from infrastructure in domain layer
|
|
@@ -64,6 +65,15 @@ Changed file list (user-provided or from `git diff --name-only`)
|
|
|
64
65
|
- [ ] Constructor parameters match actual source (FP-002)
|
|
65
66
|
- [ ] **Common First (Iron Law #9)**: No crew-specific logic outside crew marker blocks. All features must work without crew artifacts.
|
|
66
67
|
|
|
68
|
+
**Step 2.2: Acceptance Contract Gate**
|
|
69
|
+
|
|
70
|
+
If `docs/project-state.md` has `## Story Contracts` rows for the Story:
|
|
71
|
+
1. Review each row before code-quality review.
|
|
72
|
+
2. Compare assertion vs code, tests, API/UI output, and proof.
|
|
73
|
+
3. Output **Story Contract Review**: `Contract | Status | Evidence`.
|
|
74
|
+
4. `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` blocks `DONE` and commit guidance.
|
|
75
|
+
5. Wrong-contract tests fail.
|
|
76
|
+
|
|
67
77
|
<!-- CREW_MODE_START -->
|
|
68
78
|
**Step 2.5: CI Standards Compliance (🟣 Pipeline only)**
|
|
69
79
|
|
|
@@ -79,11 +89,11 @@ Changed file list (user-provided or from `git diff --name-only`)
|
|
|
79
89
|
1. Check if `docs/project-brief.md` has a `## CI Artifact Index` section (or `.harness/ci-index.md` exists). If neither → skip this step.
|
|
80
90
|
2. Read the project's primary language/build tool from `docs/project-brief.md` Key Technical Decisions.
|
|
81
91
|
3. Match the language/build tool to a row in the CI Artifact Index → get the reference URL and Key Constraints.
|
|
82
|
-
4. Surface the reference
|
|
92
|
+
4. Surface the reference under `### CI Standards Compliance`:
|
|
83
93
|
- Reference URL (the indexed guide)
|
|
84
|
-
- Key Constraints
|
|
85
|
-
- `[CI-STANDARD]`
|
|
86
|
-
5. **Warning only — do NOT block commit**.
|
|
94
|
+
- Key Constraints from the index
|
|
95
|
+
- `[CI-STANDARD]` if changed files obviously mismatch a listed constraint
|
|
96
|
+
5. **Warning only — do NOT block commit**. Point to the guide; do not assert compliance.
|
|
87
97
|
|
|
88
98
|
If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip this step entirely (also true for 🟢/🔵/🔴 pipelines).
|
|
89
99
|
<!-- CREW_MODE_END -->
|
|
@@ -92,10 +102,12 @@ If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip
|
|
|
92
102
|
- [ ] Interface changes have synchronized mocks (FP-001)
|
|
93
103
|
- [ ] New features have tests
|
|
94
104
|
- [ ] Existing tests pass
|
|
105
|
+
- [ ] **Enum/filter coverage**: If the Story adds finite values (e.g. `normal/watch/breached`), tests must cover every value at the claimed boundary (domain/API/UI) or mark untested boundaries as partial. Claiming full API coverage while HTTP tests exercise one value → `[ACCEPTANCE-GAP: FILTER_COVERAGE]`.
|
|
95
106
|
|
|
96
|
-
**Verification is a gate
|
|
97
|
-
- Run the project
|
|
98
|
-
- If
|
|
107
|
+
**Verification is a gate.** Before Step 4, include concrete working proof:
|
|
108
|
+
- Run the project test/verification command (`npm test`, `pytest`, `go test ./...`, or recorded proof command).
|
|
109
|
+
- If user-facing behavior is untested, include smoke proof (command, URL, screenshot/manual action, or observed output).
|
|
110
|
+
- UI/manual proof needs artifact/checklist or URL + exact counts/elements.
|
|
99
111
|
- If any existing test fails → output `[BLOCKER: TESTS_FAILING]`. STOP before Step 4.
|
|
100
112
|
- If a Proof Plan command cannot run → output `[BLOCKER: PROOF_COMMAND_INVALID]` with the command. STOP.
|
|
101
113
|
- If test files exist but no test command exists → output `[BLOCKER: NO_TEST_COMMAND]`. STOP.
|
|
@@ -114,6 +126,7 @@ If state files are in scope, write/request Proof Ledger / Evidence Summary immed
|
|
|
114
126
|
- [ ] No credentials, .env, or temp files in staging (FP-004)
|
|
115
127
|
- [ ] No hardcoded API keys or passwords
|
|
116
128
|
- [ ] No injection vulnerabilities (SQL, XSS)
|
|
129
|
+
- [ ] Evaluator artifacts require approval (`harness-owner: evaluator` → `harness-edit-approved`)
|
|
117
130
|
|
|
118
131
|
**Step 5: Failure Pattern Cross-Check**
|
|
119
132
|
- Compare current changes against all FP-NNN items in docs/failure-patterns.md
|
|
@@ -124,28 +137,9 @@ If state files are in scope, write/request Proof Ledger / Evidence Summary immed
|
|
|
124
137
|
|
|
125
138
|
If `docs/project-brief.md` contains a `## Crew Artifact Index` table with entries:
|
|
126
139
|
|
|
127
|
-
1. **ARB Fail Item Check**:
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
- Read the relevant section in the ARB checklist (path from Artifact Index)
|
|
131
|
-
- Verify implementation matches the recommended action
|
|
132
|
-
- If not → flag as `[ARB-COMPLIANCE]` in output
|
|
133
|
-
- **Indirect resolution check**: Even if the Story does NOT have `[ARB-FAIL]` prefix, scan the changed files against ARB Fail items. If a change resolves or partially addresses a Fail item (e.g., fixing a security vulnerability flagged by ARB), flag as `[ARB-INDIRECT]` in output with a recommendation to update the Validation Tracker.
|
|
134
|
-
|
|
135
|
-
2. **NFR Spot Check** (lightweight — check only NFRs relevant to changed files):
|
|
136
|
-
- Read PRD's non-functional requirements section (path from Artifact Index)
|
|
137
|
-
- Check ONLY the NFRs related to changed code:
|
|
138
|
-
- Performance-related change? → Check performance NFRs
|
|
139
|
-
- Security-related change? → Check security NFRs
|
|
140
|
-
- API change? → Check scalability/reliability NFRs
|
|
141
|
-
- Flag violations as `[NFR-GAP]` in output
|
|
142
|
-
- Note: This is a best-effort check by the LLM, not a guarantee of 100% detection
|
|
143
|
-
|
|
144
|
-
3. **FR Acceptance Criteria Check**:
|
|
145
|
-
- If the current Story has `[FR-NNN]` reference:
|
|
146
|
-
- Read the corresponding FR acceptance criteria from PRD (path from Artifact Index)
|
|
147
|
-
- Verify tests cover the acceptance criteria
|
|
148
|
-
- If missing → flag as `[ACCEPTANCE-GAP]` in output
|
|
140
|
+
1. **ARB Fail Item Check**: If the Story has `[ARB-FAIL]`, verify the related ARB checklist recommendation. If changed files indirectly resolve a Fail item, flag `[ARB-INDIRECT]` and recommend Tracker update.
|
|
141
|
+
2. **NFR Spot Check**: Read only NFRs relevant to changed files (security/performance/API/reliability). Violations → `[NFR-GAP]`.
|
|
142
|
+
3. **FR Acceptance Criteria Check**: If the Story references `FR-NNN`, verify tests/proof cover the PRD acceptance criteria. Missing coverage → `[ACCEPTANCE-GAP]`.
|
|
149
143
|
|
|
150
144
|
All flags (`[ARB-COMPLIANCE]`, `[ARB-INDIRECT]`, `[NFR-GAP]`, `[ACCEPTANCE-GAP]`) are warnings, not blockers. Include them in the review output under a new "### Crew Artifact Compliance" section.
|
|
151
145
|
|
|
@@ -171,6 +165,7 @@ Verify that state file updates actually happened. **Run the `state-check` skill
|
|
|
171
165
|
|
|
172
166
|
After running state-check, also verify:
|
|
173
167
|
- [ ] **docs/project-state.md**: If stories were worked on, is Quick Summary current? Are story statuses updated?
|
|
168
|
+
- [ ] **docs/project-state.md section integrity**: `## Recent Changes` must not contain proof/evidence headings or UI checklist bullets. If it does, flag `[STATE-AUDIT: SECTION_CORRUPTION]` and fix before DONE.
|
|
174
169
|
- [ ] **docs/features.md**: If new features were added, are they registered? If features were completed, is status updated?
|
|
175
170
|
- [ ] **Cross-check features ↔ stories**: If a feature status is `✅ done` in features.md, verify all related stories in project-state.md are also `done`. If stories are `done` but their feature is still `🔄 in-progress`, flag as `[STATE-AUDIT]`.
|
|
176
171
|
- [ ] **FR Coverage validation**: For the Story being reviewed, check if it implements a feature (FR-NNN reference in Story name, or changes to files listed in features.md Key Files):
|
|
@@ -223,6 +218,7 @@ If review is BLOCKED → do NOT suggest commit. Fix first.
|
|
|
223
218
|
|
|
224
219
|
### Passed Items
|
|
225
220
|
- Architecture rules: ✅
|
|
221
|
+
- Story Contract Review: ✅ / ❌ / ⚠️ (include table when contracts exist)
|
|
226
222
|
- Test integrity: ✅ / ⚠️ (detail)
|
|
227
223
|
- Working proof: command/evidence + PASS result
|
|
228
224
|
- Proof Ledger: compact table with evidence, result, and command/observation
|
|
@@ -276,6 +272,7 @@ After review completes, always append a 🧭 block based on the outcome:
|
|
|
276
272
|
| All checks pass, more stories remain | Commit → `lead` — "커밋 후 다음 Story는?" |
|
|
277
273
|
| All checks pass, all stories done | Commit → `wrap-up` — "커밋 후 세션을 마무리해줘" |
|
|
278
274
|
| STATE-AUDIT flags found | Two valid paths: (1) `wrap-up` now → "지금 state 파일을 정리해줘" or (2) `lead` → continue coding, resolve at session end |
|
|
275
|
+
| FILTER_COVERAGE or SECTION_CORRUPTION found | [Fix] — "테스트/상태 구조 지적사항을 수정하세요. 완료 후 다시 reviewer 호출" |
|
|
279
276
|
| Security/architecture issues blocking | [Fix] — "리뷰 지적사항을 수정하세요. 완료 후 **새 프롬프트**에서 다시 `@reviewer` 호출" |
|
|
280
277
|
|
|
281
278
|
Example 🧭 block for passing review:
|
package/harness/project-state.md
CHANGED
|
@@ -61,6 +61,17 @@
|
|
|
61
61
|
| S1-1 | First usable result | Proof Pending | npm test | - | tests not run |
|
|
62
62
|
-->
|
|
63
63
|
|
|
64
|
+
## Story Contracts
|
|
65
|
+
|
|
66
|
+
<!-- Semantic acceptance contracts for active Stories.
|
|
67
|
+
Keep rows short and testable. Every row for a ✅ done Story must be proven.
|
|
68
|
+
Use this to prevent semantic drift such as implementing `risk` when the contract requires `slaRisk`.
|
|
69
|
+
|
|
70
|
+
| Story | Contract | Required Assertion | Proof Status | Evidence |
|
|
71
|
+
|-------|----------|--------------------|--------------|----------|
|
|
72
|
+
| S1-1 | Field contract | Public response exposes `expectedField`, not `wrongField` | ⬜ not proven | unit/API/UI assertion |
|
|
73
|
+
-->
|
|
74
|
+
|
|
64
75
|
## Proof Ledger
|
|
65
76
|
|
|
66
77
|
<!-- One line per completed proof. Do not paste long logs.
|
|
@@ -78,6 +78,7 @@ Ensures bottom-up implementation: foundations first, then layers that depend on
|
|
|
78
78
|
- Each task should be completable in one session
|
|
79
79
|
- Every task must include its test files
|
|
80
80
|
- Implementation and tests belong in the same Wave whenever possible. Do not defer tests to a later Wave unless the proof harness itself is the earlier Wave.
|
|
81
|
+
- Each Story/Wave must preserve the Story Contracts defined by pm. If a task changes a public field, API response, domain invariant, UI label, persistence behavior, or ARB control, name the affected contract and the assertion that will prove it.
|
|
81
82
|
- New modules MUST be registered in docs/dependency-map.md (Iron Law #6) — the breakdown OUTPUT section lists these registrations, and pm (or the user, if invoked directly) is responsible for executing the actual state file writes
|
|
82
83
|
- If a task exceeds Story scope, stop and report to user
|
|
83
84
|
|
|
@@ -87,7 +88,7 @@ After completing the breakdown, update these files in the same session:
|
|
|
87
88
|
|
|
88
89
|
- [ ] **docs/dependency-map.md**: Register all NEW_MODULE entries. Update "Depends On" / "Depended By" for INTERFACE_CHANGE entries.
|
|
89
90
|
- [ ] **docs/features.md**: Add a new row for the feature with Status `🔧 active`, Key Files from Wave tasks, and Test Files.
|
|
90
|
-
- [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave.
|
|
91
|
+
- [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave and add/update `## Story Contracts` rows for semantic assertions that must be proven before Done.
|
|
91
92
|
|
|
92
93
|
### 🧭 Navigation — After Feature Breakdown
|
|
93
94
|
|
|
@@ -40,6 +40,17 @@ Unlike the `reviewer` agent (which reviews your own changes pre-commit with full
|
|
|
40
40
|
|
|
41
41
|
### Step 4: Code Quality
|
|
42
42
|
|
|
43
|
+
Before ordinary code-quality comments, run the **Acceptance Contract Gate**:
|
|
44
|
+
|
|
45
|
+
1. Read `docs/project-state.md` → `## Story Contracts`.
|
|
46
|
+
2. If the PR's Story has contract rows, verify each required assertion against the diff, tests, and proof evidence.
|
|
47
|
+
3. Produce:
|
|
48
|
+
| Contract | Status | Evidence |
|
|
49
|
+
|----------|--------|----------|
|
|
50
|
+
| Field/API/domain/UI contract | PASS / FAIL / NOT_PROVEN | exact file/test/proof evidence |
|
|
51
|
+
4. Any `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` → `REQUEST_CHANGES`.
|
|
52
|
+
5. Green tests are insufficient if they assert the wrong semantic contract.
|
|
53
|
+
|
|
43
54
|
Run through these checks for each changed file:
|
|
44
55
|
|
|
45
56
|
- [ ] Architecture rules respected (no layer violations — check docs/dependency-map.md)
|
|
@@ -50,6 +61,7 @@ Run through these checks for each changed file:
|
|
|
50
61
|
- [ ] No duplicated logic that should be extracted to a shared module
|
|
51
62
|
- [ ] No circular imports (verify against docs/dependency-map.md)
|
|
52
63
|
- [ ] Naming conventions follow project standards (per project-brief.md → Key Technical Decisions)
|
|
64
|
+
- [ ] Evaluator/run-card/scorecard artifacts are not rewritten unless explicitly user-approved
|
|
53
65
|
|
|
54
66
|
### Step 5: Test Coverage
|
|
55
67
|
|
|
@@ -57,6 +69,7 @@ Run through these checks for each changed file:
|
|
|
57
69
|
- [ ] Modified code has updated tests
|
|
58
70
|
- [ ] No `.only` or `.skip` left in test files
|
|
59
71
|
- [ ] Interface changes have synchronized mocks (Iron Law #1)
|
|
72
|
+
- [ ] UI/manual proof is durable: artifact/checklist, or URL plus exact observed UI counts/elements
|
|
60
73
|
|
|
61
74
|
### Step 6: State File Compliance
|
|
62
75
|
|
|
@@ -85,6 +98,9 @@ Run through these checks for each changed file:
|
|
|
85
98
|
- [issue 1]
|
|
86
99
|
- [issue 2]
|
|
87
100
|
|
|
101
|
+
### Story Contract Review
|
|
102
|
+
- PASS / FAIL / NOT_PROVEN rows, when `## Story Contracts` exists
|
|
103
|
+
|
|
88
104
|
### Test Coverage: ✅ Sufficient / ⚠️ Gaps found
|
|
89
105
|
[list of gaps if any]
|
|
90
106
|
|
package/harness/skills/setup.md
CHANGED
|
@@ -189,6 +189,7 @@ Using data from Phase 1 + Phase 2, fill the following files:
|
|
|
189
189
|
- Quick Summary: filled with current project state
|
|
190
190
|
- Sprint 1 stories: based on what setup discovered
|
|
191
191
|
- Module Registry: summary from docs/dependency-map.md
|
|
192
|
+
- Story Contracts: keep the section. Add compact `⬜ not proven` rows for explicit field/API/invariant/UI/ARB contracts; if unsure, write `needs-user-confirmation`.
|
|
192
193
|
|
|
193
194
|
**docs/failure-patterns.md**:
|
|
194
195
|
- Keep FP-001 through FP-004 as templates (Frequency: 0)
|
|
@@ -131,6 +131,49 @@ Outcomes:
|
|
|
131
131
|
- Missing local path, blank required column, invalid vocabulary, local machine path leakage, or restricted row with external target → WARN
|
|
132
132
|
- No `Project Docs Hub Index` section or no real rows → skip; this project may not use docs-bridge
|
|
133
133
|
|
|
134
|
+
### Check 9: Story Contract Coverage
|
|
135
|
+
|
|
136
|
+
1. Read `docs/project-state.md` (or `.harness/project-state.md` in Team mode).
|
|
137
|
+
2. If any Story is marked `✅ done`, inspect `## Story Contracts`.
|
|
138
|
+
3. Outcomes:
|
|
139
|
+
- Done Story has no `## Story Contracts` section → WARN: `[WARN] Story {S-N-M} done but no Story Contracts section — add semantic contract rows for new work`
|
|
140
|
+
- Done Story has no rows in `## Story Contracts` → WARN: `[WARN] Story {S-N-M} done but has no Story Contract rows`
|
|
141
|
+
- Story Contracts table is missing required columns `Story`, `Contract`, or `Proof Status` → FAIL
|
|
142
|
+
- Done Story has any contract row with blank / `⬜ not proven` / `NOT_PROVEN` / `FAIL` / `needs-user-confirmation` → FAIL
|
|
143
|
+
- Done Story has all rows `✅ pass`, `proven`, `verified`, or equivalent → PASS
|
|
144
|
+
- In-progress Story has unproven contracts → PASS; proof pending is normal
|
|
145
|
+
|
|
146
|
+
This is the semantic acceptance gate. It exists to catch drift such as "contract requires `slaRisk`, implementation/tests/UI use `risk`" before DONE.
|
|
147
|
+
|
|
148
|
+
### Check 10: Durable Smoke Evidence
|
|
149
|
+
|
|
150
|
+
For each `✅ done` Story, any passing Proof Ledger row whose Evidence says `UI`, `browser`, `smoke`, or `manual` must include durable proof:
|
|
151
|
+
- screenshot/trace/video/Playwright/Cypress artifact, or
|
|
152
|
+
- checklist row, or
|
|
153
|
+
- URL plus exact observed UI counts/elements.
|
|
154
|
+
|
|
155
|
+
Vague rows like `UI manual | pass | checked browser` → FAIL.
|
|
156
|
+
|
|
157
|
+
### Check 11: Evaluator Artifact Protection
|
|
158
|
+
|
|
159
|
+
If changed files include `experiment/kode-harness-scorecard.md`, `experiment/run-card.md`, `experiment/evaluator-*.md`, or `docs/evaluator-*.md`, WARN unless explicitly requested by the user. If the file has `<!-- harness-owner: evaluator -->`, FAIL unless it also has `<!-- harness-edit-approved: <reason> -->`.
|
|
160
|
+
|
|
161
|
+
### Check 12: Scope Split Approval
|
|
162
|
+
|
|
163
|
+
If `docs/project-brief.md` maps one FR/KPI/ARB row to multiple Story IDs, require `Scope split approved: <reason>` or `<!-- harness-scope-split-approved: ... -->`. Without approval → FAIL.
|
|
164
|
+
|
|
165
|
+
### Check 13: Recent Changes Section Integrity
|
|
166
|
+
|
|
167
|
+
`docs/project-state.md` must keep session changelog entries separate from proof/evidence sections.
|
|
168
|
+
|
|
169
|
+
1. Read `## Recent Changes`.
|
|
170
|
+
2. FAIL if the section contains nested headings such as `### Scenario`, `### Color Coding`, or `### Conclusion`.
|
|
171
|
+
3. FAIL if the section contains UI/proof checklist bullets such as dropdown/column/color-coding/evidence details.
|
|
172
|
+
4. Valid Recent Changes entries should be compact changelog rows, preferably:
|
|
173
|
+
`- YYYY-MM-DD S{N}-{M}: {what changed} (STATUS: DONE)`
|
|
174
|
+
|
|
175
|
+
This catches wrap-up corruption where `## Recent Changes` is inserted in the middle of `FR-008 Durable UI Evidence` and steals the remaining proof content.
|
|
176
|
+
|
|
134
177
|
## Output Format
|
|
135
178
|
|
|
136
179
|
```
|
|
@@ -158,6 +201,18 @@ Outcomes:
|
|
|
158
201
|
### Check 8: Project Docs Hub Index
|
|
159
202
|
- {N} rows checked / {M} missing paths / {K} visibility warnings / skipped if unused
|
|
160
203
|
|
|
204
|
+
### Check 9: Story Contract Coverage
|
|
205
|
+
- {N} done Stories checked / {M} missing contract rows / {K} unproven contracts
|
|
206
|
+
|
|
207
|
+
### Check 10: Durable Smoke Evidence
|
|
208
|
+
- {N} UI/manual proof rows checked / {M} vague rows
|
|
209
|
+
|
|
210
|
+
### Check 12: Scope Split Approval
|
|
211
|
+
- {N} split tracker rows checked / {M} missing approval
|
|
212
|
+
|
|
213
|
+
### Check 13: Recent Changes Section Integrity
|
|
214
|
+
- Recent Changes contains only changelog entries / {M} misplaced evidence lines
|
|
215
|
+
|
|
161
216
|
<!-- CREW_MODE_START -->
|
|
162
217
|
### Check 6: Validation Tracker (🟣)
|
|
163
218
|
- {N} FR references checked / {M} drifted
|