@kodevibe/harness 0.11.2 → 0.11.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ko.md CHANGED
@@ -409,7 +409,7 @@ Bootstrap이 `docs/crew/`, `docs/PM/`, `docs/Analyst/`, `docs/ARB/`에서 crew
409
409
 
410
410
  ## 로드맵
411
411
 
412
- kode:harness는 현재 **v0.11.2** — v0.11 proof-first uninstall safety 기반 위에 deterministic source-repo guardrail과 manifest-sealed R10 model benchmark workflow를 추가했습니다.
412
+ kode:harness는 현재 **v0.11.4** — v0.11 proof-first 기반 위에 R16 recovery hardening(거짓 clean state-check claim, surface-specific Story Contract, reviewer dependency evidence, dirty wrap-up guard)을 추가했습니다.
413
413
 
414
414
  | 단계 | 버전 | 상태 | 초점 |
415
415
  |------|------|------|------|
@@ -425,7 +425,9 @@ kode:harness는 현재 **v0.11.2** — v0.11의 proof-first와 uninstall safety
425
425
  | **Confidence Loop** | v0.10.0 | ✅ 완료 | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content 회귀 테스트 |
426
426
  | **Proof-First Enforcement** | v0.11.0 | ✅ 완료 | Mandatory Proof Plan, lead proof blocker, reviewer proof blocker, state-check Proof Ledger coverage |
427
427
  | **Uninstall Safety** | v0.11.1 | ✅ 완료 | Manifest 기반 uninstall, state 기본 보존, shared owner 복원, purge cleanup |
428
- | **Deterministic Release Guard** | v0.11.2 | ✅ 현재 | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
428
+ | **Deterministic Release Guard** | v0.11.2 | ✅ 완료 | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
429
+ | **Experiment Hardening** | v0.11.3 | ✅ 완료 | R15 Recent Changes integrity, Wave Scope boundary drift checks, enum/filter coverage honesty |
430
+ | **Recovery Hardening** | v0.11.4 | ✅ 현재 | R16 false PASS claim guard, surface-specific Story Contract checks, reviewer dependency evidence, dirty wrap-up guard |
429
431
  | **Docs Bridge** | v0.11.1 | 🧪 Experimental | Project Docs Hub Index, docs-bridge 스킬, visibility 경계를 가진 로컬 docs hub 인덱스 |
430
432
  | **Safety & Branding** | v0.9.6 | ✅ 완료 | init overwrite 백업, 배포 파일 pm 네이밍 정리, LICENSE 브랜딩 정리 |
431
433
  | **Validation** | v1.0 | 🔜 다음 | 실사용 검증, 사용자 피드백 수집 |
package/README.md CHANGED
@@ -389,7 +389,7 @@ It adds a Project Docs Hub Index to `project-brief.md` with each local source, r
389
389
 
390
390
  ## Roadmap
391
391
 
392
- kode:harness is at **v0.11.2** — adds deterministic source-repo guardrails and a manifest-sealed R10 model benchmark workflow on top of the v0.11 proof-first and uninstall safety foundation.
392
+ kode:harness is at **v0.11.4** — adds R16 recovery hardening for false clean state-check claims, surface-specific Story Contracts, reviewer dependency evidence, and dirty wrap-up truthfulness on top of the v0.11 proof-first and deterministic release guard foundation.
393
393
 
394
394
  | Phase | Version | Status | Focus |
395
395
  |---|---|---|---|
@@ -405,7 +405,9 @@ kode:harness is at **v0.11.2** — adds deterministic source-repo guardrails and
405
405
  | **Confidence Loop** | v0.10.0 | ✅ Done | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content regression tests |
406
406
  | **Proof-First Enforcement** | v0.11.0 | ✅ Complete | Mandatory Proof Plan, lead proof blockers, reviewer proof blockers, state-check Proof Ledger coverage |
407
407
  | **Uninstall Safety** | v0.11.1 | ✅ Complete | Manifest-based uninstall, default state preservation, shared owner restore, purge cleanup |
408
- | **Deterministic Release Guard** | v0.11.2 | ✅ Current | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
408
+ | **Deterministic Release Guard** | v0.11.2 | ✅ Complete | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
409
+ | **Experiment Hardening** | v0.11.3 | ✅ Complete | R15 Recent Changes integrity, Wave Scope boundary drift checks, enum/filter coverage honesty, R15 bench scenarios |
410
+ | **Recovery Hardening** | v0.11.4 | ✅ Current | R16 false PASS claim guard, surface-specific Story Contract checks, reviewer dependency evidence, dirty wrap-up guard |
409
411
  | **Docs Bridge** | v0.11.1 | 🧪 Experimental | Project Docs Hub Index, docs-bridge skill, local docs hub index with visibility boundaries |
410
412
  | **Safety & Branding** | v0.9.6 | ✅ Done | init overwrite backups, shipped pm naming cleanup, LICENSE branding cleanup |
411
413
  | **Validation** | v1.0 | 🔜 Next | Real-world project adoption, user feedback collection |
@@ -104,15 +104,18 @@ After every status check, recommend the next action based on current context:
104
104
 
105
105
  **Request: "story done" / "S{N}-{M} done"**
106
106
  1. Read the Story's Proof Plan and current Evidence-Gated Progress Board row.
107
- 2. Require proof before marking done:
107
+ 2. Read `## Story Contracts` rows for the Story. Every row must be proven before Done.
108
+ 3. Require proof before marking done:
108
109
  - Passing proof → set state to `Proven`, update Story status to `✅ done`, append Proof Ledger / Evidence Summary row.
109
110
  - Missing proof → keep state `Proof Pending`, output `[BLOCKER: PROOF_MISSING]`, and do not advance to the next Story.
110
111
  - Failing proof → keep state `Implementing`, output `[BLOCKER: PROOF_FAILING]`, and fix within current Story.
111
- 3. Add completion record to "Recent Changes" section only after passing proof.
112
- 4. **Commit/Push check**: If changes are uncommitted, remind:
112
+ - Unproven contract keep state `Proof Pending`, output `[BLOCKER: CONTRACT_NOT_PROVEN]`, and update the contract Evidence target.
113
+ 4. If proof passes, update Story Contract Proof Status to `✅ pass` / `proven` with evidence. Never leave `needs-user-confirmation` on a done Story.
114
+ 5. Add completion record to "Recent Changes" section only after passing proof.
115
+ 6. **Commit/Push check**: If changes are uncommitted, remind:
113
116
  - "⚠️ S{N}-{M} 완료 — 커밋하셨나요? `git add <files> && git commit -m \"S{N}-{M}: {description}\"`"
114
117
  - Team mode: Also remind to push — "팀원에게 공유하려면 `git push origin {branch}` 실행"
115
- 5. Guide to next Story only after proof passes.
118
+ 7. Guide to next Story only after proof passes.
116
119
 
117
120
  **Request: "new story" / "next task"**
118
121
  1. Find next `todo` Story in docs/project-state.md
@@ -147,8 +150,14 @@ When invoked after pm approval, verify that pm wrote state files correctly:
147
150
 
148
151
  When a Story contains multiple Tasks/Waves (from breakdown):
149
152
  - Guide implementation **one Wave at a time** (not one file at a time, not all at once)
153
+ - For every Wave, print a compact **Wave Scope** before implementation:
154
+ `Wave {N} Scope: allowed files = <files>; expected proof = <command/manual observation>`
150
155
  - After each Wave is implemented, **run tests or smoke proof** to verify the Wave is clean before proceeding
156
+ - After each Wave, compare changed files against the Wave Scope:
157
+ - Only allowed files changed → continue.
158
+ - Extra files changed → output `[SCOPE-DRIFT: WAVE_BOUNDARY]`, record the extra files, and ask whether the Wave should be collapsed/approved before proceeding.
151
159
  - Record a mini Proof Ledger row inline: Evidence, Result, Command / Observation
160
+ - For semantic contracts with "always/every/all/항상", include public surfaces in the Wave proof target (for example: `create/list/get/resolve` return paths). A test that covers only one return path is partial proof.
152
161
  - Only after verification passes, prompt: "Wave {N} 완료 (tests pass). Wave {N+1}로 넘어갈까요?"
153
162
  - If tests fail → output `[BLOCKER: WAVE_PROOF_FAILING]`, fix within the current Wave, and do NOT advance.
154
163
  - This prevents context overload from modifying too many modules simultaneously
@@ -42,24 +42,22 @@ One of:
42
42
 
43
43
  ### Step 0: State File Readiness
44
44
 
45
- Before proceeding, verify that required state files have content (not just TODO placeholders):
45
+ Before proceeding, verify required state files have content:
46
46
  - `docs/project-brief.md` — Must have Vision and Goals filled
47
47
  - `docs/features.md` — Must have at least one feature row
48
48
  - `docs/dependency-map.md` — Must have at least one module row (for existing projects)
49
49
 
50
- If ALL files are empty/placeholder-only → **Stop and run the `setup` skill first.** Report: "State files are empty. Running setup to onboard this project."
51
- If `docs/project-brief.md` alone is empty → **Stop.** Without Vision/Goals, pm cannot check Non-Goals or provide direction guard. Run `setup` first.
50
+ If ALL files are empty/placeholder-only → **Stop and run `setup` first.**
51
+ If `docs/project-brief.md` alone is empty → **Stop.** Without Vision/Goals, pm cannot provide direction guard.
52
52
 
53
53
  > Step 0 runs BEFORE Step 1. If Step 0 stops (empty brief), Step 1 never executes. When Step 0 passes, Step 1 reads the now-confirmed non-empty project-brief.md for detailed content.
54
54
 
55
55
  ### Step 0.5: Load Agent Memory
56
56
 
57
57
  Read `docs/agent-memory/pm.md` for past learnings:
58
- - Estimation accuracy from previous sprints (did Wave estimates match reality?)
59
- - Architecture patterns that worked or failed in this project
60
- - Repeated planning mistakes to avoid
58
+ - estimation accuracy, architecture patterns, repeated planning mistakes
61
59
 
62
- Apply these insights when creating the implementation plan. If the memory file is empty or contains only placeholders, skip this step.
60
+ Apply these when planning. If memory is empty/placeholders only, skip.
63
61
 
64
62
  ### Step 0.7: Roadmap Draft
65
63
 
@@ -70,7 +68,7 @@ Apply these insights when creating the implementation plan. If the memory file i
70
68
  <!-- CREW_MODE_END -->
71
69
 
72
70
  1. `docs/project-brief.md`의 Goals + `docs/dependency-map.md`의 현재 모듈 구조를 읽는다
73
- 2. Feature Roadmap **초안** 생성:
71
+ 2. Roadmap 초안 생성:
74
72
  ```
75
73
  ## Feature Roadmap
76
74
  ### Phase 1 — Core (Goal 달성 필수)
@@ -78,9 +76,8 @@ Apply these insights when creating the implementation plan. If the memory file i
78
76
  ### Phase 2 — Enhancement (사용성/완성도)
79
77
  - [ ] F-002: ...
80
78
  ```
81
- 3. 사용자에게 초안 제시: **"이 Feature Roadmap을 검토하고, 추가/삭제/순서 변경을 알려주세요."**
82
- 4. 사용자 교정을 반영한 최종 Roadmap을 `docs/project-brief.md`에 `## Feature Roadmap` 섹션으로 기록한다
83
- 5. Feature Roadmap이 확정되면 아래 "For New Feature" 절차로 진행한다
79
+ 3. 교정 `docs/project-brief.md`에 기록한다
80
+ 4. 확정되면 "For New Feature"로 진행한다
84
81
 
85
82
  ### For New Feature
86
83
 
@@ -132,10 +129,11 @@ Apply these insights when creating the implementation plan. If the memory file i
132
129
  10. Run **check-impact** skill for each existing module being modified (pm calls both skills independently — breakdown does NOT invoke check-impact internally. Ordering: breakdown first → register modules → check-impact second.)
133
130
  11. Check `docs/failure-patterns.md` for relevant past mistakes
134
131
  12. Produce a **Goal Card** (6 lines max) and implementation plan.
135
- 13. Produce a **Proof Plan** per Story: exact test/smoke command or checklist; never TBD. No path → add Story 0: set up test/smoke proof. Any `TBD`/blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP before state writes.
136
- 14. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) — do NOT write state files yet
137
- 15. **After user approves** Update `docs/project-state.md` with the new Story
138
- 16. **After user approves** → Update `docs/features.md` with the new feature entry
132
+ 13. Produce a **Proof Plan** per Story: exact test/smoke/manual proof; never TBD. No path → add Story 0: set up test/smoke proof. Blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP.
133
+ 14. Produce **Story Contracts**: Done-blocking field/API/UI/ARB assertions.
134
+ 15. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) do NOT write state files yet
135
+ 16. **After user approves** → Update `docs/project-state.md` with the new Story and Story Contract rows
136
+ 17. **After user approves** → Update `docs/features.md` with the new feature entry
139
137
 
140
138
  State writes (Steps 15-16) execute ONLY after user approval. Rejected plans never touch state.
141
139
 
@@ -185,8 +183,10 @@ After user approves the plan, perform all writes before 🧭:
185
183
  - If no Sprint exists, create Sprint 1 with theme
186
184
  - Add Story rows to the Story Status table (status = `⬜ todo`)
187
185
  - Each Story: ID (S{N}-{M}), Title, Status, Scope (files/modules), Proof Plan
186
+ - Add `## Story Contracts` rows. Initial status: `⬜ not proven` or `needs-user-confirmation`.
188
187
  <!-- CREW_MODE_START -->
189
188
  - If crew-driven, include FR reference
189
+ - Convert FR/ARB criteria naming fields, API contracts, invariants, proof gates, or UI requirements.
190
190
  <!-- CREW_MODE_END -->
191
191
  - Update Quick Summary section
192
192
 
@@ -201,10 +201,11 @@ After user approves the plan, perform all writes before 🧭:
201
201
  - ARB Fail Resolution: fill Story column with mapped Story IDs
202
202
  <!-- CREW_MODE_END -->
203
203
 
204
- **Completion Check**: Verify:
205
- - [ ] features.md has new feature row(s)
206
- - [ ] project-state.md has Story rows with `⬜ todo` status
207
- - [ ] dependency-map.md has new module rows (if plan introduces new modules)
204
+ **Completion Check**:
205
+ - [ ] features.md new row(s)
206
+ - [ ] project-state.md Story rows with `⬜ todo`
207
+ - [ ] project-state.md Story Contract rows or "none identified"
208
+ - [ ] dependency-map.md new module rows (if any)
208
209
  <!-- CREW_MODE_START -->
209
210
  - [ ] project-brief.md Validation Tracker updated (if 🟣 pipeline)
210
211
  <!-- CREW_MODE_END -->
@@ -247,6 +248,11 @@ After the Post-Approval state writes complete, run the `state-check` skill:
247
248
  | S{N}-0 | Proof setup, if needed | `npm test` / `npm run smoke` / manual checklist |
248
249
  | S{N}-{M} | Tests / smoke / manual | exact command/checklist; never TBD |
249
250
 
251
+ ### Story Contract Plan
252
+ | Story | Contract | Required Assertion | Initial Proof Status | Evidence Target |
253
+ |-------|----------|--------------------|----------------------|-----------------|
254
+ | S{N}-{M} | Field/API/domain/UI contract | assertion to prove before Done | ⬜ not proven | unit/API/smoke/manual |
255
+
250
256
  ### Implementation Plan
251
257
  [Output from breakdown skill]
252
258
 
@@ -36,17 +36,15 @@ Before reviewing, verify that required state files exist and are not empty:
36
36
  - `docs/failure-patterns.md` — Must exist (needed for Step 5 cross-check)
37
37
  - `docs/project-state.md` — Must have current Sprint info (needed for scope check)
38
38
 
39
- If state files are empty/placeholder-only → Warn: "State files are not filled. Review will proceed but scope check and failure pattern cross-check will be limited. Consider running `setup` skill."
40
- If `docs/failure-patterns.md` is empty, FP-cross-check (Step 5) will be skipped. This increases risk of recurring bugs.
39
+ If state files are empty/placeholder-only → warn that scope and FP checks are limited; suggest `setup`.
40
+ If `docs/failure-patterns.md` is empty, skip FP cross-check.
41
41
 
42
42
  ### Step 0.5: Load Agent Memory
43
43
 
44
44
  Read `docs/agent-memory/reviewer.md` for past learnings:
45
- - Frequently missed review items in this project
46
- - Common code patterns that caused issues
47
- - Review statistics (pass rate, common failure categories)
45
+ - missed review items, risky code patterns, review statistics
48
46
 
49
- Pay extra attention to items flagged in past reviews. If the memory file is empty or contains only placeholders, skip this step.
47
+ If memory is empty/placeholders only, skip.
50
48
 
51
49
  ### Input
52
50
 
@@ -57,6 +55,7 @@ Changed file list (user-provided or from `git diff --name-only`)
57
55
  **Step 1: Identify Change Scope**
58
56
  - Run `git diff --cached --stat` or `git diff --stat` to see changed files
59
57
  - Compare against current Story scope in docs/project-state.md
58
+ - If lead/pm named a Wave Scope, changed files outside that Wave are `[SCOPE-DRIFT: WAVE_BOUNDARY]`. Do not revert automatically; require user approval or a state note explaining the collapsed wave.
60
59
 
61
60
  **Step 2: Architecture Rule Check**
62
61
  - [ ] No imports from infrastructure in domain layer
@@ -64,6 +63,15 @@ Changed file list (user-provided or from `git diff --name-only`)
64
63
  - [ ] Constructor parameters match actual source (FP-002)
65
64
  - [ ] **Common First (Iron Law #9)**: No crew-specific logic outside crew marker blocks. All features must work without crew artifacts.
66
65
 
66
+ **Step 2.2: Acceptance Contract Gate**
67
+
68
+ If `docs/project-state.md` has `## Story Contracts` rows for the Story:
69
+ 1. Compare each row against code, tests, API/UI, and proof.
70
+ 2. Output **Story Contract Review**: `Contract | Status | Evidence`.
71
+ 3. `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` blocks `DONE`.
72
+ 4. Wrong-contract tests fail.
73
+ 5. **R16 surface rule**: `always/every/all/항상` contracts must name/prove relevant public paths, e.g. `create/list/get/resolve`. Missing surfaces → `[CONTRACT-GAP: SURFACE_UNSPECIFIED]`.
74
+
67
75
  <!-- CREW_MODE_START -->
68
76
  **Step 2.5: CI Standards Compliance (🟣 Pipeline only)**
69
77
 
@@ -79,11 +87,11 @@ Changed file list (user-provided or from `git diff --name-only`)
79
87
  1. Check if `docs/project-brief.md` has a `## CI Artifact Index` section (or `.harness/ci-index.md` exists). If neither → skip this step.
80
88
  2. Read the project's primary language/build tool from `docs/project-brief.md` Key Technical Decisions.
81
89
  3. Match the language/build tool to a row in the CI Artifact Index → get the reference URL and Key Constraints.
82
- 4. Surface the reference in review output under a `### CI Standards Compliance` section:
90
+ 4. Surface the reference under `### CI Standards Compliance`:
83
91
  - Reference URL (the indexed guide)
84
- - Key Constraints listed in the index (echoed back so the user does not need to re-read the guide)
85
- - `[CI-STANDARD]` flag if any obvious mismatch is detected against a listed constraint (best-effort LLM judgment based only on the changed files)
86
- 5. **Warning only — do NOT block commit**. The user (or a human reviewer) decides whether the changes meet company standards. The reviewer agent never asserts compliance; it only points to the authoritative guide.
92
+ - Key Constraints from the index
93
+ - `[CI-STANDARD]` if changed files obviously mismatch a listed constraint
94
+ 5. **Warning only — do NOT block commit**. Point to the guide; do not assert compliance.
87
95
 
88
96
  If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip this step entirely (also true for 🟢/🔵/🔴 pipelines).
89
97
  <!-- CREW_MODE_END -->
@@ -92,10 +100,12 @@ If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip
92
100
  - [ ] Interface changes have synchronized mocks (FP-001)
93
101
  - [ ] New features have tests
94
102
  - [ ] Existing tests pass
103
+ - [ ] **Enum/filter coverage**: If the Story adds finite values (e.g. `normal/watch/breached`), tests must cover every value at the claimed boundary (domain/API/UI) or mark untested boundaries as partial. Claiming full API coverage while HTTP tests exercise one value → `[ACCEPTANCE-GAP: FILTER_COVERAGE]`.
95
104
 
96
- **Verification is a gate, not a suggestion.** Before continuing to Step 4, the reviewer must include concrete working proof:
97
- - Run the project's test/verification command when available (for example `npm test`, `pnpm test`, `pytest`, `go test ./...`, or the command recorded in docs/project-brief.md / package scripts).
98
- - If the change is user-facing and tests do not exercise the behavior, include a minimal smoke proof (command, URL, screenshot/manual action, or observed output).
105
+ **Verification is a gate.** Before Step 4, include concrete working proof:
106
+ - Run the project test/verification command (`npm test`, `pytest`, `go test ./...`, or recorded proof command).
107
+ - If user-facing behavior is untested, include smoke proof (command, URL, screenshot/manual action, or observed output).
108
+ - UI/manual proof needs artifact/checklist or URL + exact counts/elements.
99
109
  - If any existing test fails → output `[BLOCKER: TESTS_FAILING]`. STOP before Step 4.
100
110
  - If a Proof Plan command cannot run → output `[BLOCKER: PROOF_COMMAND_INVALID]` with the command. STOP.
101
111
  - If test files exist but no test command exists → output `[BLOCKER: NO_TEST_COMMAND]`. STOP.
@@ -111,9 +121,9 @@ Record the result as a **Proof Ledger** entry. Keep it short:
111
121
  If state files are in scope, write/request Proof Ledger / Evidence Summary immediately after proof passes.
112
122
 
113
123
  **Step 4: Security Check (secure skill)**
114
- - [ ] No credentials, .env, or temp files in staging (FP-004)
115
- - [ ] No hardcoded API keys or passwords
116
- - [ ] No injection vulnerabilities (SQL, XSS)
124
+ - [ ] No credentials, hardcoded secrets, injection risks, or temp files
125
+ - [ ] Evaluator artifacts require approval (`harness-owner: evaluator` → `harness-edit-approved`)
126
+ - [ ] **R16 scope/dependency evidence**: For "no external deps/auth/persistence", cite `package.json` and actual `require`/`import` lines. Do not name absent modules; hallucinated deps block `DONE`.
117
127
 
118
128
  **Step 5: Failure Pattern Cross-Check**
119
129
  - Compare current changes against all FP-NNN items in docs/failure-patterns.md
@@ -124,28 +134,9 @@ If state files are in scope, write/request Proof Ledger / Evidence Summary immed
124
134
 
125
135
  If `docs/project-brief.md` contains a `## Crew Artifact Index` table with entries:
126
136
 
127
- 1. **ARB Fail Item Check**:
128
- - Read Validation Tracker ARB Fail Resolution section
129
- - If the current Story addresses a Fail item (has `[ARB-FAIL]` prefix):
130
- - Read the relevant section in the ARB checklist (path from Artifact Index)
131
- - Verify implementation matches the recommended action
132
- - If not → flag as `[ARB-COMPLIANCE]` in output
133
- - **Indirect resolution check**: Even if the Story does NOT have `[ARB-FAIL]` prefix, scan the changed files against ARB Fail items. If a change resolves or partially addresses a Fail item (e.g., fixing a security vulnerability flagged by ARB), flag as `[ARB-INDIRECT]` in output with a recommendation to update the Validation Tracker.
134
-
135
- 2. **NFR Spot Check** (lightweight — check only NFRs relevant to changed files):
136
- - Read PRD's non-functional requirements section (path from Artifact Index)
137
- - Check ONLY the NFRs related to changed code:
138
- - Performance-related change? → Check performance NFRs
139
- - Security-related change? → Check security NFRs
140
- - API change? → Check scalability/reliability NFRs
141
- - Flag violations as `[NFR-GAP]` in output
142
- - Note: This is a best-effort check by the LLM, not a guarantee of 100% detection
143
-
144
- 3. **FR Acceptance Criteria Check**:
145
- - If the current Story has `[FR-NNN]` reference:
146
- - Read the corresponding FR acceptance criteria from PRD (path from Artifact Index)
147
- - Verify tests cover the acceptance criteria
148
- - If missing → flag as `[ACCEPTANCE-GAP]` in output
137
+ 1. **ARB Fail Item Check**: If the Story has `[ARB-FAIL]`, verify the related ARB checklist recommendation. If changed files indirectly resolve a Fail item, flag `[ARB-INDIRECT]` and recommend Tracker update.
138
+ 2. **NFR Spot Check**: Read only NFRs relevant to changed files (security/performance/API/reliability). Violations → `[NFR-GAP]`.
139
+ 3. **FR Acceptance Criteria Check**: If the Story references `FR-NNN`, verify tests/proof cover the PRD acceptance criteria. Missing coverage → `[ACCEPTANCE-GAP]`.
149
140
 
150
141
  All flags (`[ARB-COMPLIANCE]`, `[ARB-INDIRECT]`, `[NFR-GAP]`, `[ACCEPTANCE-GAP]`) are warnings, not blockers. Include them in the review output under a new "### Crew Artifact Compliance" section.
151
142
 
@@ -171,6 +162,7 @@ Verify that state file updates actually happened. **Run the `state-check` skill
171
162
 
172
163
  After running state-check, also verify:
173
164
  - [ ] **docs/project-state.md**: If stories were worked on, is Quick Summary current? Are story statuses updated?
165
+ - [ ] **docs/project-state.md section integrity**: `## Recent Changes` must not contain proof/evidence headings or UI checklist bullets. If it does, flag `[STATE-AUDIT: SECTION_CORRUPTION]` and fix before DONE.
174
166
  - [ ] **docs/features.md**: If new features were added, are they registered? If features were completed, is status updated?
175
167
  - [ ] **Cross-check features ↔ stories**: If a feature status is `✅ done` in features.md, verify all related stories in project-state.md are also `done`. If stories are `done` but their feature is still `🔄 in-progress`, flag as `[STATE-AUDIT]`.
176
168
  - [ ] **FR Coverage validation**: For the Story being reviewed, check if it implements a feature (FR-NNN reference in Story name, or changes to files listed in features.md Key Files):
@@ -181,6 +173,8 @@ After running state-check, also verify:
181
173
  - [ ] **docs/failure-patterns.md**: If a bug was fixed that matched a pattern, was frequency incremented?
182
174
  - [ ] **docs/project-brief.md**: If a technology or architectural decision was made, is it in Decision Log?
183
175
  - [ ] **docs/agent-memory/*.md**: If an agent (reviewer/pm/lead) was used this session, was its memory updated by the wrap-up skill?
176
+ - [ ] **R16 guard evidence**: Run/request the guard command and include its exact summary. Any guard error forbids `DONE`/`DONE_WITH_CONCERNS`:
177
+ `HARNESS_GUARD_ROOT="$PWD" node /path/to/k-harness/scripts/harness-guard.js docs/project-state.md`
184
178
 
185
179
  For each missing update: flag as `[STATE-AUDIT]` in the output and provide the exact update that should be made.
186
180
  **Severity**:
@@ -210,6 +204,8 @@ When review result is DONE or DONE_WITH_CONCERNS (no blockers):
210
204
 
211
205
  If review is BLOCKED → do NOT suggest commit. Fix first.
212
206
 
207
+ Before commit guidance, run `git status --short`; do not imply a commit exists unless `git log --oneline -1` confirms it.
208
+
213
209
  ### Output Format
214
210
 
215
211
  ```
@@ -223,6 +219,7 @@ If review is BLOCKED → do NOT suggest commit. Fix first.
223
219
 
224
220
  ### Passed Items
225
221
  - Architecture rules: ✅
222
+ - Story Contract Review: ✅ / ❌ / ⚠️ (include table when contracts exist)
226
223
  - Test integrity: ✅ / ⚠️ (detail)
227
224
  - Working proof: command/evidence + PASS result
228
225
  - Proof Ledger: compact table with evidence, result, and command/observation
@@ -276,6 +273,7 @@ After review completes, always append a 🧭 block based on the outcome:
276
273
  | All checks pass, more stories remain | Commit → `lead` — "커밋 후 다음 Story는?" |
277
274
  | All checks pass, all stories done | Commit → `wrap-up` — "커밋 후 세션을 마무리해줘" |
278
275
  | STATE-AUDIT flags found | Two valid paths: (1) `wrap-up` now → "지금 state 파일을 정리해줘" or (2) `lead` → continue coding, resolve at session end |
276
+ | FILTER_COVERAGE or SECTION_CORRUPTION found | [Fix] — "테스트/상태 구조 지적사항을 수정하세요. 완료 후 다시 reviewer 호출" |
279
277
  | Security/architecture issues blocking | [Fix] — "리뷰 지적사항을 수정하세요. 완료 후 **새 프롬프트**에서 다시 `@reviewer` 호출" |
280
278
 
281
279
  Example 🧭 block for passing review:
@@ -61,6 +61,17 @@
61
61
  | S1-1 | First usable result | Proof Pending | npm test | - | tests not run |
62
62
  -->
63
63
 
64
+ ## Story Contracts
65
+
66
+ <!-- Semantic acceptance contracts for active Stories.
67
+ Keep rows short and testable. Every row for a ✅ done Story must be proven.
68
+ Use this to prevent semantic drift such as implementing `risk` when the contract requires `slaRisk`.
69
+
70
+ | Story | Contract | Required Assertion | Proof Status | Evidence |
71
+ |-------|----------|--------------------|--------------|----------|
72
+ | S1-1 | Field contract | Public response exposes `expectedField`, not `wrongField` | ⬜ not proven | unit/API/UI assertion |
73
+ -->
74
+
64
75
  ## Proof Ledger
65
76
 
66
77
  <!-- One line per completed proof. Do not paste long logs.
@@ -78,6 +78,7 @@ Ensures bottom-up implementation: foundations first, then layers that depend on
78
78
  - Each task should be completable in one session
79
79
  - Every task must include its test files
80
80
  - Implementation and tests belong in the same Wave whenever possible. Do not defer tests to a later Wave unless the proof harness itself is the earlier Wave.
81
+ - Each Story/Wave must preserve the Story Contracts defined by pm. If a task changes a public field, API response, domain invariant, UI label, persistence behavior, or ARB control, name the affected contract and the assertion that will prove it.
81
82
  - New modules MUST be registered in docs/dependency-map.md (Iron Law #6) — the breakdown OUTPUT section lists these registrations, and pm (or the user, if invoked directly) is responsible for executing the actual state file writes
82
83
  - If a task exceeds Story scope, stop and report to user
83
84
 
@@ -87,7 +88,7 @@ After completing the breakdown, update these files in the same session:
87
88
 
88
89
  - [ ] **docs/dependency-map.md**: Register all NEW_MODULE entries. Update "Depends On" / "Depended By" for INTERFACE_CHANGE entries.
89
90
  - [ ] **docs/features.md**: Add a new row for the feature with Status `🔧 active`, Key Files from Wave tasks, and Test Files.
90
- - [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave.
91
+ - [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave and add/update `## Story Contracts` rows for semantic assertions that must be proven before Done.
91
92
 
92
93
  ### 🧭 Navigation — After Feature Breakdown
93
94
 
@@ -40,6 +40,17 @@ Unlike the `reviewer` agent (which reviews your own changes pre-commit with full
40
40
 
41
41
  ### Step 4: Code Quality
42
42
 
43
+ Before ordinary code-quality comments, run the **Acceptance Contract Gate**:
44
+
45
+ 1. Read `docs/project-state.md` → `## Story Contracts`.
46
+ 2. If the PR's Story has contract rows, verify each required assertion against the diff, tests, and proof evidence.
47
+ 3. Produce:
48
+ | Contract | Status | Evidence |
49
+ |----------|--------|----------|
50
+ | Field/API/domain/UI contract | PASS / FAIL / NOT_PROVEN | exact file/test/proof evidence |
51
+ 4. Any `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` → `REQUEST_CHANGES`.
52
+ 5. Green tests are insufficient if they assert the wrong semantic contract.
53
+
43
54
  Run through these checks for each changed file:
44
55
 
45
56
  - [ ] Architecture rules respected (no layer violations — check docs/dependency-map.md)
@@ -50,6 +61,7 @@ Run through these checks for each changed file:
50
61
  - [ ] No duplicated logic that should be extracted to a shared module
51
62
  - [ ] No circular imports (verify against docs/dependency-map.md)
52
63
  - [ ] Naming conventions follow project standards (per project-brief.md → Key Technical Decisions)
64
+ - [ ] Evaluator/run-card/scorecard artifacts are not rewritten unless explicitly user-approved
53
65
 
54
66
  ### Step 5: Test Coverage
55
67
 
@@ -57,6 +69,7 @@ Run through these checks for each changed file:
57
69
  - [ ] Modified code has updated tests
58
70
  - [ ] No `.only` or `.skip` left in test files
59
71
  - [ ] Interface changes have synchronized mocks (Iron Law #1)
72
+ - [ ] UI/manual proof is durable: artifact/checklist, or URL plus exact observed UI counts/elements
60
73
 
61
74
  ### Step 6: State File Compliance
62
75
 
@@ -85,6 +98,9 @@ Run through these checks for each changed file:
85
98
  - [issue 1]
86
99
  - [issue 2]
87
100
 
101
+ ### Story Contract Review
102
+ - PASS / FAIL / NOT_PROVEN rows, when `## Story Contracts` exists
103
+
88
104
  ### Test Coverage: ✅ Sufficient / ⚠️ Gaps found
89
105
  [list of gaps if any]
90
106
 
@@ -61,10 +61,8 @@ Use `--overwrite` only to reset corrupted state after backup; then rerun setup t
61
61
  - `sprint-manager.md` → should be renamed to `lead.md`
62
62
  - `navigator.md` → should be renamed to `lead.md`
63
63
  - `builder.md` → should be renamed to `pm.md`
64
- - For each legacy file found:
65
- - If the new name does NOT exist offer to rename: `mv {legacy}.md {new}.md` (preserves history)
66
- - If BOTH exist → ask the user which to keep, or merge contents into the new name and delete the legacy
67
- - Confirm with the user before renaming. Record the migration in `docs/project-state.md` Recent Changes.
64
+ - For each legacy file: offer rename if the new name is absent; if both exist, ask whether to keep or merge.
65
+ - Confirm before renaming and record the migration in Recent Changes.
68
66
 
69
67
  **Do NOT modify any code files in this phase.**
70
68
 
@@ -189,6 +187,7 @@ Using data from Phase 1 + Phase 2, fill the following files:
189
187
  - Quick Summary: filled with current project state
190
188
  - Sprint 1 stories: based on what setup discovered
191
189
  - Module Registry: summary from docs/dependency-map.md
190
+ - Story Contracts: keep the section. Add compact `⬜ not proven` rows for explicit field/API/invariant/UI/ARB contracts; if unsure, write `needs-user-confirmation`.
192
191
 
193
192
  **docs/failure-patterns.md**:
194
193
  - Keep FP-001 through FP-004 as templates (Frequency: 0)
@@ -131,6 +131,61 @@ Outcomes:
131
131
  - Missing local path, blank required column, invalid vocabulary, local machine path leakage, or restricted row with external target → WARN
132
132
  - No `Project Docs Hub Index` section or no real rows → skip; this project may not use docs-bridge
133
133
 
134
+ ### Check 9: Story Contract Coverage
135
+
136
+ 1. Read `docs/project-state.md` (or `.harness/project-state.md` in Team mode).
137
+ 2. If any Story is marked `✅ done`, inspect `## Story Contracts`.
138
+ 3. Outcomes:
139
+ - Done Story has no `## Story Contracts` section → WARN: `[WARN] Story {S-N-M} done but no Story Contracts section — add semantic contract rows for new work`
140
+ - Done Story has no rows in `## Story Contracts` → WARN: `[WARN] Story {S-N-M} done but has no Story Contract rows`
141
+ - Story Contracts table is missing required columns `Story`, `Contract`, or `Proof Status` → FAIL
142
+ - Done Story has any contract row with blank / `⬜ not proven` / `NOT_PROVEN` / `FAIL` / `needs-user-confirmation` → FAIL
143
+ - Done Story has all rows `✅ pass`, `proven`, `verified`, or equivalent → PASS
144
+ - In-progress Story has unproven contracts → PASS; proof pending is normal
145
+
146
+ This is the semantic acceptance gate. It exists to catch drift such as "contract requires `slaRisk`, implementation/tests/UI use `risk`" before DONE.
147
+
148
+ ### Check 10: Durable Smoke Evidence
149
+
150
+ For each `✅ done` Story, any passing Proof Ledger row whose Evidence says `UI`, `browser`, `smoke`, or `manual` must include durable proof:
151
+ - screenshot/trace/video/Playwright/Cypress artifact, or
152
+ - checklist row, or
153
+ - URL plus exact observed UI counts/elements.
154
+
155
+ Vague rows like `UI manual | pass | checked browser` → FAIL.
156
+
157
+ ### Check 11: Evaluator Artifact Protection
158
+
159
+ If changed files include `experiment/kode-harness-scorecard.md`, `experiment/run-card.md`, `experiment/evaluator-*.md`, or `docs/evaluator-*.md`, WARN unless explicitly requested by the user. If the file has `<!-- harness-owner: evaluator -->`, FAIL unless it also has `<!-- harness-edit-approved: <reason> -->`.
160
+
161
+ ### Check 12: Scope Split Approval
162
+
163
+ If `docs/project-brief.md` maps one FR/KPI/ARB row to multiple Story IDs, require `Scope split approved: <reason>` or `<!-- harness-scope-split-approved: ... -->`. Without approval → FAIL.
164
+
165
+ ### Check 13: Recent Changes Section Integrity
166
+
167
+ `docs/project-state.md` must keep session changelog entries separate from proof/evidence sections.
168
+
169
+ 1. Read `## Recent Changes`.
170
+ 2. FAIL if the section contains nested headings such as `### Scenario`, `### Color Coding`, or `### Conclusion`.
171
+ 3. FAIL if the section contains UI/proof checklist bullets such as dropdown/column/color-coding/evidence details.
172
+ 4. Valid Recent Changes entries should be compact changelog rows, preferably:
173
+ `- YYYY-MM-DD S{N}-{M}: {what changed} (STATUS: DONE)`
174
+
175
+ This catches wrap-up corruption where `## Recent Changes` is inserted in the middle of `FR-008 Durable UI Evidence` and steals the remaining proof content.
176
+
177
+ ### Check 14: Self-Verify Claim Integrity (R16)
178
+
179
+ If `docs/project-state.md` or the caller output claims `state-check PASS`, `0 FAIL`, `0 WARN`, or `guard no issues`, the claim must be backed by deterministic evidence:
180
+
181
+ 1. Prefer running the installed guard command:
182
+ `HARNESS_GUARD_ROOT="$PWD" node /path/to/k-harness/scripts/harness-guard.js docs/project-state.md`
183
+ 2. If CLI execution is unavailable, do not claim `0 FAIL, 0 WARN`; say `manual state-check only`.
184
+ 3. FAIL if any markdown/state/contract/handoff/env-seal issue is visible while the file claims clean self-verify.
185
+ 4. FAIL if the guard output is summarized but not shown.
186
+
187
+ This catches reports such as "state-check PASS: 0 FAIL, 0 WARN" when a Proof Ledger table is malformed or Environment Seal is missing.
188
+
134
189
  ## Output Format
135
190
 
136
191
  ```
@@ -158,6 +213,22 @@ Outcomes:
158
213
  ### Check 8: Project Docs Hub Index
159
214
  - {N} rows checked / {M} missing paths / {K} visibility warnings / skipped if unused
160
215
 
216
+ ### Check 9: Story Contract Coverage
217
+ - {N} done Stories checked / {M} missing contract rows / {K} unproven contracts
218
+
219
+ ### Check 10: Durable Smoke Evidence
220
+ - {N} UI/manual proof rows checked / {M} vague rows
221
+
222
+ ### Check 12: Scope Split Approval
223
+ - {N} split tracker rows checked / {M} missing approval
224
+
225
+ ### Check 13: Recent Changes Section Integrity
226
+ - Recent Changes contains only changelog entries / {M} misplaced evidence lines
227
+
228
+ ### Check 14: Self-Verify Claim Integrity
229
+ - Guard output: shown / missing
230
+ - Clean PASS claim matches deterministic result: yes/no
231
+
161
232
  <!-- CREW_MODE_START -->
162
233
  ### Check 6: Validation Tracker (🟣)
163
234
  - {N} FR references checked / {M} drifted
@@ -197,7 +268,7 @@ When invoked by another agent (pm/reviewer/wrap-up), control returns to the call
197
268
 
198
269
  - Do NOT invent data. Read the files and report exactly what you find.
199
270
  - Do NOT modify state files in this skill — diagnosis only. Caller decides remediation.
200
- - Do NOT run shell scripts. All checks are markdown-described file reads + comparisons.
271
+ - Do NOT invent deterministic results. If a guard CLI is available, run it; otherwise mark clean PASS claims as manual-only, not `0 FAIL, 0 WARN`.
201
272
  - If a check cannot be performed (e.g., `docs/` missing entirely), report it as FAIL and stop — further checks are meaningless.
202
273
 
203
274
  ## Anti-patterns