@kodevibe/harness 0.11.2 → 0.11.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -389,7 +389,7 @@ It adds a Project Docs Hub Index to `project-brief.md` with each local source, r
389
389
 
390
390
  ## Roadmap
391
391
 
392
- kode:harness is at **v0.11.2** — adds deterministic source-repo guardrails and a manifest-sealed R10 model benchmark workflow on top of the v0.11 proof-first and uninstall safety foundation.
392
+ kode:harness is at **v0.11.3** — adds R15 experiment hardening for section integrity, Wave Scope drift, and filter coverage honesty on top of the v0.11 proof-first and deterministic release guard foundation.
393
393
 
394
394
  | Phase | Version | Status | Focus |
395
395
  |---|---|---|---|
@@ -405,7 +405,8 @@ kode:harness is at **v0.11.2** — adds deterministic source-repo guardrails and
405
405
  | **Confidence Loop** | v0.10.0 | ✅ Done | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content regression tests |
406
406
  | **Proof-First Enforcement** | v0.11.0 | ✅ Complete | Mandatory Proof Plan, lead proof blockers, reviewer proof blockers, state-check Proof Ledger coverage |
407
407
  | **Uninstall Safety** | v0.11.1 | ✅ Complete | Manifest-based uninstall, default state preservation, shared owner restore, purge cleanup |
408
- | **Deterministic Release Guard** | v0.11.2 | ✅ Current | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
408
+ | **Deterministic Release Guard** | v0.11.2 | ✅ Complete | R1-R10 guard scripts, package-boundary scan, dependency-map scan, R10 manifest-sealed bench workflow |
409
+ | **Experiment Hardening** | v0.11.3 | ✅ Current | R15 Recent Changes integrity, Wave Scope boundary drift checks, enum/filter coverage honesty, R15 bench scenarios |
409
410
  | **Docs Bridge** | v0.11.1 | 🧪 Experimental | Project Docs Hub Index, docs-bridge skill, local docs hub index with visibility boundaries |
410
411
  | **Safety & Branding** | v0.9.6 | ✅ Done | init overwrite backups, shipped pm naming cleanup, LICENSE branding cleanup |
411
412
  | **Validation** | v1.0 | 🔜 Next | Real-world project adoption, user feedback collection |
@@ -104,15 +104,18 @@ After every status check, recommend the next action based on current context:
104
104
 
105
105
  **Request: "story done" / "S{N}-{M} done"**
106
106
  1. Read the Story's Proof Plan and current Evidence-Gated Progress Board row.
107
- 2. Require proof before marking done:
107
+ 2. Read `## Story Contracts` rows for the Story. Every row must be proven before Done.
108
+ 3. Require proof before marking done:
108
109
  - Passing proof → set state to `Proven`, update Story status to `✅ done`, append Proof Ledger / Evidence Summary row.
109
110
  - Missing proof → keep state `Proof Pending`, output `[BLOCKER: PROOF_MISSING]`, and do not advance to the next Story.
110
111
  - Failing proof → keep state `Implementing`, output `[BLOCKER: PROOF_FAILING]`, and fix within current Story.
111
- 3. Add completion record to "Recent Changes" section only after passing proof.
112
- 4. **Commit/Push check**: If changes are uncommitted, remind:
112
+ - Unproven contract keep state `Proof Pending`, output `[BLOCKER: CONTRACT_NOT_PROVEN]`, and update the contract Evidence target.
113
+ 4. If proof passes, update Story Contract Proof Status to `✅ pass` / `proven` with evidence. Never leave `needs-user-confirmation` on a done Story.
114
+ 5. Add completion record to "Recent Changes" section only after passing proof.
115
+ 6. **Commit/Push check**: If changes are uncommitted, remind:
113
116
  - "⚠️ S{N}-{M} 완료 — 커밋하셨나요? `git add <files> && git commit -m \"S{N}-{M}: {description}\"`"
114
117
  - Team mode: Also remind to push — "팀원에게 공유하려면 `git push origin {branch}` 실행"
115
- 5. Guide to next Story only after proof passes.
118
+ 7. Guide to next Story only after proof passes.
116
119
 
117
120
  **Request: "new story" / "next task"**
118
121
  1. Find next `todo` Story in docs/project-state.md
@@ -147,7 +150,12 @@ When invoked after pm approval, verify that pm wrote state files correctly:
147
150
 
148
151
  When a Story contains multiple Tasks/Waves (from breakdown):
149
152
  - Guide implementation **one Wave at a time** (not one file at a time, not all at once)
153
+ - For every Wave, print a compact **Wave Scope** before implementation:
154
+ `Wave {N} Scope: allowed files = <files>; expected proof = <command/manual observation>`
150
155
  - After each Wave is implemented, **run tests or smoke proof** to verify the Wave is clean before proceeding
156
+ - After each Wave, compare changed files against the Wave Scope:
157
+ - Only allowed files changed → continue.
158
+ - Extra files changed → output `[SCOPE-DRIFT: WAVE_BOUNDARY]`, record the extra files, and ask whether the Wave should be collapsed/approved before proceeding.
151
159
  - Record a mini Proof Ledger row inline: Evidence, Result, Command / Observation
152
160
  - Only after verification passes, prompt: "Wave {N} 완료 (tests pass). Wave {N+1}로 넘어갈까요?"
153
161
  - If tests fail → output `[BLOCKER: WAVE_PROOF_FAILING]`, fix within the current Wave, and do NOT advance.
@@ -70,7 +70,7 @@ Apply these insights when creating the implementation plan. If the memory file i
70
70
  <!-- CREW_MODE_END -->
71
71
 
72
72
  1. `docs/project-brief.md`의 Goals + `docs/dependency-map.md`의 현재 모듈 구조를 읽는다
73
- 2. Feature Roadmap **초안** 생성:
73
+ 2. Roadmap 초안 생성:
74
74
  ```
75
75
  ## Feature Roadmap
76
76
  ### Phase 1 — Core (Goal 달성 필수)
@@ -78,9 +78,8 @@ Apply these insights when creating the implementation plan. If the memory file i
78
78
  ### Phase 2 — Enhancement (사용성/완성도)
79
79
  - [ ] F-002: ...
80
80
  ```
81
- 3. 사용자에게 초안 제시: **"이 Feature Roadmap을 검토하고, 추가/삭제/순서 변경을 알려주세요."**
82
- 4. 사용자 교정을 반영한 최종 Roadmap을 `docs/project-brief.md`에 `## Feature Roadmap` 섹션으로 기록한다
83
- 5. Feature Roadmap이 확정되면 아래 "For New Feature" 절차로 진행한다
81
+ 3. 교정 `docs/project-brief.md`에 기록한다
82
+ 4. 확정되면 "For New Feature"로 진행한다
84
83
 
85
84
  ### For New Feature
86
85
 
@@ -132,10 +131,11 @@ Apply these insights when creating the implementation plan. If the memory file i
132
131
  10. Run **check-impact** skill for each existing module being modified (pm calls both skills independently — breakdown does NOT invoke check-impact internally. Ordering: breakdown first → register modules → check-impact second.)
133
132
  11. Check `docs/failure-patterns.md` for relevant past mistakes
134
133
  12. Produce a **Goal Card** (6 lines max) and implementation plan.
135
- 13. Produce a **Proof Plan** per Story: exact test/smoke command or checklist; never TBD. No path → add Story 0: set up test/smoke proof. Any `TBD`/blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP before state writes.
136
- 14. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) — do NOT write state files yet
137
- 15. **After user approves** Update `docs/project-state.md` with the new Story
138
- 16. **After user approves** → Update `docs/features.md` with the new feature entry
134
+ 13. Produce a **Proof Plan** per Story: exact test/smoke/manual proof; never TBD. No path → add Story 0: set up test/smoke proof. Blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP.
135
+ 14. Produce **Story Contracts**: Done-blocking field/API/UI/ARB assertions.
136
+ 15. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) do NOT write state files yet
137
+ 16. **After user approves** → Update `docs/project-state.md` with the new Story and Story Contract rows
138
+ 17. **After user approves** → Update `docs/features.md` with the new feature entry
139
139
 
140
140
  State writes (Steps 15-16) execute ONLY after user approval. Rejected plans never touch state.
141
141
 
@@ -185,8 +185,10 @@ After user approves the plan, perform all writes before 🧭:
185
185
  - If no Sprint exists, create Sprint 1 with theme
186
186
  - Add Story rows to the Story Status table (status = `⬜ todo`)
187
187
  - Each Story: ID (S{N}-{M}), Title, Status, Scope (files/modules), Proof Plan
188
+ - Add `## Story Contracts` rows. Initial status: `⬜ not proven` or `needs-user-confirmation`.
188
189
  <!-- CREW_MODE_START -->
189
190
  - If crew-driven, include FR reference
191
+ - Convert FR/ARB criteria naming fields, API contracts, invariants, proof gates, or UI requirements.
190
192
  <!-- CREW_MODE_END -->
191
193
  - Update Quick Summary section
192
194
 
@@ -201,10 +203,11 @@ After user approves the plan, perform all writes before 🧭:
201
203
  - ARB Fail Resolution: fill Story column with mapped Story IDs
202
204
  <!-- CREW_MODE_END -->
203
205
 
204
- **Completion Check**: Verify:
205
- - [ ] features.md has new feature row(s)
206
- - [ ] project-state.md has Story rows with `⬜ todo` status
207
- - [ ] dependency-map.md has new module rows (if plan introduces new modules)
206
+ **Completion Check**:
207
+ - [ ] features.md new row(s)
208
+ - [ ] project-state.md Story rows with `⬜ todo`
209
+ - [ ] project-state.md Story Contract rows or "none identified"
210
+ - [ ] dependency-map.md new module rows (if any)
208
211
  <!-- CREW_MODE_START -->
209
212
  - [ ] project-brief.md Validation Tracker updated (if 🟣 pipeline)
210
213
  <!-- CREW_MODE_END -->
@@ -247,6 +250,11 @@ After the Post-Approval state writes complete, run the `state-check` skill:
247
250
  | S{N}-0 | Proof setup, if needed | `npm test` / `npm run smoke` / manual checklist |
248
251
  | S{N}-{M} | Tests / smoke / manual | exact command/checklist; never TBD |
249
252
 
253
+ ### Story Contract Plan
254
+ | Story | Contract | Required Assertion | Initial Proof Status | Evidence Target |
255
+ |-------|----------|--------------------|----------------------|-----------------|
256
+ | S{N}-{M} | Field/API/domain/UI contract | assertion to prove before Done | ⬜ not proven | unit/API/smoke/manual |
257
+
250
258
  ### Implementation Plan
251
259
  [Output from breakdown skill]
252
260
 
@@ -57,6 +57,7 @@ Changed file list (user-provided or from `git diff --name-only`)
57
57
  **Step 1: Identify Change Scope**
58
58
  - Run `git diff --cached --stat` or `git diff --stat` to see changed files
59
59
  - Compare against current Story scope in docs/project-state.md
60
+ - If lead/pm named a Wave Scope, changed files outside that Wave are `[SCOPE-DRIFT: WAVE_BOUNDARY]`. Do not revert automatically; require user approval or a state note explaining the collapsed wave.
60
61
 
61
62
  **Step 2: Architecture Rule Check**
62
63
  - [ ] No imports from infrastructure in domain layer
@@ -64,6 +65,15 @@ Changed file list (user-provided or from `git diff --name-only`)
64
65
  - [ ] Constructor parameters match actual source (FP-002)
65
66
  - [ ] **Common First (Iron Law #9)**: No crew-specific logic outside crew marker blocks. All features must work without crew artifacts.
66
67
 
68
+ **Step 2.2: Acceptance Contract Gate**
69
+
70
+ If `docs/project-state.md` has `## Story Contracts` rows for the Story:
71
+ 1. Review each row before code-quality review.
72
+ 2. Compare assertion vs code, tests, API/UI output, and proof.
73
+ 3. Output **Story Contract Review**: `Contract | Status | Evidence`.
74
+ 4. `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` blocks `DONE` and commit guidance.
75
+ 5. Wrong-contract tests fail.
76
+
67
77
  <!-- CREW_MODE_START -->
68
78
  **Step 2.5: CI Standards Compliance (🟣 Pipeline only)**
69
79
 
@@ -79,11 +89,11 @@ Changed file list (user-provided or from `git diff --name-only`)
79
89
  1. Check if `docs/project-brief.md` has a `## CI Artifact Index` section (or `.harness/ci-index.md` exists). If neither → skip this step.
80
90
  2. Read the project's primary language/build tool from `docs/project-brief.md` Key Technical Decisions.
81
91
  3. Match the language/build tool to a row in the CI Artifact Index → get the reference URL and Key Constraints.
82
- 4. Surface the reference in review output under a `### CI Standards Compliance` section:
92
+ 4. Surface the reference under `### CI Standards Compliance`:
83
93
  - Reference URL (the indexed guide)
84
- - Key Constraints listed in the index (echoed back so the user does not need to re-read the guide)
85
- - `[CI-STANDARD]` flag if any obvious mismatch is detected against a listed constraint (best-effort LLM judgment based only on the changed files)
86
- 5. **Warning only — do NOT block commit**. The user (or a human reviewer) decides whether the changes meet company standards. The reviewer agent never asserts compliance; it only points to the authoritative guide.
94
+ - Key Constraints from the index
95
+ - `[CI-STANDARD]` if changed files obviously mismatch a listed constraint
96
+ 5. **Warning only — do NOT block commit**. Point to the guide; do not assert compliance.
87
97
 
88
98
  If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip this step entirely (also true for 🟢/🔵/🔴 pipelines).
89
99
  <!-- CREW_MODE_END -->
@@ -92,10 +102,12 @@ If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip
92
102
  - [ ] Interface changes have synchronized mocks (FP-001)
93
103
  - [ ] New features have tests
94
104
  - [ ] Existing tests pass
105
+ - [ ] **Enum/filter coverage**: If the Story adds finite values (e.g. `normal/watch/breached`), tests must cover every value at the claimed boundary (domain/API/UI) or mark untested boundaries as partial. Claiming full API coverage while HTTP tests exercise one value → `[ACCEPTANCE-GAP: FILTER_COVERAGE]`.
95
106
 
96
- **Verification is a gate, not a suggestion.** Before continuing to Step 4, the reviewer must include concrete working proof:
97
- - Run the project's test/verification command when available (for example `npm test`, `pnpm test`, `pytest`, `go test ./...`, or the command recorded in docs/project-brief.md / package scripts).
98
- - If the change is user-facing and tests do not exercise the behavior, include a minimal smoke proof (command, URL, screenshot/manual action, or observed output).
107
+ **Verification is a gate.** Before Step 4, include concrete working proof:
108
+ - Run the project test/verification command (`npm test`, `pytest`, `go test ./...`, or recorded proof command).
109
+ - If user-facing behavior is untested, include smoke proof (command, URL, screenshot/manual action, or observed output).
110
+ - UI/manual proof needs artifact/checklist or URL + exact counts/elements.
99
111
  - If any existing test fails → output `[BLOCKER: TESTS_FAILING]`. STOP before Step 4.
100
112
  - If a Proof Plan command cannot run → output `[BLOCKER: PROOF_COMMAND_INVALID]` with the command. STOP.
101
113
  - If test files exist but no test command exists → output `[BLOCKER: NO_TEST_COMMAND]`. STOP.
@@ -114,6 +126,7 @@ If state files are in scope, write/request Proof Ledger / Evidence Summary immed
114
126
  - [ ] No credentials, .env, or temp files in staging (FP-004)
115
127
  - [ ] No hardcoded API keys or passwords
116
128
  - [ ] No injection vulnerabilities (SQL, XSS)
129
+ - [ ] Evaluator artifacts require approval (`harness-owner: evaluator` → `harness-edit-approved`)
117
130
 
118
131
  **Step 5: Failure Pattern Cross-Check**
119
132
  - Compare current changes against all FP-NNN items in docs/failure-patterns.md
@@ -124,28 +137,9 @@ If state files are in scope, write/request Proof Ledger / Evidence Summary immed
124
137
 
125
138
  If `docs/project-brief.md` contains a `## Crew Artifact Index` table with entries:
126
139
 
127
- 1. **ARB Fail Item Check**:
128
- - Read Validation Tracker ARB Fail Resolution section
129
- - If the current Story addresses a Fail item (has `[ARB-FAIL]` prefix):
130
- - Read the relevant section in the ARB checklist (path from Artifact Index)
131
- - Verify implementation matches the recommended action
132
- - If not → flag as `[ARB-COMPLIANCE]` in output
133
- - **Indirect resolution check**: Even if the Story does NOT have `[ARB-FAIL]` prefix, scan the changed files against ARB Fail items. If a change resolves or partially addresses a Fail item (e.g., fixing a security vulnerability flagged by ARB), flag as `[ARB-INDIRECT]` in output with a recommendation to update the Validation Tracker.
134
-
135
- 2. **NFR Spot Check** (lightweight — check only NFRs relevant to changed files):
136
- - Read PRD's non-functional requirements section (path from Artifact Index)
137
- - Check ONLY the NFRs related to changed code:
138
- - Performance-related change? → Check performance NFRs
139
- - Security-related change? → Check security NFRs
140
- - API change? → Check scalability/reliability NFRs
141
- - Flag violations as `[NFR-GAP]` in output
142
- - Note: This is a best-effort check by the LLM, not a guarantee of 100% detection
143
-
144
- 3. **FR Acceptance Criteria Check**:
145
- - If the current Story has `[FR-NNN]` reference:
146
- - Read the corresponding FR acceptance criteria from PRD (path from Artifact Index)
147
- - Verify tests cover the acceptance criteria
148
- - If missing → flag as `[ACCEPTANCE-GAP]` in output
140
+ 1. **ARB Fail Item Check**: If the Story has `[ARB-FAIL]`, verify the related ARB checklist recommendation. If changed files indirectly resolve a Fail item, flag `[ARB-INDIRECT]` and recommend Tracker update.
141
+ 2. **NFR Spot Check**: Read only NFRs relevant to changed files (security/performance/API/reliability). Violations → `[NFR-GAP]`.
142
+ 3. **FR Acceptance Criteria Check**: If the Story references `FR-NNN`, verify tests/proof cover the PRD acceptance criteria. Missing coverage → `[ACCEPTANCE-GAP]`.
149
143
 
150
144
  All flags (`[ARB-COMPLIANCE]`, `[ARB-INDIRECT]`, `[NFR-GAP]`, `[ACCEPTANCE-GAP]`) are warnings, not blockers. Include them in the review output under a new "### Crew Artifact Compliance" section.
151
145
 
@@ -171,6 +165,7 @@ Verify that state file updates actually happened. **Run the `state-check` skill
171
165
 
172
166
  After running state-check, also verify:
173
167
  - [ ] **docs/project-state.md**: If stories were worked on, is Quick Summary current? Are story statuses updated?
168
+ - [ ] **docs/project-state.md section integrity**: `## Recent Changes` must not contain proof/evidence headings or UI checklist bullets. If it does, flag `[STATE-AUDIT: SECTION_CORRUPTION]` and fix before DONE.
174
169
  - [ ] **docs/features.md**: If new features were added, are they registered? If features were completed, is status updated?
175
170
  - [ ] **Cross-check features ↔ stories**: If a feature status is `✅ done` in features.md, verify all related stories in project-state.md are also `done`. If stories are `done` but their feature is still `🔄 in-progress`, flag as `[STATE-AUDIT]`.
176
171
  - [ ] **FR Coverage validation**: For the Story being reviewed, check if it implements a feature (FR-NNN reference in Story name, or changes to files listed in features.md Key Files):
@@ -223,6 +218,7 @@ If review is BLOCKED → do NOT suggest commit. Fix first.
223
218
 
224
219
  ### Passed Items
225
220
  - Architecture rules: ✅
221
+ - Story Contract Review: ✅ / ❌ / ⚠️ (include table when contracts exist)
226
222
  - Test integrity: ✅ / ⚠️ (detail)
227
223
  - Working proof: command/evidence + PASS result
228
224
  - Proof Ledger: compact table with evidence, result, and command/observation
@@ -276,6 +272,7 @@ After review completes, always append a 🧭 block based on the outcome:
276
272
  | All checks pass, more stories remain | Commit → `lead` — "커밋 후 다음 Story는?" |
277
273
  | All checks pass, all stories done | Commit → `wrap-up` — "커밋 후 세션을 마무리해줘" |
278
274
  | STATE-AUDIT flags found | Two valid paths: (1) `wrap-up` now → "지금 state 파일을 정리해줘" or (2) `lead` → continue coding, resolve at session end |
275
+ | FILTER_COVERAGE or SECTION_CORRUPTION found | [Fix] — "테스트/상태 구조 지적사항을 수정하세요. 완료 후 다시 reviewer 호출" |
279
276
  | Security/architecture issues blocking | [Fix] — "리뷰 지적사항을 수정하세요. 완료 후 **새 프롬프트**에서 다시 `@reviewer` 호출" |
280
277
 
281
278
  Example 🧭 block for passing review:
@@ -61,6 +61,17 @@
61
61
  | S1-1 | First usable result | Proof Pending | npm test | - | tests not run |
62
62
  -->
63
63
 
64
+ ## Story Contracts
65
+
66
+ <!-- Semantic acceptance contracts for active Stories.
67
+ Keep rows short and testable. Every row for a ✅ done Story must be proven.
68
+ Use this to prevent semantic drift such as implementing `risk` when the contract requires `slaRisk`.
69
+
70
+ | Story | Contract | Required Assertion | Proof Status | Evidence |
71
+ |-------|----------|--------------------|--------------|----------|
72
+ | S1-1 | Field contract | Public response exposes `expectedField`, not `wrongField` | ⬜ not proven | unit/API/UI assertion |
73
+ -->
74
+
64
75
  ## Proof Ledger
65
76
 
66
77
  <!-- One line per completed proof. Do not paste long logs.
@@ -78,6 +78,7 @@ Ensures bottom-up implementation: foundations first, then layers that depend on
78
78
  - Each task should be completable in one session
79
79
  - Every task must include its test files
80
80
  - Implementation and tests belong in the same Wave whenever possible. Do not defer tests to a later Wave unless the proof harness itself is the earlier Wave.
81
+ - Each Story/Wave must preserve the Story Contracts defined by pm. If a task changes a public field, API response, domain invariant, UI label, persistence behavior, or ARB control, name the affected contract and the assertion that will prove it.
81
82
  - New modules MUST be registered in docs/dependency-map.md (Iron Law #6) — the breakdown OUTPUT section lists these registrations, and pm (or the user, if invoked directly) is responsible for executing the actual state file writes
82
83
  - If a task exceeds Story scope, stop and report to user
83
84
 
@@ -87,7 +88,7 @@ After completing the breakdown, update these files in the same session:
87
88
 
88
89
  - [ ] **docs/dependency-map.md**: Register all NEW_MODULE entries. Update "Depends On" / "Depended By" for INTERFACE_CHANGE entries.
89
90
  - [ ] **docs/features.md**: Add a new row for the feature with Status `🔧 active`, Key Files from Wave tasks, and Test Files.
90
- - [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave.
91
+ - [ ] **docs/project-state.md**: Add Stories to the Story Status table for each Wave and add/update `## Story Contracts` rows for semantic assertions that must be proven before Done.
91
92
 
92
93
  ### 🧭 Navigation — After Feature Breakdown
93
94
 
@@ -40,6 +40,17 @@ Unlike the `reviewer` agent (which reviews your own changes pre-commit with full
40
40
 
41
41
  ### Step 4: Code Quality
42
42
 
43
+ Before ordinary code-quality comments, run the **Acceptance Contract Gate**:
44
+
45
+ 1. Read `docs/project-state.md` → `## Story Contracts`.
46
+ 2. If the PR's Story has contract rows, verify each required assertion against the diff, tests, and proof evidence.
47
+ 3. Produce:
48
+ | Contract | Status | Evidence |
49
+ |----------|--------|----------|
50
+ | Field/API/domain/UI contract | PASS / FAIL / NOT_PROVEN | exact file/test/proof evidence |
51
+ 4. Any `FAIL`, `NOT_PROVEN`, blank Proof Status, or `needs-user-confirmation` → `REQUEST_CHANGES`.
52
+ 5. Green tests are insufficient if they assert the wrong semantic contract.
53
+
43
54
  Run through these checks for each changed file:
44
55
 
45
56
  - [ ] Architecture rules respected (no layer violations — check docs/dependency-map.md)
@@ -50,6 +61,7 @@ Run through these checks for each changed file:
50
61
  - [ ] No duplicated logic that should be extracted to a shared module
51
62
  - [ ] No circular imports (verify against docs/dependency-map.md)
52
63
  - [ ] Naming conventions follow project standards (per project-brief.md → Key Technical Decisions)
64
+ - [ ] Evaluator/run-card/scorecard artifacts are not rewritten unless explicitly user-approved
53
65
 
54
66
  ### Step 5: Test Coverage
55
67
 
@@ -57,6 +69,7 @@ Run through these checks for each changed file:
57
69
  - [ ] Modified code has updated tests
58
70
  - [ ] No `.only` or `.skip` left in test files
59
71
  - [ ] Interface changes have synchronized mocks (Iron Law #1)
72
+ - [ ] UI/manual proof is durable: artifact/checklist, or URL plus exact observed UI counts/elements
60
73
 
61
74
  ### Step 6: State File Compliance
62
75
 
@@ -85,6 +98,9 @@ Run through these checks for each changed file:
85
98
  - [issue 1]
86
99
  - [issue 2]
87
100
 
101
+ ### Story Contract Review
102
+ - PASS / FAIL / NOT_PROVEN rows, when `## Story Contracts` exists
103
+
88
104
  ### Test Coverage: ✅ Sufficient / ⚠️ Gaps found
89
105
  [list of gaps if any]
90
106
 
@@ -189,6 +189,7 @@ Using data from Phase 1 + Phase 2, fill the following files:
189
189
  - Quick Summary: filled with current project state
190
190
  - Sprint 1 stories: based on what setup discovered
191
191
  - Module Registry: summary from docs/dependency-map.md
192
+ - Story Contracts: keep the section. Add compact `⬜ not proven` rows for explicit field/API/invariant/UI/ARB contracts; if unsure, write `needs-user-confirmation`.
192
193
 
193
194
  **docs/failure-patterns.md**:
194
195
  - Keep FP-001 through FP-004 as templates (Frequency: 0)
@@ -131,6 +131,49 @@ Outcomes:
131
131
  - Missing local path, blank required column, invalid vocabulary, local machine path leakage, or restricted row with external target → WARN
132
132
  - No `Project Docs Hub Index` section or no real rows → skip; this project may not use docs-bridge
133
133
 
134
+ ### Check 9: Story Contract Coverage
135
+
136
+ 1. Read `docs/project-state.md` (or `.harness/project-state.md` in Team mode).
137
+ 2. If any Story is marked `✅ done`, inspect `## Story Contracts`.
138
+ 3. Outcomes:
139
+ - Done Story has no `## Story Contracts` section → WARN: `[WARN] Story {S-N-M} done but no Story Contracts section — add semantic contract rows for new work`
140
+ - Done Story has no rows in `## Story Contracts` → WARN: `[WARN] Story {S-N-M} done but has no Story Contract rows`
141
+ - Story Contracts table is missing required columns `Story`, `Contract`, or `Proof Status` → FAIL
142
+ - Done Story has any contract row with blank / `⬜ not proven` / `NOT_PROVEN` / `FAIL` / `needs-user-confirmation` → FAIL
143
+ - Done Story has all rows `✅ pass`, `proven`, `verified`, or equivalent → PASS
144
+ - In-progress Story has unproven contracts → PASS; proof pending is normal
145
+
146
+ This is the semantic acceptance gate. It exists to catch drift such as "contract requires `slaRisk`, implementation/tests/UI use `risk`" before DONE.
147
+
148
+ ### Check 10: Durable Smoke Evidence
149
+
150
+ For each `✅ done` Story, any passing Proof Ledger row whose Evidence says `UI`, `browser`, `smoke`, or `manual` must include durable proof:
151
+ - screenshot/trace/video/Playwright/Cypress artifact, or
152
+ - checklist row, or
153
+ - URL plus exact observed UI counts/elements.
154
+
155
+ Vague rows like `UI manual | pass | checked browser` → FAIL.
156
+
157
+ ### Check 11: Evaluator Artifact Protection
158
+
159
+ If changed files include `experiment/kode-harness-scorecard.md`, `experiment/run-card.md`, `experiment/evaluator-*.md`, or `docs/evaluator-*.md`, WARN unless explicitly requested by the user. If the file has `<!-- harness-owner: evaluator -->`, FAIL unless it also has `<!-- harness-edit-approved: <reason> -->`.
160
+
161
+ ### Check 12: Scope Split Approval
162
+
163
+ If `docs/project-brief.md` maps one FR/KPI/ARB row to multiple Story IDs, require `Scope split approved: <reason>` or `<!-- harness-scope-split-approved: ... -->`. Without approval → FAIL.
164
+
165
+ ### Check 13: Recent Changes Section Integrity
166
+
167
+ `docs/project-state.md` must keep session changelog entries separate from proof/evidence sections.
168
+
169
+ 1. Read `## Recent Changes`.
170
+ 2. FAIL if the section contains nested headings such as `### Scenario`, `### Color Coding`, or `### Conclusion`.
171
+ 3. FAIL if the section contains UI/proof checklist bullets such as dropdown/column/color-coding/evidence details.
172
+ 4. Valid Recent Changes entries should be compact changelog rows, preferably:
173
+ `- YYYY-MM-DD S{N}-{M}: {what changed} (STATUS: DONE)`
174
+
175
+ This catches wrap-up corruption where `## Recent Changes` is inserted in the middle of `FR-008 Durable UI Evidence` and steals the remaining proof content.
176
+
134
177
  ## Output Format
135
178
 
136
179
  ```
@@ -158,6 +201,18 @@ Outcomes:
158
201
  ### Check 8: Project Docs Hub Index
159
202
  - {N} rows checked / {M} missing paths / {K} visibility warnings / skipped if unused
160
203
 
204
+ ### Check 9: Story Contract Coverage
205
+ - {N} done Stories checked / {M} missing contract rows / {K} unproven contracts
206
+
207
+ ### Check 10: Durable Smoke Evidence
208
+ - {N} UI/manual proof rows checked / {M} vague rows
209
+
210
+ ### Check 12: Scope Split Approval
211
+ - {N} split tracker rows checked / {M} missing approval
212
+
213
+ ### Check 13: Recent Changes Section Integrity
214
+ - Recent Changes contains only changelog entries / {M} misplaced evidence lines
215
+
161
216
  <!-- CREW_MODE_START -->
162
217
  ### Check 6: Validation Tracker (🟣)
163
218
  - {N} FR references checked / {M} drifted
@@ -20,7 +20,7 @@ This is kode:harness's memory mechanism — without it, the same mistakes repeat
20
20
  - After a review revealed a repeated mistake
21
21
  - When the user explicitly asks to record a lesson
22
22
 
23
- > **Timing**: Invoke this skill **once at session end**, not after each individual skill. It aggregates the entire session's work into state files.
23
+ > **Timing**: Invoke once at session end; aggregate the whole session into state files.
24
24
 
25
25
  ## Procedure
26
26
 
@@ -35,7 +35,7 @@ If `git diff --stat` shows no changes and `git log` shows no new commits this se
35
35
  - Report: "📝 Quiet session — status checks only, no code changes."
36
36
  - Skip Step 3 (Failure Pattern Detection) — no code changes means no new failure patterns
37
37
  - Skip Step 5 (features.md update) and Step 5.5 (dependency-map.md verify) — nothing changed
38
- - Still execute Step 4 (Quick Summary update) and Step 6 (Agent Memory) if an agent was used
38
+ - Still execute Step 4 and Step 6 if an agent was used
39
39
  - Still execute Step 2 (Direction Drift Check) — discussion-only drift is possible
40
40
 
41
41
  ### Step 2: Direction Drift Check
@@ -49,14 +49,14 @@ Before recording failures, verify that the session's work stayed aligned with pr
49
49
  - Did the user explicitly change direction during this session? → Note for pivot recommendation
50
50
  3. **If drift detected**:
51
51
  - Add a warning to the Step 7 Report: `⚠️ Direction drift: [description of misalignment]`
52
- - Recommend: "Consider running `pivot` skill to formally update project direction. You can run it AFTER this wrap-up session completes."
52
+ - Recommend `pivot` after wrap-up
53
53
  - Do NOT block — the wrap-up skill always completes
54
54
  4. **If no drift**: Proceed silently (no output for this step)
55
55
 
56
56
  <!-- CREW_MODE_START -->
57
57
  #### Step 2.5: Validation Tracker Update (🟣 Pipeline only) ⚠️ MANDATORY
58
58
 
59
- > **⛔ Completion Gate**: Step 2.5를 완료해야 Step 3 이후를 진행할 수 있습니다. Validation Tracker가 존재하면 반드시 갱신 후 다음 단계로 진행하세요.
59
+ > **⛔ Completion Gate**: Validation Tracker가 있으면 갱신 후 Step 3으로 진행하세요.
60
60
 
61
61
  If `docs/project-brief.md` contains a `## Validation Tracker` section with data:
62
62
 
@@ -70,9 +70,9 @@ If `docs/project-brief.md` contains a `## Validation Tracker` section with data:
70
70
  - Did this session produce Stories/code that don't map to any FR or KPI? → warn
71
71
  - Are there KPIs/FRs with no Stories after 2+ sprints? → warn as "⚠️ Unplanned KPI/FR risk"
72
72
  - Include warnings in Step 7 Report
73
- 5. **Self-check**: Validation Tracker의 KPI/FR/ARB 상태가 이 세션의 완료 Story 정확히 반영하는지 확인. 미갱신 항목이 있으면 즉시 갱신 후 다음 단계로 진행.
73
+ 5. **Self-check**: KPI/FR/ARB 상태가 완료 Story 일치해야 합니다. 누락 즉시 갱신.
74
74
 
75
- > ⛔ Validation Tracker 갱신 없이 Step 3으로 진행하지 마세요. 이 단계를 건너뛰면 FR/KPI Coverage가 실제 진행 상황과 불일치합니다.
75
+ > ⛔ Tracker 갱신 없이 Step 3으로 진행하지 마세요.
76
76
 
77
77
  If no Validation Tracker → skip this step entirely.
78
78
  <!-- CREW_MODE_END -->
@@ -83,7 +83,7 @@ For each issue/error that occurred in this session:
83
83
 
84
84
  1. Read `docs/failure-patterns.md`
85
85
  2. Check if this matches an existing pattern (FP-NNN):
86
- - **If match found AND already incremented by `debug` in this session**: Skip — do not double-count. Check `docs/project-state.md` Recent Changes for debug entries from this session to determine if a pattern was already recorded.
86
+ - **If match found AND already incremented by `debug` in this session**: Skip. Check Recent Changes to avoid double-count.
87
87
  - **If match found AND NOT already incremented this session**: Increment the Frequency counter, add the Sprint/Story to "Occurred"
88
88
  - **If new pattern**: Assign next FP-NNN number, create a new entry using this format:
89
89
 
@@ -100,7 +100,7 @@ For each issue/error that occurred in this session:
100
100
 
101
101
  3. If the failure relates to a specific skill or agent, note it for that skill's checklist
102
102
 
103
- > **Self-check**: Step 3 완료 시 `docs/failure-patterns.md`에 최소 하나의 FP 항목이 있어야 합니다 (이 세션에서 실패가 없었으면 기존 항목 확인만).
103
+ > **Self-check**: `docs/failure-patterns.md` has at least one FP entry; if no new failure, existing entries are enough.
104
104
 
105
105
  ### Step 4: Update docs/project-state.md ⚠️ MANDATORY
106
106
 
@@ -116,6 +116,9 @@ For each issue/error that occurred in this session:
116
116
  ```
117
117
  - [YYYY-MM-DD] S{N}-{M}: {what was done} (STATUS: DONE)
118
118
  ```
119
+ - Append compact changelog rows only. If creating the section, place it before `## Session Handoff Protocol` or EOF.
120
+ - Never insert it inside proof/evidence sections (`Proof Ledger`, durable UI evidence, smoke/manual proof).
121
+ - Self-check: no nested headings or UI/proof checklist bullets under `## Recent Changes`.
119
122
 
120
123
  ### Step 5: Update docs/features.md (if applicable)
121
124
 
@@ -136,7 +139,7 @@ For each issue/error that occurred in this session:
136
139
  - WARN → include warnings in the final wrap-up report, then proceed
137
140
  - FAIL → fix the listed drift (update affected state files), then re-run state-check until PASS or WARN
138
141
 
139
- > **Self-check**: `docs/dependency-map.md`에 이 세션에서 새로 추가한 모듈이 모두 등록되었는지 확인. 누락 시 즉시 추가. state-check PASS 또는 WARN을 반환해야 다음 단계로 진행합니다.
142
+ > **Self-check**: New modules are registered in `docs/dependency-map.md`; state-check is PASS/WARN.
140
143
 
141
144
  ### Step 5.55: Refresh Project Docs Hub Index (if applicable)
142
145
 
@@ -145,7 +148,7 @@ Run only if user used/requested `docs-bridge`, or Project Docs Hub Index has rea
145
148
  1. For changed docs (`README*`, `docs/`, ADR/design/runbook/API specs), refresh indexed review/status.
146
149
  2. For new docs, add `proposed` rows or recommend `docs-bridge`.
147
150
  3. Never write external hubs, invent targets, change visibility, or convert `local-only` / `pending` without explicit request.
148
- 4. Keep filesystem paths, vault names, page IDs, and resolver details out of tracked state; use `.harness/docs-bridge.local.*`.
151
+ 4. Keep private paths/IDs/resolvers out of tracked state; use `.harness/docs-bridge.local.*`.
149
152
 
150
153
  ### Step 5.6: Resolve STATE-AUDIT Flags (if applicable)
151
154
 
@@ -160,8 +163,12 @@ Before session end, record the working proof that justified completion:
160
163
  1. Read reviewer output or recent terminal evidence for passing tests/smoke proof.
161
164
  2. Add one compact row to `docs/project-state.md` → `## Proof Ledger` for each completed Story.
162
165
  3. Cross-check completed Stories against `## Evidence Summary` / `## Proof Ledger`.
163
- 4. If no proof exists, write `[PROOF-GAP] Proof missing` in the wrap-up report and recommend returning to `reviewer`; do not claim the Story is complete.
164
- 5. If `[PROOF-GAP]` exists, STOP before Step 5.65. Do not auto-commit state files that mark a Story done/reviewed without passing proof. Revert the Story to Proof Pending or return to `reviewer`.
166
+ 4. Check `## Story Contracts`: completed Story rows must be `✅ pass`/`proven`; otherwise `[CONTRACT-GAP]`.
167
+ 5. UI/manual/smoke proof needs artifact/checklist or URL + exact counts/elements.
168
+ 6. Keep durable UI evidence in its own section, never under `## Recent Changes`.
169
+ 7. Split FR/KPI/ARB mappings need `Scope split approved: <reason>`.
170
+ 8. If proof is missing, write `[PROOF-GAP]` and return to `reviewer`.
171
+ 9. If `[PROOF-GAP]` or `[CONTRACT-GAP]` exists, STOP before Step 5.65.
165
172
 
166
173
  Proof rows must stay short: Date, Story, Evidence, Result, Command / Observation. Do not paste long logs.
167
174
 
@@ -171,7 +178,7 @@ State file 변경사항을 커밋합니다. Learn 실행 결과가 커밋되지
171
178
 
172
179
  1. Stage state files: `git add docs/project-state.md docs/failure-patterns.md docs/features.md docs/dependency-map.md docs/agent-memory/`
173
180
  2. Commit: `git commit -m "wrap-up: session lessons captured"`
174
- 3. If commit fails (nothing to commit), skip — state files were already committed
181
+ 3. If nothing to commit, skip.
175
182
 
176
183
  > **Self-check**: `git status`에 docs/ 아래 unstaged 파일이 없어야 합니다.
177
184
 
@@ -179,16 +186,16 @@ State file 변경사항을 커밋합니다. Learn 실행 결과가 커밋되지
179
186
 
180
187
  Before ending the session, check for unpushed commits:
181
188
 
182
- 1. Run `git log --oneline @{u}..HEAD 2>/dev/null || echo "no upstream"` to find unpushed commits
189
+ 1. Run `git log --oneline @{u}..HEAD 2>/dev/null || echo "no upstream"`
183
190
  2. **If unpushed commits exist**:
184
191
  - List the commits: `git log --oneline @{u}..HEAD`
185
192
  - Solo mode: Recommend push — `git push origin {branch}`
186
- - Team mode: **Strongly recommend** push — unpushed work is invisible to teammates
193
+ - Team mode: **Strongly recommend** push
187
194
  - Warn: "⚠️ {N}개의 커밋이 push되지 않았습니다. 작업물을 원격에 백업하세요."
188
195
  3. **If no upstream configured** (`no upstream`):
189
196
  - Check if remote exists: `git remote -v`
190
197
  - If remote exists: Suggest `git push -u origin {branch}` (first push)
191
- - If no remote: Note "원격 저장소가 설정되지 않았습니다. 팀 협업 시 remote 설정이 필요합니다."
198
+ - If no remote: Note "원격 저장소가 설정되지 않았습니다."
192
199
  4. **If all commits are pushed**: Skip (no output)
193
200
 
194
201
  ### Step 6: Update Agent Memory (if applicable)
@@ -199,15 +206,15 @@ If an agent (reviewer, pm, lead, architect) was used in this session, update its
199
206
  2. **Auto-initialize if needed**: If the file only contains `<!-- Example entries` placeholder comments and no real data:
200
207
  - Replace the placeholder block with actual entries from this session
201
208
  - Initialize statistics counters with real values
202
- - If real entries already exist alongside placeholders, APPEND new entries and remove only the placeholder comments. Do not overwrite existing real data.
203
- - **When does initialization happen?**: On the FIRST session where the agent is used AND wrap-up is invoked. If an agent is never used, its memory stays as a placeholder indefinitely — this is expected.
209
+ - If real entries exist, APPEND and remove only placeholder comments.
210
+ - Initialize on the first wrap-up session where the agent was used.
204
211
  - Example transformation:
205
212
  ```
206
213
  Before: <!-- Example entries (replace with real findings after first review):
207
214
  After: - [S1-1] Mock sync missed for UserService interface change
208
215
  ```
209
216
  3. **Append session learnings** to the appropriate section:
210
- - **reviewer.md**: Add review patterns found, update statistics (total reviews +1, auto-fixes, escalations)
217
+ - **reviewer.md**: Add review patterns and update statistics
211
218
  - **pm.md**: Record estimation accuracy (planned vs actual), note architecture discoveries
212
219
  - **lead.md**: Update velocity (stories done/planned), record any scope drift incidents
213
220
  - **architect.md**: Record design decisions made, module boundary insights, anti-patterns observed
@@ -275,11 +282,11 @@ If crew artifacts were used this session (🟣 pipeline), also append:
275
282
  - Keep failure pattern descriptions concise (1-2 sentences for Cause and Prevention)
276
283
  - If no failures occurred, skip Step 2 and just update state files
277
284
  - Do not modify source code — this skill only updates state files
278
- - Quick Summary must be exactly 3 lines — concise enough for the next session to scan instantly
285
+ - Quick Summary must be exactly 3 lines
279
286
 
280
287
  ## Enforced Rules
281
288
 
282
- - **Session Handoff**: Before ending a session, docs/project-state.md Quick Summary MUST be updated. Never leave the project in an undocumented state.
289
+ - **Session Handoff**: Before ending, update docs/project-state.md Quick Summary.
283
290
  - **State File Size Limits**: Keep state files compact for LLM context windows:
284
291
  - docs/project-brief.md: Max 200 lines
285
292
  - docs/project-state.md: Max 300 lines (archive completed sprints)
@@ -302,7 +309,7 @@ If crew artifacts were used this session (🟣 pipeline), also append:
302
309
  ## Team Mode: Session Wrap-up
303
310
 
304
311
  ### Pre-Pull (mandatory before any shared file edit)
305
- 1. Run `git pull` on the default branch before updating docs/features.md or docs/dependency-map.md (per project-brief.md → Key Technical Decisions; default: main)
312
+ 1. Run `git pull` on the default branch before updating docs/features.md or docs/dependency-map.md (default: main)
306
313
  2. If merge conflicts occur, follow the **Merge Conflict SOP** below
307
314
 
308
315
  ### Merge Conflict SOP
@@ -319,7 +326,7 @@ When `git pull` causes merge conflicts in shared state files:
319
326
  | `docs/failure-patterns.md` | Keep BOTH entries, deduplicate by FP-NNN number |
320
327
  3. **After resolving**: `git add <resolved-files> && git commit`
321
328
  4. **Verify**: Re-read the resolved file and confirm no data was lost
322
- 5. **If unsure**: Do NOT force-resolve. Ask the designated authority (per project-brief.md; default: team lead) or the Owner of the conflicting rows.
329
+ 5. **If unsure**: Do NOT force-resolve. Ask the designated authority or row Owner.
323
330
 
324
331
  ### Owner-Scoped Updates
325
332
  - **docs/features.md**: only update rows where Owner = you
@@ -327,7 +334,7 @@ When `git pull` causes merge conflicts in shared state files:
327
334
  - **Personal files** (.harness/project-state.md, .harness/failure-patterns.md, .harness/agent-memory/): update freely — no coordination needed
328
335
 
329
336
  ### Failure Pattern Promotion
330
- If a personal failure pattern (FP-NNN in .harness/failure-patterns.md) is likely to affect other developers:
337
+ If a personal FP in `.harness/failure-patterns.md` may affect others:
331
338
  1. Discuss with the team (Slack, PR comment, etc.)
332
- 2. If agreed, add it to a shared location (team wiki, PR description) so others can add it to their personal .harness/failure-patterns.md
339
+ 2. If agreed, add it to a shared location so others can copy it.
333
340
  <!-- TEAM_MODE_END -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kodevibe/harness",
3
- "version": "0.11.2",
3
+ "version": "0.11.3",
4
4
  "description": "kode:harness — harness engineering for keeping every developer's AI aligned on one project direction.",
5
5
  "keywords": [
6
6
  "llm",
package/src/guard.js CHANGED
@@ -16,6 +16,11 @@
16
16
  // R8 checkPublicBoundary — public package surface does not leak internal refs
17
17
  // R9 checkEnvSeal — proof records include reproducible environment seal
18
18
  // R10 checkInstructionBudget — instruction files fit model-tier budgets
19
+ // R11 checkStoryContracts — done Stories must prove their semantic contracts
20
+ // R12 checkEvaluatorArtifact — evaluator-owned artifacts are not rewritten silently
21
+ // R13 checkSmokeEvidence — browser/manual proof must leave durable evidence
22
+ // R14 checkScopeSplitApproval — FR/KPI/ARB split mappings need approval
23
+ // R15 checkRecentChangesIntegrity — wrap-up must not corrupt state sections
19
24
  //
20
25
  // Severity: 'error' blocks the commit (exit 1). 'warn' is informational.
21
26
 
@@ -153,6 +158,12 @@ function parseMarkdownTable(section) {
153
158
  });
154
159
  }
155
160
 
161
+ function markdownTableHeaders(section) {
162
+ const lines = section.split('\n').map((l) => l.trim()).filter((l) => l.startsWith('|'));
163
+ if (lines.length < 2) return [];
164
+ return lines[0].replace(/^\|/, '').replace(/\|$/, '').split('|').map((c) => c.trim());
165
+ }
166
+
156
167
  function storyIdFromRow(row) {
157
168
  return row.ID || row.Id || row.Story || row['Story ID'] || row['Story'] || '';
158
169
  }
@@ -161,6 +172,56 @@ function rowStatus(row) {
161
172
  return row.Status || row.status || '';
162
173
  }
163
174
 
175
+ function storyContractStatus(row) {
176
+ return row['Proof Status'] || row.Status || row.status || row.Result || row.result || '';
177
+ }
178
+
179
+ const RECENT_CHANGES_EVIDENCE_TERMS = /\b(?:scenario|expected|result|observer|url|screenshot|playwright|dropdown|column|color coding|ui elements?)\b|시나리오|기대|결과|관찰|드롭다운|컬럼|색상|스크린샷/i;
180
+
181
+ /**
182
+ * Recent Changes is the session changelog. A recurring wrap-up failure is
183
+ * inserting it in the middle of an evidence/proof section, which silently moves
184
+ * the rest of that section under "Recent Changes". Catch obvious structure
185
+ * corruption deterministically.
186
+ *
187
+ * @param {string} content project-state.md
188
+ * @returns {Array}
189
+ */
190
+ function checkRecentChangesIntegrity(content) {
191
+ const violations = [];
192
+ const visible = stripHtmlComments(content);
193
+ const recent = getSection(visible, 'Recent Changes');
194
+ if (recent === null) return violations;
195
+
196
+ const lines = recent.split('\n');
197
+ const nestedHeading = lines.find((line) => /^#{3,}\s+/.test(line.trim()));
198
+ if (nestedHeading) {
199
+ violations.push({
200
+ check: 'state-structure',
201
+ severity: 'error',
202
+ line: 0,
203
+ message: `Recent Changes contains nested heading "${nestedHeading.trim()}". It was likely inserted inside an evidence/proof section; move Recent Changes after the completed evidence block (R15).`,
204
+ });
205
+ }
206
+
207
+ const misplacedEvidence = lines.find((line) => {
208
+ const trimmed = line.trim();
209
+ if (!/^[-*]\s+/.test(trimmed)) return false;
210
+ if (/\d{4}-\d{2}-\d{2}|\bS\d+-\d+\b|session|wrap-up|recent/i.test(trimmed)) return false;
211
+ return RECENT_CHANGES_EVIDENCE_TERMS.test(trimmed);
212
+ });
213
+ if (misplacedEvidence) {
214
+ violations.push({
215
+ check: 'state-structure',
216
+ severity: 'error',
217
+ line: 0,
218
+ message: `Recent Changes contains evidence/checklist content (${misplacedEvidence.trim()}). Do not place proof details under Recent Changes; keep durable evidence in its own section (R15).`,
219
+ });
220
+ }
221
+
222
+ return violations;
223
+ }
224
+
164
225
  /**
165
226
  * Validate docs/project-state.md content for handoff + proof-first integrity.
166
227
  * @param {string} content
@@ -215,6 +276,96 @@ function checkStateFile(content) {
215
276
  }
216
277
  }
217
278
 
279
+ violations.push(...checkRecentChangesIntegrity(content));
280
+
281
+ return violations;
282
+ }
283
+
284
+ // ─── Story Contract Gate (R11 / semantic acceptance) ────────────────
285
+
286
+ const STORY_CONTRACT_PASS = /✅|pass(?:ed)?|proven|verified|reviewed|done|ok/i;
287
+ const STORY_CONTRACT_NOT_PROVEN = /❌|fail(?:ed)?|not[_ -]?proven|not[_ -]?verified|pending|todo|tbd|blank|needs[_ -]?user[_ -]?confirmation|needs[_ -]?confirmation|⬜|🚫|blocked/i;
288
+
289
+ /**
290
+ * Semantic Story Contract gate. This is intentionally project-agnostic:
291
+ * the harness does not parse the app domain itself; it requires the agent to
292
+ * write a compact Story Contract table and prove every row before Done.
293
+ *
294
+ * Backward compatibility: existing projects with done Stories but no contract
295
+ * rows receive WARN, not FAIL. Once a contract row exists, an unproven row is
296
+ * blocking.
297
+ *
298
+ * @param {{projectState?: string}|string} input project-state.md content or object
299
+ * @returns {Array}
300
+ */
301
+ function checkStoryContracts(input = {}) {
302
+ const projectState = typeof input === 'string' ? input : (input.projectState || '');
303
+ const violations = [];
304
+ const visible = stripHtmlComments(projectState);
305
+ const doneStories = parseMarkdownTable(getSection(visible, 'Story Status') || '')
306
+ .filter((row) => /✅\s*done/i.test(rowStatus(row)));
307
+ if (doneStories.length === 0) return violations;
308
+
309
+ const section = getSection(visible, 'Story Contracts');
310
+ if (section === null) {
311
+ for (const story of doneStories) {
312
+ const id = storyIdFromRow(story);
313
+ violations.push({
314
+ check: 'story-contract',
315
+ severity: 'warn',
316
+ line: 0,
317
+ message: `Story ${id || '(unknown)'} is done but project-state.md has no Story Contracts section (R11). Add semantic contract rows for new work.`,
318
+ });
319
+ }
320
+ return violations;
321
+ }
322
+
323
+ const headers = markdownTableHeaders(section);
324
+ const requiredHeaders = ['Story', 'Contract', 'Proof Status'];
325
+ const hasContractTable = headers.length > 0;
326
+ const hasRequiredHeaders = requiredHeaders.every((name) => headers.includes(name));
327
+ if (!hasContractTable || !hasRequiredHeaders) {
328
+ violations.push({
329
+ check: 'story-contract',
330
+ severity: 'error',
331
+ line: 0,
332
+ message: 'Story Contracts table is malformed. Required columns: Story, Contract, Proof Status (R11).',
333
+ });
334
+ return violations;
335
+ }
336
+
337
+ const rows = parseMarkdownTable(section);
338
+ for (const story of doneStories) {
339
+ const id = storyIdFromRow(story);
340
+ const contracts = rows.filter((row) => {
341
+ const cell = String(row.Story || row['Story ID'] || '');
342
+ return id && cell.includes(id);
343
+ });
344
+
345
+ if (contracts.length === 0) {
346
+ violations.push({
347
+ check: 'story-contract',
348
+ severity: 'warn',
349
+ line: 0,
350
+ message: `Story ${id || '(unknown)'} is done but has no Story Contract rows (R11). Add semantic acceptance contracts for new work.`,
351
+ });
352
+ continue;
353
+ }
354
+
355
+ for (const row of contracts) {
356
+ const status = storyContractStatus(row);
357
+ const contract = row.Contract || row['Required Assertion'] || '(unnamed contract)';
358
+ if (!status || STORY_CONTRACT_NOT_PROVEN.test(status) || !STORY_CONTRACT_PASS.test(status)) {
359
+ violations.push({
360
+ check: 'story-contract',
361
+ severity: 'error',
362
+ line: 0,
363
+ message: `Story ${id} is done but Story Contract "${contract}" is not proven (status: ${status || 'blank'}). Prove every contract row before Done (R11).`,
364
+ });
365
+ }
366
+ }
367
+ }
368
+
218
369
  return violations;
219
370
  }
220
371
 
@@ -385,6 +536,44 @@ function checkStateSync({ projectState = '', features = '', dependencyMap = '' }
385
536
  return violations;
386
537
  }
387
538
 
539
+ // ─── Scope Split Approval Gate (R14) ────────────────────────────────
540
+
541
+ const STORY_ID_RE = /\bS\d+-\d+\b/g;
542
+ const TRACKER_ROW_RE = /\b(?:FR|KPI|ARB|ARB-FAIL)[-_]?\d+\b/i;
543
+ const SCOPE_SPLIT_APPROVAL_RE = /\bScope split approved\b|범위\s*분할\s*승인|<!--\s*harness-scope-split-approved:/i;
544
+
545
+ /**
546
+ * When a Validation Tracker maps a single FR/KPI/ARB item to multiple Stories,
547
+ * the split is a product decision. Require an explicit approval marker so a
548
+ * model cannot silently shrink or repartition scope.
549
+ *
550
+ * @param {{projectBrief?: string}|string} input project-brief.md content or object
551
+ * @returns {Array}
552
+ */
553
+ function checkScopeSplitApproval(input = {}) {
554
+ const projectBrief = typeof input === 'string' ? input : (input.projectBrief || '');
555
+ const visible = stripHtmlComments(projectBrief);
556
+ if (!/Validation Tracker|FR Coverage|KPI Coverage|ARB Fail Resolution/i.test(visible)) return [];
557
+ if (SCOPE_SPLIT_APPROVAL_RE.test(projectBrief)) return [];
558
+
559
+ const violations = [];
560
+ const lines = visible.split('\n');
561
+ for (let i = 0; i < lines.length; i++) {
562
+ const line = lines[i];
563
+ if (!TRACKER_ROW_RE.test(line)) continue;
564
+ const storyIds = [...new Set(line.match(STORY_ID_RE) || [])];
565
+ if (storyIds.length > 1) {
566
+ violations.push({
567
+ check: 'scope-split',
568
+ severity: 'error',
569
+ line: i + 1,
570
+ message: `Validation Tracker maps one item to multiple Stories (${storyIds.join(', ')}) without Scope split approved marker (R14). Add "Scope split approved: <reason>" before Done.`,
571
+ });
572
+ }
573
+ }
574
+ return violations;
575
+ }
576
+
388
577
  // ─── Integration / Persistence DoD (R4) ──────────────────────────────
389
578
 
390
579
  // Evidence terms that prove only in-memory/unit behaviour (insufficient alone).
@@ -392,6 +581,11 @@ const UNIT_ONLY_TERMS = /\bunit\b|단위\s*테스트/i;
392
581
  // Evidence terms that prove integration / persistence reached real boundaries.
393
582
  const INTEGRATION_TERMS = /\bintegration\b|통합|\bpersist|영속|\brow count|적재|\bcontext test|\be2e\b|database|\bdb\b|repository|commit boundary/i;
394
583
 
584
+ const SMOKE_EVIDENCE_TERMS = /\b(?:ui|browser|smoke|manual)\b|브라우저|수동|화면/i;
585
+ const DURABLE_SMOKE_ARTIFACT_TERMS = /\b(?:screenshot|playwright|cypress|selenium|trace|video|manual checklist|checklist)\b|\.(?:png|jpe?g|webp|mp4)\b/i;
586
+ const SMOKE_URL_TERMS = /https?:\/\/|localhost(?::\d+)?|127\.0\.0\.1(?::\d+)?/i;
587
+ const SMOKE_OBSERVATION_TERMS = /\b(?:observed|rows?|counts?|items?|open|closed|breached|at-risk|normal|columns?|filters?)\b|렌더|확인|컬럼|필터|개\s*open|\d+/i;
588
+
395
589
  /**
396
590
  * Integration/Persistence DoD: a Story marked "✅ done" must have at least one
397
591
  * Proof Ledger row whose evidence indicates integration/persistence — not only
@@ -429,6 +623,52 @@ function checkIntegrationDoD(content) {
429
623
  return violations;
430
624
  }
431
625
 
626
+ /**
627
+ * Browser/manual/smoke proof is too easy to fake as prose. If a done Story has
628
+ * a passing UI/manual/smoke Proof Ledger row, require a durable artifact or a
629
+ * URL plus concrete observed UI data.
630
+ *
631
+ * @param {string} content project-state.md
632
+ * @returns {Array}
633
+ */
634
+ function checkSmokeEvidence(content) {
635
+ const violations = [];
636
+ const visible = stripHtmlComments(content);
637
+ const doneIds = parseMarkdownTable(getSection(visible, 'Story Status') || '')
638
+ .filter((row) => /✅\s*done/i.test(rowStatus(row)))
639
+ .map((row) => storyIdFromRow(row))
640
+ .filter(Boolean);
641
+ if (doneIds.length === 0) return violations;
642
+
643
+ const ledgerRows = parseMarkdownTable(getSection(visible, 'Proof Ledger') || '');
644
+ for (const row of ledgerRows) {
645
+ const values = Object.values(row).filter((v) => typeof v === 'string');
646
+ const raw = values.join(' ');
647
+ const story = row.Story || row['Story ID'] || '';
648
+ const result = row.Result || row.result || raw;
649
+ if (!/(✅|pass)/i.test(result)) continue;
650
+ if (story && !doneIds.some((id) => story.includes(id))) continue;
651
+ if (!SMOKE_EVIDENCE_TERMS.test(raw)) continue;
652
+
653
+ const artifact = row.Artifact || row.artifact || '';
654
+ const command = row['Command / Observation'] || row.Command || row.Observation || '';
655
+ const hasDurableArtifact = artifact.trim() && !/^[-—n/a]*$/i.test(artifact.trim());
656
+ const hasToolProof = DURABLE_SMOKE_ARTIFACT_TERMS.test(raw);
657
+ const hasConcreteManualProof = SMOKE_URL_TERMS.test(command || raw) && SMOKE_OBSERVATION_TERMS.test(command || raw);
658
+
659
+ if (!(hasDurableArtifact || hasToolProof || hasConcreteManualProof)) {
660
+ violations.push({
661
+ check: 'smoke-evidence',
662
+ severity: 'error',
663
+ line: 0,
664
+ message: `Passing UI/manual/smoke proof for Story ${story || '(unknown)'} is not durable. Add screenshot/Playwright artifact, checklist row, or URL plus exact observed UI counts/elements (R13).`,
665
+ });
666
+ }
667
+ }
668
+
669
+ return violations;
670
+ }
671
+
432
672
  // ─── Environment Seal Gate (R9) ──────────────────────────────────────
433
673
 
434
674
  /**
@@ -501,6 +741,51 @@ function checkPublicBoundary(content, filename = '') {
501
741
  return violations;
502
742
  }
503
743
 
744
+ // ─── Evaluator Artifact Protection (R12) ────────────────────────────
745
+
746
+ const EVALUATOR_ARTIFACT_PATHS = [
747
+ /^experiment\/(?:kode-harness-scorecard|run-card|evaluator-).*\.md$/i,
748
+ /^docs\/experiment\/(?:kode-harness-scorecard|run-card|evaluator-).*\.md$/i,
749
+ /^docs\/evaluator-.*\.md$/i,
750
+ ];
751
+ const EVALUATOR_OWNER_MARKER = /<!--\s*harness-owner:\s*evaluator\s*-->/i;
752
+ const EVALUATOR_APPROVAL_MARKER = /<!--\s*harness-edit-approved:\s*[^>]+-->/i;
753
+
754
+ function isEvaluatorArtifact(file = '') {
755
+ const normalized = file.replace(/\\/g, '/');
756
+ return EVALUATOR_ARTIFACT_PATHS.some((re) => re.test(normalized));
757
+ }
758
+
759
+ /**
760
+ * Protect evaluator/run-card/scorecard artifacts from model-under-test rewrites.
761
+ * Ordinary evaluator-looking paths warn. Files explicitly marked
762
+ * `harness-owner: evaluator` fail unless a human approval marker is present.
763
+ *
764
+ * @param {string} content
765
+ * @param {string} [filename]
766
+ * @returns {Array}
767
+ */
768
+ function checkEvaluatorArtifact(content, filename = '') {
769
+ if (!isEvaluatorArtifact(filename)) return [];
770
+ const owned = EVALUATOR_OWNER_MARKER.test(content);
771
+ const approved = EVALUATOR_APPROVAL_MARKER.test(content);
772
+ if (owned && !approved) {
773
+ return [{
774
+ check: 'evaluator-artifact',
775
+ severity: 'error',
776
+ line: 0,
777
+ message: `${filename}: evaluator-owned artifact changed without explicit approval marker (R12). Add <!-- harness-edit-approved: <reason> --> only when the user explicitly requested this edit.`,
778
+ }];
779
+ }
780
+ if (owned && approved) return [];
781
+ return [{
782
+ check: 'evaluator-artifact',
783
+ severity: 'warn',
784
+ line: 0,
785
+ message: `${filename}: evaluator/run-card artifact changed. Confirm the user explicitly requested this edit (R12).`,
786
+ }];
787
+ }
788
+
504
789
  // ─── Markdown lint (R6 / L3-8) ───────────────────────────────────────
505
790
 
506
791
  /**
@@ -640,6 +925,10 @@ function isStateMarkdownFile(file) {
640
925
  return /(?:^|\/)(?:docs|\.harness)\/(?:project-state|features|dependency-map|project-brief|failure-patterns)\.md$/.test(file);
641
926
  }
642
927
 
928
+ function isProjectBriefFile(file) {
929
+ return /(?:^|\/)(?:docs|\.harness)\/project-brief\.md$/.test(file);
930
+ }
931
+
643
932
  function isScannableForSecrets(file) {
644
933
  return /\.(js|ts|jsx|tsx|json|jsonc|ya?ml|env|sh|py|java|md|properties|toml)$/i.test(file)
645
934
  && !/\.lock$/.test(file);
@@ -672,6 +961,7 @@ function runGuard({ files, cwd = process.cwd() }) {
672
961
  if (isPublicPackageFile(rel)) {
673
962
  all.push(...checkPublicBoundary(content, rel));
674
963
  }
964
+ all.push(...checkEvaluatorArtifact(content, rel));
675
965
  if (file.endsWith('.md') && isStateMarkdownFile(file)) {
676
966
  all.push(...lintMarkdownTables(content, rel));
677
967
  }
@@ -682,12 +972,17 @@ function runGuard({ files, cwd = process.cwd() }) {
682
972
  const base = path.basename(file);
683
973
  all.push(...checkStateFile(content));
684
974
  all.push(...checkReviewerHandoff(content));
975
+ all.push(...checkStoryContracts({ projectState: content }));
685
976
  all.push(...checkIntegrationDoD(content));
977
+ all.push(...checkSmokeEvidence(content));
686
978
  all.push(...checkEnvSeal(content));
687
979
  if (STATE_LINE_LIMITS[base]) {
688
980
  all.push(...lintLineLimit(content, STATE_LINE_LIMITS[base], rel));
689
981
  }
690
982
  }
983
+ if (isProjectBriefFile(file)) {
984
+ all.push(...checkScopeSplitApproval({ projectBrief: content }));
985
+ }
691
986
  }
692
987
 
693
988
  const errorCount = all.filter((v) => v.severity === 'error').length;
@@ -699,11 +994,16 @@ module.exports = {
699
994
  scanSecrets,
700
995
  checkStateFile,
701
996
  checkReviewerHandoff,
997
+ checkStoryContracts,
702
998
  checkLearnCompletion,
703
999
  checkStateSync,
1000
+ checkScopeSplitApproval,
1001
+ checkRecentChangesIntegrity,
704
1002
  checkIntegrationDoD,
1003
+ checkSmokeEvidence,
705
1004
  checkEnvSeal,
706
1005
  checkPublicBoundary,
1006
+ checkEvaluatorArtifact,
707
1007
  lintMarkdownTables,
708
1008
  lintLineLimit,
709
1009
  checkInstructionBudget,