npm - @kodevibe/harness - Versions diffs - 0.9.6 → 0.11.0 - Mend

@kodevibe/harness 0.9.6 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.ko.md +32 -13
package/README.md +32 -13
package/harness/agents/lead.md +31 -8
package/harness/agents/pm.md +42 -35
package/harness/agents/reviewer.md +25 -3
package/harness/core-rules.md +28 -14
package/harness/project-brief.md +38 -28
package/harness/project-state.md +31 -5
package/harness/skills/breakdown.md +1 -0
package/harness/skills/pivot.md +2 -0
package/harness/skills/setup.md +19 -1
package/harness/skills/state-check.md +15 -2
package/harness/skills/wrap-up.md +12 -0
package/package.json +4 -2
package/src/init.js +1 -1

package/README.ko.md CHANGED Viewed

@@ -13,6 +13,10 @@
 AI 코딩 에이전트를 위한 프로덕션급 가드레일. 컨텍스트 부패를 방지하고, 프로젝트 방향을 강제하며, 세션 간 상태를 유지합니다. **Copilot, Claude, Cursor, Codex, Windsurf, Gemini** 지원. 의존성 제로.
+> **에코시스템 내 위치.** kode:harness는 **kode:vibe** 에코시스템의 *실행(execution)* 레이어입니다 — 계획 레이어(PRD / 아키텍처 / ARB)와 인프라 레이어(CI / 런타임) 사이에 위치하며, 코딩 중 AI의 방향을 잡아주는 역할을 합니다. 다른 레이어는 선택이며, kode:harness만 독립적으로 쓸 수 있습니다.
+> **Pre-1.0 안정성 고지.** 0.x 활발 개발 단계. CLI 플래그·상태 파일·스킬/에이전트 계약은 minor 버전 간 변경될 수 있습니다. v1.0.0은 IDE 호환·외부 production 사용이 30일 이상 동결된 후에야 컷합니다.
 ---
 ## 빠른 시작
@@ -28,6 +32,8 @@ npx @kodevibe/harness init          # IDE 선택
 끝입니다. 이제 AI는 영속적인 메모리, 방향 가드레일, 자기 교정 루프를 갖게 됩니다.
+v0.11부터는 **Proof-First Enforcement**가 Common Mode Confidence Loop 위에 추가됩니다. pm은 실행 가능한 proof를 계획해야 하고, lead는 증거 없이 Story를 완료 처리하지 않으며, reviewer는 proof가 없거나 실패하면 커밋 안내 전에 멈추고, state-check는 Proof Ledger 누락을 점검합니다.
 <details>
 <summary>추가 설치 옵션</summary>
@@ -71,6 +77,7 @@ kode:harness는 세 가지 메커니즘으로 해결합니다:
 | **상태 영속성** | AI가 세션 간 목표, 결정, 진행 상황을 잊는 것 |
 | **방향 가드** | AI가 프로젝트 목표에서 이탈하거나 과거 결정과 모순되는 것 |
 | **실패 패턴** | AI가 세션 간 같은 실수를 반복하는 것 |
+| **Proof Ledger** | 테스트·스모크 증거 없이 진행됐다고 착각하는 것 |
 ---
@@ -90,7 +97,10 @@ kode:harness는 세 가지 메커니즘으로 해결합니다:
 | 기능 | 설명 |
 |------|------|
 | 🛡️ **Direction Guard** | 모든 코딩 요청을 프로젝트 목표/비목표와 대조 후 실행 |
-| 🧭 **Navigation Dispatcher** | 5개 파이프라인을 따라 다음 단계 프롬프트를 자동 안내 |
+| 🧭 **Quiet Navigator** | 현재 목표와 필요한 증거 중심의 짧은 다음 행동 안내 |
+| ✅ **Evidence-Gated Progress Board** | 테스트/스모크 증거가 있어야 Planned → Proof Pending → Proven으로 진행 |
+| 📒 **Proof Ledger** | review/wrap-up 출력에 명령어, 결과, 관찰 증거를 짧게 기록 |
+| 🔒 **Proof-First Enforcement** | pm/lead/reviewer/state-check가 모호한 계획, 증거 없는 완료, 실패한 테스트, 누락된 proof 기록을 막음 |
 | 📝 **상태 영속성** | LLM 세션 간 프로젝트 지식을 유지하는 5개 마크다운 파일 |
 | 🔄 **5개 파이프라인** | 🟢 신규 → 🔵 계속 → 🔴 버그 수정 → 🟡 방향 전환 → 🟣 Crew 기반 |
 | 🛠️ **11개 스킬** | 단계별 절차: setup, debug, breakdown, review, pivot, state-check 등 |
@@ -112,14 +122,16 @@ npx @kodevibe/harness validate  # state 파일에 실제 내용 확인
 ## 지원 IDE
-| IDE | 디스패처 (always-on) | 스킬 | 에이전트 |
-|-----|---------------------|------|----------|
-| **VS Code Copilot** | `.github/copilot-instructions.md` | `.github/skills/*/SKILL.md` | `.github/agents/*.agent.md` |
-| **Claude Code** | `CLAUDE.md` (+ `.claude/rules/core.md`) | `.claude/skills/*/SKILL.md` | `.claude/agents/*.md` |
-| **Cursor** | `.cursor/rules/core.mdc` (+ `AGENTS.md`) | `.agents/skills/*/SKILL.md` (cross-tool) | `.cursor/rules/<agent>.mdc` |
-| **Codex** | `AGENTS.md` | `.agents/skills/*/SKILL.md` | `.codex/agents/*.toml` |
-| **Windsurf** | `.windsurf/rules/core.md` | `.windsurf/skills/*/SKILL.md` | *(스킬로 설치)* |
-| **Antigravity** | `AGENTS.md` | `.agents/skills/*/SKILL.md` (cross-tool) | `.agents/rules/<agent>.md` |
+어떤 걸 고를지 모르겠다면? 이미 코딩하고 있는 IDE를 그대로 고르면 됩니다 — 모든 경로는 동일한 `harness/` 소스에서 생성되므로 스킬/에이전트 내용은 동일합니다:
+| IDE | 이럴 때 고르세요 | 디스패처 (always-on) | 스킬 | 에이전트 |
+|-----|--------------------|---------------------|------|----------|
+| **VS Code Copilot** | VS Code를 주로 쓰고 GitHub Copilot Chat 사용. | `.github/copilot-instructions.md` | `.github/skills/*/SKILL.md` | `.github/agents/*.agent.md` |
+| **Claude Code** | 터미널/Claude Code CLI 선호. | `CLAUDE.md` (+ `.claude/rules/core.md`) | `.claude/skills/*/SKILL.md` | `.claude/agents/*.md` |
+| **Cursor** | Cursor 에디터 사용. | `.cursor/rules/core.mdc` (+ `AGENTS.md`) | `.agents/skills/*/SKILL.md` (cross-tool) | `.cursor/rules/<agent>.mdc` |
+| **Codex** | OpenAI Codex CLI 서브에이전트 사용. | `AGENTS.md` | `.agents/skills/*/SKILL.md` | `.codex/agents/*.toml` |
+| **Windsurf** | Codeium/Windsurf 사용. | `.windsurf/rules/core.md` | `.windsurf/skills/*/SKILL.md` | *(스킬로 설치)* |
+| **Antigravity** | Google Antigravity / Gemini 사용. | `AGENTS.md` | `.agents/skills/*/SKILL.md` (cross-tool) | `.agents/rules/<agent>.md` |
 모든 IDE에 `docs/` 디렉토리에 State 파일(`project-state.md`, `project-brief.md`, `features.md`, `failure-patterns.md`, `dependency-map.md`)도 함께 설치됩니다.
@@ -244,7 +256,7 @@ npx @kodevibe/harness init --team
 ## Iron Laws
-10개 규칙이 모든 스킬과 에이전트에 적용됩니다. kode:harness로 관리되는 프로젝트의 품질 근간을 형성합니다.
+11개 규칙이 모든 스킬과 에이전트에 적용됩니다. kode:harness로 관리되는 프로젝트의 품질 근간을 형성합니다.
 | # | 규칙 | 적용 대상 |
 |---|------|----------|
@@ -256,6 +268,9 @@ npx @kodevibe/harness init --team
 | 6 | **의존성 맵** — 모듈 추가/수정 → 같은 커밋에서 `dependency-map.md` 업데이트 | `reviewer`, `wrap-up` |
 | 7 | **기능 레지스트리** — 새 기능 → 같은 커밋에서 `features.md`에 등록 | `reviewer`, `wrap-up` |
 | 8 | **세션 핸드오프** — 세션 종료 → `project-state.md` Quick Summary 업데이트 | `wrap-up` |
+| 9 | **Common First** — Common 모드는 crew 산출물 없이 동작해야 하며 crew-only 로직은 marker block 안에만 둠 | 모든 에이전트 |
+| 10 | **Self-Verify** — DONE 보고 전 `state-check` 실행. FAIL이면 DONE 금지 | 모든 에이전트 |
+| 11 | **Proof First** — passing proof 없이는 Story를 Proven, Reviewed, DONE, commit guidance로 이동 금지 | `pm`, `lead`, `reviewer`, `state-check` |
 ---
@@ -292,7 +307,7 @@ Bootstrap이 `docs/crew/`, `docs/PM/`, `docs/Analyst/`, `docs/ARB/`에서 crew
 | 의존성 | Node 20+ | Bun + Node + Playwright | Node 18+ | **Zero** |
 | IDE 지원 | 20+ (installer) | 5 (setup --host) | 13 (runtime select) | 6 (네이티브 포맷) |
 | 방향 관리 | ❌ | ❌ | ❌ | ✅ (Direction Guard + pivot + Decision Log) |
-| Iron Laws (코드 품질 규칙) | ❌ | ❌ | ❌ | ✅ (10개 규칙이 스킬에 임베딩) |
+| Iron Laws (코드 품질 규칙) | ❌ | ❌ | ❌ | ✅ (11개 규칙이 스킬에 임베딩) |
 | Cold start | ❌ | ❌ | `/gsd-new-project` | ✅ (`setup` 스킬) |
 | 태스크당 컨텍스트 | 4-6 파일 | 1 파일 | 매번 200k 플랜 | **2-3 파일 (136줄 디스패처)** |
@@ -300,7 +315,7 @@ Bootstrap이 `docs/crew/`, `docs/PM/`, `docs/Analyst/`, `docs/ARB/`에서 crew
 ## 로드맵
-kode:harness는 현재 **v0.9.6** — init이 덮어쓰는 IDE 파일을 `.harness/init-backups/<timestamp>/...`에 백업하고, 배포 파일의 pm 네이밍과 LICENSE 브랜딩을 정리했습니다. v0.9.5는 경량성 예산 재교정(40K/1500/2500)과 Iron Laws/디스패처 정합성 수정입니다.
+kode:harness는 현재 **v0.11.0** — Common Mode의 proof-first 동작을 강제합니다. Proof Plan은 정확한 명령/체크여야 하고, Story 완료는 증거 없이는 막히며, reviewer는 proof 실패/누락 시 멈추고, state-check는 Proof Ledger 누락을 점검합니다.
 | 단계 | 버전 | 상태 | 초점 |
 |------|------|------|------|
@@ -312,11 +327,15 @@ kode:harness는 현재 **v0.9.6** — init이 덮어쓰는 IDE 파일을 `.harne
 | **Self-Verify** | v0.9.2 | ✅ 완료 | state-check 스킬, Iron Law #10, Confirmation Gate Defaults, 멀티 IDE 수정, CI Artifact Index |
 | **IDE Realignment** | v0.9.4 | ✅ 완료 | 6개 IDE 어댑터 공식 문서 정합; Antigravity `.agents/`, Codex `.toml`, Cursor `.cursor/rules/`; release 스킬 Step 6.5 + qa-check.sh §10 회귀 가드 |
 | **Consistency & Budget** | v0.9.5 | ✅ 완료 | reviewer.md Iron Laws stale 수정, 디스패처 동기화, 경량성 예산 재교정(40K/1500/2500) 및 근거 기록 |
-| **Safety & Branding** | v0.9.6 | ✅ 현재 | init overwrite 백업, 배포 파일 pm 네이밍 정리, LICENSE 브랜딩 정리 |
+| **Drift Guard & Positioning** | v0.9.7 | ✅ 완료 | `harness/`↔`.github/` drift 가드, reviewer working-proof 게이트, kode:vibe 위치 안내, IDE 선택 가이드, project-brief 예시 |
+| **Confidence Loop** | v0.10.0 | ✅ 완료 | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content 회귀 테스트 |
+| **Proof-First Enforcement** | v0.11.0 | ✅ 현재 | Mandatory Proof Plan, lead proof blocker, reviewer proof blocker, state-check Proof Ledger coverage |
+| **Safety & Branding** | v0.9.6 | ✅ 완료 | init overwrite 백업, 배포 파일 pm 네이밍 정리, LICENSE 브랜딩 정리 |
 | **Validation** | v1.0 | 🔜 다음 | 실사용 검증, 사용자 피드백 수집 |
 ### 다음 단계
+- [ ] 파일럿: 실제 프로젝트에서 v0.11 Common Mode proof coverage 측정
 - [ ] 파일럿: 외부 기획 산출물을 kode:harness의 🟣 파이프라인으로 실제 프로젝트에 적용
 - [ ] 실제 프로젝트에 kode:harness를 적용하고 사용 데이터 수집
 - [ ] 사용 사례 문서화: Solo vs Team, crew vs no-crew

package/README.md CHANGED Viewed

@@ -13,6 +13,10 @@
 Production-grade guardrails for AI coding agents. Prevents context rot, enforces project direction, and persists state across sessions. Works with **Copilot, Claude, Cursor, Codex, Windsurf, and Gemini**. Zero dependencies.
+> **Where this fits.** kode:harness is the *execution* layer of the **kode:vibe** ecosystem — it sits between a planning layer (PRD / architecture / ARB) and an infrastructure layer (CI / runtime). kode:harness keeps the AI on direction while you code; the other layers are optional. You can use kode:harness alone.
+> **Pre-1.0 stability notice.** Active 0.x development. CLI flags, state files, and skill/agent contracts may change between minor versions. v1.0.0 ships only after 30 days of frozen IDE compatibility matrix + external production usage.
 ---
 ## Quick Start
@@ -28,6 +32,8 @@ npx @kodevibe/harness init          # pick your IDE
 That's it. Your AI now has persistent memory, direction guardrails, and self-correction loops.
+v0.11 adds **Proof-First Enforcement** on top of the Common Mode Confidence Loop: pm must define runnable proof, lead cannot mark a Story done without passing evidence, reviewer blocks commit guidance when proof is missing or failing, and state-check audits Proof Ledger coverage.
 <details>
 <summary>More install options</summary>
@@ -71,6 +77,7 @@ kode:harness solves this with three mechanisms:
 | **State Persistence** | AI forgetting goals, decisions, and progress between sessions |
 | **Direction Guard** | AI drifting away from project goals or contradicting past decisions |
 | **Failure Patterns** | AI repeating the same mistakes across sessions |
+| **Proof Ledger** | AI claiming progress without tests, smoke proof, or user-visible evidence |
 ---
@@ -90,7 +97,10 @@ kode:harness solves this with three mechanisms:
 | Feature | Description |
 |---------|-------------|
 | 🛡️ **Direction Guard** | Every coding request is checked against project goals/non-goals before execution |
-| 🧭 **Navigation Dispatcher** | Turn-by-Turn navigation through 5 pipelines with copy-paste next-step prompts |
+| 🧭 **Quiet Navigator** | Short next-action guidance centered on current goal and required evidence |
+| ✅ **Evidence-Gated Progress Board** | Stories move from Planned → Proof Pending → Proven only when tests or smoke proof exist |
+| 📒 **Proof Ledger** | Review and wrap-up outputs record compact proof: command, result, and observation |
+| 🔒 **Proof-First Enforcement** | pm/lead/reviewer/state-check block vague plans, unproven completion, failing tests, and missing proof records |
 | 📝 **State Persistence** | 5 markdown files that persist project knowledge across LLM sessions |
 | 🔄 **5 Pipelines** | 🟢 New Dev → 🔵 Continue → 🔴 Bug Fix → 🟡 Direction Change → 🟣 Crew-Driven |
 | 🛠️ **11 Skills** | Step-by-step procedures: setup, debug, breakdown, review, pivot, state-check, and more |
@@ -110,14 +120,16 @@ npx @kodevibe/harness validate  # verify state files have real content
 ## Supported IDEs
-| IDE | Dispatcher (always-on) | Skills | Agents |
-|-----|----------------------|--------|--------|
-| **VS Code Copilot** | `.github/copilot-instructions.md` | `.github/skills/*/SKILL.md` | `.github/agents/*.agent.md` |
-| **Claude Code** | `CLAUDE.md` (+ `.claude/rules/core.md`) | `.claude/skills/*/SKILL.md` | `.claude/agents/*.md` |
-| **Cursor** | `.cursor/rules/core.mdc` (+ `AGENTS.md`) | `.agents/skills/*/SKILL.md` (cross-tool) | `.cursor/rules/<agent>.mdc` |
-| **Codex** | `AGENTS.md` | `.agents/skills/*/SKILL.md` | `.codex/agents/*.toml` |
-| **Windsurf** | `.windsurf/rules/core.md` | `.windsurf/skills/*/SKILL.md` | *(agents installed as skills)* |
-| **Antigravity** | `AGENTS.md` | `.agents/skills/*/SKILL.md` (cross-tool) | `.agents/rules/<agent>.md` |
+Not sure which to pick? Use the IDE you already code in — each install path is generated from the same `harness/` source, so the underlying skills/agents are identical:
+| IDE | Pick this if… | Dispatcher (always-on) | Skills | Agents |
+|-----|---------------|----------------------|--------|--------|
+| **VS Code Copilot** | You use VS Code daily and have GitHub Copilot Chat. | `.github/copilot-instructions.md` | `.github/skills/*/SKILL.md` | `.github/agents/*.agent.md` |
+| **Claude Code** | You prefer Claude in the terminal / Claude Code CLI. | `CLAUDE.md` (+ `.claude/rules/core.md`) | `.claude/skills/*/SKILL.md` | `.claude/agents/*.md` |
+| **Cursor** | You use Cursor as your editor. | `.cursor/rules/core.mdc` (+ `AGENTS.md`) | `.agents/skills/*/SKILL.md` (cross-tool) | `.cursor/rules/<agent>.mdc` |
+| **Codex** | You use OpenAI Codex CLI subagents. | `AGENTS.md` | `.agents/skills/*/SKILL.md` | `.codex/agents/*.toml` |
+| **Windsurf** | You use Codeium/Windsurf. | `.windsurf/rules/core.md` | `.windsurf/skills/*/SKILL.md` | *(agents installed as skills)* |
+| **Antigravity** | You use Google Antigravity / Gemini. | `AGENTS.md` | `.agents/skills/*/SKILL.md` (cross-tool) | `.agents/rules/<agent>.md` |
 All IDEs also get state files (`project-state.md`, `project-brief.md`, `features.md`, `failure-patterns.md`, `dependency-map.md`) in the `docs/` directory.
@@ -218,7 +230,7 @@ npx @kodevibe/harness init --team
 ## Iron Laws
-These 10 rules are enforced across all skills and agents. They form the quality backbone of every kode:harness project managed with harness engineering.
+These 11 rules are enforced across all skills and agents. They form the quality backbone of every kode:harness project managed with harness engineering.
 | # | Law | Enforced By |
 |---|-----|-------------|
@@ -230,6 +242,9 @@ These 10 rules are enforced across all skills and agents. They form the quality
 | 6 | **Dependency Map** — New/modified module → update `dependency-map.md` in the same commit. | `reviewer`, `wrap-up` |
 | 7 | **Feature Registry** — New feature → register in `features.md` in the same commit. | `reviewer`, `wrap-up` |
 | 8 | **Session Handoff** — Session end → update `project-state.md` Quick Summary. | `wrap-up` |
+| 9 | **Common First** — Common mode must work without crew artifacts; crew-only logic stays in crew marker blocks. | All agents |
+| 10 | **Self-Verify** — Run `state-check` before reporting DONE. FAIL blocks DONE. | All agents |
+| 11 | **Proof First** — No Story moves to Proven, Reviewed, DONE, or commit guidance without passing proof. | `pm`, `lead`, `reviewer`, `state-check` |
 ## Documentation
@@ -268,13 +283,13 @@ Original crew documents are **never modified**. Only the index and tracker are c
 | Dependencies | Node 20+ | Bun + Node + Playwright | Node 18+ | Zero |
 | IDE support | 20+ (installer) | 5 (setup --host) | 13 (runtime select) | 6 (native format) |
 | Direction management | ❌ | ❌ | ❌ | ✅ (Direction Guard + pivot + Decision Log) |
-| Iron Laws (code quality rules) | ❌ | ❌ | ❌ | ✅ (10 laws embedded in skills) |
+| Iron Laws (code quality rules) | ❌ | ❌ | ❌ | ✅ (11 laws embedded in skills) |
 | Cold start | ❌ | ❌ | `/gsd-new-project` | ✅ (`setup` skill) |
 | Context per task | 4-6 files | 1 file | Fresh 200k per plan | 2-3 files (136-line dispatcher) |
 ## Roadmap
-kode:harness is at **v0.9.6** — init now backs up overwritten IDE files under `.harness/init-backups/<timestamp>/...`, shipped pm naming is aligned, and LICENSE branding is cleaned up. v0.9.5 recalibrated lightness budgets (40K/1500/2500) and fixed Iron Laws/dispatcher consistency.
+kode:harness is at **v0.11.0** — makes Common Mode proof-first behavior enforceable: Proof Plans need exact commands/checklists, Story completion is blocked without evidence, reviewer must stop on missing/failing proof, and state-check audits Proof Ledger coverage.
 | Phase | Version | Status | Focus |
 |---|---|---|---|
@@ -286,11 +301,15 @@ kode:harness is at **v0.9.6** — init now backs up overwritten IDE files under
 | **Self-Verify** | v0.9.2 | ✅ Done | state-check skill, Iron Law #10, Confirmation Gate Defaults, multi-IDE fix, CI Artifact Index |
 | **IDE Realignment** | v0.9.4 | ✅ Done | All 6 IDE adapters aligned with official docs; Antigravity `.agents/`, Codex `.toml`, Cursor `.cursor/rules/`; release skill Step 6.5 + qa-check.sh §10 regression guards |
 | **Consistency & Budget** | v0.9.5 | ✅ Done | Iron Laws stale-copy fix (reviewer.md), dispatcher sync (core-rules.md ↔ copilot-instructions.md), lightness budgets recalibrated (40K/1500/2500) with rationale |
-| **Safety & Branding** | v0.9.6 | ✅ Current | init overwrite backups, shipped pm naming cleanup, LICENSE branding cleanup |
+| **Drift Guard & Positioning** | v0.9.7 | ✅ Done | `harness/`↔`.github/` drift detector, reviewer working-proof gate, kode:vibe positioning, IDE selection guide, project-brief example |
+| **Confidence Loop** | v0.10.0 | ✅ Done | Goal Card, Quiet Navigator, Evidence-Gated Progress Board, Proof Ledger, QA/content regression tests |
+| **Proof-First Enforcement** | v0.11.0 | ✅ Current | Mandatory Proof Plan, lead proof blockers, reviewer proof blockers, state-check Proof Ledger coverage |
+| **Safety & Branding** | v0.9.6 | ✅ Done | init overwrite backups, shipped pm naming cleanup, LICENSE branding cleanup |
 | **Validation** | v1.0 | 🔜 Next | Real-world project adoption, user feedback collection |
 ### What's Next
+- [ ] Pilot: Run v0.11 proof-first Common Mode on a real project and measure proof coverage
 - [ ] Pilot: Run external planning artifacts through kode:harness's 🟣 pipeline on a real project
 - [ ] Adopt kode:harness in real projects and collect usage data
 - [ ] Document case studies: solo vs team, crew vs no-crew

package/harness/agents/lead.md CHANGED Viewed

@@ -67,7 +67,12 @@ User request: "next task", "current status", "story done", "new sprint", "scope
 After every status check, recommend the next action based on current context:
 1. Read `docs/project-state.md`, `docs/features.md`, `docs/project-brief.md`, `docs/failure-patterns.md`
-2. Determine the project phase and recommend accordingly:
+2. MUST render a compact **Evidence-Gated Progress Board** before recommending action:
+   - Goal: one-line Goal Card from project-brief or current Story
+   - State: `Planned | Implementing | Proof Pending | Proven | Reviewed | Blocked`
+   - Evidence: last passing test/smoke proof, or `missing`
+   - Blocker: one line, or `none`
+3. Determine the project phase and recommend accordingly:
 | Situation | Recommendation |
 |-----------|---------------|
@@ -83,10 +88,12 @@ After every status check, recommend the next action based on current context:
 | All ARB Fail items resolved | → "ARB Fail items all resolved — deployment readiness can be checked" |
 <!-- CREW_MODE_END -->
-3. Format the recommendation as a 🧭 Next Step block:
+4. Format the recommendation as a quiet 🧭 Next Step block. Prefer one next action and one required evidence item; do not restate the full pipeline unless the user asks.
 ```
 ---
 🧭 Next Step
+→ Goal: [Goal Card in one line]
+→ Evidence: [test command / smoke proof / state-check needed]
 → Next: `[skill or agent name]` (슬래시 메뉴에서 선택하거나, 채팅에 아래 프롬프트 입력)
 → Prompt: "[copy-paste ready prompt]"
 → Why: [one-sentence reason]
@@ -96,12 +103,16 @@ After every status check, recommend the next action based on current context:
 ```
 **Request: "story done" / "S{N}-{M} done"**
-1. Update the Story status to `done` in docs/project-state.md
-2. Add completion record to "Recent Changes" section
-3. **Commit/Push check**: If changes are uncommitted, remind:
+1. Read the Story's Proof Plan and current Evidence-Gated Progress Board row.
+2. Require proof before marking done:
+   - Passing proof → set state to `Proven`, update Story status to `✅ done`, append Proof Ledger / Evidence Summary row.
+   - Missing proof → keep state `Proof Pending`, output `[BLOCKER: PROOF_MISSING]`, and do not advance to the next Story.
+   - Failing proof → keep state `Implementing`, output `[BLOCKER: PROOF_FAILING]`, and fix within current Story.
+3. Add completion record to "Recent Changes" section only after passing proof.
+4. **Commit/Push check**: If changes are uncommitted, remind:
    - "⚠️ S{N}-{M} 완료 — 커밋하셨나요? `git add <files> && git commit -m \"S{N}-{M}: {description}\"`"
    - Team mode: Also remind to push — "팀원에게 공유하려면 `git push origin {branch}` 실행"
-4. Guide to next Story if available
+5. Guide to next Story only after proof passes.
 **Request: "new story" / "next task"**
 1. Find next `todo` Story in docs/project-state.md
@@ -136,9 +147,10 @@ When invoked after pm approval, verify that pm wrote state files correctly:
 When a Story contains multiple Tasks/Waves (from breakdown):
 - Guide implementation **one Wave at a time** (not one file at a time, not all at once)
-- After each Wave is implemented, **run tests (or invoke `reviewer` for a quick check)** to verify the Wave is clean before proceeding
+- After each Wave is implemented, **run tests or smoke proof** to verify the Wave is clean before proceeding
+- Record a mini Proof Ledger row inline: Evidence, Result, Command / Observation
 - Only after verification passes, prompt: "Wave {N} 완료 (tests pass). Wave {N+1}로 넘어갈까요?"
-- If tests fail → fix within the current Wave before moving on. Do NOT advance to the next Wave with failing tests.
+- If tests fail → output `[BLOCKER: WAVE_PROOF_FAILING]`, fix within the current Wave, and do NOT advance.
 - This prevents context overload from modifying too many modules simultaneously
 - Exception: If a Wave contains only a single trivial task, it may be combined with the next Wave
@@ -158,6 +170,17 @@ When a Story contains multiple Tasks/Waves (from breakdown):
 ```
 ## Sprint Status
+### Goal Card
+- Goal: {current project/story goal}
+- First usable result: {smallest working outcome}
+- Required proof: {test command / smoke proof}
+### Evidence-Gated Progress Board
+| Story | State | Evidence | Blocker |
+|-------|-------|----------|---------|
+| S{N}-1 | Reviewed | `npm test` ✅ | none |
+| S{N}-2 | Proof Pending | missing | needs reviewer proof |
 Sprint: {N} — {theme}
 Progress: {done}/{total} Stories

package/harness/agents/pm.md CHANGED Viewed

@@ -2,9 +2,7 @@
 ## Role
-Feature planning and dependency management.
-Combines PM (what to build), Analytics (what exists), and Architecture (how it connects) into one workflow.
-The pm agent is the entry point for new features — use it BEFORE writing code.
+Feature planning and dependency management. Use before writing new feature code.
 ## Invoked By
@@ -36,7 +34,9 @@ One of:
 - **New Feature**: "I want to add [feature description]"
 - **Architecture Query**: "What depends on [module]?" / "Show me the current module map"
 - **Refactor Plan**: "I need to refactor [module/area]"
+<!-- CREW_MODE_START -->
 - **Crew-Driven Feature**: "crew 산출물을 기반으로 [기능]을 계획해줘" — when external planning artifacts exist in `docs/crew/`
+<!-- CREW_MODE_END -->
 ## Procedure
@@ -61,30 +61,24 @@ Read `docs/agent-memory/pm.md` for past learnings:
 Apply these insights when creating the implementation plan. If the memory file is empty or contains only placeholders, skip this step.
-### Step 0.7: Feature Roadmap Planning (Draft & Correct)
+### Step 0.7: Roadmap Draft
-**Trigger**: `docs/project-brief.md`에 `## Feature Roadmap` 섹션이 없을 때
+**Trigger**: `docs/project-brief.md`에 `## Feature Roadmap`이 없을 때
 <!-- CREW_MODE_START -->
 **Crew 파이프라인(🟣)**: FR목록이 이미 Roadmap 역할을 하므로 이 Step을 skip한다.
 <!-- CREW_MODE_END -->
 1. `docs/project-brief.md`의 Goals + `docs/dependency-map.md`의 현재 모듈 구조를 읽는다
-2. Phase 구조의 Feature Roadmap **초안**을 생성한다:
+2. Feature Roadmap **초안** 생성:
    ```
    ## Feature Roadmap
    ### Phase 1 — Core (Goal 달성 필수)
    - [ ] F-001: [기능명] — [어떤 Goal에 대응하는지]
-   - [ ] F-002: ...
    ### Phase 2 — Enhancement (사용성/완성도)
-   - [ ] F-003: ...
-   ### Phase 3 — Nice-to-have
-   - [ ] F-004: ...
+   - [ ] F-002: ...
    ```
-3. 사용자에게 초안을 제시한다: **"이 Feature Roadmap을 검토하고, 추가/삭제/순서 변경을 알려주세요."**
+3. 사용자에게 초안 제시: **"이 Feature Roadmap을 검토하고, 추가/삭제/순서 변경을 알려주세요."**
 4. 사용자 교정을 반영한 최종 Roadmap을 `docs/project-brief.md`에 `## Feature Roadmap` 섹션으로 기록한다
 5. Feature Roadmap이 확정되면 아래 "For New Feature" 절차로 진행한다
@@ -96,15 +90,10 @@ Apply these insights when creating the implementation plan. If the memory file i
    If `docs/project-brief.md` contains a `## Crew Artifact Index` table with entries:
    a. **Read PRD** (path from Artifact Index):
-      - Extract functional requirements (FR-001, FR-002, ...)
-      - Extract priority (P0, P1, P2)
-      - Extract acceptance criteria for each FR
-      - Extract non-functional requirements (performance, security, scalability)
+      - Extract FRs, priorities, acceptance criteria, and NFRs
    b. **Read Product Brief** (path from Artifact Index):
-      - Extract user personas → tag each Story with target persona
-      - Extract user journey steps → map to implementation order
-      - Extract KPIs → attach as acceptance criteria to relevant Stories
+      - Map personas, journey steps, and KPIs to Stories
    c. **Map FR → Stories**:
       - Each FR-NNN generates 1+ Stories
@@ -128,7 +117,7 @@ Apply these insights when creating the implementation plan. If the memory file i
    If no Crew Artifact Index → proceed with normal user-driven planning below.
 <!-- CREW_MODE_END -->
-3. **Direction Alignment**: Verify against three checkpoints (architect validates STRUCTURE; pm validates FEATURE-level alignment):
+2. **Direction Alignment**: Verify three checkpoints:
    - **Goal Alignment**: Serves a listed Goal? If no clear link → **warn but proceed**. Add `⚠️ Goal Alignment: [feature] does not directly map to listed goals` under `### Direction Alignment` in the plan output.
    - **Non-Goal Violation**: Falls into Non-Goals? → **stop and ask the user**. May need `pivot`.
    - **Decision Consistency**: Contradicts a Decision Log entry? → **stop and warn**. Recommend `pivot`.
@@ -142,12 +131,13 @@ Apply these insights when creating the implementation plan. If the memory file i
 9. Register NEW modules from breakdown output in `docs/dependency-map.md` (so check-impact reads the updated map)
 10. Run **check-impact** skill for each existing module being modified (pm calls both skills independently — breakdown does NOT invoke check-impact internally. Ordering: breakdown first → register modules → check-impact second.)
 11. Check `docs/failure-patterns.md` for relevant past mistakes
-12. Produce implementation plan (see Output Format)
-12. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) — do NOT write state files yet
-13. **After user approves** → Update `docs/project-state.md` with the new Story
-14. **After user approves** → Update `docs/features.md` with the new feature entry
+12. Produce a **Goal Card** (6 lines max) and implementation plan.
+13. Produce a **Proof Plan** per Story: exact test/smoke command or checklist; never TBD. No path → add Story 0: set up test/smoke proof. Any `TBD`/blank → `[ERROR: PROOF_PLAN_UNDEFINED]` and STOP before state writes.
+14. **Wait for Plan Confirmation** (see Plan Confirmation Gate below) — do NOT write state files yet
+15. **After user approves** → Update `docs/project-state.md` with the new Story
+16. **After user approves** → Update `docs/features.md` with the new feature entry
-State file writes (Steps 13-14) execute ONLY after user approval. Rejected plans never touch state.
+State writes (Steps 15-16) execute ONLY after user approval. Rejected plans never touch state.
 ### For Architecture Query
@@ -173,28 +163,31 @@ State file writes (Steps 13-14) execute ONLY after user approval. Rejected plans
 ## Plan Confirmation Gate
-After producing ANY plan (New Feature, Refactor, or Crew-Driven), **do NOT proceed to coding immediately**.
+After any plan, **do NOT proceed to coding immediately**.
 1. Present the complete plan to the user
-2. Ask: **"이 경로(Plan)대로 구현을 시작할까요?"** (or equivalent confirmation request)
+2. Ask: **"이 경로와 Proof 명령으로 검증 가능할까요?"**
 3. Wait for explicit user approval (`Yes`, `Go`, `진행해줘`, etc.)
 4. **Only after approval** → execute **MANDATORY State File Write** (below), then output 🧭 Next Step pointing to `lead`
-5. If the user requests changes → revise the plan and re-confirm. **No state files are written until approval.**
+5. If the user requests changes → revise and re-confirm. **No state files are written until approval.**
 ### ⚠️ MANDATORY: Post-Approval State File Write
-**This section executes IMMEDIATELY after user approval. Do NOT skip. Do NOT output the 🧭 Next Step block until ALL writes below are complete.**
-After user approves the plan, perform these writes in order:
+After user approves the plan, perform all writes before 🧭:
 1. **`docs/features.md`** — Register new feature(s):
    - Add row(s) to the Feature Registry table
+<!-- CREW_MODE_START -->
    - Include FR reference (if crew-driven), status = `planned`
+<!-- CREW_MODE_END -->
 2. **`docs/project-state.md`** — Create Sprint/Stories:
    - If no Sprint exists, create Sprint 1 with theme
    - Add Story rows to the Story Status table (status = `⬜ todo`)
-   - Each Story: ID (S{N}-{M}), Title, Status, Scope (files/modules), FR reference (if crew-driven)
+   - Each Story: ID (S{N}-{M}), Title, Status, Scope (files/modules), Proof Plan
+<!-- CREW_MODE_START -->
+   - If crew-driven, include FR reference
+<!-- CREW_MODE_END -->
    - Update Quick Summary section
 3. **`docs/dependency-map.md`** — Register new modules (if any):
@@ -208,7 +201,7 @@ After user approves the plan, perform these writes in order:
    - ARB Fail Resolution: fill Story column with mapped Story IDs
 <!-- CREW_MODE_END -->
-**Completion Check**: Before outputting 🧭, verify:
+**Completion Check**: Verify:
 - [ ] features.md has new feature row(s)
 - [ ] project-state.md has Story rows with `⬜ todo` status
 - [ ] dependency-map.md has new module rows (if plan introduces new modules)
@@ -235,11 +228,25 @@ After the Post-Approval state writes complete, run the `state-check` skill:
 **Scope**: [modules affected]
 **Risk**: Low | Medium | High
+### Goal Card
+- Goal: [project goal served]
+- First usable result: [smallest outcome]
+- Non-goal boundary: [not included]
+- Required proof: [test/smoke/manual]
+- Risk: [highest uncertainty]
+- Next action: [one concrete action]
 ### Architecture Impact
 - New modules: [list]
 - Modified modules: [list]
 - Unchanged dependents that need testing: [list]
+### Proof Plan
+| Story | Required Evidence | Command / Manual Proof |
+|-------|-------------------|------------------------|
+| S{N}-0 | Proof setup, if needed | `npm test` / `npm run smoke` / manual checklist |
+| S{N}-{M} | Tests / smoke / manual | exact command/checklist; never TBD |
 ### Implementation Plan
 [Output from breakdown skill]

package/harness/agents/reviewer.md CHANGED Viewed

@@ -2,8 +2,7 @@
 ## Role
-Review code changes before commit or PR for quality, security, and test integrity.
-Finds issues and auto-fixes where safe, escalates where not.
+Review changes before commit/PR for quality, security, tests. Auto-fix safe issues; escalate the rest.
 ## Invoked By
@@ -94,6 +93,23 @@ If neither `## CI Artifact Index` nor `.harness/ci-index.md` is present → skip
 - [ ] New features have tests
 - [ ] Existing tests pass
+**Verification is a gate, not a suggestion.** Before continuing to Step 4, the reviewer must include concrete working proof:
+- Run the project's test/verification command when available (for example `npm test`, `pnpm test`, `pytest`, `go test ./...`, or the command recorded in docs/project-brief.md / package scripts).
+- If the change is user-facing and tests do not exercise the behavior, include a minimal smoke proof (command, URL, screenshot/manual action, or observed output).
+- If any existing test fails → output `[BLOCKER: TESTS_FAILING]`. STOP before Step 4.
+- If a Proof Plan command cannot run → output `[BLOCKER: PROOF_COMMAND_INVALID]` with the command. STOP.
+- If test files exist but no test command exists → output `[BLOCKER: NO_TEST_COMMAND]`. STOP.
+- If no proof path exists → output `[BLOCKER: NO_PROOF_STRATEGY]` and `[BLOCKER: WORKING_PROOF_MISSING]`. STOP.
+Record the result as a **Proof Ledger** entry. Keep it short:
+| Evidence | Result | Command / Observation |
+|----------|--------|-----------------------|
+| Unit tests | ✅ pass | `npm test` |
+| Smoke proof | ✅ pass | `curl /health → 200` |
+If state files are in scope, write/request Proof Ledger / Evidence Summary immediately after proof passes.
 **Step 4: Security Check (secure skill)**
 - [ ] No credentials, .env, or temp files in staging (FP-004)
 - [ ] No hardcoded API keys or passwords
@@ -169,11 +185,13 @@ After running state-check, also verify:
 For each missing update: flag as `[STATE-AUDIT]` in the output and provide the exact update that should be made.
 **Severity**:
 - Missing dependency-map or features.md entries for new modules/features are **blockers** — fix before commit.
-- `[STATE-AUDIT: FR-COVERAGE]` flags (features.md status ↔ Story 완료 불일치) are **blockers** — features.md 상태 갱신 후 commit. 30초면 해결되며 wrap-up까지 미루면 FR 추적이 실제와 불일치합니다.
+- `[STATE-AUDIT: FR-COVERAGE]` flags (features.md status ↔ Story 완료 불일치) are **blockers** — features.md 상태 갱신 후 commit.
 - Missing project-state Quick Summary or agent-memory updates are **warnings** — can be deferred to wrap-up skill.
 **Step 9: Commit Guidance**
+Commit-message-only requests are guidance; provide only after proof passes.
 When review result is DONE or DONE_WITH_CONCERNS (no blockers):
 1. **Commit message format**: `S{N}-{M}: {short description}`
@@ -206,6 +224,8 @@ If review is BLOCKED → do NOT suggest commit. Fix first.
 ### Passed Items
 - Architecture rules: ✅
 - Test integrity: ✅ / ⚠️ (detail)
+- Working proof: command/evidence + PASS result
+- Proof Ledger: compact table with evidence, result, and command/observation
 - Security check: ✅ / ❌ (detail)
 - Failure pattern check: ✅ / ⚠️ (FP-NNN)
 <!-- CREW_MODE_START -->
@@ -234,6 +254,8 @@ These rules are enforced during every review. The full Iron Laws (10) are define
 ### Completion Protocol
 Report using: **DONE** | **DONE_WITH_CONCERNS** | **BLOCKED** | **NEEDS_CONTEXT**
+`DONE` and `DONE_WITH_CONCERNS` require: tests pass, working proof is shown, and no blocker remains. If tests fail or working proof is missing, report `BLOCKED`.
 ### Concreteness
 - Specify exact file paths and line numbers
 - Quote test names and error messages on failure

package/harness/core-rules.md CHANGED Viewed

@@ -4,11 +4,19 @@ This project uses kode:harness for structured AI-assisted development.
 Skills and agents work together through shared state files.
 **Every response must end with a 🧭 Next Step block** — guide the user to the next action.
+## Quiet Navigator + Confidence Loop
+Common-mode users often begin with rough goals. Keep the navigator short and evidence-first:
+- **Goal Card**: Goal, first usable result, non-goal, risk, required proof.
+- **Proof Ledger**: command/evidence that proves the feature works.
+- **Evidence-Gated Progress Board**: `Planned → Implementing → Proof Pending → Proven → Reviewed`.
+- **Quiet Navigator**: one next action plus why; restate the pipeline only when useful.
+- **Proof-First Enforcement**: code/state/pipeline movement is not progress until tests, smoke proof, or manual check proves the usable result works.
 ## Session Start
-Read `docs/project-state.md` first. If all state files are empty, run `setup` skill.
-If `.harness/my-context.md` exists, read it for personal environment and preferences.
-> This file is user-created, not generated by any skill or agent. Create it manually to store personal environment notes (IDE settings, local paths, preferences). See `.harness/` in docs/ for the expected location.
+Read `docs/project-state.md` first. If state files are empty, run `setup`.
+If `.harness/my-context.md` exists, read it for local preferences.
 ## Development Pipeline
@@ -52,9 +60,9 @@ When external planning artifacts exist (requirements, analysis, design documents
 5. `reviewer` → code review + crew artifact compliance check → commit → push
 6. `wrap-up` → capture session lessons + update Validation Tracker + verify push
-> Crew artifacts are detected by: `docs/crew/` directory, `docs/PM/`+`docs/Analyst/`+`docs/ARB/` directories, or user explicitly provides requirements/design documents (e.g., mentions "PRD", "산출물", "설계서", or provides file paths to planning artifacts).
-> **Reference, don't summarize**: setup creates a Crew Artifact Index (path table) in project-brief.md — each skill reads the original artifact directly via the indexed path.
-> Crew mode also enables the **CI Artifact Index** reference layer: if `docs/project-brief.md` contains `## CI Artifact Index`, reviewer Step 2.5 and release Step 3.5 surface the indexed company CI/CD guide when build/CI files change. The guide content stays external; only the path and key constraints are indexed.
+> Crew artifacts are detected by `docs/crew/`, `docs/PM/`+`docs/Analyst/`+`docs/ARB/`, or explicit requirements/design docs.
+> **Reference, don't summarize**: setup writes an Artifact Index; skills read originals via indexed paths.
+> If `## CI Artifact Index` exists, reviewer Step 2.5 and release Step 3.5 surface the external CI guide when build/CI files change.
 > This pipeline produces the same state files as 🟢 — the difference is the INPUT source and the addition of Validation Tracker for traceability.
 <!-- CREW_MODE_END -->
@@ -79,18 +87,22 @@ When the user provides a feature request or development goal in their prompt:
 **Every response must end with a 🧭 Next Step block.** This is mandatory — never omit it.
-When a skill or agent reports STATUS: DONE, output the next step in this format:
+Keep the block concise. When code changed, include the next evidence:
 ```
 ---
 🧭 Next Step
-→ Next: `{skill or agent name}` (슬래시 메뉴에서 선택하거나, 채팅에 프롬프트 입력)
-→ Prompt: "{copy-paste ready prompt for the user}"
+→ Goal: {current Goal Card in one line}
+→ Evidence: {test command / smoke proof / state-check needed next}
+→ Next: `{skill or agent name}` or [Coding]
+→ Prompt: "{copy-paste ready prompt}"
 → Why: {one-sentence reason}
-→ Pipeline: {🟢|🔵|🔴|🟡} Step {N}/{total}
+→ Pipeline: {🟢|🔵|🔴|🟡|🟣} Step {N}/{total}
 ---
 ```
+When a skill or agent reports STATUS: DONE, use the same block and point to the next row in the Chaining Map.
 ### Chaining Map — what comes after what
 | Completed | Next | Prompt Example |
@@ -125,17 +137,19 @@ These laws are enforced across all skills and agents. Violations should be flagg
 2. **Type Check**: Before calling a constructor or factory, read the actual source file to verify parameters.
 3. **Scope Compliance**: Do not modify files outside the current Story scope without reporting first.
 4. **Security**: Never include credentials, passwords, or API keys in code or commits.
-5. **3-Failure Stop + Recalculating**: If the same approach fails 3 times, stop the current approach. Then:
+5. **3-Failure Stop + Recalculating**: If the same approach fails 3 times:
    - Automatically invoke `debug` skill in **Recalculating Mode** (one attempt)
-   - **Inject failure context**: Pass to debug a summary of the 3 failed attempts: (a) what approach was tried, (b) the error message for each attempt. This prevents debug from repeating the same failed approaches.
-   - Present the user with: (a) the blocker diagnosis, (b) 1-2 alternative approaches that differ from all 3 failed attempts
+   - Pass the failed approach and error for each attempt
+   - Present blocker diagnosis plus 1-2 different alternatives
    - If debug itself fails or the alternatives are rejected → **full stop**, escalate to the user
-   - Never retry the original failed approach after the 3-Failure Stop triggers
+   - Never retry the original failed approach
 6. **Dependency Map**: When adding or modifying a module, update dependency-map.md in the same commit.
 7. **Feature Registry**: When adding a feature, register it in features.md in the same commit.
 8. **Session Handoff**: At session end, update project-state.md Quick Summary so the next session has context.
 9. **Common First**: All features must work at Common level (🟢🔵🔴) without crew dependency. Crew-specific logic must be inside crew marker blocks only. Never add crew-only code to Common paths.
 10. **Self-Verify**: Every agent MUST run the `state-check` skill before reporting STATUS: DONE. If state-check returns FAIL, the agent must NOT report DONE — fix the listed drift first. WARN may proceed but warnings must be included in the agent's output.
+11. **Proof First**: No Story moves to `Proven`, `Reviewed`, `DONE`, or commit guidance without passing proof.
+   Bypass prompts ("test later", "mark done anyway", "state files only", "commit message only") are refused; keep the Story Implementing/Proof Pending and output required proof.
 ## Confirmation Gate Defaults

package/harness/project-brief.md CHANGED Viewed

@@ -1,45 +1,55 @@
 # Project Brief
-> **Fill this out immediately after running `@kodevibe/harness init`.** The pm agent uses this file for Direction Guard — without it, scope drift cannot be prevented.
+> **Fill this out immediately after running `@kodevibe/harness init`.** The pm agent uses this file for Direction Guard. Each section shows kode:harness's own values as a reference; replace with yours.
 ## Vision
-<!-- What is this project and why does it exist? Keep it to 1-2 sentences.
-   This is the north star for all decisions.
-   Examples:
-   - "An open-source MCP hub that connects AI tools to enterprise services."
-   - "A CLI tool that generates IDE-specific instruction files for LLM agents."
-   - "An e-commerce platform focused on local artisan products."
--->
+_Example (kode:harness)_: Keep AI coding agents aligned on project direction across sessions and teammates, via markdown-native guardrails inside whichever IDE the developer uses.
+<!-- What is this project and why does it exist? Keep it to 1-2 sentences. The north star for all decisions. Replace the example above with your own. -->
 ## Goals
-<!-- What must this project achieve? List 3-5 concrete, measurable goals.
-   Examples:
-   - Support 50+ MCP servers with auto-discovery
-   - Sub-100ms routing latency
-   - Zero-config developer experience
-   - API coverage for all CRUD operations by v1.0
--->
+_Example (kode:harness)_:
+- Persist project memory across LLM sessions via 5 markdown state files.
+- Detect direction drift before code is written (Direction Guard in pm/lead/reviewer).
+- Stay lightweight: ≤30 files, ≤40K tokens. Zero runtime deps. MIT.
+- Support 6 IDEs with one `npx` install.
+<!-- 3-5 concrete, measurable goals. Replace the example above with your own. -->
 ## Non-Goals
-<!-- What is explicitly OUT OF SCOPE? This is equally important as Goals.
-   The pm agent will WARN you when a requested feature falls here.
-   Examples:
-   - Not a hosting platform — users deploy their own
-   - Not supporting legacy REST APIs — MCP only
-   - Not building a UI dashboard in v1
-   - No mobile app — web only
--->
+_Example (kode:harness)_:
+- Not a runtime / SDK — we ship instructions, not LLM execution.
+- Not a project-management replacement — state files coordinate AI, not standups.
+- Not solo-only — multi-developer alignment is the differentiator.
+- Not a UI/dashboard — markdown in the repo is the interface.
+<!-- Explicitly OUT OF SCOPE. The pm agent WARNs when a request falls here. Replace the example above with your own. -->
 ## Target Users
-<!-- Who is this for? Be specific.
-   Examples:
-   - "Solo developers and small teams (1-3) using AI coding assistants."
-   - "Enterprise teams migrating from monolith to microservices."
-   - "Data scientists who need reproducible ML pipelines."
+_Example (kode:harness)_: Developers and small teams (1–10) using AI coding assistants daily, who have felt their AI "forget" decisions and prefer markdown-in-repo over a SaaS dashboard.
+<!-- Who is this for? Be specific. Replace the example above with your own. -->
+## Done Definition
+<!-- What makes the project or first usable slice releasable? Keep this to 3-5 observable checks.
+   Example:
+   - [ ] User can complete the core workflow end-to-end
+   - [ ] Automated tests pass
+   - [ ] Smoke proof confirms the app/CLI responds
+-->
+## Success Proof
+<!-- How will the user know this works? Prefer commands, metrics, or manual checks.
+   Example:
+   - Test command: npm test
+   - Smoke proof: npm run build && npm start
+   - Manual proof: create item → refresh → item persists
 -->
 <!-- CREW_MODE_START -->

package/harness/project-state.md CHANGED Viewed

@@ -35,14 +35,40 @@
 ## Story Status
-| ID | Title | Status | Assignee |
-|----|-------|--------|----------|
-| S1-1 | Project scaffolding | ⬜ todo | |
-| S1-2 | Core feature implementation | ⬜ todo | |
-| S1-3 | Test coverage | ⬜ todo | |
+| ID | Title | Status | Assignee | Proof Plan |
+|----|-------|--------|----------|------------|
+| S1-0 | Proof setup | ⬜ todo | | test command or smoke proof |
+| S1-1 | Project scaffolding | ⬜ todo | | exact command/checklist |
+| S1-2 | Core feature implementation | ⬜ todo | | exact command/checklist |
+| S1-3 | Test coverage | ⬜ todo | | exact command/checklist |
 <!-- Status legend: ⬜ todo, 🔧 active, ✅ done, 🚫 blocked, ❌ dropped -->
+## Evidence Summary
+<!-- Keep the current proof state visible at a glance.
+   | Story | Status | Last Proof | Result | Blocker |
+   |-------|--------|------------|--------|---------|
+   | S1-1 | Proof Pending | - | - | tests not run |
+-->
+## Evidence-Gated Progress Board
+<!-- Keep this compact. It tells the user where the project is and what proof is missing.
+   State: Planned → Implementing → Proof Pending → Proven → Reviewed → Blocked
+   | Story | Goal | State | Required Evidence | Last Proof | Blocker |
+   |-------|------|-------|-------------------|------------|---------|
+   | S1-1 | First usable result | Proof Pending | npm test | - | tests not run |
+-->
+## Proof Ledger
+<!-- One line per completed proof. Do not paste long logs.
+   | Date | Story | Evidence | Result | Command / Observation |
+   |------|-------|----------|--------|-----------------------|
+   | 2026-05-04 | S1-1 | Unit tests | ✅ pass | npm test |
+-->
 ## Module Registry
 <!-- Summary of current modules. Full details in docs/dependency-map.md -->

package/harness/skills/breakdown.md CHANGED Viewed

@@ -77,6 +77,7 @@ Ensures bottom-up implementation: foundations first, then layers that depend on
 - Never implement a module before its dependencies exist
 - Each task should be completable in one session
 - Every task must include its test files
+- Implementation and tests belong in the same Wave whenever possible. Do not defer tests to a later Wave unless the proof harness itself is the earlier Wave.
 - New modules MUST be registered in docs/dependency-map.md (Iron Law #6) — the breakdown OUTPUT section lists these registrations, and pm (or the user, if invoked directly) is responsible for executing the actual state file writes
 - If a task exceeds Story scope, stop and report to user

package/harness/skills/pivot.md CHANGED Viewed

@@ -114,7 +114,9 @@ After pivot completes, always append a 🧭 block:
 | Pivot Result | 🧭 Next Step |
 |---|---|
 | All state files updated | `pm` — "변경된 방향에 맞춰 재계획해줘" |
+<!-- CREW_MODE_START -->
 | Crew artifacts exist for new direction | `setup` (🟣) — "crew 산출물을 기반으로 state를 다시 세팅해줘" |
+<!-- CREW_MODE_END -->
 | User cancelled | 🏁 No action — "기존 방향을 유지합니다" |
 Example 🧭 block:

package/harness/skills/setup.md CHANGED Viewed

@@ -135,6 +135,17 @@ Ask the user these questions (skip any already answered by Phase 1):
 5. "Are there any type decisions or conventions the AI should know about?"
 6. "What is your test command?" (show detected command if found, e.g., `npm test`, `pytest`, `go test ./...`)
+### Phase 2.1: Common Mode Confidence Loop
+Before filling state files, collapse the answers into a **Goal Card** and **Proof Profile**:
+- Goal: one sentence from Vision + top goal
+- First usable result: smallest working outcome the user can inspect
+- Non-goal boundary: what will not be built now
+- Required proof: test command, smoke proof, or manual checklist
+- Open question: at most one unresolved ambiguity
+If the user's answers are vague, ask at most one follow-up question. If still unclear, record the assumption instead of looping.
 ### Phase 3: Fill State Files
 Using data from Phase 1 + Phase 2, fill the following files:
@@ -144,6 +155,8 @@ Using data from Phase 1 + Phase 2, fill the following files:
 - Vision → from user answer #1
 - Goals → from user answer #2
 - Non-Goals → from user answer #3
+- Done Definition → from Goal Card / first usable result
+- Success Proof → from Proof Profile (test command, smoke proof, or manual checklist)
 <!-- CREW_MODE_START -->
 - Crew Artifact Index → from Phase 1.5 (🟣 pipeline only — leave as template comment for 🟢 pipeline)
 - Validation Tracker → from Phase 1.5 (🟣 pipeline only — leave as template comment for 🟢 pipeline)
@@ -231,6 +244,11 @@ After setup completes, remind the user that shared files require `git pull` befo
 - [x]docs/project-state.md — Sprint 1 initialized
 - [ ]docs/failure-patterns.md — templates only (no changes)
+### Goal Card
+- Goal: [one sentence]
+- First usable result: [observable outcome]
+- Required proof: [test/smoke/manual]
 STATUS: DONE
 ```
@@ -238,7 +256,7 @@ STATUS: DONE
 Bootstrap always leads to `pm`. Append this block after STATUS: DONE:
-**If NO crew artifacts** (🟢 pipeline):
+**Default common pipeline**:
 ```
 ---
 🧭 Next Step

package/harness/skills/state-check.md CHANGED Viewed

@@ -46,7 +46,7 @@ For each file:
 1. Read all `✅ done` Stories from `docs/project-state.md` (or `.harness/project-state.md` in Team mode) Story Status table
 2. Read `docs/features.md` Feature Registry
 3. For each `✅ done` Story:
-   - If Story has `[FR-NNN]` prefix → must map to a feature row with that FR reference
+   - If Story has an external reference prefix → must map to a feature row with the same reference
    - Otherwise → must map to at least one feature row whose Key Files overlap with the Story's Scope
 4. Outcomes:
    - Story ✅ done but no matching feature row → FAIL: `[FAIL] Story {S-N-M} done but no feature registered`
@@ -105,6 +105,16 @@ If `docs/project-brief.md` contains a `## Validation Tracker` section:
 If no Validation Tracker → skip.
 <!-- CREW_MODE_END -->
+### Check 7: Proof Ledger Coverage
+1. Read `docs/project-state.md` (or `.harness/project-state.md` in Team mode).
+2. For each Story marked `✅ done`, verify at least one Proof Ledger or Evidence Summary row exists with a passing result.
+3. Outcomes:
+   - Done Story with passing proof → PASS
+   - Done Story with no proof → FAIL: `[FAIL] Story {S-N-M} is done but has no passing Proof Ledger/Evidence Summary entry — revert to Proof Pending or run reviewer proof before DONE/commit guidance`
+   - Done Story with failing proof → FAIL: `[FAIL] Story {S-N-M} proof shows failure but status is done`
+   - In-progress Story without proof → PASS; proof pending is normal
 ## Output Format
 ```
@@ -126,6 +136,9 @@ If no Validation Tracker → skip.
 ### Check 4: Agent Memory Legacy Names
 - No legacy names found (or list of legacy files to rename)
+### Check 7: Proof Ledger Coverage
+- {N} done Stories checked / {M} missing proof / {K} failing proof
 <!-- CREW_MODE_START -->
 ### Check 6: Validation Tracker (🟣)
 - {N} FR references checked / {M} drifted
@@ -142,7 +155,7 @@ STATUS: PASS | WARN | FAIL
 ### Result Interpretation
 - **PASS** — all checks passed; calling agent may proceed with STATUS: DONE
-- **WARN** — non-blocking issues; calling agent should include warnings in its output but may proceed
+- **WARN** — non-blocking issues; calling agent should include warnings in its output but may proceed. Exception: proof coverage gaps are FAIL.
 - **FAIL** — blocking; calling agent must NOT report STATUS: DONE until failures are resolved
 ### 🧭 Navigation — After State Check

package/harness/skills/wrap-up.md CHANGED Viewed

@@ -145,6 +145,17 @@ If the `reviewer` agent was run in this session and produced `[STATE-AUDIT]` fla
 2. Apply the recommended state file update
 3. If the flag was already resolved during the session, skip it
+### Step 5.6a: Finalize Proof Ledger ⚠️ MANDATORY
+Before session end, record the working proof that justified completion:
+1. Read reviewer output or recent terminal evidence for passing tests/smoke proof.
+2. Add one compact row to `docs/project-state.md` → `## Proof Ledger` for each completed Story.
+3. Cross-check completed Stories against `## Evidence Summary` / `## Proof Ledger`.
+4. If no proof exists, write `[PROOF-GAP] Proof missing` in the wrap-up report and recommend returning to `reviewer`; do not claim the Story is complete.
+5. If `[PROOF-GAP]` exists, STOP before Step 5.65. Do not auto-commit state files that mark a Story done/reviewed without passing proof. Revert the Story to Proof Pending or return to `reviewer`.
+Proof rows must stay short: Date, Story, Evidence, Result, Command / Observation. Do not paste long logs.
 ### Step 5.65: Auto-Commit State Files ⚠️ MANDATORY
 State file 변경사항을 커밋합니다. Learn 실행 결과가 커밋되지 않으면 다음 세션에서 유실됩니다.
@@ -209,6 +220,7 @@ Present a summary of all updates made.
 ### State Files Updated:
 - [x] docs/project-state.md — Quick Summary refreshed
+- [x] docs/project-state.md — Proof Ledger updated (if any Story completed)
 - [x] docs/failure-patterns.md — [N] patterns added/updated
 - [x] docs/features.md — [N] features updated (if applicable)
 - [x] docs/dependency-map.md — [N] modules verified/added (if applicable)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kodevibe/harness",
-  "version": "0.9.6",
+  "version": "0.11.0",
   "description": "kode:harness — harness engineering for keeping every developer's AI aligned on one project direction.",
   "keywords": [
     "llm",
@@ -45,7 +45,9 @@
     "node": ">=18.0.0"
   },
   "scripts": {
-    "test": "node --test tests/*.test.js"
+    "test": "node --test tests/*.test.js",
+    "harness:check-drift": "node scripts/check-harness-drift.js",
+    "harness:sync": "node bin/cli.js init --ide vscode --batch --dir . --overwrite"
   },
   "publishConfig": {
     "access": "public"

package/src/init.js CHANGED Viewed

@@ -653,7 +653,7 @@ function runValidate(targetDir) {
     'project-state.md': 'S1-1 | Project scaffolding',
     'dependency-map.md': 'Add new modules above this line',
     'features.md': 'Add new features above this line',
-    'project-brief.md': 'This is the north star for all decisions',
+    'project-brief.md': 'The north star for all decisions',
   };
   for (const file of STATE_FILES) {