npm - @haposoft/cafekit - Versions diffs - 0.8.7 → 0.8.9 - Mend

@haposoft/cafekit 0.8.7 → 0.8.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/package.json +1 -1
package/src/claude/CLAUDE.md +2 -2
package/src/claude/agents/code-auditor.md +9 -1
package/src/claude/agents/inspector.md +24 -1
package/src/claude/agents/spec-maker.md +26 -22
package/src/claude/agents/test-runner.md +27 -5
package/src/claude/hooks/spec-state.cjs +2 -1
package/src/claude/migration-manifest.json +2 -1
package/src/claude/rules/workflow.md +5 -4
package/src/claude/scripts/validate-spec-output.cjs +271 -0
package/src/claude/skills/code-review/references/spec-compliance-review.md +1 -1
package/src/claude/skills/develop/SKILL.md +43 -12
package/src/claude/skills/develop/references/quality-gate.md +43 -40
package/src/claude/skills/specs/SKILL.md +32 -27
package/src/claude/skills/specs/references/review.md +2 -2
package/src/claude/skills/specs/rules/tasks-generation.md +35 -9
package/src/claude/skills/specs/templates/task.md +43 -33
package/src/claude/skills/sync/SKILL.md +2 -2
package/src/claude/skills/sync/references/sync-protocols.md +5 -5
package/src/claude/skills/test/SKILL.md +32 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.8.7",
+  "version": "0.8.9",
   "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",

package/src/claude/CLAUDE.md CHANGED Viewed

@@ -37,7 +37,7 @@ These rules reduce common agent coding failures: hidden assumptions, overbuilt s
 ### 4. Goal-Driven Execution
 - Convert requests into verifiable success criteria.
-- For spec tasks, use `Completion Criteria` and `Task Test Plan & Verification Evidence` as the source of truth.
+- For spec tasks, use `Completion Criteria` and `Evidence` as the source of truth. Existing task files may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
 - For bugs, reproduce with a failing test or concrete evidence when feasible before fixing.
 - Loop until verification passes or a real blocker is recorded.
@@ -65,7 +65,7 @@ Use this loop for non-trivial work:
 A task is done only when all apply:
 - implementation satisfies `Completion Criteria`
-- `Task Test Plan & Verification Evidence` is satisfied with concrete proof
+- `Evidence` is satisfied with concrete proof
 - preflight/build/test outcomes are passing or an explicit blocker is recorded
 - code review has no critical issues
 - a verification receipt exists before task state is synced to `done`

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -16,16 +16,20 @@ You DO NOT fix code. You only READ, SCORE, and REPORT.
 ## Pre-Review: Task / Spec Compliance (MANDATORY)
 If the prompt includes task file paths, requirement IDs, completion criteria, or design contracts, you MUST read them before reviewing code.
+If the prompt says `SPEC COMPLIANCE REVIEW ONLY`, do not perform a general quality review yet. First prove the implementation matches the active task, `scope_lock`, requirements, design contracts, and scout-discovered runtime entrypoints.
+Do NOT trust implementer reports. Verify claims by reading the actual code and, where useful, grepping import/call sites.
 Extract and verify:
 1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
 2. Declared task scope (`Related Files` and direct support files that are clearly justified)
 3. Completion Criteria
-4. Task Test Plan & Verification Evidence expectations (or legacy Verification & Evidence)
+4. Task Evidence expectations (or Task Test Plan & Verification Evidence / legacy Verification & Evidence)
 5. Canonical Contracts & Invariants from the design
 6. Named technologies and runtime choices that the task/spec explicitly requires
+7. Runtime entrypoints/callers and reachability obligations from task evidence or the task-aware scout report
 Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
+Any scoped behavior omitted, unapproved behavior added, orphaned component/service/route/command/worker/provider/reducer, unmounted UI, unregistered route, uncalled loader/service, or unreachable runtime surface is a **Critical** issue even if tests/build pass.
 If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
 ## Pre-Review: Blast Radius Check (MANDATORY)
@@ -61,6 +65,8 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Hunt serious logic bugs (crashes, data loss, infinite loops).
 - Hunt severe architecture violations (circular imports, cross-layer coupling).
 - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
+- Hunt reachability failures: created exports with no importers, UI not mounted, route not registered, service/data loader never called, provider never wrapping consumers, reducer/action disconnected from runtime state, CLI/worker/manifest not wired.
+- Hunt scope drift: accepted requirement omitted or out-of-scope behavior added without spec amendment.
 - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
 - Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
 - Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
@@ -131,6 +137,8 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
 **Automatic Criticals:**
 - Missing required entrypoint/artifact/runtime output named in the task/spec
+- Runtime-facing artifact exists only as orphaned or unreachable code: component/export unused, UI unmounted, route unregistered, service/loader uncalled, provider not mounted, reducer/action disconnected, command/worker/manifest not wired
+- Missing scoped acceptance criteria or behavior outside `scope_lock` without a spec amendment
 - Placeholder scaffolding marked as complete when the task demanded real wiring
 - Auth/session/transport/persistence behavior that contradicts the design contracts
 - Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification

package/src/claude/agents/inspector.md CHANGED Viewed

@@ -8,7 +8,7 @@ model: haiku
 # Inspect — Codebase Scout
 You hold two primary roles depending on when you are called:
-1. **Architecture Scout (Pre-coding):** Quickly map out directory trees to identify the EXACT FILES relevant to a new feature.
+1. **Task-Aware Architecture Scout (Pre-coding):** Quickly map out directory trees, runtime entrypoints, integration points, and exact files relevant to the active task.
 2. **Edge Case Scout (Code Review phase):** Quickly grep and scan the codebase to find where modified functions/components are imported elsewhere. You hunt for hidden side-effects and boundary errors to inform the `code-auditor`.
 You scout. You DO NOT analyze bugs deeply and you NEVER modify code.
@@ -21,6 +21,8 @@ Before packaging your report, verify:
 - [ ] Followed the 2-Phase rule: (Phase 1) Quick scan via `Glob`/`ls` for root structure. (Phase 2) Read specific files to narrow down scope.
 - [ ] Did NOT dump thousands of files. Only reported CORE relevant files.
 - [ ] Noted the layer/tier of each file (e.g., API files = backend, Component files = frontend).
+- [ ] Identified the runtime entrypoint/caller for runtime-facing work, or explicitly reported that it could not be determined.
+- [ ] Checked whether prior task outputs are currently imported, mounted, registered, invoked, or still orphaned.
 - [ ] Report is Short, Solid, and Sharp.
 ## Capabilities
@@ -31,6 +33,10 @@ Before packaging your report, verify:
 ## Responsibilities
 - Provide a file list with brief context descriptions — fast and concise.
 - Target the right directories, skip noise.
+- For `hapo:develop`, scout PER ACTIVE TASK. Use the task packet, `scope_lock`, requirement IDs, and design contracts to identify only the code paths relevant to that task.
+- Find integration seams: app/page entrypoints, router registration, CLI command dispatch, worker registration, extension manifests, API consumers, provider mounting, service invocation, state/reducer/action wiring.
+- Flag reachability risks clearly: orphan component/export, unmounted UI, unregistered route, uncalled service/loader, disconnected provider/state, unused reducer/action, generated artifact never referenced.
+- Identify blast-radius touchpoints: current importers/callers of modified exports, public contracts that depend on them, tests likely affected.
 ## Core Skills
 - Summarize root config (README, package.json, turbo.json) to identify repo type.
@@ -42,10 +48,27 @@ Before packaging your report, verify:
 ```markdown
 # Inspect Report
+## Runtime Entrypoints / Callers
+- `path/to/App.tsx` — Why this is the feature entrypoint
+- `path/to/router.ts` — Route registration point
+## Integration Points
+- `path/to/provider.tsx` — Existing provider to mount/use
+- `path/to/service.ts` — Existing service call path
+## Prior Task Outputs / Reachability
+- `path/to/NewComponent.tsx` — imported by X | orphaned | intentionally internal for task Y
 ## Relevant Files
 - `path/to/file.ts` — Brief role description (e.g., Handles JWT Auth)
 - ...
+## Blast Radius / Dependents
+- `path/to/dependent.ts` — imports/calls changed symbol
+## Scope / Spec Risks
+- Missing entrypoint, orphan output, out-of-scope touch, stale contract, or "none"
 ## Identified Structure
 - (Monorepo or single app? Main libraries/frameworks detected)

package/src/claude/agents/spec-maker.md CHANGED Viewed

@@ -31,6 +31,7 @@ specs/<feature>/
 - `spec.json` is generated from `.claude/skills/specs/templates/spec-state.json`; never write `init.json` or `spec-state.json` into the spec directory.
 - Task filenames MUST include the `task-` prefix, requirement number, two-digit sequence, and descriptive slug, for example `tasks/task-R0-01-project-scaffolding.md`.
 - Do NOT write `hydration.md`; task hydration is session/task-state synchronization only.
+- Before setting `ready_for_implementation = true`, run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and fix every failure.
 ## Mental Models (How You Think)
@@ -125,23 +126,24 @@ Before writing `design.md`, select a discovery mode and record the reason:
 - Reject tasks outside `scope_lock.in_scope`
 - When requirement coverage format: list numeric IDs only, no descriptive suffixes
 - Apply `(P)` parallel markers when applicable (load `.claude/skills/specs/rules/tasks-parallel-analysis.md`)
-- Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, and negative-path checks.
+- Every task MUST use the compact implementation-ready shape: `Context`, `Steps`, `Requirements`, `Related Files`, `Completion Criteria`, `Evidence`, `Risk Assessment`.
+- `Evidence` MUST include exact commands, artifacts/runtime surfaces, runtime reachability proof, and negative-path checks. Existing specs may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
 - Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
-- Validation decisions that affect implementation MUST be written into implementation-facing sections (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`) rather than only `Risk Assessment`.
-### Sub-Task Detail Requirements (MANDATORY)
-Each task file MUST contain granular sub-tasks with the following structure:
-1. **Major steps** (`- [ ] 1. ...`) group related work by cohesion
-2. **Sub-tasks** (`- [ ] 1.1 ...`) describe specific actionable items (1-3 hours each)
-3. **Detail bullets** under each sub-task describe:
-   - Business logic and behavior to implement
-   - Edge cases and constraints
-   - Validation rules
-4. **Requirement mapping** (`_Requirements: X.X_`) at the end of EVERY sub-task — no exceptions
-5. **Test coverage section** as the last major step in every task, with unit + integration sub-tasks
-6. **Completion criteria** must be observable and testable — not subjective
-**FORBIDDEN**: Task files with only 3-5 top-level checkboxes and no sub-task breakdown. This level of detail is INSUFFICIENT for implementation.
+- UI/app/runtime workflows MUST include a final integration/reachability task or final integration section that names the real entrypoint and proves all scoped user-facing surfaces are wired.
+- Do not allow orphan task outputs: components, services, hooks, routes, commands, workers, providers, reducers, data loaders, and generated artifacts must be reachable now or assigned to a named later integration task.
+- Validation decisions that affect implementation MUST be written into implementation-facing sections (`Context`, `Steps`, `Requirements`, `Completion Criteria`, `Evidence`) rather than only `Risk Assessment`.
+### Task Detail Requirements (MANDATORY)
+Each task file MUST be compact but implementation-ready:
+1. `Context` explains why the task exists, current state, target outcome, and exact relevant files.
+2. `Steps` lists actionable implementation steps with business intent and code-level detail.
+3. `Requirements` lists the requirement IDs covered by this task.
+4. `Related Files` names exact paths and action type when known.
+5. `Completion Criteria` is observable and testable.
+6. `Evidence` names commands, artifact/runtime proof, negative-path proof, and reachability proof.
+7. `Risk Assessment` states real risks or `None identified`.
+**FORBIDDEN**: Vague task files with no exact files, no requirement mapping, or no evidence. Compact is good; vague is invalid.
 ## Research Phase
@@ -172,12 +174,14 @@ Before marking the spec ready:
 4. Fail if any path in `task_files` does not exist
 5. Fail if any on-disk task file is missing from `task_registry` or any registry path does not exist
 6. Fail if any task file path does not match `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`)
-7. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
-8. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
-9. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
-10. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
-11. Reject task files that use legacy non-numeric mappings like `NFR-1`
-12. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
+7. Fail if all task files are `R0` when the spec has more than two tasks
+8. Run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and treat non-zero exit as blocking
+9. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
+10. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
+11. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
+12. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
+13. Reject task files that use legacy non-numeric mappings like `NFR-1`
+14. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
 ## Execution Workflow Summary

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -11,14 +11,26 @@ You are a battle-hardened QA engineer who has been burned by production incident
 ## Task-Aware Inputs
-If the prompt includes task file paths, Completion Criteria, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
+If the prompt includes task file paths, Completion Criteria, Evidence, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
 Diff-aware test selection does NOT replace task-specific verification.
 If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
+If the prompt includes a feature name or `specs/<feature>`, load `spec.json`, `requirements.md`, `design.md`, and the active/recent task files. Treat `scope_lock`, Completion Criteria, and Evidence as the test contract.
+## Test Type Expectations
+Select tests by the task's touched surface:
+- Pure logic/data/parser/sort/filter/validator/regression work requires unit tests with negative-path coverage.
+- Stateful UI, context/store, API/service, persistence, or provider wiring requires component or integration proof.
+- Complete user workflows require E2E/UI-flow proof once the vertical slice exists.
+- Layout/theme/responsive work requires runtime visual checks, viewport checks, or screenshot proof when practical.
+- Interactive UI requires accessibility checks for focus, labels, roles, keyboard behavior, and ARIA when relevant.
+- Scaffold/config/release plumbing can pass with smoke proof when deeper behavior is not in scope.
+- Performance/security checks are required only when requirements, risk, or changed boundaries make them relevant.
 ## Command Resolution Order
 When the task file names exact commands, use this order:
-1. Run every exact executable command from `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`) in declaration order.
+1. Run every exact executable command from `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`) in declaration order.
 2. Run repo-default typecheck/test/build commands only to fill gaps not already covered above.
 3. Apply diff-aware test selection only after task-mandated commands are satisfied.
@@ -56,9 +68,11 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 4. **No-op Detection:** Parse runner output for executed test count. If the command exits 0 but runs 0 tests, report `NO_TESTS` instead of `PASS`.
 5. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
 6. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
-7. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
-8. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
-9. **Verdict:** Output structured report.
+7. **Runtime Reachability Audit:** For runtime-facing work, grep/read the declared entrypoint/caller and verify created components/services/routes/commands/workers/providers/loaders are imported, mounted, registered, or invoked. If a task output is orphaned, mark evidence FAIL.
+8. **Scope Coverage Audit:** Compare reachable behavior against scoped requirements/task criteria. Missing scoped behavior is FAIL; out-of-scope behavior is NEEDS_ATTENTION unless user-approved.
+9. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
+10. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
+11. **Verdict:** Output structured report.
 ## Supported Ecosystems
@@ -100,6 +114,12 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ### Task Evidence
 - [PASS|FAIL|UNVERIFIED] [verification item] → [proof or blocker]
+### Runtime Reachability
+- [PASS|FAIL|UNVERIFIED] `entrypoint/caller` reaches `artifact` → [proof or blocker]
+### Scope / Spec Coverage
+- [PASS|FAIL|NEEDS_ATTENTION] Scoped requirement/task criterion → [reachable proof or missing behavior]
 ### Unverified Items
 - [list anything that could not be executed or inspected]
@@ -120,6 +140,8 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
 - **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
 - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
+- **Runtime Reachability Missing = FAIL:** If a task created runtime-facing code but it is not imported, mounted, registered, invoked, or otherwise reachable from the declared entrypoint/caller, return FAIL.
+- **Scope Coverage Missing = FAIL:** If scoped requirements or task completion criteria are not exercised or inspectably reachable, return FAIL even when build/typecheck pass.
 - **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
 - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
 - **Zero-Test Green Is NO_TESTS:** If `npm test`, `pnpm test`, `pytest`, or an equivalent runner exits successfully while reporting 0 tests, treat it as `NO_TESTS`, not a passing suite.

package/src/claude/hooks/spec-state.cjs CHANGED Viewed

@@ -94,7 +94,8 @@ try {
   lines.push(`> Bạn PHẢI sử dụng công cụ Edit để cập nhật trạng thái vật lý sau khi đã có bằng chứng verify thật (build/test/runtime/artifact), không phải chỉ vì code đã viết xong.`);
   lines.push(`> 1. Sửa file \`spec.json\` (status, phase/current_phase, timestamps, \`task_files\`, \`task_registry\`, validation state nếu có thay đổi).`);
   lines.push(`> 2. Chỉ khi verify xong mới sửa file \`tasks/task-*.md\` (status + tick '[x]' các sub-task và completion criteria liên quan).`);
-  lines.push(`> 3. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
+  lines.push(`> 3. Trước khi set \`ready_for_implementation = true\`, PHẢI chạy \`node .claude/scripts/validate-spec-output.cjs specs/${featureName}\` và sửa mọi lỗi.`);
+  lines.push(`> 4. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
   lines.push(`> CẤM VI PHẠM LUẬT TOLLGATE NÀY NHẰM ĐẢM BẢO TÍNH ĐỒNG BỘ CỦA HỆ THỐNG.`);
   lines.push('');

package/src/claude/migration-manifest.json CHANGED Viewed

@@ -56,7 +56,8 @@
   "scripts": {
     "required": [
       "validate-docs.cjs",
-      "browser-tool.cjs"
+      "browser-tool.cjs",
+      "validate-spec-output.cjs"
     ]
   },
   "agentReferences": {

package/src/claude/rules/workflow.md CHANGED Viewed

@@ -15,11 +15,12 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
 - For non-trivial features, use `/hapo:specs` to create or validate the spec.
 - For approved specs, work one task file at a time.
 - Extract from the active task:
-  - `Objective`
-  - `Constraints`
+  - `Context`
+  - `Steps`
+  - `Requirements`
   - `Related Files`
   - `Completion Criteria`
-  - `Task Test Plan & Verification Evidence`
+  - `Evidence`
 - If these are missing or too vague to verify, route back to spec correction.
 ## 3. Execute
@@ -31,7 +32,7 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
 ## 4. Verify
-- Run exact commands from `Task Test Plan & Verification Evidence` first.
+- Run exact commands from `Evidence` first.
 - Then run repo-level lint/test/build as needed for confidence.
 - Use only fresh verification from the current run when claiming completion.
 - `PRECHECK_FAIL` outranks `NO_TESTS`.

package/src/claude/scripts/validate-spec-output.cjs ADDED Viewed

@@ -0,0 +1,271 @@
+#!/usr/bin/env node
+/**
+ * CafeKit spec artifact validator.
+ *
+ * This is intentionally deterministic. Prompt rules can drift; this script is
+ * the hard backstop before a spec is marked ready for implementation.
+ */
+const fs = require('fs');
+const path = require('path');
+const TASK_PATH_RE = /^tasks\/task-R\d+-\d{2}-[a-z0-9]+(?:-[a-z0-9]+)*\.md$/;
+const REQUIRED_REGISTRY_KEYS = [
+  'id',
+  'title',
+  'status',
+  'dependencies',
+  'blocker',
+  'started_at',
+  'completed_at',
+  'last_updated_at',
+];
+function usage() {
+  console.error('Usage: node .claude/scripts/validate-spec-output.cjs specs/<feature>');
+}
+function resolveSpecDir(input) {
+  if (!input) return null;
+  const cwd = process.cwd();
+  const direct = path.resolve(cwd, input);
+  if (fs.existsSync(direct)) return direct;
+  const viaSpecs = path.resolve(cwd, 'specs', input);
+  if (fs.existsSync(viaSpecs)) return viaSpecs;
+  return direct;
+}
+function readJson(filePath, errors) {
+  try {
+    return JSON.parse(fs.readFileSync(filePath, 'utf8'));
+  } catch (error) {
+    errors.push(`${filePath}: invalid JSON (${error.message})`);
+    return null;
+  }
+}
+function listTaskFiles(specDir) {
+  const tasksDir = path.join(specDir, 'tasks');
+  if (!fs.existsSync(tasksDir)) return [];
+  return fs
+    .readdirSync(tasksDir, { withFileTypes: true })
+    .filter((entry) => entry.isFile() && entry.name.endsWith('.md'))
+    .map((entry) => `tasks/${entry.name}`)
+    .sort();
+}
+function hasHeading(content, heading) {
+  const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+  return new RegExp(`^##\\s+${escaped}\\s*$`, 'm').test(content);
+}
+function extractRequirementIds(requirementsText) {
+  const ids = new Set();
+  const headingRe = /^#{2,4}\s+(?:(?:Requirement)\s+)?((?:REQ-\d+)|(?:R\d+))\b/gim;
+  let match;
+  while ((match = headingRe.exec(requirementsText)) !== null) {
+    ids.add(match[1].toUpperCase());
+  }
+  const numericRequirementRe = /^#{2,4}\s+(?:Requirement\s+)?(\d+)(?=[:.\s-])/gim;
+  while ((match = numericRequirementRe.exec(requirementsText)) !== null) {
+    ids.add(`R${match[1]}`);
+  }
+  const bracketRe = /\[((?:REQ-\d+)|(?:R\d+))\]/gi;
+  while ((match = bracketRe.exec(requirementsText)) !== null) {
+    ids.add(match[1].toUpperCase());
+  }
+  return [...ids].filter((id) => id !== 'R0').sort();
+}
+function validateTaskSections(taskPath, content, errors) {
+  const hasContext =
+    hasHeading(content, 'Context') ||
+    hasHeading(content, 'Objective') ||
+    hasHeading(content, 'Goal');
+  const hasSteps =
+    hasHeading(content, 'Steps') || hasHeading(content, 'Implementation Steps');
+  const hasRequirements =
+    hasHeading(content, 'Requirements') || /^\*\*Requirement:\*\*/m.test(content);
+  const hasEvidence =
+    hasHeading(content, 'Evidence') ||
+    hasHeading(content, 'Task Test Plan & Verification Evidence') ||
+    hasHeading(content, 'Verification & Evidence');
+  if (!hasContext) errors.push(`${taskPath}: missing Context/Objective/Goal`);
+  if (!hasSteps) errors.push(`${taskPath}: missing Steps/Implementation Steps`);
+  if (!hasRequirements) errors.push(`${taskPath}: missing Requirements mapping`);
+  if (!hasEvidence) errors.push(`${taskPath}: missing Evidence or task test plan`);
+}
+function validateSpec(specDir) {
+  const errors = [];
+  const warnings = [];
+  const specJsonPath = path.join(specDir, 'spec.json');
+  if (!fs.existsSync(specDir)) {
+    errors.push(`${specDir}: spec directory does not exist`);
+    return { errors, warnings };
+  }
+  for (const forbidden of ['init.json', 'spec-state.json', 'hydration.md']) {
+    if (fs.existsSync(path.join(specDir, forbidden))) {
+      errors.push(`${forbidden}: forbidden generated artifact`);
+    }
+  }
+  if (!fs.existsSync(specJsonPath)) {
+    errors.push('spec.json: missing');
+    return { errors, warnings };
+  }
+  const spec = readJson(specJsonPath, errors);
+  if (!spec) return { errors, warnings };
+  if (!spec.scope_lock || typeof spec.scope_lock !== 'object' || Array.isArray(spec.scope_lock)) {
+    errors.push('spec.json.scope_lock: must be an object, not a boolean or array');
+  }
+  const taskFiles = listTaskFiles(specDir);
+  const taskFileSet = new Set(taskFiles);
+  if (!Array.isArray(spec.task_files)) {
+    errors.push('spec.json.task_files: missing array');
+    if (Array.isArray(spec.tasks)) {
+      errors.push('spec.json.tasks: legacy field detected; use task_files');
+    }
+  } else {
+    const declared = [...spec.task_files].sort();
+    if (JSON.stringify(declared) !== JSON.stringify(taskFiles)) {
+      errors.push('spec.json.task_files: must exactly match files under tasks/');
+      warnings.push(`expected task_files=${JSON.stringify(taskFiles)}`);
+    }
+  }
+  if (!spec.task_registry || typeof spec.task_registry !== 'object' || Array.isArray(spec.task_registry)) {
+    errors.push('spec.json.task_registry: missing object keyed by task file path');
+  } else {
+    const registryKeys = Object.keys(spec.task_registry).sort();
+    if (JSON.stringify(registryKeys) !== JSON.stringify(taskFiles)) {
+      errors.push('spec.json.task_registry: keys must exactly match task file paths');
+    }
+    for (const [registryPath, entry] of Object.entries(spec.task_registry)) {
+      if (!taskFileSet.has(registryPath)) {
+        errors.push(`spec.json.task_registry.${registryPath}: no matching task file`);
+      }
+      for (const key of REQUIRED_REGISTRY_KEYS) {
+        if (!(key in (entry || {}))) {
+          errors.push(`spec.json.task_registry.${registryPath}: missing ${key}`);
+        }
+      }
+      if (entry && !Array.isArray(entry.dependencies)) {
+        errors.push(`spec.json.task_registry.${registryPath}.dependencies: must be an array`);
+      }
+      for (const dep of entry?.dependencies || []) {
+        if (!taskFileSet.has(dep)) {
+          errors.push(`spec.json.task_registry.${registryPath}.dependencies: unknown dependency ${dep}`);
+        }
+      }
+    }
+  }
+  for (const taskFile of taskFiles) {
+    if (!TASK_PATH_RE.test(taskFile)) {
+      errors.push(`${taskFile}: must match tasks/task-R{N}-{SEQ}-<slug>.md with two-digit SEQ`);
+    }
+  }
+  if (taskFiles.length > 2 && taskFiles.every((taskFile) => /^tasks\/task-R0-/.test(taskFile))) {
+    errors.push('tasks/: feature work cannot be entirely R0; reserve R0 for shared foundation tasks');
+  }
+  const requirementsPath = path.join(specDir, 'requirements.md');
+  const designPath = path.join(specDir, 'design.md');
+  const researchPath = path.join(specDir, 'research.md');
+  if (!fs.existsSync(requirementsPath)) errors.push('requirements.md: missing');
+  if (!fs.existsSync(designPath)) errors.push('design.md: missing');
+  if (taskFiles.length > 0) {
+    if (!fs.existsSync(researchPath)) {
+      errors.push('research.md: missing Evidence Summary for non-trivial spec');
+    } else {
+      const research = fs.readFileSync(researchPath, 'utf8');
+      if (!/^##\s+Evidence Summary\s*$/m.test(research)) {
+        errors.push('research.md: missing ## Evidence Summary');
+      }
+    }
+  }
+  let requirementIds = [];
+  if (fs.existsSync(requirementsPath)) {
+    requirementIds = extractRequirementIds(fs.readFileSync(requirementsPath, 'utf8'));
+  }
+  const coveredRequirementIds = new Set();
+  for (const taskFile of taskFiles) {
+    const fullPath = path.join(specDir, taskFile);
+    const content = fs.readFileSync(fullPath, 'utf8');
+    validateTaskSections(taskFile, content, errors);
+    const idRe = /\b((?:REQ-\d+)|(?:R\d+))\b/gi;
+    let match;
+    while ((match = idRe.exec(content)) !== null) {
+      const id = match[1].toUpperCase();
+      if (id !== 'R0') coveredRequirementIds.add(id);
+    }
+    const numericMappingRe = /_Requirements:\s*([^_\n]+)_/gi;
+    while ((match = numericMappingRe.exec(content)) !== null) {
+      for (const token of match[1].split(',')) {
+        const number = token.trim().match(/^(\d+)(?:\.\d+)?$/);
+        if (number) coveredRequirementIds.add(`R${number[1]}`);
+      }
+    }
+  }
+  for (const requirementId of requirementIds) {
+    if (!coveredRequirementIds.has(requirementId)) {
+      errors.push(`requirements.md:${requirementId}: not covered by any task`);
+    }
+  }
+  if (spec.ready_for_implementation === true && errors.length > 0) {
+    errors.push('spec.json.ready_for_implementation: cannot be true while validator errors exist');
+  }
+  return { errors, warnings };
+}
+function main() {
+  const specDir = resolveSpecDir(process.argv[2]);
+  if (!specDir) {
+    usage();
+    process.exit(2);
+  }
+  const { errors, warnings } = validateSpec(specDir);
+  for (const warning of warnings) {
+    console.warn(`[WARN] ${warning}`);
+  }
+  if (errors.length > 0) {
+    console.error(`FAIL ${path.relative(process.cwd(), specDir) || specDir}`);
+    for (const error of errors) {
+      console.error(`- ${error}`);
+    }
+    process.exit(1);
+  }
+  console.log(`PASS ${path.relative(process.cwd(), specDir) || specDir}`);
+}
+main();

package/src/claude/skills/code-review/references/spec-compliance-review.md CHANGED Viewed

@@ -24,7 +24,7 @@ Do not attempt a standard text-based review if the project includes Visual Specs
 3. If NO (Markdown Spec only): Read the spec directly and extract:
    - requirement bullets
    - task `Completion Criteria`
-   - task `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`)
+   - task `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`)
    - canonical contracts/invariants from `design.md`
    Then verify the changed files against those concrete obligations.