@haposoft/cafekit 0.8.7 → 0.8.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@haposoft/cafekit",
3
- "version": "0.8.7",
3
+ "version": "0.8.9",
4
4
  "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
5
5
  "author": "Haposoft <nghialt@haposoft.com>",
6
6
  "license": "MIT",
@@ -37,7 +37,7 @@ These rules reduce common agent coding failures: hidden assumptions, overbuilt s
37
37
  ### 4. Goal-Driven Execution
38
38
 
39
39
  - Convert requests into verifiable success criteria.
40
- - For spec tasks, use `Completion Criteria` and `Task Test Plan & Verification Evidence` as the source of truth.
40
+ - For spec tasks, use `Completion Criteria` and `Evidence` as the source of truth. Existing task files may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
41
41
  - For bugs, reproduce with a failing test or concrete evidence when feasible before fixing.
42
42
  - Loop until verification passes or a real blocker is recorded.
43
43
 
@@ -65,7 +65,7 @@ Use this loop for non-trivial work:
65
65
  A task is done only when all apply:
66
66
 
67
67
  - implementation satisfies `Completion Criteria`
68
- - `Task Test Plan & Verification Evidence` is satisfied with concrete proof
68
+ - `Evidence` is satisfied with concrete proof
69
69
  - preflight/build/test outcomes are passing or an explicit blocker is recorded
70
70
  - code review has no critical issues
71
71
  - a verification receipt exists before task state is synced to `done`
@@ -16,16 +16,20 @@ You DO NOT fix code. You only READ, SCORE, and REPORT.
16
16
  ## Pre-Review: Task / Spec Compliance (MANDATORY)
17
17
 
18
18
  If the prompt includes task file paths, requirement IDs, completion criteria, or design contracts, you MUST read them before reviewing code.
19
+ If the prompt says `SPEC COMPLIANCE REVIEW ONLY`, do not perform a general quality review yet. First prove the implementation matches the active task, `scope_lock`, requirements, design contracts, and scout-discovered runtime entrypoints.
20
+ Do NOT trust implementer reports. Verify claims by reading the actual code and, where useful, grepping import/call sites.
19
21
 
20
22
  Extract and verify:
21
23
  1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
22
24
  2. Declared task scope (`Related Files` and direct support files that are clearly justified)
23
25
  3. Completion Criteria
24
- 4. Task Test Plan & Verification Evidence expectations (or legacy Verification & Evidence)
26
+ 4. Task Evidence expectations (or Task Test Plan & Verification Evidence / legacy Verification & Evidence)
25
27
  5. Canonical Contracts & Invariants from the design
26
28
  6. Named technologies and runtime choices that the task/spec explicitly requires
29
+ 7. Runtime entrypoints/callers and reachability obligations from task evidence or the task-aware scout report
27
30
 
28
31
  Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
32
+ Any scoped behavior omitted, unapproved behavior added, orphaned component/service/route/command/worker/provider/reducer, unmounted UI, unregistered route, uncalled loader/service, or unreachable runtime surface is a **Critical** issue even if tests/build pass.
29
33
  If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
30
34
 
31
35
  ## Pre-Review: Blast Radius Check (MANDATORY)
@@ -61,6 +65,8 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
61
65
  - Hunt serious logic bugs (crashes, data loss, infinite loops).
62
66
  - Hunt severe architecture violations (circular imports, cross-layer coupling).
63
67
  - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
68
+ - Hunt reachability failures: created exports with no importers, UI not mounted, route not registered, service/data loader never called, provider never wrapping consumers, reducer/action disconnected from runtime state, CLI/worker/manifest not wired.
69
+ - Hunt scope drift: accepted requirement omitted or out-of-scope behavior added without spec amendment.
64
70
  - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
65
71
  - Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
66
72
  - Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
@@ -131,6 +137,8 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
131
137
 
132
138
  **Automatic Criticals:**
133
139
  - Missing required entrypoint/artifact/runtime output named in the task/spec
140
+ - Runtime-facing artifact exists only as orphaned or unreachable code: component/export unused, UI unmounted, route unregistered, service/loader uncalled, provider not mounted, reducer/action disconnected, command/worker/manifest not wired
141
+ - Missing scoped acceptance criteria or behavior outside `scope_lock` without a spec amendment
134
142
  - Placeholder scaffolding marked as complete when the task demanded real wiring
135
143
  - Auth/session/transport/persistence behavior that contradicts the design contracts
136
144
  - Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification
@@ -8,7 +8,7 @@ model: haiku
8
8
  # Inspect — Codebase Scout
9
9
 
10
10
  You hold two primary roles depending on when you are called:
11
- 1. **Architecture Scout (Pre-coding):** Quickly map out directory trees to identify the EXACT FILES relevant to a new feature.
11
+ 1. **Task-Aware Architecture Scout (Pre-coding):** Quickly map out directory trees, runtime entrypoints, integration points, and exact files relevant to the active task.
12
12
  2. **Edge Case Scout (Code Review phase):** Quickly grep and scan the codebase to find where modified functions/components are imported elsewhere. You hunt for hidden side-effects and boundary errors to inform the `code-auditor`.
13
13
 
14
14
  You scout. You DO NOT analyze bugs deeply and you NEVER modify code.
@@ -21,6 +21,8 @@ Before packaging your report, verify:
21
21
  - [ ] Followed the 2-Phase rule: (Phase 1) Quick scan via `Glob`/`ls` for root structure. (Phase 2) Read specific files to narrow down scope.
22
22
  - [ ] Did NOT dump thousands of files. Only reported CORE relevant files.
23
23
  - [ ] Noted the layer/tier of each file (e.g., API files = backend, Component files = frontend).
24
+ - [ ] Identified the runtime entrypoint/caller for runtime-facing work, or explicitly reported that it could not be determined.
25
+ - [ ] Checked whether prior task outputs are currently imported, mounted, registered, invoked, or still orphaned.
24
26
  - [ ] Report is Short, Solid, and Sharp.
25
27
 
26
28
  ## Capabilities
@@ -31,6 +33,10 @@ Before packaging your report, verify:
31
33
  ## Responsibilities
32
34
  - Provide a file list with brief context descriptions — fast and concise.
33
35
  - Target the right directories, skip noise.
36
+ - For `hapo:develop`, scout PER ACTIVE TASK. Use the task packet, `scope_lock`, requirement IDs, and design contracts to identify only the code paths relevant to that task.
37
+ - Find integration seams: app/page entrypoints, router registration, CLI command dispatch, worker registration, extension manifests, API consumers, provider mounting, service invocation, state/reducer/action wiring.
38
+ - Flag reachability risks clearly: orphan component/export, unmounted UI, unregistered route, uncalled service/loader, disconnected provider/state, unused reducer/action, generated artifact never referenced.
39
+ - Identify blast-radius touchpoints: current importers/callers of modified exports, public contracts that depend on them, tests likely affected.
34
40
 
35
41
  ## Core Skills
36
42
  - Summarize root config (README, package.json, turbo.json) to identify repo type.
@@ -42,10 +48,27 @@ Before packaging your report, verify:
42
48
  ```markdown
43
49
  # Inspect Report
44
50
 
51
+ ## Runtime Entrypoints / Callers
52
+ - `path/to/App.tsx` — Why this is the feature entrypoint
53
+ - `path/to/router.ts` — Route registration point
54
+
55
+ ## Integration Points
56
+ - `path/to/provider.tsx` — Existing provider to mount/use
57
+ - `path/to/service.ts` — Existing service call path
58
+
59
+ ## Prior Task Outputs / Reachability
60
+ - `path/to/NewComponent.tsx` — imported by X | orphaned | intentionally internal for task Y
61
+
45
62
  ## Relevant Files
46
63
  - `path/to/file.ts` — Brief role description (e.g., Handles JWT Auth)
47
64
  - ...
48
65
 
66
+ ## Blast Radius / Dependents
67
+ - `path/to/dependent.ts` — imports/calls changed symbol
68
+
69
+ ## Scope / Spec Risks
70
+ - Missing entrypoint, orphan output, out-of-scope touch, stale contract, or "none"
71
+
49
72
  ## Identified Structure
50
73
  - (Monorepo or single app? Main libraries/frameworks detected)
51
74
 
@@ -31,6 +31,7 @@ specs/<feature>/
31
31
  - `spec.json` is generated from `.claude/skills/specs/templates/spec-state.json`; never write `init.json` or `spec-state.json` into the spec directory.
32
32
  - Task filenames MUST include the `task-` prefix, requirement number, two-digit sequence, and descriptive slug, for example `tasks/task-R0-01-project-scaffolding.md`.
33
33
  - Do NOT write `hydration.md`; task hydration is session/task-state synchronization only.
34
+ - Before setting `ready_for_implementation = true`, run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and fix every failure.
34
35
 
35
36
  ## Mental Models (How You Think)
36
37
 
@@ -125,23 +126,24 @@ Before writing `design.md`, select a discovery mode and record the reason:
125
126
  - Reject tasks outside `scope_lock.in_scope`
126
127
  - When requirement coverage format: list numeric IDs only, no descriptive suffixes
127
128
  - Apply `(P)` parallel markers when applicable (load `.claude/skills/specs/rules/tasks-parallel-analysis.md`)
128
- - Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, and negative-path checks.
129
+ - Every task MUST use the compact implementation-ready shape: `Context`, `Steps`, `Requirements`, `Related Files`, `Completion Criteria`, `Evidence`, `Risk Assessment`.
130
+ - `Evidence` MUST include exact commands, artifacts/runtime surfaces, runtime reachability proof, and negative-path checks. Existing specs may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
129
131
  - Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
130
- - Validation decisions that affect implementation MUST be written into implementation-facing sections (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`) rather than only `Risk Assessment`.
131
-
132
- ### Sub-Task Detail Requirements (MANDATORY)
133
- Each task file MUST contain granular sub-tasks with the following structure:
134
- 1. **Major steps** (`- [ ] 1. ...`) group related work by cohesion
135
- 2. **Sub-tasks** (`- [ ] 1.1 ...`) describe specific actionable items (1-3 hours each)
136
- 3. **Detail bullets** under each sub-task describe:
137
- - Business logic and behavior to implement
138
- - Edge cases and constraints
139
- - Validation rules
140
- 4. **Requirement mapping** (`_Requirements: X.X_`) at the end of EVERY sub-task — no exceptions
141
- 5. **Test coverage section** as the last major step in every task, with unit + integration sub-tasks
142
- 6. **Completion criteria** must be observable and testable — not subjective
143
-
144
- **FORBIDDEN**: Task files with only 3-5 top-level checkboxes and no sub-task breakdown. This level of detail is INSUFFICIENT for implementation.
132
+ - UI/app/runtime workflows MUST include a final integration/reachability task or final integration section that names the real entrypoint and proves all scoped user-facing surfaces are wired.
133
+ - Do not allow orphan task outputs: components, services, hooks, routes, commands, workers, providers, reducers, data loaders, and generated artifacts must be reachable now or assigned to a named later integration task.
134
+ - Validation decisions that affect implementation MUST be written into implementation-facing sections (`Context`, `Steps`, `Requirements`, `Completion Criteria`, `Evidence`) rather than only `Risk Assessment`.
135
+
136
+ ### Task Detail Requirements (MANDATORY)
137
+ Each task file MUST be compact but implementation-ready:
138
+ 1. `Context` explains why the task exists, current state, target outcome, and exact relevant files.
139
+ 2. `Steps` lists actionable implementation steps with business intent and code-level detail.
140
+ 3. `Requirements` lists the requirement IDs covered by this task.
141
+ 4. `Related Files` names exact paths and action type when known.
142
+ 5. `Completion Criteria` is observable and testable.
143
+ 6. `Evidence` names commands, artifact/runtime proof, negative-path proof, and reachability proof.
144
+ 7. `Risk Assessment` states real risks or `None identified`.
145
+
146
+ **FORBIDDEN**: Vague task files with no exact files, no requirement mapping, or no evidence. Compact is good; vague is invalid.
145
147
 
146
148
  ## Research Phase
147
149
 
@@ -172,12 +174,14 @@ Before marking the spec ready:
172
174
  4. Fail if any path in `task_files` does not exist
173
175
  5. Fail if any on-disk task file is missing from `task_registry` or any registry path does not exist
174
176
  6. Fail if any task file path does not match `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`)
175
- 7. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
176
- 8. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
177
- 9. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
178
- 10. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
179
- 11. Reject task files that use legacy non-numeric mappings like `NFR-1`
180
- 12. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
177
+ 7. Fail if all task files are `R0` when the spec has more than two tasks
178
+ 8. Run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and treat non-zero exit as blocking
179
+ 9. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
180
+ 10. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
181
+ 11. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
182
+ 12. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
183
+ 13. Reject task files that use legacy non-numeric mappings like `NFR-1`
184
+ 14. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
181
185
 
182
186
  ## Execution Workflow Summary
183
187
 
@@ -11,14 +11,26 @@ You are a battle-hardened QA engineer who has been burned by production incident
11
11
 
12
12
  ## Task-Aware Inputs
13
13
 
14
- If the prompt includes task file paths, Completion Criteria, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
14
+ If the prompt includes task file paths, Completion Criteria, Evidence, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
15
15
  Diff-aware test selection does NOT replace task-specific verification.
16
16
  If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
17
+ If the prompt includes a feature name or `specs/<feature>`, load `spec.json`, `requirements.md`, `design.md`, and the active/recent task files. Treat `scope_lock`, Completion Criteria, and Evidence as the test contract.
18
+
19
+ ## Test Type Expectations
20
+
21
+ Select tests by the task's touched surface:
22
+ - Pure logic/data/parser/sort/filter/validator/regression work requires unit tests with negative-path coverage.
23
+ - Stateful UI, context/store, API/service, persistence, or provider wiring requires component or integration proof.
24
+ - Complete user workflows require E2E/UI-flow proof once the vertical slice exists.
25
+ - Layout/theme/responsive work requires runtime visual checks, viewport checks, or screenshot proof when practical.
26
+ - Interactive UI requires accessibility checks for focus, labels, roles, keyboard behavior, and ARIA when relevant.
27
+ - Scaffold/config/release plumbing can pass with smoke proof when deeper behavior is not in scope.
28
+ - Performance/security checks are required only when requirements, risk, or changed boundaries make them relevant.
17
29
 
18
30
  ## Command Resolution Order
19
31
 
20
32
  When the task file names exact commands, use this order:
21
- 1. Run every exact executable command from `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`) in declaration order.
33
+ 1. Run every exact executable command from `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`) in declaration order.
22
34
  2. Run repo-default typecheck/test/build commands only to fill gaps not already covered above.
23
35
  3. Apply diff-aware test selection only after task-mandated commands are satisfied.
24
36
 
@@ -56,9 +68,11 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
56
68
  4. **No-op Detection:** Parse runner output for executed test count. If the command exits 0 but runs 0 tests, report `NO_TESTS` instead of `PASS`.
57
69
  5. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
58
70
  6. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
59
- 7. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
60
- 8. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
61
- 9. **Verdict:** Output structured report.
71
+ 7. **Runtime Reachability Audit:** For runtime-facing work, grep/read the declared entrypoint/caller and verify created components/services/routes/commands/workers/providers/loaders are imported, mounted, registered, or invoked. If a task output is orphaned, mark evidence FAIL.
72
+ 8. **Scope Coverage Audit:** Compare reachable behavior against scoped requirements/task criteria. Missing scoped behavior is FAIL; out-of-scope behavior is NEEDS_ATTENTION unless user-approved.
73
+ 9. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
74
+ 10. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
75
+ 11. **Verdict:** Output structured report.
62
76
 
63
77
  ## Supported Ecosystems
64
78
 
@@ -100,6 +114,12 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
100
114
  ### Task Evidence
101
115
  - [PASS|FAIL|UNVERIFIED] [verification item] → [proof or blocker]
102
116
 
117
+ ### Runtime Reachability
118
+ - [PASS|FAIL|UNVERIFIED] `entrypoint/caller` reaches `artifact` → [proof or blocker]
119
+
120
+ ### Scope / Spec Coverage
121
+ - [PASS|FAIL|NEEDS_ATTENTION] Scoped requirement/task criterion → [reachable proof or missing behavior]
122
+
103
123
  ### Unverified Items
104
124
  - [list anything that could not be executed or inspected]
105
125
 
@@ -120,6 +140,8 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
120
140
  - **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
121
141
  - **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
122
142
  - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
143
+ - **Runtime Reachability Missing = FAIL:** If a task created runtime-facing code but it is not imported, mounted, registered, invoked, or otherwise reachable from the declared entrypoint/caller, return FAIL.
144
+ - **Scope Coverage Missing = FAIL:** If scoped requirements or task completion criteria are not exercised or inspectably reachable, return FAIL even when build/typecheck pass.
123
145
  - **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
124
146
  - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
125
147
  - **Zero-Test Green Is NO_TESTS:** If `npm test`, `pnpm test`, `pytest`, or an equivalent runner exits successfully while reporting 0 tests, treat it as `NO_TESTS`, not a passing suite.
@@ -94,7 +94,8 @@ try {
94
94
  lines.push(`> Bạn PHẢI sử dụng công cụ Edit để cập nhật trạng thái vật lý sau khi đã có bằng chứng verify thật (build/test/runtime/artifact), không phải chỉ vì code đã viết xong.`);
95
95
  lines.push(`> 1. Sửa file \`spec.json\` (status, phase/current_phase, timestamps, \`task_files\`, \`task_registry\`, validation state nếu có thay đổi).`);
96
96
  lines.push(`> 2. Chỉ khi verify xong mới sửa file \`tasks/task-*.md\` (status + tick '[x]' các sub-task và completion criteria liên quan).`);
97
- lines.push(`> 3. NẾU VỪA HOÀN THÀNH 1 TASK SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
97
+ lines.push(`> 3. Trước khi set \`ready_for_implementation = true\`, PHẢI chạy \`node .claude/scripts/validate-spec-output.cjs specs/${featureName}\` sửa mọi lỗi.`);
98
+ lines.push(`> 4. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
98
99
  lines.push(`> CẤM VI PHẠM LUẬT TOLLGATE NÀY NHẰM ĐẢM BẢO TÍNH ĐỒNG BỘ CỦA HỆ THỐNG.`);
99
100
  lines.push('');
100
101
 
@@ -56,7 +56,8 @@
56
56
  "scripts": {
57
57
  "required": [
58
58
  "validate-docs.cjs",
59
- "browser-tool.cjs"
59
+ "browser-tool.cjs",
60
+ "validate-spec-output.cjs"
60
61
  ]
61
62
  },
62
63
  "agentReferences": {
@@ -15,11 +15,12 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
15
15
  - For non-trivial features, use `/hapo:specs` to create or validate the spec.
16
16
  - For approved specs, work one task file at a time.
17
17
  - Extract from the active task:
18
- - `Objective`
19
- - `Constraints`
18
+ - `Context`
19
+ - `Steps`
20
+ - `Requirements`
20
21
  - `Related Files`
21
22
  - `Completion Criteria`
22
- - `Task Test Plan & Verification Evidence`
23
+ - `Evidence`
23
24
  - If these are missing or too vague to verify, route back to spec correction.
24
25
 
25
26
  ## 3. Execute
@@ -31,7 +32,7 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
31
32
 
32
33
  ## 4. Verify
33
34
 
34
- - Run exact commands from `Task Test Plan & Verification Evidence` first.
35
+ - Run exact commands from `Evidence` first.
35
36
  - Then run repo-level lint/test/build as needed for confidence.
36
37
  - Use only fresh verification from the current run when claiming completion.
37
38
  - `PRECHECK_FAIL` outranks `NO_TESTS`.
@@ -0,0 +1,271 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * CafeKit spec artifact validator.
4
+ *
5
+ * This is intentionally deterministic. Prompt rules can drift; this script is
6
+ * the hard backstop before a spec is marked ready for implementation.
7
+ */
8
+
9
+ const fs = require('fs');
10
+ const path = require('path');
11
+
12
+ const TASK_PATH_RE = /^tasks\/task-R\d+-\d{2}-[a-z0-9]+(?:-[a-z0-9]+)*\.md$/;
13
+ const REQUIRED_REGISTRY_KEYS = [
14
+ 'id',
15
+ 'title',
16
+ 'status',
17
+ 'dependencies',
18
+ 'blocker',
19
+ 'started_at',
20
+ 'completed_at',
21
+ 'last_updated_at',
22
+ ];
23
+
24
+ function usage() {
25
+ console.error('Usage: node .claude/scripts/validate-spec-output.cjs specs/<feature>');
26
+ }
27
+
28
+ function resolveSpecDir(input) {
29
+ if (!input) return null;
30
+
31
+ const cwd = process.cwd();
32
+ const direct = path.resolve(cwd, input);
33
+ if (fs.existsSync(direct)) return direct;
34
+
35
+ const viaSpecs = path.resolve(cwd, 'specs', input);
36
+ if (fs.existsSync(viaSpecs)) return viaSpecs;
37
+
38
+ return direct;
39
+ }
40
+
41
+ function readJson(filePath, errors) {
42
+ try {
43
+ return JSON.parse(fs.readFileSync(filePath, 'utf8'));
44
+ } catch (error) {
45
+ errors.push(`${filePath}: invalid JSON (${error.message})`);
46
+ return null;
47
+ }
48
+ }
49
+
50
+ function listTaskFiles(specDir) {
51
+ const tasksDir = path.join(specDir, 'tasks');
52
+ if (!fs.existsSync(tasksDir)) return [];
53
+
54
+ return fs
55
+ .readdirSync(tasksDir, { withFileTypes: true })
56
+ .filter((entry) => entry.isFile() && entry.name.endsWith('.md'))
57
+ .map((entry) => `tasks/${entry.name}`)
58
+ .sort();
59
+ }
60
+
61
+ function hasHeading(content, heading) {
62
+ const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
63
+ return new RegExp(`^##\\s+${escaped}\\s*$`, 'm').test(content);
64
+ }
65
+
66
+ function extractRequirementIds(requirementsText) {
67
+ const ids = new Set();
68
+ const headingRe = /^#{2,4}\s+(?:(?:Requirement)\s+)?((?:REQ-\d+)|(?:R\d+))\b/gim;
69
+ let match;
70
+
71
+ while ((match = headingRe.exec(requirementsText)) !== null) {
72
+ ids.add(match[1].toUpperCase());
73
+ }
74
+
75
+ const numericRequirementRe = /^#{2,4}\s+(?:Requirement\s+)?(\d+)(?=[:.\s-])/gim;
76
+ while ((match = numericRequirementRe.exec(requirementsText)) !== null) {
77
+ ids.add(`R${match[1]}`);
78
+ }
79
+
80
+ const bracketRe = /\[((?:REQ-\d+)|(?:R\d+))\]/gi;
81
+ while ((match = bracketRe.exec(requirementsText)) !== null) {
82
+ ids.add(match[1].toUpperCase());
83
+ }
84
+
85
+ return [...ids].filter((id) => id !== 'R0').sort();
86
+ }
87
+
88
+ function validateTaskSections(taskPath, content, errors) {
89
+ const hasContext =
90
+ hasHeading(content, 'Context') ||
91
+ hasHeading(content, 'Objective') ||
92
+ hasHeading(content, 'Goal');
93
+ const hasSteps =
94
+ hasHeading(content, 'Steps') || hasHeading(content, 'Implementation Steps');
95
+ const hasRequirements =
96
+ hasHeading(content, 'Requirements') || /^\*\*Requirement:\*\*/m.test(content);
97
+ const hasEvidence =
98
+ hasHeading(content, 'Evidence') ||
99
+ hasHeading(content, 'Task Test Plan & Verification Evidence') ||
100
+ hasHeading(content, 'Verification & Evidence');
101
+
102
+ if (!hasContext) errors.push(`${taskPath}: missing Context/Objective/Goal`);
103
+ if (!hasSteps) errors.push(`${taskPath}: missing Steps/Implementation Steps`);
104
+ if (!hasRequirements) errors.push(`${taskPath}: missing Requirements mapping`);
105
+ if (!hasEvidence) errors.push(`${taskPath}: missing Evidence or task test plan`);
106
+ }
107
+
108
+ function validateSpec(specDir) {
109
+ const errors = [];
110
+ const warnings = [];
111
+ const specJsonPath = path.join(specDir, 'spec.json');
112
+
113
+ if (!fs.existsSync(specDir)) {
114
+ errors.push(`${specDir}: spec directory does not exist`);
115
+ return { errors, warnings };
116
+ }
117
+
118
+ for (const forbidden of ['init.json', 'spec-state.json', 'hydration.md']) {
119
+ if (fs.existsSync(path.join(specDir, forbidden))) {
120
+ errors.push(`${forbidden}: forbidden generated artifact`);
121
+ }
122
+ }
123
+
124
+ if (!fs.existsSync(specJsonPath)) {
125
+ errors.push('spec.json: missing');
126
+ return { errors, warnings };
127
+ }
128
+
129
+ const spec = readJson(specJsonPath, errors);
130
+ if (!spec) return { errors, warnings };
131
+
132
+ if (!spec.scope_lock || typeof spec.scope_lock !== 'object' || Array.isArray(spec.scope_lock)) {
133
+ errors.push('spec.json.scope_lock: must be an object, not a boolean or array');
134
+ }
135
+
136
+ const taskFiles = listTaskFiles(specDir);
137
+ const taskFileSet = new Set(taskFiles);
138
+
139
+ if (!Array.isArray(spec.task_files)) {
140
+ errors.push('spec.json.task_files: missing array');
141
+ if (Array.isArray(spec.tasks)) {
142
+ errors.push('spec.json.tasks: legacy field detected; use task_files');
143
+ }
144
+ } else {
145
+ const declared = [...spec.task_files].sort();
146
+ if (JSON.stringify(declared) !== JSON.stringify(taskFiles)) {
147
+ errors.push('spec.json.task_files: must exactly match files under tasks/');
148
+ warnings.push(`expected task_files=${JSON.stringify(taskFiles)}`);
149
+ }
150
+ }
151
+
152
+ if (!spec.task_registry || typeof spec.task_registry !== 'object' || Array.isArray(spec.task_registry)) {
153
+ errors.push('spec.json.task_registry: missing object keyed by task file path');
154
+ } else {
155
+ const registryKeys = Object.keys(spec.task_registry).sort();
156
+ if (JSON.stringify(registryKeys) !== JSON.stringify(taskFiles)) {
157
+ errors.push('spec.json.task_registry: keys must exactly match task file paths');
158
+ }
159
+
160
+ for (const [registryPath, entry] of Object.entries(spec.task_registry)) {
161
+ if (!taskFileSet.has(registryPath)) {
162
+ errors.push(`spec.json.task_registry.${registryPath}: no matching task file`);
163
+ }
164
+ for (const key of REQUIRED_REGISTRY_KEYS) {
165
+ if (!(key in (entry || {}))) {
166
+ errors.push(`spec.json.task_registry.${registryPath}: missing ${key}`);
167
+ }
168
+ }
169
+ if (entry && !Array.isArray(entry.dependencies)) {
170
+ errors.push(`spec.json.task_registry.${registryPath}.dependencies: must be an array`);
171
+ }
172
+ for (const dep of entry?.dependencies || []) {
173
+ if (!taskFileSet.has(dep)) {
174
+ errors.push(`spec.json.task_registry.${registryPath}.dependencies: unknown dependency ${dep}`);
175
+ }
176
+ }
177
+ }
178
+ }
179
+
180
+ for (const taskFile of taskFiles) {
181
+ if (!TASK_PATH_RE.test(taskFile)) {
182
+ errors.push(`${taskFile}: must match tasks/task-R{N}-{SEQ}-<slug>.md with two-digit SEQ`);
183
+ }
184
+ }
185
+
186
+ if (taskFiles.length > 2 && taskFiles.every((taskFile) => /^tasks\/task-R0-/.test(taskFile))) {
187
+ errors.push('tasks/: feature work cannot be entirely R0; reserve R0 for shared foundation tasks');
188
+ }
189
+
190
+ const requirementsPath = path.join(specDir, 'requirements.md');
191
+ const designPath = path.join(specDir, 'design.md');
192
+ const researchPath = path.join(specDir, 'research.md');
193
+
194
+ if (!fs.existsSync(requirementsPath)) errors.push('requirements.md: missing');
195
+ if (!fs.existsSync(designPath)) errors.push('design.md: missing');
196
+
197
+ if (taskFiles.length > 0) {
198
+ if (!fs.existsSync(researchPath)) {
199
+ errors.push('research.md: missing Evidence Summary for non-trivial spec');
200
+ } else {
201
+ const research = fs.readFileSync(researchPath, 'utf8');
202
+ if (!/^##\s+Evidence Summary\s*$/m.test(research)) {
203
+ errors.push('research.md: missing ## Evidence Summary');
204
+ }
205
+ }
206
+ }
207
+
208
+ let requirementIds = [];
209
+ if (fs.existsSync(requirementsPath)) {
210
+ requirementIds = extractRequirementIds(fs.readFileSync(requirementsPath, 'utf8'));
211
+ }
212
+
213
+ const coveredRequirementIds = new Set();
214
+ for (const taskFile of taskFiles) {
215
+ const fullPath = path.join(specDir, taskFile);
216
+ const content = fs.readFileSync(fullPath, 'utf8');
217
+ validateTaskSections(taskFile, content, errors);
218
+
219
+ const idRe = /\b((?:REQ-\d+)|(?:R\d+))\b/gi;
220
+ let match;
221
+ while ((match = idRe.exec(content)) !== null) {
222
+ const id = match[1].toUpperCase();
223
+ if (id !== 'R0') coveredRequirementIds.add(id);
224
+ }
225
+
226
+ const numericMappingRe = /_Requirements:\s*([^_\n]+)_/gi;
227
+ while ((match = numericMappingRe.exec(content)) !== null) {
228
+ for (const token of match[1].split(',')) {
229
+ const number = token.trim().match(/^(\d+)(?:\.\d+)?$/);
230
+ if (number) coveredRequirementIds.add(`R${number[1]}`);
231
+ }
232
+ }
233
+ }
234
+
235
+ for (const requirementId of requirementIds) {
236
+ if (!coveredRequirementIds.has(requirementId)) {
237
+ errors.push(`requirements.md:${requirementId}: not covered by any task`);
238
+ }
239
+ }
240
+
241
+ if (spec.ready_for_implementation === true && errors.length > 0) {
242
+ errors.push('spec.json.ready_for_implementation: cannot be true while validator errors exist');
243
+ }
244
+
245
+ return { errors, warnings };
246
+ }
247
+
248
+ function main() {
249
+ const specDir = resolveSpecDir(process.argv[2]);
250
+ if (!specDir) {
251
+ usage();
252
+ process.exit(2);
253
+ }
254
+
255
+ const { errors, warnings } = validateSpec(specDir);
256
+ for (const warning of warnings) {
257
+ console.warn(`[WARN] ${warning}`);
258
+ }
259
+
260
+ if (errors.length > 0) {
261
+ console.error(`FAIL ${path.relative(process.cwd(), specDir) || specDir}`);
262
+ for (const error of errors) {
263
+ console.error(`- ${error}`);
264
+ }
265
+ process.exit(1);
266
+ }
267
+
268
+ console.log(`PASS ${path.relative(process.cwd(), specDir) || specDir}`);
269
+ }
270
+
271
+ main();
@@ -24,7 +24,7 @@ Do not attempt a standard text-based review if the project includes Visual Specs
24
24
  3. If NO (Markdown Spec only): Read the spec directly and extract:
25
25
  - requirement bullets
26
26
  - task `Completion Criteria`
27
- - task `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`)
27
+ - task `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`)
28
28
  - canonical contracts/invariants from `design.md`
29
29
  Then verify the changed files against those concrete obligations.
30
30