npm - @haposoft/cafekit - Versions diffs - 0.8.7 → 0.8.8 - Mend

@haposoft/cafekit 0.8.7 → 0.8.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/package.json +1 -1
package/src/claude/agents/code-auditor.md +8 -0
package/src/claude/agents/inspector.md +24 -1
package/src/claude/agents/spec-maker.md +4 -1
package/src/claude/agents/test-runner.md +14 -3
package/src/claude/skills/develop/SKILL.md +39 -8
package/src/claude/skills/develop/references/quality-gate.md +40 -38
package/src/claude/skills/specs/SKILL.md +5 -0
package/src/claude/skills/specs/references/review.md +1 -1
package/src/claude/skills/specs/rules/tasks-generation.md +6 -0
package/src/claude/skills/specs/templates/task.md +5 -0
package/src/claude/skills/test/SKILL.md +23 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.8.7",
+  "version": "0.8.8",
   "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -16,6 +16,8 @@ You DO NOT fix code. You only READ, SCORE, and REPORT.
 ## Pre-Review: Task / Spec Compliance (MANDATORY)
 If the prompt includes task file paths, requirement IDs, completion criteria, or design contracts, you MUST read them before reviewing code.
+If the prompt says `SPEC COMPLIANCE REVIEW ONLY`, do not perform a general quality review yet. First prove the implementation matches the active task, `scope_lock`, requirements, design contracts, and scout-discovered runtime entrypoints.
+Do NOT trust implementer reports. Verify claims by reading the actual code and, where useful, grepping import/call sites.
 Extract and verify:
 1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
@@ -24,8 +26,10 @@ Extract and verify:
 4. Task Test Plan & Verification Evidence expectations (or legacy Verification & Evidence)
 5. Canonical Contracts & Invariants from the design
 6. Named technologies and runtime choices that the task/spec explicitly requires
+7. Runtime entrypoints/callers and reachability obligations from task evidence or the task-aware scout report
 Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
+Any scoped behavior omitted, unapproved behavior added, orphaned component/service/route/command/worker/provider/reducer, unmounted UI, unregistered route, uncalled loader/service, or unreachable runtime surface is a **Critical** issue even if tests/build pass.
 If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
 ## Pre-Review: Blast Radius Check (MANDATORY)
@@ -61,6 +65,8 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Hunt serious logic bugs (crashes, data loss, infinite loops).
 - Hunt severe architecture violations (circular imports, cross-layer coupling).
 - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
+- Hunt reachability failures: created exports with no importers, UI not mounted, route not registered, service/data loader never called, provider never wrapping consumers, reducer/action disconnected from runtime state, CLI/worker/manifest not wired.
+- Hunt scope drift: accepted requirement omitted or out-of-scope behavior added without spec amendment.
 - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
 - Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
 - Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
@@ -131,6 +137,8 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
 **Automatic Criticals:**
 - Missing required entrypoint/artifact/runtime output named in the task/spec
+- Runtime-facing artifact exists only as orphaned or unreachable code: component/export unused, UI unmounted, route unregistered, service/loader uncalled, provider not mounted, reducer/action disconnected, command/worker/manifest not wired
+- Missing scoped acceptance criteria or behavior outside `scope_lock` without a spec amendment
 - Placeholder scaffolding marked as complete when the task demanded real wiring
 - Auth/session/transport/persistence behavior that contradicts the design contracts
 - Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification

package/src/claude/agents/inspector.md CHANGED Viewed

@@ -8,7 +8,7 @@ model: haiku
 # Inspect — Codebase Scout
 You hold two primary roles depending on when you are called:
-1. **Architecture Scout (Pre-coding):** Quickly map out directory trees to identify the EXACT FILES relevant to a new feature.
+1. **Task-Aware Architecture Scout (Pre-coding):** Quickly map out directory trees, runtime entrypoints, integration points, and exact files relevant to the active task.
 2. **Edge Case Scout (Code Review phase):** Quickly grep and scan the codebase to find where modified functions/components are imported elsewhere. You hunt for hidden side-effects and boundary errors to inform the `code-auditor`.
 You scout. You DO NOT analyze bugs deeply and you NEVER modify code.
@@ -21,6 +21,8 @@ Before packaging your report, verify:
 - [ ] Followed the 2-Phase rule: (Phase 1) Quick scan via `Glob`/`ls` for root structure. (Phase 2) Read specific files to narrow down scope.
 - [ ] Did NOT dump thousands of files. Only reported CORE relevant files.
 - [ ] Noted the layer/tier of each file (e.g., API files = backend, Component files = frontend).
+- [ ] Identified the runtime entrypoint/caller for runtime-facing work, or explicitly reported that it could not be determined.
+- [ ] Checked whether prior task outputs are currently imported, mounted, registered, invoked, or still orphaned.
 - [ ] Report is Short, Solid, and Sharp.
 ## Capabilities
@@ -31,6 +33,10 @@ Before packaging your report, verify:
 ## Responsibilities
 - Provide a file list with brief context descriptions — fast and concise.
 - Target the right directories, skip noise.
+- For `hapo:develop`, scout PER ACTIVE TASK. Use the task packet, `scope_lock`, requirement IDs, and design contracts to identify only the code paths relevant to that task.
+- Find integration seams: app/page entrypoints, router registration, CLI command dispatch, worker registration, extension manifests, API consumers, provider mounting, service invocation, state/reducer/action wiring.
+- Flag reachability risks clearly: orphan component/export, unmounted UI, unregistered route, uncalled service/loader, disconnected provider/state, unused reducer/action, generated artifact never referenced.
+- Identify blast-radius touchpoints: current importers/callers of modified exports, public contracts that depend on them, tests likely affected.
 ## Core Skills
 - Summarize root config (README, package.json, turbo.json) to identify repo type.
@@ -42,10 +48,27 @@ Before packaging your report, verify:
 ```markdown
 # Inspect Report
+## Runtime Entrypoints / Callers
+- `path/to/App.tsx` — Why this is the feature entrypoint
+- `path/to/router.ts` — Route registration point
+## Integration Points
+- `path/to/provider.tsx` — Existing provider to mount/use
+- `path/to/service.ts` — Existing service call path
+## Prior Task Outputs / Reachability
+- `path/to/NewComponent.tsx` — imported by X | orphaned | intentionally internal for task Y
 ## Relevant Files
 - `path/to/file.ts` — Brief role description (e.g., Handles JWT Auth)
 - ...
+## Blast Radius / Dependents
+- `path/to/dependent.ts` — imports/calls changed symbol
+## Scope / Spec Risks
+- Missing entrypoint, orphan output, out-of-scope touch, stale contract, or "none"
 ## Identified Structure
 - (Monorepo or single app? Main libraries/frameworks detected)

package/src/claude/agents/spec-maker.md CHANGED Viewed

@@ -125,8 +125,10 @@ Before writing `design.md`, select a discovery mode and record the reason:
 - Reject tasks outside `scope_lock.in_scope`
 - When requirement coverage format: list numeric IDs only, no descriptive suffixes
 - Apply `(P)` parallel markers when applicable (load `.claude/skills/specs/rules/tasks-parallel-analysis.md`)
-- Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, and negative-path checks.
+- Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, runtime reachability proof, and negative-path checks.
 - Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
+- UI/app/runtime workflows MUST include a final integration/reachability task or final integration section that names the real entrypoint and proves all scoped user-facing surfaces are wired.
+- Do not allow orphan task outputs: components, services, hooks, routes, commands, workers, providers, reducers, data loaders, and generated artifacts must be reachable now or assigned to a named later integration task.
 - Validation decisions that affect implementation MUST be written into implementation-facing sections (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`) rather than only `Risk Assessment`.
 ### Sub-Task Detail Requirements (MANDATORY)
@@ -140,6 +142,7 @@ Each task file MUST contain granular sub-tasks with the following structure:
 4. **Requirement mapping** (`_Requirements: X.X_`) at the end of EVERY sub-task — no exceptions
 5. **Test coverage section** as the last major step in every task, with unit + integration sub-tasks
 6. **Completion criteria** must be observable and testable — not subjective
+7. **Scope/reachability criteria** must prove the task implements scoped behavior without out-of-scope additions and without unreachable runtime-facing outputs
 **FORBIDDEN**: Task files with only 3-5 top-level checkboxes and no sub-task breakdown. This level of detail is INSUFFICIENT for implementation.

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -14,6 +14,7 @@ You are a battle-hardened QA engineer who has been burned by production incident
 If the prompt includes task file paths, Completion Criteria, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
 Diff-aware test selection does NOT replace task-specific verification.
 If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
+If the prompt includes a feature name or `specs/<feature>`, load `spec.json`, `requirements.md`, `design.md`, and the active/recent task files. Treat `scope_lock`, Completion Criteria, and Task Test Plan evidence as the test contract.
 ## Command Resolution Order
@@ -56,9 +57,11 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 4. **No-op Detection:** Parse runner output for executed test count. If the command exits 0 but runs 0 tests, report `NO_TESTS` instead of `PASS`.
 5. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
 6. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
-7. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
-8. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
-9. **Verdict:** Output structured report.
+7. **Runtime Reachability Audit:** For runtime-facing work, grep/read the declared entrypoint/caller and verify created components/services/routes/commands/workers/providers/loaders are imported, mounted, registered, or invoked. If a task output is orphaned, mark evidence FAIL.
+8. **Scope Coverage Audit:** Compare reachable behavior against scoped requirements/task criteria. Missing scoped behavior is FAIL; out-of-scope behavior is NEEDS_ATTENTION unless user-approved.
+9. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
+10. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
+11. **Verdict:** Output structured report.
 ## Supported Ecosystems
@@ -100,6 +103,12 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ### Task Evidence
 - [PASS|FAIL|UNVERIFIED] [verification item] → [proof or blocker]
+### Runtime Reachability
+- [PASS|FAIL|UNVERIFIED] `entrypoint/caller` reaches `artifact` → [proof or blocker]
+### Scope / Spec Coverage
+- [PASS|FAIL|NEEDS_ATTENTION] Scoped requirement/task criterion → [reachable proof or missing behavior]
 ### Unverified Items
 - [list anything that could not be executed or inspected]
@@ -120,6 +129,8 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
 - **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
 - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
+- **Runtime Reachability Missing = FAIL:** If a task created runtime-facing code but it is not imported, mounted, registered, invoked, or otherwise reachable from the declared entrypoint/caller, return FAIL.
+- **Scope Coverage Missing = FAIL:** If scoped requirements or task completion criteria are not exercised or inspectably reachable, return FAIL even when build/typecheck pass.
 - **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
 - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
 - **Zero-Test Green Is NO_TESTS:** If `npm test`, `pnpm test`, `pytest`, or an equivalent runner exits successfully while reporting 0 tests, treat it as `NO_TESTS`, not a passing suite.

package/src/claude/skills/develop/SKILL.md CHANGED Viewed

@@ -53,6 +53,11 @@ If the spec/task explicitly names a framework, auth system, datastore, transport
 You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "placeholder", "temporary auth", "in-memory until later") unless the spec itself is updated first.
 </CONTRACT-FIDELITY>
+<SCOPE-FIDELITY>
+The approved `scope_lock`, requirements, design contracts, and active task packet are the implementation contract.
+You MUST implement all scoped behavior for the active task, MUST NOT add out-of-scope behavior, and MUST NOT mark work done while required surfaces exist only as orphaned files, unmounted UI, unregistered routes, uncalled loaders, or placeholder wiring.
+</SCOPE-FIDELITY>
 ## Anti-Rationalization Protocol
 | Thought (Excuse) | Reality (Rule) |
@@ -66,12 +71,14 @@ You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "p
 flowchart TD
     A["/hapo:develop \u003cfeature\u003e"] --> B[Step 1: Load Spec]
     B -->|Missing| Z[Stop: Run /hapo:specs]
-    B -->|Ready| C[Step 2: Scout Codebase (inspector)]
+    B -->|Ready| C[Step 2: Task-Aware Scout (inspector)]
     C --> D[Step 3: Implement Code (god-developer)]
-    D --> E[Step 4: Quality Gate: Test + Review + Evidence]
-    E -->|Fail (code-auditor)| D
+    D --> E[Step 4: Quality Gate: Test + Spec Review + Code Review + Evidence]
+    E -->|Fail| D
     E -->|Pass| F[Step 5: State Sync + Incremental Docs Sync]
-    F --> G[Report Completion]
+    F --> H{More tasks?}
+    H -->|Yes| B
+    H -->|No| G[Final Integration Scout + Report Completion]
 ```
 ### Step 1: Initialize & Load Spec
@@ -94,11 +101,26 @@ flowchart TD
 - Before coding, set the active task(s) to `in_progress` in both markdown and `spec.json.task_registry`, or route through `/hapo:sync` if the runtime expects the sync protocol.
 ### Step 2: Scout (Codebase Inspection)
-- **Mandatory:** Call agent `Agent(subagent_type="inspector", ...)` to scan the overall codebase structure (e.g., where components live, where utils are). Avoid wandering into forbidden zones. Use the legacy `Task` tool only in runtimes that have not renamed the subagent tool yet.
+- **Mandatory per task:** Call agent `Agent(subagent_type="inspector", ...)` before implementing EVERY active task. This is task-aware scouting, not a one-time global scan.
+- The inspector prompt MUST include:
+  - Active task file path and extracted task packet
+  - Requirement IDs and `scope_lock`
+  - Relevant `design.md` contracts/invariants
+  - Prior completed task outputs from `spec.json.task_registry`
+  - Related Files from the active task
+- Inspector MUST report:
+  - Real runtime entrypoints/callers affected by the task (`App.tsx`, routes, CLI command, worker registration, manifest, API consumer, etc.)
+  - Existing integration points and adjacent code patterns to follow
+  - Prior task outputs this task must consume or preserve
+  - Blast-radius touchpoints and dependent files that can regress
+  - Reachability risks: orphan components, unmounted UI, unregistered routes, uncalled services/loaders, unused providers, disconnected reducers/actions
+  - Exact files likely safe to modify and any files outside `Related Files` that require a justified scope escape
+- If the inspector cannot identify the entrypoint/caller for a runtime-facing task, STOP and route back to spec correction or ask the user. Do not guess.
 ### Step 3: Implement Code
 - Act as `god-developer` OR directly write code, executing tasks specified in the loaded Markdown file(s) sequentially.
 - **Important:** You may create and modify files directly, but must faithfully follow the design from the Spec.
+- You MUST use the Step 2 scout report as implementation context. If code reality contradicts the task packet, stop and reconcile the spec before coding.
 - Progress tracking: Temporarily change `[ ]` to `[/]` in Spec files while coding is in progress. Do NOT mark `[x]` before Step 4 passes.
 - **Task Boundary Protocol (CRITICAL):**
   - Default editable scope is `Related Files` from the task packet.
@@ -112,18 +134,22 @@ flowchart TD
 - **Named Technology Rule:** If the task/spec explicitly requires a named dependency or runtime choice (for example Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, S3), you MUST implement that choice or stop. Do not swap it for a custom/in-memory/local substitute and still call the task complete.
 - **Cross-Service Reality Rule:** If a task spans multiple processes or runtimes (web ↔ API, worker ↔ DB, extension ↔ backend), you MUST prove the integration uses shared real state or a real contract boundary. Process-local placeholders on both sides do not count as completion.
 - **Placeholder Completion Rule:** You MAY scaffold future files only when the active task truly needs them to compile, but placeholder route handlers, in-memory stores, or fake adapters MUST NOT be used as evidence that the current task's behavior works end-to-end.
+- **Reachability Rule:** Runtime-facing work is incomplete until it is reachable from the real entrypoint/caller named in the task evidence or Step 2 scout report. Creating a component/service/route/provider/reducer without importing, mounting, registering, or invoking it is not implementation.
+- **Prior Output Consumption Rule:** If this task depends on previous task outputs, verify those outputs are consumed through real code paths. If a prior output is unused and this task is responsible for wiring it, wire it now; if a later task owns the wiring, keep the current task pending unless that deferral is named in the active task evidence.
 ### Step 4: Self-Healing (Quality Gate Auto-Fix)
 The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
-**Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
+**Mantra:** Scope/spec compliance first, code quality second. All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
 - Passing Step 4 requires ALL of the following:
   1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Task Test Plan & Verification Evidence` section (or legacy `Verification & Evidence`)
-  2. Code review passes
-  3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
+  2. Spec compliance review passes: every scoped requirement and active task criterion is implemented, with no extras and no omissions
+  3. Code quality review passes
+  4. Task evidence passes (artifacts/runtime surfaces/reachability/negative-path checks from the task file are proven)
 - `PRECHECK_FAIL` outranks `NO_TESTS`. If compile/typecheck/build fails, the task is FAIL even when no test suite exists yet.
 - `NO_TESTS` is NOT equivalent to PASS. If the task explicitly requires a test command or automated test proof, `NO_TESTS` is a FAIL or BLOCKED outcome until the requirement is satisfied or the spec is corrected.
 - If build/test passes but task evidence is missing, the task is still FAIL.
+- If runtime-facing work is orphaned, unmounted, unregistered, uncalled, or unreachable from the declared entrypoint/caller, the task is still FAIL.
 - If the implementation silently replaced a named contract choice or relies on cross-service process-local stand-ins, the task is still FAIL.
 - Only escalate to the user after 3 consecutive failed review rounds.
@@ -147,6 +173,11 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
 - Task-level docs sync happens after every verified completed task, but actual edits still depend on `Docs impact`.
 - In **Specific-Task Mode**, STOP after sync and report the result.
 - In **Full-Spec Mode**, only after sync may you re-read `task_registry`, pick the next unblocked pending task, and repeat from Step 1 for that task.
+- When no pending tasks remain, run a **Final Integration Scout** before reporting completion:
+  - Trace runtime entrypoints from `main`/route/CLI/worker/manifest/API consumer through the scoped feature surfaces.
+  - Compare reachable behavior against `scope_lock`, `requirements.md`, `design.md`, and all task Completion Criteria.
+  - FAIL completion if any scoped surface is missing, any created runtime-facing artifact is orphaned, or spec progress/registry says done while evidence is missing.
+  - Only then set top-level progress to `code_done` / next phase.
 ---
 ## Attached References

package/src/claude/skills/develop/references/quality-gate.md CHANGED Viewed

@@ -1,11 +1,12 @@
-# Quality Gate — Parallel Test + Review Loop
+# Quality Gate — Task Evidence + Two-Stage Review Loop
 This is the critical checkpoint protecting codebase quality at Step 4 of `hapo:develop`.
 Runs AUTOMATICALLY. Only escalates to user after 3 consecutive failures or a critical block.
-Green tests are NOT enough. The gate requires three proofs:
+Green tests are NOT enough. The gate requires four proofs:
 1. Automated verification (typecheck/test/build)
-2. Code/spec review
-3. Task evidence (completion criteria + runtime/artifact proof from the task file)
+2. Spec compliance review (scope/task/design adherence)
+3. Code quality review
+4. Task evidence (completion criteria + runtime/artifact/reachability proof from the task file)
 ## Automation Semantics
@@ -16,8 +17,10 @@ Green tests are NOT enough. The gate requires three proofs:
 - If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
 - Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
 - Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
+- Scope fidelity is mandatory: missing scoped behavior, extra unapproved behavior, or task output that exists only as orphaned/unreachable code is a review failure even when build/tests pass.
+- Runtime-facing artifacts must be reachable from the real entrypoint/caller named by the task or the task-aware scout report.
-## Parallel Quality Cycle
+## Quality Cycle
 Maximum retry counter: **3 attempts**. Exceeding 3 triggers a collapse warning.
@@ -29,66 +32,65 @@ Before START_LOOP:
   - Extract Related Files, Completion Criteria, Task Test Plan & Verification Evidence (or legacy Verification & Evidence)
   - Extract the exact executable verification commands in declaration order
   - Extract relevant design contracts/invariants for the touched area
+  - Extract scope_lock, requirement IDs, runtime entrypoints/callers, and reachability proof obligations
   - If any of these are missing or too vague to verify, FAIL immediately and route back to spec correction
 START_LOOP:
   ---------------------------------------------------------------
-  PARALLEL GATE: Spawn BOTH agents simultaneously
+  STAGE A: Test + SPEC COMPLIANCE review
   ---------------------------------------------------------------
   → Agent(subagent_type="test-runner",
-        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
+        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs and prove runtime reachability from declared entrypoints/callers. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
         description="Test [feature]")
   → Agent(subagent_type="code-auditor",
-        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
-        description="Review [feature]")
+        prompt="SPEC COMPLIANCE REVIEW ONLY. Do not trust the implementer's report. Read the active task file(s), scope_lock, referenced requirements, design contracts, task-aware scout report, and actual code. Verify line by line that every scoped requirement and completion criterion is implemented, nothing out-of-scope was added, and every runtime-facing artifact is reachable from the declared entrypoint/caller or explicitly deferred to a named later task. Missing deliverables, placeholder-only wiring, orphan components/services, unmounted UI, unregistered routes, uncalled loaders, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Return SPEC_PASS or SPEC_FAIL, critical count, file:line findings, and evidence gaps.",
+        description="Spec review [feature]")
   Wait for BOTH to return results.
-  ---------------------------------------------------------------
-  COMBINE RESULTS
-  ---------------------------------------------------------------
-  CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
+  CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR Reachability FAIL / SPEC_FAIL OR NO_TESTS when tests were required:
     - Increment retry_count++
     - If retry_count >= 3:
         → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
     - If retry_count < 3:
-        → Return to Step 3 (god-developer). Fix the failing checks or missing evidence first.
-        → GOTO START_LOOP (re-run BOTH test + review)
+        → Return to Step 3 (god-developer). Fix the failing checks, spec gaps, or missing evidence first.
+        → GOTO START_LOOP
+  CASE 2 — Test PASS + Evidence PASS + SPEC_PASS:
+    → Proceed to STAGE B code quality review.
-  CASE 2 — Test PASS + Evidence PASS + Review FAIL (Score < 9.5 OR Critical > 0):
+STAGE B:
+  ---------------------------------------------------------------
+  CODE QUALITY REVIEW (only after spec compliance passes)
+  ---------------------------------------------------------------
+  → Agent(subagent_type="code-auditor",
+        prompt="CODE QUALITY REVIEW. Spec compliance already passed. Review recently written code for security, logic correctness, architecture, YAGNI/KISS/DRY, maintainability, tests, and project conventions. Also re-check that no recent edits broke dependents found by the task-aware scout report. Return score (X/10), critical count, warning list, and concrete file:line findings.",
+        description="Quality review [feature]")
+  CASE 3 — Code quality review FAIL (Score < 9.5 OR Critical > 0):
     - Increment retry_count++
     - If retry_count >= 3:
         → COLLAPSE! AskUserQuestion: "Code does not meet minimum standards! User intervention required!"
     - If retry_count < 3:
-        → Fix each review issue from warning log.
-        → GOTO REVIEW_ONLY (skip re-test only if the fixes cannot affect automated evidence; otherwise rerun full loop)
+        → Fix each review issue.
+        → GOTO START_LOOP unless the fix is prose-only and cannot affect evidence; otherwise re-run Stage B.
-  CASE 3 — Test PASS + Evidence PASS + Review PASS (Score >= 9.5 AND Critical = 0):
+  CASE 4 — Test PASS + Evidence PASS + SPEC_PASS + Code quality review PASS (Score >= 9.5 AND Critical = 0):
     → PASS! Auto-approved.
-    → PROCEED to completion report with a verification receipt summarizing exact commands executed, artifact/runtime proof, and review result.
-REVIEW_ONLY:
-  ---------------------------------------------------------------
-  Re-run ONLY code-auditor (tests already passed and no new evidence-producing code changed)
-  ---------------------------------------------------------------
-  → Agent(subagent_type="code-auditor", ...)
-  IF Score >= 9.5 AND Critical = 0 → PASS!
-  IF Score < 9.5 OR Critical > 0:
-    - retry_count++
-    - If retry_count >= 3 → COLLAPSE
-    - Else → fix issues, GOTO REVIEW_ONLY
+    → PROCEED to completion report with a verification receipt summarizing exact commands executed, artifact/runtime/reachability proof, spec review result, and code quality review result.
 ```
 ## Critical Issue Definitions
 - **Security:** XSS vulnerabilities, SQL injection, leaked env tokens/secrets.
-- **Performance:** Bottlenecks, O(n³) algorithms, unbounded loops over DB calls.
+- **Performance:** Bottlenecks, O(n^3) algorithms, unbounded loops over DB calls.
 - **Architecture:** Breaking MVC boundaries, cross-module coupling, convention violations.
 - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
 - **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
-- **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task.
+- **Reachability Failure:** Orphan components/services/hooks/routes/workers/commands/providers/reducers, unmounted UI, unregistered routes, uncalled data loaders, unused providers, disconnected actions, or any runtime-facing artifact that cannot be reached from the declared entrypoint/caller.
+- **Scope Drift:** Scoped acceptance criteria omitted, behavior added outside `scope_lock`, or a task marked complete while part of its approved requirement remains unwired.
+- **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task packet.
 - **Contract Substitution Drift:** Replacing a named framework/auth/transport/datastore/runtime boundary with a custom simplification without a spec amendment.
 - **Cross-Service Reality Failure:** Claiming end-to-end behavior across web/api/worker/extension boundaries while state only exists in local process memory or placeholder adapters.
@@ -96,8 +98,8 @@ REVIEW_ONLY:
 Must log the Quality Gate result to the terminal for user visibility:
-- **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
-- **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
+- **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Spec PASS + Review 9.5/10 - Auto-Approved`
+- **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Spec PASS + Review 9.6/10`
 - **Preflight Fail:** `[x] Step 4 Quality Gate: PRECHECK_FAIL → compile/typecheck/build failed before tests mattered`
-- **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
+- **Fix Needed:** `[~] Step 4 Quality Gate: Tests/spec/evidence failed → returned to god-developer`
 - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`

package/src/claude/skills/specs/SKILL.md CHANGED Viewed

@@ -267,6 +267,8 @@ Load: `references/scope-inquiry.md`
 - Each task file follows template `templates/task.md`
 - `Related Files` and test plans must inherit paths, contracts, and test targets from the codebase scout. If exact files/tests cannot be named for an enhancement, run targeted inspect before generating tasks.
 - Each task file MUST include `Completion Criteria` and `Task Test Plan & Verification Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done.
+- Every task MUST preserve the approved `scope_lock`: implement all scoped acceptance criteria for its requirement, avoid out-of-scope features, and record any intentional deferral as a named later task rather than implicit omission.
+- For UI/app/runtime features, generate a final integration/reachability task or final section that names the real runtime entrypoint and proves prior task outputs are imported, mounted, registered, invoked, or otherwise reachable.
 - Build `spec.json.task_registry` alongside `task_files`. For each task file, register at minimum:
   - `id`
   - `title`
@@ -327,6 +329,7 @@ Each task file MUST be **self-contained and implementation-ready** — detailed
 4. **Related Files** — Table with exact paths, action type, and descriptions
 5. **Completion Criteria** — Observable, testable criteria (checkbox format)
 6. **Risk Assessment** — Table with risk, severity, mitigation
+7. **Runtime reachability** — For any created component, service, route, command, worker, provider, or data loader, state where it is reached from or which named later task wires it
 **Parallel markers:** Append `(P)` to tasks that can run concurrently (no data dependency, no shared files, no prerequisite approval from another task). Tasks serving DIFFERENT requirements are often parallelizable.
@@ -362,6 +365,8 @@ Load: `references/review.md` + `rules/design-review.md`
 - FAIL if a newly generated non-trivial spec lacks a `research.md` Evidence Summary with codebase scout result, external research result or skip rationale, selected decision, rejected alternatives, and downstream task/test implications.
 - FAIL if any requirement or NFR mapping uses non-numeric labels (`NFR-1`, `SEC-1`, etc.)
 - FAIL if a task lacks `Completion Criteria` or `Task Test Plan & Verification Evidence` (legacy `Verification & Evidence` is accepted only for pre-existing task files)
+- FAIL if a task creates runtime-facing artifacts but neither proves reachability from an entrypoint/caller nor names a later integration task responsible for wiring them.
+- FAIL if a UI/app/runtime spec has multiple user-facing task outputs but no final integration/reachability task or final integration section.
 - FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`, canonical contracts, or requirements text).
 - FAIL if the spec scope/provider was switched away from Anthropic/Claude but `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider-specific strings such as `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` is the only allowed place for historical cost comparisons.
 - FAIL if privacy/delete-data work lacks a single canonical deletion policy. The design MUST explicitly choose either:

package/src/claude/skills/specs/references/review.md CHANGED Viewed

@@ -43,7 +43,7 @@ These rules override any self-reasoning or optimization the system may attempt:
 5. **No false completion.** You MUST NOT set `validation.status = "completed"` or `ready_for_implementation = true` until a reconciliation audit proves the accepted findings and validation decisions are reflected in the physical spec artifacts.
 6. **Provider drift is a real defect.** If the scope changed away from Claude/Anthropic, stale strings like `Claude API`, `Haiku`, or `haiku_reachable` in `requirements.md`, `design.md`, or `tasks/*.md` are validation failures. `research.md` may mention them only as historical comparison.
 7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `Task Test Plan & Verification Evidence`.
-8. **CafeKit command dialect only.** Validation output MUST use `/hapo:develop <feature>` as the implementation handoff.
+8. **CafeKit command dialect only.** Validation output MUST use `/hapo:develop <feature>` as the implementation handoff. Never mention `/sdd:execute-spec`, `/sdd:*`, `/work`, `/code`, `/specs <feature> --approve`, `/hapo:specs <feature> --approve`, or non-CafeKit aliases.
 9. **CafeKit task filename convention only.** Task files MUST use `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`). Files like `tasks/R0-1-project-scaffolding.md` are legacy/foreign format; rename them and update `spec.json.task_files`, `spec.json.task_registry`, and dependency references before passing validation.
 ---

package/src/claude/skills/specs/rules/tasks-generation.md CHANGED Viewed

@@ -28,6 +28,7 @@ Detail bullets must include:
 **Every task must**:
 - Build on previous outputs (no orphaned code)
 - Connect to the overall system (no hanging features)
+- Stay inside the approved `scope_lock` and requirement IDs; do not add unapproved features or silently drop scoped behavior
 - Progress incrementally (no big jumps in complexity)
 - Validate core functionality early in sequence
 - Respect architecture boundaries defined in design.md (Architecture Pattern & Boundary Map)
@@ -37,6 +38,8 @@ Detail bullets must include:
 - Use major task summaries sparingly—omit detail bullets if the work is fully captured by child tasks.
 **End with integration tasks** to wire everything together.
+- For UI/app/runtime workflows, the final integration task MUST name the real entrypoint (`App.tsx`, route, command, worker, extension manifest, API route, etc.) and verify every user-visible surface from the requirements is reachable from that entrypoint.
+- Components, services, routes, commands, workers, providers, and data loaders created by earlier tasks MUST be consumed by a later integration task or explicitly marked as internal support in `design.md`; orphaned deliverables are invalid.
 ### 3. Flexible Task Sizing
@@ -80,6 +83,7 @@ Grouping tasks vertically by requirement carries the risk of "siloed" or fragmen
 1. **Foundation First (The R0 Concept)**: Extract shared infrastructure, core database migrations, authentication wrappers, and base UI layouts into foundational tasks running before feature work. If these aren't explicitly in requirements, classify them as `task-R0-XX-foundation.md` or map them to the most logical architectural requirement. All parallel feature tasks MUST depend on these foundation tasks.
 2. **Shared Interfaces (Horizontal Contracts)**: Sub-tasks that touch shared cross-requirement architecture (like registering a new page in a global `router.ts` or adding a column to a shared table) MUST explicitly reference the shared contract defined in `design.md`.
 3. **Integration Enforcers**: If R1 and R2 interact (e.g., R2 UI displays data fetched by R1 backend), the later task MUST have a sub-task explicitly dedicated to "Wiring/Integrating with [Previous Feature] output".
+4. **Final Runtime Integration**: For any feature that has a user-facing screen, public route, CLI command, background worker, browser extension surface, or API flow, create a final integration task (or a final integration section in the last dependent task) that proves the whole scoped feature works from its runtime entrypoint. This task MUST fail if prior-task outputs exist but are not imported, mounted, registered, or invoked.
 ### 3d. Spike Tasks for Complex/Uncertain Areas (MANDATORY)
@@ -144,10 +148,12 @@ That section is the task-level test plan and MUST contain:
 1. **Automated proof** — exact command(s) for typecheck, tests, build, or explicit `N/A`
 2. **Artifact/runtime proof** — exact files, routes, UI surfaces, generated outputs, or persisted state to inspect
 3. **Contract/negative-path proof** — at least one contract-preserving check for unauthorized, invalid, missing-permission, rollback, or failure-path behavior when relevant
+4. **Reachability proof** — when the task creates a runtime-facing artifact, name the upstream entrypoint or caller that reaches it; if reachability is deferred, name the exact later integration task responsible
 Rules:
 - If the task produces a build artifact or generated file, name the exact artifact path to inspect.
 - If the task wires entrypoints (popup, content script, route, worker, CLI command), name the exact runtime surface that must exist after implementation.
+- If the task creates a UI component, service, hook, reducer, route handler, worker, command, or data loader, the evidence MUST prove it is either reachable from the declared runtime surface or intentionally internal support for a named later task.
 - If verification depends on environment or manual setup, document the blocker explicitly instead of implying success.
 - Build success alone is NEVER enough evidence for a completed task.
 - For provider-sensitive work, use provider-neutral wording unless the scope lock explicitly names a vendor.

package/src/claude/skills/specs/templates/task.md CHANGED Viewed

@@ -16,6 +16,7 @@
 - **MUST**: {{Non-negotiable requirement or technical constraint}}
 - **SHOULD**: {{Recommended approach or optimization}}
 - **MUST NOT**: {{Explicitly forbidden action or approach}}
+- **SCOPE**: Implement only the behavior mapped to R{{REQ_NUMBER}} and the approved `scope_lock`; do not add out-of-scope features or leave scoped acceptance criteria unwired.
 ## Implementation Steps
@@ -57,6 +58,7 @@
 - [ ] {{Criteria 1 — observable output or artifact, maps to acceptance criteria R{{REQ_NUMBER}}}}
 - [ ] {{Criteria 2 — measurable behavior or negative-path outcome}}
 - [ ] {{Criteria 3 — maps directly to acceptance criteria from requirements.md and can be proven below}}
+- [ ] {{Criteria 4 — no orphaned component/service/route/command; created runtime-facing work is reachable from the declared entrypoint or explicitly deferred to a named integration task}}
 ## Task Test Plan & Verification Evidence
@@ -68,6 +70,9 @@ This section is the task-level test plan. It names the exact commands, observabl
 - [ ] Artifact / runtime verification
   - Inspect: `{{artifact path | route | UI state | DB object | manifest entry}}`
   - Expect: {{Observable result that proves the task is really wired}}
+- [ ] Runtime reachability verification
+  - Entrypoint/caller: `{{App.tsx | route file | CLI command | worker registration | manifest | API consumer}}`
+  - Expect: {{Created component/service/route/worker/loader is imported, mounted, registered, or invoked from the runtime path; if deferred, name the later integration task}}
 - [ ] Contract / negative-path verification
   - Check: {{Unauthorized path, validation error, permission omission, missing env behavior, deletion effect, etc.}}
   - Expect: {{Concrete failure mode or contract-preserving behavior}}

package/src/claude/skills/test/SKILL.md CHANGED Viewed

@@ -18,6 +18,8 @@ Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same
 /hapo:test                    # Blast-radius mode: only tests affected by recent changes
 /hapo:test --full             # Run full test suite regardless of changes
 /hapo:test <scope>            # Test a specific module or path
+/hapo:test <feature-name>     # Spec-aware test: load specs/<feature-name> and verify scope/task evidence
+/hapo:test specs/<feature>    # Spec-aware test by spec directory
 /hapo:test --ui <url>         # UI verification via chrome-devtools (public pages)
 /hapo:test --ui-auth <url>    # UI verification with auth injection (protected pages)
 /hapo:test --ui-flow <url>    # UI testing with User Journey (form fill/submit simulation)
@@ -31,6 +33,13 @@ If a test command exits 0 but runs 0 tests, report NO_TESTS — this is a green
 If tests fail, list every failure explicitly — do not summarize failures away.
 </HARD-GATE>
+<SCOPE-GATE>
+When a feature name or `specs/<feature>` path is supplied, testing is spec-aware.
+Load `spec.json`, `requirements.md`, `design.md`, active/recent task files, and Task Test Plan evidence.
+The verdict MUST compare executed/reachable behavior against `scope_lock`, requirements, design contracts, task Completion Criteria, and runtime reachability obligations.
+Build/typecheck success without scoped runtime proof is not PASS.
+</SCOPE-GATE>
 ## 4-Phase Execution
 ```mermaid
@@ -61,6 +70,12 @@ Auto-detect the test runner from project files:
 Unless `--full` is specified: apply **Blast Radius scoping** to run only tests
 affected by recent file changes. See `references/execution-strategy.md` Phase A.
+If the argument resolves to `specs/<feature>` or a feature directory under `specs/`, enter **Spec-Aware Mode**:
+- Load `spec.json`, `requirements.md`, `design.md`, and task files referenced by `task_registry`
+- Identify tasks marked `done`, `in_progress`, or recently changed
+- Extract exact commands, runtime/artifact proof, runtime reachability proof, and negative-path checks
+- Scope test selection by affected task files, but do not skip any mandatory task evidence
 ### Phase 2 — Execute
 **Code testing (default):**
@@ -68,6 +83,7 @@ affected by recent file changes. See `references/execution-strategy.md` Phase A.
 2. Execute test command with coverage flags
 3. Collect test counts, coverage percentages, and fail stack traces
 4. Treat 0 executed tests as `NO_TESTS`, even if the command exits 0
+5. In Spec-Aware Mode, inspect runtime reachability from declared entrypoints/callers and fail if scoped surfaces are missing or orphaned
 **UI verification (`--ui` / `--ui-auth` / `--ui-flow`):**
 Execute multi-page discovery, then spawn **Parallel UI Subagents** (test-runner instances) to handle Smoke, Core-Vitals, Accessibility, SEO, Security, and User Flows simultaneously.
@@ -76,7 +92,7 @@ See `references/execution-strategy.md` Phase C for full phase breakdown.
 Delegate execution to `test-runner` agent:
 ```
 Agent(subagent_type="test-runner",
-  prompt="Run tests. Scope: [blast-radius|full|ui]. Target: [path|url]. Return structured verdict.",
+  prompt="Run tests. Scope: [blast-radius|full|ui|spec-aware]. Target: [path|url|feature]. Load specs when target is a feature. Return structured verdict with scope/spec coverage and runtime reachability.",
   description="Test [feature]")
 ```
@@ -111,6 +127,12 @@ Return a **structured verdict** (required format — not free-form prose):
 - Accessibility issues: N found | none
 - Screenshots: [paths]
+### Scope / Spec Coverage (if feature scope)
+- Requirements covered: N/N
+- Task evidence checks: PASS | FAIL | UNVERIFIED
+- Runtime reachability: PASS | FAIL | UNVERIFIED
+- Out-of-scope behavior detected: none | [list]
 ### Test Regression Check
 - **Comparison:** Compare current test count and assertion depth against previous runs.
 - **Result:** OK | REGRESSION (tests deleted/weakened)