npm - @haposoft/cafekit - Versions diffs - 0.8.7 → 0.8.9 - Mend

@haposoft/cafekit 0.8.7 → 0.8.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/package.json +1 -1
package/src/claude/CLAUDE.md +2 -2
package/src/claude/agents/code-auditor.md +9 -1
package/src/claude/agents/inspector.md +24 -1
package/src/claude/agents/spec-maker.md +26 -22
package/src/claude/agents/test-runner.md +27 -5
package/src/claude/hooks/spec-state.cjs +2 -1
package/src/claude/migration-manifest.json +2 -1
package/src/claude/rules/workflow.md +5 -4
package/src/claude/scripts/validate-spec-output.cjs +271 -0
package/src/claude/skills/code-review/references/spec-compliance-review.md +1 -1
package/src/claude/skills/develop/SKILL.md +43 -12
package/src/claude/skills/develop/references/quality-gate.md +43 -40
package/src/claude/skills/specs/SKILL.md +32 -27
package/src/claude/skills/specs/references/review.md +2 -2
package/src/claude/skills/specs/rules/tasks-generation.md +35 -9
package/src/claude/skills/specs/templates/task.md +43 -33
package/src/claude/skills/sync/SKILL.md +2 -2
package/src/claude/skills/sync/references/sync-protocols.md +5 -5
package/src/claude/skills/test/SKILL.md +32 -1

package/src/claude/skills/specs/templates/task.md CHANGED Viewed

@@ -7,43 +7,39 @@
 **Dependencies:** {{DEPENDENCIES}}
 **Spec:** specs/{{FEATURE_NAME}}/
-## Objective
+## Context
-{{Brief 1-2 sentence objective detailing WHAT to accomplish, not HOW. Must relate directly to requirement R{{REQ_NUMBER}}.}}
+- **Why**: {{Business/user reason this task exists}}
+- **Current state**: {{Relevant existing files, route, model, API, screen, or "greenfield"}}
+- **Target outcome**: {{Observable behavior after this task is done}}
 ## Constraints
 - **MUST**: {{Non-negotiable requirement or technical constraint}}
 - **SHOULD**: {{Recommended approach or optimization}}
 - **MUST NOT**: {{Explicitly forbidden action or approach}}
+- **SCOPE**: Implement only the behavior mapped to R{{REQ_NUMBER}} and the approved `scope_lock`; do not add out-of-scope features or leave scoped acceptance criteria unwired.
-## Implementation Steps
-- [ ] 1. {{MAJOR_STEP_1}}
-  - [ ] 1.1 {{Sub-task describing specific behavior/action}}
-    - {{Detail: business logic, behavior, target validation}}
-    - {{Detail: edge case or constraint}}
-    - _Requirements: {{REQ_NUMBER}}.{{X}}_
-  - [ ] 1.2 {{Next sub-task}}
-    - {{Detail items}}
-    - _Requirements: {{REQ_NUMBER}}.{{Y}}_
-- [ ] 2. {{MAJOR_STEP_2}}
-  - [ ] 2.1 {{Sub-task}}
-    - {{Details}}
-    - _Requirements: {{REQ_NUMBER}}.{{Z}}_
-  - [ ] 2.2 {{Sub-task}}
-    - {{Details}}
-    - _Requirements: {{REQ_NUMBER}}.{{W}}_
-- [ ] 3. Test coverage for R{{REQ_NUMBER}}
-  - [ ] 3.1 Unit tests
-    - {{Test case 1: target behavior to verify}}
-    - {{Test case 2: edge case / error case}}
-    - _Requirements: {{REQ_NUMBER}}_
-  - [ ]* 3.2 Integration tests (optional for MVP)
-    - {{Describe end-to-end flow to verify}}
-    - _Requirements: {{REQ_NUMBER}}_
+## Steps
+- [ ] 1. {{Actionable step with exact file/path/contract}}
+  - {{Business intent: what user/system behavior this enables}}
+  - {{Code detail: schema/API/component/function/route and validation rules}}
+  - _Requirements: {{REQ_NUMBER}}.{{X}}_
+- [ ] 2. {{Next actionable step}}
+  - {{Business intent}}
+  - {{Code detail, edge case, or integration contract}}
+  - _Requirements: {{REQ_NUMBER}}.{{Y}}_
+- [ ] 3. Verification implementation
+  - {{Unit/integration/e2e test or explicit manual verification hook}}
+  - _Requirements: {{REQ_NUMBER}}_
+## Requirements
+- {{REQ_NUMBER}}.{{X}} — {{Acceptance criterion or requirement covered}}
+- {{REQ_NUMBER}}.{{Y}} — {{Acceptance criterion or requirement covered}}
 ## Related Files
@@ -57,17 +53,31 @@
 - [ ] {{Criteria 1 — observable output or artifact, maps to acceptance criteria R{{REQ_NUMBER}}}}
 - [ ] {{Criteria 2 — measurable behavior or negative-path outcome}}
 - [ ] {{Criteria 3 — maps directly to acceptance criteria from requirements.md and can be proven below}}
+- [ ] {{Criteria 4 — no orphaned component/service/route/command; created runtime-facing work is reachable from the declared entrypoint or explicitly deferred to a named integration task}}
+## Evidence
-## Task Test Plan & Verification Evidence
+This section is both the task-level test plan and the proof checklist. Keep it short, exact, and executable.
+Select the proof by task risk; do not run every test type for every task.
-This section is the task-level test plan. It names the exact commands, observable runtime/artifact proof, and negative-path checks required before this task can be marked done.
+- Logic/data/validator task: include unit tests.
+- Stateful UI/component task: include component or integration tests.
+- Cross-module/API/state flow task: include integration tests.
+- User-facing end-to-end workflow: include E2E/UI flow verification.
+- Layout/theme/responsive task: include visual/runtime viewport checks.
+- Interactive UI task: include accessibility checks when keyboard, focus, labels, or ARIA can regress.
+- Scaffold/release task: include smoke build/test/dev-server checks.
+- Performance/security checks are required only when the requirement, risk, or touched surface calls for them.
-- [ ] Automated verification
+- [ ] Automated verification (unit/component/integration/E2E as applicable)
   - Command(s): `{{TYPECHECK / TEST / BUILD COMMANDS OR N/A}}`
   - Expected proof: {{What output, exit code, or report proves success}}
 - [ ] Artifact / runtime verification
   - Inspect: `{{artifact path | route | UI state | DB object | manifest entry}}`
   - Expect: {{Observable result that proves the task is really wired}}
+- [ ] Runtime reachability verification
+  - Entrypoint/caller: `{{App.tsx | route file | CLI command | worker registration | manifest | API consumer}}`
+  - Expect: {{Created component/service/route/worker/loader is imported, mounted, registered, or invoked from the runtime path; if deferred, name the later integration task}}
 - [ ] Contract / negative-path verification
   - Check: {{Unauthorized path, validation error, permission omission, missing env behavior, deletion effect, etc.}}
   - Expect: {{Concrete failure mode or contract-preserving behavior}}
@@ -84,4 +94,4 @@ This section is the task-level test plan. It names the exact commands, observabl
 > **Parallel marker**: Append `(P)` to the title if this task can run concurrently with another (usually when serving different requirements).
 > **Test note**: If a test coverage sub-task can be deferred post-MVP, mark it with `- [ ]*`.
 > **Requirement mapping**: Every sub-task MUST end with `_Requirements: X.X_`. No mapping = invalid task file.
-> **Verification rule**: No `## Task Test Plan & Verification Evidence` section = invalid task file. Existing specs may use legacy `## Verification & Evidence`; agents must support both headings.
+> **Evidence rule**: No `## Evidence` section = invalid task file. Existing specs may use `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`; agents must support all three headings.

package/src/claude/skills/sync/SKILL.md CHANGED Viewed

@@ -34,8 +34,8 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
 1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
 2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
-3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Task Test Plan & Verification Evidence` checkboxes that have actual proof. Legacy `Verification & Evidence` sections are supported.
-4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
+3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Steps` / `## Implementation Steps` and relevant `Completion Criteria` / `Evidence` checkboxes that have actual proof. `Task Test Plan & Verification Evidence` and legacy `Verification & Evidence` sections are supported.
+4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Evidence`, `## Task Test Plan & Verification Evidence`, or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
 5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
 6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.

package/src/claude/skills/sync/references/sync-protocols.md CHANGED Viewed

@@ -15,7 +15,7 @@ When requested to update a phase or change task configuration, `spec.json` must
     - full relative path like `tasks/task-R0-02-extension-shell.md`
 *   **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
 *   **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
-*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
+*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Evidence`, `## Task Test Plan & Verification Evidence`, or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
 *   **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
 *   **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
 *   **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
@@ -27,12 +27,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
 ### A. Completing a Task
 When `/hapo:sync <feature> <task-id> done`:
 1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
-2. Inspect `## Task Test Plan & Verification Evidence` first. If the task uses legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
+2. Inspect `## Evidence` first. If the task uses `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
 3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
 4. Replace with: `**Status:** done`.
-5. Locate block: `## Implementation Steps`.
+5. Locate block: `## Steps` or `## Implementation Steps`.
 6. Convert `- [ ]` into `- [x]` strictly within that section.
-7. Update relevant checkboxes in `## Completion Criteria` and `## Task Test Plan & Verification Evidence` only when the caller provides or the file already contains real proof. For legacy task files, update `## Verification & Evidence` instead.
+7. Update relevant checkboxes in `## Completion Criteria` and `## Evidence` only when the caller provides or the file already contains real proof. For v0.8 or legacy task files, update `## Task Test Plan & Verification Evidence` or `## Verification & Evidence` instead.
 8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
 ### B. Blocking a Task
@@ -59,7 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
    - Missing disk file referenced in registry → remove or flag it
    - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
    - Registry says `done` but markdown still pending → update markdown only if evidence exists
-   - Either side says `done` but `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
+   - Either side says `done` but `## Evidence` / `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
    - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
 5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
 6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.

package/src/claude/skills/test/SKILL.md CHANGED Viewed

@@ -18,6 +18,8 @@ Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same
 /hapo:test                    # Blast-radius mode: only tests affected by recent changes
 /hapo:test --full             # Run full test suite regardless of changes
 /hapo:test <scope>            # Test a specific module or path
+/hapo:test <feature-name>     # Spec-aware test: load specs/<feature-name> and verify scope/task evidence
+/hapo:test specs/<feature>    # Spec-aware test by spec directory
 /hapo:test --ui <url>         # UI verification via chrome-devtools (public pages)
 /hapo:test --ui-auth <url>    # UI verification with auth injection (protected pages)
 /hapo:test --ui-flow <url>    # UI testing with User Journey (form fill/submit simulation)
@@ -31,6 +33,13 @@ If a test command exits 0 but runs 0 tests, report NO_TESTS — this is a green
 If tests fail, list every failure explicitly — do not summarize failures away.
 </HARD-GATE>
+<SCOPE-GATE>
+When a feature name or `specs/<feature>` path is supplied, testing is spec-aware.
+Load `spec.json`, `requirements.md`, `design.md`, active/recent task files, and task `Evidence` / test-plan proof.
+The verdict MUST compare executed/reachable behavior against `scope_lock`, requirements, design contracts, task Completion Criteria, and runtime reachability obligations.
+Build/typecheck success without scoped runtime proof is not PASS.
+</SCOPE-GATE>
 ## 4-Phase Execution
 ```mermaid
@@ -61,6 +70,12 @@ Auto-detect the test runner from project files:
 Unless `--full` is specified: apply **Blast Radius scoping** to run only tests
 affected by recent file changes. See `references/execution-strategy.md` Phase A.
+If the argument resolves to `specs/<feature>` or a feature directory under `specs/`, enter **Spec-Aware Mode**:
+- Load `spec.json`, `requirements.md`, `design.md`, and task files referenced by `task_registry`
+- Identify tasks marked `done`, `in_progress`, or recently changed
+- Extract exact commands, runtime/artifact proof, runtime reachability proof, and negative-path checks
+- Scope test selection by affected task files, but do not skip any mandatory task evidence
 ### Phase 2 — Execute
 **Code testing (default):**
@@ -68,6 +83,16 @@ affected by recent file changes. See `references/execution-strategy.md` Phase A.
 2. Execute test command with coverage flags
 3. Collect test counts, coverage percentages, and fail stack traces
 4. Treat 0 executed tests as `NO_TESTS`, even if the command exits 0
+5. In Spec-Aware Mode, inspect runtime reachability from declared entrypoints/callers and fail if scoped surfaces are missing or orphaned
+**Spec-aware test type escalation:**
+- Unit tests are mandatory when task evidence covers pure logic, transforms, validators, sorting/filtering, or regressions.
+- Component/integration tests are expected when task evidence covers stateful UI, context/store wiring, API/service boundaries, or persistence.
+- E2E/UI flow tests are expected once a complete user-facing workflow exists, not for isolated foundation tasks.
+- Visual/responsive checks are expected for layout, theme, dashboard, and style tasks.
+- Accessibility checks are expected for interactive UI surfaces where focus, roles, labels, keyboard navigation, or ARIA can regress.
+- Smoke checks are enough for scaffold/config tasks unless the task requires deeper proof.
+- Performance/security checks are only mandatory when the requirement, design risk, or touched runtime boundary calls for them.
 **UI verification (`--ui` / `--ui-auth` / `--ui-flow`):**
 Execute multi-page discovery, then spawn **Parallel UI Subagents** (test-runner instances) to handle Smoke, Core-Vitals, Accessibility, SEO, Security, and User Flows simultaneously.
@@ -76,7 +101,7 @@ See `references/execution-strategy.md` Phase C for full phase breakdown.
 Delegate execution to `test-runner` agent:
 ```
 Agent(subagent_type="test-runner",
-  prompt="Run tests. Scope: [blast-radius|full|ui]. Target: [path|url]. Return structured verdict.",
+  prompt="Run tests. Scope: [blast-radius|full|ui|spec-aware]. Target: [path|url|feature]. Load specs when target is a feature. Return structured verdict with scope/spec coverage and runtime reachability.",
   description="Test [feature]")
 ```
@@ -111,6 +136,12 @@ Return a **structured verdict** (required format — not free-form prose):
 - Accessibility issues: N found | none
 - Screenshots: [paths]
+### Scope / Spec Coverage (if feature scope)
+- Requirements covered: N/N
+- Task evidence checks: PASS | FAIL | UNVERIFIED
+- Runtime reachability: PASS | FAIL | UNVERIFIED
+- Out-of-scope behavior detected: none | [list]
 ### Test Regression Check
 - **Comparison:** Compare current test count and assertion depth against previous runs.
 - **Result:** OK | REGRESSION (tests deleted/weakened)