npm - @haposoft/cafekit - Versions diffs - 0.7.28 → 0.8.0 - Mend

@haposoft/cafekit 0.7.28 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +21 -11
package/package.json +4 -1
package/src/claude/CLAUDE.md +81 -135
package/src/claude/agents/brainstormer.md +24 -13
package/src/claude/agents/code-auditor.md +1 -1
package/src/claude/agents/spec-maker.md +2 -2
package/src/claude/agents/test-runner.md +10 -8
package/src/claude/rules/ai-dev-rules.md +36 -51
package/src/claude/rules/hook-protocols.md +35 -0
package/src/claude/rules/orchestrator.md +11 -0
package/src/claude/rules/workflow.md +41 -45
package/src/claude/skills/brainstorm/SKILL.md +123 -39
package/src/claude/skills/chrome-devtools/scripts/package.json +3 -1
package/src/claude/skills/code-review/references/spec-compliance-review.md +1 -1
package/src/claude/skills/develop/SKILL.md +7 -7
package/src/claude/skills/develop/references/quality-gate.md +2 -2
package/src/claude/skills/develop/references/subagent-patterns.md +1 -1
package/src/claude/skills/git/SKILL.md +19 -2
package/src/claude/skills/git/references/finish-branch.md +61 -0
package/src/claude/skills/pdf/scripts/__pycache__/check_bounding_boxes.cpython-314.pyc +0 -0
package/src/claude/skills/specs/SKILL.md +15 -6
package/src/claude/skills/specs/references/review.md +1 -1
package/src/claude/skills/specs/rules/tasks-generation.md +3 -3
package/src/claude/skills/specs/templates/task.md +4 -2
package/src/claude/skills/sync/SKILL.md +2 -2
package/src/claude/skills/sync/references/sync-protocols.md +4 -4
package/src/claude/skills/test/SKILL.md +4 -1
package/src/claude/skills/test/references/execution-strategy.md +3 -1
package/src/claude/skills/test/references/test-memory.md +2 -2

package/src/claude/skills/specs/SKILL.md CHANGED Viewed

@@ -85,7 +85,8 @@ Display selection menu via `AskUserQuestion`:
 ### When called WITH a feature description
 System auto-analyzes the description:
-- If description is too short (< 20 words) or vague → stop and ask 1-2 clarifying questions
+- If description is too short (< 20 words) or missing one concrete detail → stop and ask 1-2 clarifying questions
+- If the idea has unresolved architecture choices, unclear acceptance criteria, unclear scope boundaries, or multiple plausible approaches → stop and route to `/hapo:brainstorm <idea>` before creating spec artifacts
 - If task is simple (small bugfix, config change) → suggest "A spec may not be needed for this. Continue anyway?"
 - If task is complex (multi-module, security/migration related) → auto-activate deep research, ask user 3 scope questions
@@ -111,7 +112,9 @@ flowchart TD
     A["Call /hapo:specs"] --> B{Has description?}
     B -->|No| C["Menu: init / status / resume / --validate / archive"]
     B -->|Yes| D["Step 1: Analyze description"]
-    D --> E{Clear enough?}
+    D --> DB{"Needs pre-spec brainstorm?"}
+    DB -->|Yes| DB2["Stop: run /hapo:brainstorm with same idea"]
+    DB -->|No| E{Clear enough?}
     E -->|No| F["Ask user 1-2 clarifying questions"]
     F --> D
     E -->|Yes| G["Step 2: Scan specs/ for related specs"]
@@ -148,6 +151,12 @@ flowchart TD
 ### Step 1: Analyze Description
 - Assess clarity and complexity of the description
+- Route to `hapo:brainstorm` before creating files when:
+  - the expected output or acceptance criteria are not concrete
+  - the scope boundary is unknown
+  - the request has 2-3 viable architectures and no user-approved direction
+  - the feature spans 3+ independent subsystems and needs decomposition
+  - the user is explicitly asking to explore, compare, debate, or decide
 - **Multimodal & Document Auto-Ingestion (MANDATORY)**: If the input includes file paths or URLs pointing to images, audio, video, or Office documents, you MUST spawn the matching subagent to extract content BEFORE proceeding:
   - `.mp3`, `.wav`, `.mp4`, `.mov`, `.jpg`, `.png`, `.webp` → `Task(subagent_type="hapo:ai-multimodal", prompt="Transcribe/Analyze [path]")`
   - `.pdf` → `Task(subagent_type="hapo:pdf", prompt="Extract text and tables from [path]")`
@@ -218,7 +227,7 @@ Load: `references/scope-inquiry.md`
 - Load `rules/tasks-generation.md` for core principles
 - Load `rules/tasks-parallel-analysis.md` for parallel markers (default: enabled)
 - Each task file follows template `templates/task.md`
-- Each task file MUST include `Completion Criteria` and `Verification & Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done.
+- Each task file MUST include `Completion Criteria` and `Task Test Plan & Verification Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done.
 - Build `spec.json.task_registry` alongside `task_files`. For each task file, register at minimum:
   - `id`
   - `title`
@@ -311,8 +320,8 @@ Load: `references/review.md` + `rules/design-review.md`
 - FAIL if any task file exists on disk but is missing from `task_registry`
 - FAIL if any path in `task_registry` does not exist on disk
 - FAIL if any requirement or NFR mapping uses non-numeric labels (`NFR-1`, `SEC-1`, etc.)
-- FAIL if a task lacks `Completion Criteria` or `Verification & Evidence`
-- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Verification & Evidence`, canonical contracts, or requirements text).
+- FAIL if a task lacks `Completion Criteria` or `Task Test Plan & Verification Evidence` (legacy `Verification & Evidence` is accepted only for pre-existing task files)
+- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`, canonical contracts, or requirements text).
 - FAIL if the spec scope/provider was switched away from Anthropic/Claude but `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider-specific strings such as `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` is the only allowed place for historical cost comparisons.
 - FAIL if privacy/delete-data work lacks a single canonical deletion policy. The design MUST explicitly choose either:
   1. hard-delete with no re-registration lock, or
@@ -446,7 +455,7 @@ Before finalizing any specification, assert all the following:
 - [ ] **Requirements traceability** matrix present in design.md
 - [ ] **Canonical Contracts & Invariants** filled for auth/transport/persistence/artifact-sensitive work
 - [ ] **Every task file** maps to at least 1 valid in-scope requirement ID
-- [ ] **Every task file** includes `Verification & Evidence` with executable or inspectable proof
+- [ ] **Every task file** includes `Task Test Plan & Verification Evidence` with executable or inspectable proof
 - [ ] **State Machine Blueprint:** design.md contains Mermaid diagrams for non-trivial flows
 - [ ] **Dependency graph complete**: no task can start before its blockers are listed
 - [ ] **Risk matrix filled**: likelihood × impact, with mitigation for High items

package/src/claude/skills/specs/references/review.md CHANGED Viewed

@@ -42,7 +42,7 @@ These rules override any self-reasoning or optimization the system may attempt:
 4. **Apply YAGNI to fixes.** When user says "configure later" or "decide later", add a single note to the task file. Do NOT generate multiple concrete implementations (e.g., 4 provider files when user only asked for abstraction).
 5. **No false completion.** You MUST NOT set `validation.status = "completed"` or `ready_for_implementation = true` until a reconciliation audit proves the accepted findings and validation decisions are reflected in the physical spec artifacts.
 6. **Provider drift is a real defect.** If the scope changed away from Claude/Anthropic, stale strings like `Claude API`, `Haiku`, or `haiku_reachable` in `requirements.md`, `design.md`, or `tasks/*.md` are validation failures. `research.md` may mention them only as historical comparison.
-7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `Verification & Evidence`.
+7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `Task Test Plan & Verification Evidence`.
 ---

package/src/claude/skills/specs/rules/tasks-generation.md CHANGED Viewed

@@ -136,11 +136,11 @@ Every task file MUST contain the Risk Assessment table, even if no risks are ide
 - Never mark implementation work or integration-critical verification as optional—reserve `*` for auxiliary/deferrable test coverage that can be revisited post-MVP.
 - Never mark auth, permissions, privacy, data deletion, migration, schema, or contract verification work as optional.
-### Mandatory Verification & Evidence
+### Mandatory Task Test Plan & Verification Evidence
-Every task file MUST include a `## Verification & Evidence` section.
+Every new task file MUST include a `## Task Test Plan & Verification Evidence` section. Existing specs may still use the legacy `## Verification & Evidence` heading; readers and sync tools must support both.
-That section MUST contain:
+That section is the task-level test plan and MUST contain:
 1. **Automated proof** — exact command(s) for typecheck, tests, build, or explicit `N/A`
 2. **Artifact/runtime proof** — exact files, routes, UI surfaces, generated outputs, or persisted state to inspect
 3. **Contract/negative-path proof** — at least one contract-preserving check for unauthorized, invalid, missing-permission, rollback, or failure-path behavior when relevant

package/src/claude/skills/specs/templates/task.md CHANGED Viewed

@@ -58,7 +58,9 @@
 - [ ] {{Criteria 2 — measurable behavior or negative-path outcome}}
 - [ ] {{Criteria 3 — maps directly to acceptance criteria from requirements.md and can be proven below}}
-## Verification & Evidence
+## Task Test Plan & Verification Evidence
+This section is the task-level test plan. It names the exact commands, observable runtime/artifact proof, and negative-path checks required before this task can be marked done.
 - [ ] Automated verification
   - Command(s): `{{TYPECHECK / TEST / BUILD COMMANDS OR N/A}}`
@@ -82,4 +84,4 @@
 > **Parallel marker**: Append `(P)` to the title if this task can run concurrently with another (usually when serving different requirements).
 > **Test note**: If a test coverage sub-task can be deferred post-MVP, mark it with `- [ ]*`.
 > **Requirement mapping**: Every sub-task MUST end with `_Requirements: X.X_`. No mapping = invalid task file.
-> **Verification rule**: No `## Verification & Evidence` section = invalid task file.
+> **Verification rule**: No `## Task Test Plan & Verification Evidence` section = invalid task file. Existing specs may use legacy `## Verification & Evidence`; agents must support both headings.

package/src/claude/skills/sync/SKILL.md CHANGED Viewed

@@ -34,8 +34,8 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
 1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
 2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
-3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Verification & Evidence` checkboxes that have actual proof.
-4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
+3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Task Test Plan & Verification Evidence` checkboxes that have actual proof. Legacy `Verification & Evidence` sections are supported.
+4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
 5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
 6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.

package/src/claude/skills/sync/references/sync-protocols.md CHANGED Viewed

@@ -15,7 +15,7 @@ When requested to update a phase or change task configuration, `spec.json` must
     - full relative path like `tasks/task-R0-02-extension-shell.md`
 *   **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
 *   **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
-*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
+*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
 *   **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
 *   **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
 *   **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
@@ -27,12 +27,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
 ### A. Completing a Task
 When `/hapo:sync <feature> <task-id> done`:
 1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
-2. Inspect `## Verification & Evidence` first. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
+2. Inspect `## Task Test Plan & Verification Evidence` first. If the task uses legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
 3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
 4. Replace with: `**Status:** done`.
 5. Locate block: `## Implementation Steps`.
 6. Convert `- [ ]` into `- [x]` strictly within that section.
-7. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
+7. Update relevant checkboxes in `## Completion Criteria` and `## Task Test Plan & Verification Evidence` only when the caller provides or the file already contains real proof. For legacy task files, update `## Verification & Evidence` instead.
 8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
 ### B. Blocking a Task
@@ -59,7 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
    - Missing disk file referenced in registry → remove or flag it
    - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
    - Registry says `done` but markdown still pending → update markdown only if evidence exists
-   - Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
+   - Either side says `done` but `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
    - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
 5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
 6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.

package/src/claude/skills/test/SKILL.md CHANGED Viewed

@@ -27,6 +27,7 @@ Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same
 NEVER claim tests pass when they were NOT actually executed.
 NEVER mock, stub, or skip a failing test to produce a green result.
 If no test command is detected, report NO_TESTS — do not fabricate results.
+If a test command exits 0 but runs 0 tests, report NO_TESTS — this is a green lie, not a PASS.
 If tests fail, list every failure explicitly — do not summarize failures away.
 </HARD-GATE>
@@ -65,7 +66,8 @@ affected by recent file changes. See `references/execution-strategy.md` Phase A.
 **Code testing (default):**
 1. Pre-flight: run typecheck/lint to catch compile errors first
 2. Execute test command with coverage flags
-3. Collect results, coverage percentages, and fail stack traces
+3. Collect test counts, coverage percentages, and fail stack traces
+4. Treat 0 executed tests as `NO_TESTS`, even if the command exits 0
 **UI verification (`--ui` / `--ui-auth` / `--ui-flow`):**
 Execute multi-page discovery, then spawn **Parallel UI Subagents** (test-runner instances) to handle Smoke, Core-Vitals, Accessibility, SEO, Security, and User Flows simultaneously.
@@ -146,6 +148,7 @@ It merges the JSON data into `.hapo/test-memory.json` per `references/test-memor
 - `references/execution-strategy.md` — Blast-radius algorithm, auto-detect logic, UI verification phases (A–E)
 - `references/failure-triage.md` — Failure categories, triage decision tree, escalation rules
+- `references/test-memory.md` — `.hapo/test-memory.json` schema and merge rules
 ## Related

package/src/claude/skills/test/references/execution-strategy.md CHANGED Viewed

@@ -84,6 +84,9 @@ cargo test
 flutter test --coverage
 ```
+After each command, parse the runner output for executed test count. A successful
+exit with 0 executed tests is `NO_TESTS`, not `PASS`.
 ### Coverage Thresholds
 | Metric    | Minimum | Focus Areas                        |
@@ -360,4 +363,3 @@ Flag as `Security Warning` if:
 - API keys, secrets, or JWT tokens visible in page HTML
 - Mixed content (HTTP resources on HTTPS page) detected via network audit
 - `autocomplete="off"` missing on password fields

package/src/claude/skills/test/references/test-memory.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Test Memory
-The `packages/spec/src/claude/test-memory.json` file serves as the Long-Term Memory for the testing ecosystem.
+The `.hapo/test-memory.json` file in the target project serves as the Long-Term Memory for the testing ecosystem.
 ## Schema
@@ -37,4 +37,4 @@ Example:
 </lessons_learned>
 ```
-The orchestrating `hapo:test` skill (Phase 4) then intercepts this block and automatically merges it into `packages/spec/src/claude/test-memory.json`.
+The orchestrating `hapo:test` skill (Phase 4) then intercepts this block and automatically merges it into `.hapo/test-memory.json`.