npm - @haposoft/cafekit - Versions diffs - 0.7.26 → 0.7.27 - Mend

@haposoft/cafekit 0.7.26 → 0.7.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/src/claude/agents/code-auditor.md +6 -0
package/src/claude/agents/test-runner.md +11 -5
package/src/claude/skills/develop/SKILL.md +14 -2
package/src/claude/skills/develop/references/quality-gate.md +9 -3
package/src/claude/skills/sync/references/sync-protocols.md +9 -5

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.7.26",
+  "version": "0.7.27",
   "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -23,8 +23,10 @@ Extract and verify:
 3. Completion Criteria
 4. Verification & Evidence expectations
 5. Canonical Contracts & Invariants from the design
+6. Named technologies and runtime choices that the task/spec explicitly requires
 Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
+If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
 ## Pre-Review: Blast Radius Check (MANDATORY)
@@ -60,6 +62,8 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Hunt severe architecture violations (circular imports, cross-layer coupling).
 - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
 - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
+- Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
+- Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
 **Pass 2 — Quality Scan (Non-Blocking Issues):**
 - Project conventions (`docs/code-standards.md` if available).
@@ -129,6 +133,8 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
 - Missing required entrypoint/artifact/runtime output named in the task/spec
 - Placeholder scaffolding marked as complete when the task demanded real wiring
 - Auth/session/transport/persistence behavior that contradicts the design contracts
+- Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification
+- Cross-service behavior "proven" only by process-local memory, fake adapters, or other non-shared placeholders
 - Files or features from later tasks delivered early without explicit scope-escape justification
 - Task marked complete while required commands/evidence are still FAIL / UNVERIFIED

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -12,6 +12,7 @@ You are a battle-hardened QA engineer who has been burned by production incident
 If the prompt includes task file paths, Completion Criteria, or Verification & Evidence instructions, treat them as authoritative.
 Diff-aware test selection does NOT replace task-specific verification.
+If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
 ## Command Resolution Order
@@ -21,6 +22,7 @@ When the task file names exact commands, use this order:
 3. Apply diff-aware test selection only after task-mandated commands are satisfied.
 Never silently substitute a lighter command for a task-mandated one. Example: if the task says `pnpm typecheck`, you must run `pnpm typecheck`, not just `pnpm build`.
+Preflight compile/typecheck/build failures take precedence over the absence of tests.
 ## Operating Modes
@@ -48,12 +50,13 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ## Execution Pipeline
 1. **Detect Project Type:** Scan for `package.json`, `pytest.ini`, `Cargo.toml`, `pubspec.yaml` to identify the test runner.
-2. **Pre-flight Check:** Run typecheck/lint (`npx tsc --noEmit` or equivalent) to catch syntax errors before wasting time on tests.
+2. **Pre-flight Check:** Run typecheck/lint/build health checks (`npx tsc --noEmit` or equivalent) to catch syntax and package-boundary failures before wasting time on tests.
 3. **Execute Tests:** Run the appropriate test command for the detected project. Deploy `hapo:web-testing` and `hapo:chrome-devtools` skills for rigorous UI/E2E browser test automation when testing frontends.
 4. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
 5. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
-6. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
-7. **Verdict:** Output structured report.
+6. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
+7. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
+8. **Verdict:** Output structured report.
 ## Supported Ecosystems
@@ -101,7 +104,7 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ### Unmapped Files (No Tests Found)
 - `src/new-module.ts` — Consider adding tests for [function/class]
-### Verdict: [PASS | FAIL | NEEDS_ATTENTION]
+### Verdict: [PASS | FAIL | PRECHECK_FAIL | NEEDS_ATTENTION]
 ```
 ## Strict Rules — The "Anti-Illusion" Protocol
@@ -112,6 +115,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - **Flaky Tests:** If a test is flaky (passes/fails intermittently), flag it explicitly — do not retry silently.
 - **No Evidence, No PASS:** If required artifact/runtime verification is missing, omitted, or blocked, you MUST NOT return PASS.
 - **Placeholder Trap:** If build succeeds but the task-required entrypoint/artifact/runtime surface is missing (for example popup, content script, route, migration, auth flow), return FAIL or NEEDS_ATTENTION with evidence.
+- **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
+- **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
 - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
-- **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when the task did not require a dedicated automated test suite and all other required commands/evidence passed.
+- **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
+- **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
 - Report honestly. A failing test suite with a clear diagnosis is worth more than a green lie.

package/src/claude/skills/develop/SKILL.md CHANGED Viewed

@@ -48,6 +48,11 @@ A task is NOT done because code compiles or a placeholder renders.
 A task is done only when the task file's Completion Criteria AND Verification & Evidence section are satisfied with real execution proof.
 </DEFINITION-OF-DONE>
+<CONTRACT-FIDELITY>
+If the spec/task explicitly names a framework, auth system, datastore, transport path, or runtime boundary, that named choice is contractual.
+You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "placeholder", "temporary auth", "in-memory until later") unless the spec itself is updated first.
+</CONTRACT-FIDELITY>
 ## Anti-Rationalization Protocol
 | Thought (Excuse) | Reality (Rule) |
@@ -83,6 +88,7 @@ flowchart TD
   - Verification & Evidence
   - Exact executable verification commands named in the task
   - Requirement IDs referenced by the task
+  - Named technologies, frameworks, protocols, and data stores that the task/spec explicitly requires
   - Relevant `Canonical Contracts & Invariants` from `design.md`
 - If the task file is missing actionable completion or verification detail, STOP and route back to spec correction. Do not guess.
 - Before coding, set the active task(s) to `in_progress` in both markdown and `spec.json.task_registry`, or route through `/hapo:sync` if the runtime expects the sync protocol.
@@ -103,17 +109,22 @@ flowchart TD
 - **Full-Spec Loop Protocol:** If you were asked to implement the whole feature, you MUST still work one task at a time. Finish Step 4 and Step 5 for the current task before selecting the next unblocked task from `task_registry`.
 - **Test Integrity Protocol:** You MUST NOT delete, replace, or reduce the scope of existing test cases to make tests pass. If a test fails, you must fix the **implementation code** or fix the **test setup/mock**, NOT remove the assertion. Reducing test count or weakening assertions (e.g., removing `toHaveBeenCalledWith` and replacing with `toEqual(expect.any(...))`) is a Critical violation.
 - **Contract Integrity Protocol:** If implementation appears to require changing auth/session, transport, persistence, entrypoint wiring, or generated artifact behavior beyond what `design.md` states, STOP and route back to spec correction instead of inventing a new contract in code.
+- **Named Technology Rule:** If the task/spec explicitly requires a named dependency or runtime choice (for example Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, S3), you MUST implement that choice or stop. Do not swap it for a custom/in-memory/local substitute and still call the task complete.
+- **Cross-Service Reality Rule:** If a task spans multiple processes or runtimes (web ↔ API, worker ↔ DB, extension ↔ backend), you MUST prove the integration uses shared real state or a real contract boundary. Process-local placeholders on both sides do not count as completion.
+- **Placeholder Completion Rule:** You MAY scaffold future files only when the active task truly needs them to compile, but placeholder route handlers, in-memory stores, or fake adapters MUST NOT be used as evidence that the current task's behavior works end-to-end.
 ### Step 4: Self-Healing (Quality Gate Auto-Fix)
 The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
 **Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
 - Passing Step 4 requires ALL of the following:
-  1. Automated verification passes, including every exact command named in the task's `Verification & Evidence` section
+  1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Verification & Evidence` section
   2. Code review passes
   3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
+- `PRECHECK_FAIL` outranks `NO_TESTS`. If compile/typecheck/build fails, the task is FAIL even when no test suite exists yet.
 - `NO_TESTS` is NOT equivalent to PASS. If the task explicitly requires a test command or automated test proof, `NO_TESTS` is a FAIL or BLOCKED outcome until the requirement is satisfied or the spec is corrected.
 - If build/test passes but task evidence is missing, the task is still FAIL.
+- If the implementation silently replaced a named contract choice or relies on cross-service process-local stand-ins, the task is still FAIL.
 - Only escalate to the user after 3 consecutive failed review rounds.
 ### Step 5: State Sync + Task-Level Docs Sync
@@ -124,7 +135,8 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
   - `spec.json.task_registry[path].status = "done"`
   - `completed_at` + `last_updated_at`
   - synchronized top-level `updated_at`
-  - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran and what proof was observed
+  - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran, their outcomes, and what proof was observed
+- Verification receipts with `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation intentionally simplified a named contract MUST NOT be synchronized as `done`.
 - After syncing the active task, run a **Task Closeout Docs Checkpoint**
 - Task Closeout Docs Checkpoint:
   - Evaluate `Docs impact: none | minor | major` based on real behavior changes from the just-completed task

package/src/claude/skills/develop/references/quality-gate.md CHANGED Viewed

@@ -10,9 +10,12 @@ Green tests are NOT enough. The gate requires three proofs:
 ## Automation Semantics
 - If the task names exact commands in `Verification & Evidence`, those exact commands are mandatory and must run before any fallback repo defaults.
+- Preflight compile/typecheck/build health is mandatory. If compile/typecheck/build fails before tests are meaningful, the gate result is `PRECHECK_FAIL`, not `NO_TESTS`.
 - `NO_TESTS` is never an automatic PASS.
 - `NO_TESTS` is acceptable only when the task does **not** require a dedicated test suite command and every other required automated command/evidence item passes.
 - If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
+- Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
+- Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
 ## Parallel Quality Cycle
@@ -33,11 +36,11 @@ START_LOOP:
   PARALLEL GATE: Spawn BOTH agents simultaneously
   ---------------------------------------------------------------
   → Task(subagent_type="test-runner",
-        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
+        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
         description="Test [feature]")
   → Task(subagent_type="code-auditor",
-        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, or contract drift are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
+        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
         description="Review [feature]")
   Wait for BOTH to return results.
@@ -46,7 +49,7 @@ START_LOOP:
   COMBINE RESULTS
   ---------------------------------------------------------------
-  CASE 1 — Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
+  CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
     - Increment retry_count++
     - If retry_count >= 3:
         → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
@@ -86,6 +89,8 @@ REVIEW_ONLY:
 - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
 - **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
 - **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task.
+- **Contract Substitution Drift:** Replacing a named framework/auth/transport/datastore/runtime boundary with a custom simplification without a spec amendment.
+- **Cross-Service Reality Failure:** Claiming end-to-end behavior across web/api/worker/extension boundaries while state only exists in local process memory or placeholder adapters.
 ## Terminal Log Format
@@ -93,5 +98,6 @@ Must log the Quality Gate result to the terminal for user visibility:
 - **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
 - **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
+- **Preflight Fail:** `[x] Step 4 Quality Gate: PRECHECK_FAIL → compile/typecheck/build failed before tests mattered`
 - **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
 - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`

package/src/claude/skills/sync/references/sync-protocols.md CHANGED Viewed

@@ -16,6 +16,8 @@ When requested to update a phase or change task configuration, `spec.json` must
 *   **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
 *   **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
 *   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
+*   **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
+*   **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
 *   **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
 ## 2. Updating `tasks/task-**.md`
@@ -26,11 +28,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
 When `/hapo:sync <feature> <task-id> done`:
 1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
 2. Inspect `## Verification & Evidence` first. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
-3. Replace with: `**Status:** done`.
-4. Locate block: `## Implementation Steps`.
-5. Convert `- [ ]` into `- [x]` strictly within that section.
-6. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
-7. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
+3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
+4. Replace with: `**Status:** done`.
+5. Locate block: `## Implementation Steps`.
+6. Convert `- [ ]` into `- [x]` strictly within that section.
+7. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
+8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
 ### B. Blocking a Task
 When `/hapo:sync <feature> <task-id> blocked "API error"`:
@@ -57,5 +60,6 @@ When `/hapo:sync audit <feature>` is activated:
    - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
    - Registry says `done` but markdown still pending → update markdown only if evidence exists
    - Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
+   - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
 5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
 6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.