@haposoft/cafekit 0.7.26 → 0.7.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@haposoft/cafekit",
3
- "version": "0.7.26",
3
+ "version": "0.7.27",
4
4
  "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
5
5
  "author": "Haposoft <nghialt@haposoft.com>",
6
6
  "license": "MIT",
@@ -23,8 +23,10 @@ Extract and verify:
23
23
  3. Completion Criteria
24
24
  4. Verification & Evidence expectations
25
25
  5. Canonical Contracts & Invariants from the design
26
+ 6. Named technologies and runtime choices that the task/spec explicitly requires
26
27
 
27
28
  Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
29
+ If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
28
30
 
29
31
  ## Pre-Review: Blast Radius Check (MANDATORY)
30
32
 
@@ -60,6 +62,8 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
60
62
  - Hunt severe architecture violations (circular imports, cross-layer coupling).
61
63
  - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
62
64
  - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
65
+ - Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
66
+ - Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
63
67
 
64
68
  **Pass 2 — Quality Scan (Non-Blocking Issues):**
65
69
  - Project conventions (`docs/code-standards.md` if available).
@@ -129,6 +133,8 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
129
133
  - Missing required entrypoint/artifact/runtime output named in the task/spec
130
134
  - Placeholder scaffolding marked as complete when the task demanded real wiring
131
135
  - Auth/session/transport/persistence behavior that contradicts the design contracts
136
+ - Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification
137
+ - Cross-service behavior "proven" only by process-local memory, fake adapters, or other non-shared placeholders
132
138
  - Files or features from later tasks delivered early without explicit scope-escape justification
133
139
  - Task marked complete while required commands/evidence are still FAIL / UNVERIFIED
134
140
 
@@ -12,6 +12,7 @@ You are a battle-hardened QA engineer who has been burned by production incident
12
12
 
13
13
  If the prompt includes task file paths, Completion Criteria, or Verification & Evidence instructions, treat them as authoritative.
14
14
  Diff-aware test selection does NOT replace task-specific verification.
15
+ If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
15
16
 
16
17
  ## Command Resolution Order
17
18
 
@@ -21,6 +22,7 @@ When the task file names exact commands, use this order:
21
22
  3. Apply diff-aware test selection only after task-mandated commands are satisfied.
22
23
 
23
24
  Never silently substitute a lighter command for a task-mandated one. Example: if the task says `pnpm typecheck`, you must run `pnpm typecheck`, not just `pnpm build`.
25
+ Preflight compile/typecheck/build failures take precedence over the absence of tests.
24
26
 
25
27
  ## Operating Modes
26
28
 
@@ -48,12 +50,13 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
48
50
  ## Execution Pipeline
49
51
 
50
52
  1. **Detect Project Type:** Scan for `package.json`, `pytest.ini`, `Cargo.toml`, `pubspec.yaml` to identify the test runner.
51
- 2. **Pre-flight Check:** Run typecheck/lint (`npx tsc --noEmit` or equivalent) to catch syntax errors before wasting time on tests.
53
+ 2. **Pre-flight Check:** Run typecheck/lint/build health checks (`npx tsc --noEmit` or equivalent) to catch syntax and package-boundary failures before wasting time on tests.
52
54
  3. **Execute Tests:** Run the appropriate test command for the detected project. Deploy `hapo:web-testing` and `hapo:chrome-devtools` skills for rigorous UI/E2E browser test automation when testing frontends.
53
55
  4. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
54
56
  5. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
55
- 6. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
56
- 7. **Verdict:** Output structured report.
57
+ 6. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
58
+ 7. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
59
+ 8. **Verdict:** Output structured report.
57
60
 
58
61
  ## Supported Ecosystems
59
62
 
@@ -101,7 +104,7 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
101
104
  ### Unmapped Files (No Tests Found)
102
105
  - `src/new-module.ts` — Consider adding tests for [function/class]
103
106
 
104
- ### Verdict: [PASS | FAIL | NEEDS_ATTENTION]
107
+ ### Verdict: [PASS | FAIL | PRECHECK_FAIL | NEEDS_ATTENTION]
105
108
  ```
106
109
 
107
110
  ## Strict Rules — The "Anti-Illusion" Protocol
@@ -112,6 +115,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
112
115
  - **Flaky Tests:** If a test is flaky (passes/fails intermittently), flag it explicitly — do not retry silently.
113
116
  - **No Evidence, No PASS:** If required artifact/runtime verification is missing, omitted, or blocked, you MUST NOT return PASS.
114
117
  - **Placeholder Trap:** If build succeeds but the task-required entrypoint/artifact/runtime surface is missing (for example popup, content script, route, migration, auth flow), return FAIL or NEEDS_ATTENTION with evidence.
118
+ - **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
119
+ - **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
115
120
  - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
116
- - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when the task did not require a dedicated automated test suite and all other required commands/evidence passed.
121
+ - **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
122
+ - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
117
123
  - Report honestly. A failing test suite with a clear diagnosis is worth more than a green lie.
@@ -48,6 +48,11 @@ A task is NOT done because code compiles or a placeholder renders.
48
48
  A task is done only when the task file's Completion Criteria AND Verification & Evidence section are satisfied with real execution proof.
49
49
  </DEFINITION-OF-DONE>
50
50
 
51
+ <CONTRACT-FIDELITY>
52
+ If the spec/task explicitly names a framework, auth system, datastore, transport path, or runtime boundary, that named choice is contractual.
53
+ You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "placeholder", "temporary auth", "in-memory until later") unless the spec itself is updated first.
54
+ </CONTRACT-FIDELITY>
55
+
51
56
  ## Anti-Rationalization Protocol
52
57
 
53
58
  | Thought (Excuse) | Reality (Rule) |
@@ -83,6 +88,7 @@ flowchart TD
83
88
  - Verification & Evidence
84
89
  - Exact executable verification commands named in the task
85
90
  - Requirement IDs referenced by the task
91
+ - Named technologies, frameworks, protocols, and data stores that the task/spec explicitly requires
86
92
  - Relevant `Canonical Contracts & Invariants` from `design.md`
87
93
  - If the task file is missing actionable completion or verification detail, STOP and route back to spec correction. Do not guess.
88
94
  - Before coding, set the active task(s) to `in_progress` in both markdown and `spec.json.task_registry`, or route through `/hapo:sync` if the runtime expects the sync protocol.
@@ -103,17 +109,22 @@ flowchart TD
103
109
  - **Full-Spec Loop Protocol:** If you were asked to implement the whole feature, you MUST still work one task at a time. Finish Step 4 and Step 5 for the current task before selecting the next unblocked task from `task_registry`.
104
110
  - **Test Integrity Protocol:** You MUST NOT delete, replace, or reduce the scope of existing test cases to make tests pass. If a test fails, you must fix the **implementation code** or fix the **test setup/mock**, NOT remove the assertion. Reducing test count or weakening assertions (e.g., removing `toHaveBeenCalledWith` and replacing with `toEqual(expect.any(...))`) is a Critical violation.
105
111
  - **Contract Integrity Protocol:** If implementation appears to require changing auth/session, transport, persistence, entrypoint wiring, or generated artifact behavior beyond what `design.md` states, STOP and route back to spec correction instead of inventing a new contract in code.
112
+ - **Named Technology Rule:** If the task/spec explicitly requires a named dependency or runtime choice (for example Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, S3), you MUST implement that choice or stop. Do not swap it for a custom/in-memory/local substitute and still call the task complete.
113
+ - **Cross-Service Reality Rule:** If a task spans multiple processes or runtimes (web ↔ API, worker ↔ DB, extension ↔ backend), you MUST prove the integration uses shared real state or a real contract boundary. Process-local placeholders on both sides do not count as completion.
114
+ - **Placeholder Completion Rule:** You MAY scaffold future files only when the active task truly needs them to compile, but placeholder route handlers, in-memory stores, or fake adapters MUST NOT be used as evidence that the current task's behavior works end-to-end.
106
115
 
107
116
  ### Step 4: Self-Healing (Quality Gate Auto-Fix)
108
117
  The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
109
118
  **Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
110
119
 
111
120
  - Passing Step 4 requires ALL of the following:
112
- 1. Automated verification passes, including every exact command named in the task's `Verification & Evidence` section
121
+ 1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Verification & Evidence` section
113
122
  2. Code review passes
114
123
  3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
124
+ - `PRECHECK_FAIL` outranks `NO_TESTS`. If compile/typecheck/build fails, the task is FAIL even when no test suite exists yet.
115
125
  - `NO_TESTS` is NOT equivalent to PASS. If the task explicitly requires a test command or automated test proof, `NO_TESTS` is a FAIL or BLOCKED outcome until the requirement is satisfied or the spec is corrected.
116
126
  - If build/test passes but task evidence is missing, the task is still FAIL.
127
+ - If the implementation silently replaced a named contract choice or relies on cross-service process-local stand-ins, the task is still FAIL.
117
128
  - Only escalate to the user after 3 consecutive failed review rounds.
118
129
 
119
130
  ### Step 5: State Sync + Task-Level Docs Sync
@@ -124,7 +135,8 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
124
135
  - `spec.json.task_registry[path].status = "done"`
125
136
  - `completed_at` + `last_updated_at`
126
137
  - synchronized top-level `updated_at`
127
- - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran and what proof was observed
138
+ - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran, their outcomes, and what proof was observed
139
+ - Verification receipts with `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation intentionally simplified a named contract MUST NOT be synchronized as `done`.
128
140
  - After syncing the active task, run a **Task Closeout Docs Checkpoint**
129
141
  - Task Closeout Docs Checkpoint:
130
142
  - Evaluate `Docs impact: none | minor | major` based on real behavior changes from the just-completed task
@@ -10,9 +10,12 @@ Green tests are NOT enough. The gate requires three proofs:
10
10
  ## Automation Semantics
11
11
 
12
12
  - If the task names exact commands in `Verification & Evidence`, those exact commands are mandatory and must run before any fallback repo defaults.
13
+ - Preflight compile/typecheck/build health is mandatory. If compile/typecheck/build fails before tests are meaningful, the gate result is `PRECHECK_FAIL`, not `NO_TESTS`.
13
14
  - `NO_TESTS` is never an automatic PASS.
14
15
  - `NO_TESTS` is acceptable only when the task does **not** require a dedicated test suite command and every other required automated command/evidence item passes.
15
16
  - If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
17
+ - Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
18
+ - Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
16
19
 
17
20
  ## Parallel Quality Cycle
18
21
 
@@ -33,11 +36,11 @@ START_LOOP:
33
36
  PARALLEL GATE: Spawn BOTH agents simultaneously
34
37
  ---------------------------------------------------------------
35
38
  → Task(subagent_type="test-runner",
36
- prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
39
+ prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
37
40
  description="Test [feature]")
38
41
 
39
42
  → Task(subagent_type="code-auditor",
40
- prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, or contract drift are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
43
+ prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
41
44
  description="Review [feature]")
42
45
 
43
46
  Wait for BOTH to return results.
@@ -46,7 +49,7 @@ START_LOOP:
46
49
  COMBINE RESULTS
47
50
  ---------------------------------------------------------------
48
51
 
49
- CASE 1 — Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
52
+ CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
50
53
  - Increment retry_count++
51
54
  - If retry_count >= 3:
52
55
  → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
@@ -86,6 +89,8 @@ REVIEW_ONLY:
86
89
  - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
87
90
  - **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
88
91
  - **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task.
92
+ - **Contract Substitution Drift:** Replacing a named framework/auth/transport/datastore/runtime boundary with a custom simplification without a spec amendment.
93
+ - **Cross-Service Reality Failure:** Claiming end-to-end behavior across web/api/worker/extension boundaries while state only exists in local process memory or placeholder adapters.
89
94
 
90
95
  ## Terminal Log Format
91
96
 
@@ -93,5 +98,6 @@ Must log the Quality Gate result to the terminal for user visibility:
93
98
 
94
99
  - **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
95
100
  - **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
101
+ - **Preflight Fail:** `[x] Step 4 Quality Gate: PRECHECK_FAIL → compile/typecheck/build failed before tests mattered`
96
102
  - **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
97
103
  - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`
@@ -16,6 +16,8 @@ When requested to update a phase or change task configuration, `spec.json` must
16
16
  * **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
17
17
  * **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
18
18
  * **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
19
+ * **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
20
+ * **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
19
21
  * **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
20
22
 
21
23
  ## 2. Updating `tasks/task-**.md`
@@ -26,11 +28,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
26
28
  When `/hapo:sync <feature> <task-id> done`:
27
29
  1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
28
30
  2. Inspect `## Verification & Evidence` first. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
29
- 3. Replace with: `**Status:** done`.
30
- 4. Locate block: `## Implementation Steps`.
31
- 5. Convert `- [ ]` into `- [x]` strictly within that section.
32
- 6. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
33
- 7. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
31
+ 3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
32
+ 4. Replace with: `**Status:** done`.
33
+ 5. Locate block: `## Implementation Steps`.
34
+ 6. Convert `- [ ]` into `- [x]` strictly within that section.
35
+ 7. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
36
+ 8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
34
37
 
35
38
  ### B. Blocking a Task
36
39
  When `/hapo:sync <feature> <task-id> blocked "API error"`:
@@ -57,5 +60,6 @@ When `/hapo:sync audit <feature>` is activated:
57
60
  - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
58
61
  - Registry says `done` but markdown still pending → update markdown only if evidence exists
59
62
  - Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
63
+ - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
60
64
  5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
61
65
  6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.