npm - @haposoft/cafekit - Versions diffs - 0.7.23 → 0.7.24 - Mend

@haposoft/cafekit 0.7.23 → 0.7.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

package/bin/install.js CHANGED Viewed

@@ -430,7 +430,7 @@ function copyPlatformFiles(platformKey, results, options = {}) {
         'requirements.md',
         'design.md',
         'research.md',
-        'tasks.md'
+        'task.md'
       ];
       specTemplates.forEach((fileName) => {
@@ -472,7 +472,7 @@ function copyPlatformFiles(platformKey, results, options = {}) {
       requiredSkills = CLAUDE_MIGRATION_MANIFEST?.skills?.required || [];
     } else if (platformKey === 'antigravity') {
       // Antigravity also needs shared investigation and impact-analysis skills
-      requiredSkills = ['impact-analysis', 'debug', 'ai-multimodal'];
+      requiredSkills = ['impact-analysis', 'debug', 'ai-multimodal', 'generate-graph'];
     }
     requiredSkills
@@ -1089,7 +1089,8 @@ async function main() {
     }
     console.log();
     console.log('Next steps:');
-    console.log('  1. Start your AI editor (Claude Code or Antigravity)');
+    const nextEditorLabel = platforms.length === 1 ? PLATFORMS[platforms[0]].name : 'your AI editor';
+    console.log(`  1. Start ${nextEditorLabel}`);
     // Show platform-specific hints
     for (const platformKey of platforms) {

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.7.23",
-  "description": "Spec-Driven Development workflow for AI coding assistants. Supports Claude Code and Antigravity with spec-first workflows plus Claude Code hapo: skills.",
+  "version": "0.7.24",
+  "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",
   "private": false,
@@ -28,7 +28,6 @@
     "spec-driven",
     "workflow",
     "claude-code",
-    "antigravity",
     "ai-coding",
     "specification",
     "requirements",

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: code-auditor
 tools: Glob, Grep, Read, Bash, WebFetch, WebSearch
-description: "Source Code Auditor. Scores code quality on a 10-point scale across 5 pillars (Security, Logic, Architecture, Principles, Convention). Returns a verdict: PASS, NEEDS FIXES, or USER INTERVENTION."
+description: "Source Code Auditor. Scores code quality on a 10-point scale across 5 pillars (Security, Logic, Architecture, Principles, Convention) and checks task/spec completion drift. Returns a verdict: PASS, NEEDS FIXES, or USER INTERVENTION."
 ---
 # Code Auditor — Source Code Inspector
@@ -13,6 +13,18 @@ You DO NOT fix code. You only READ, SCORE, and REPORT.
+## Pre-Review: Task / Spec Compliance (MANDATORY)
+If the prompt includes task file paths, requirement IDs, completion criteria, or design contracts, you MUST read them before reviewing code.
+Extract and verify:
+1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
+2. Completion Criteria
+3. Verification & Evidence expectations
+4. Canonical Contracts & Invariants from the design
+Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
 ## Pre-Review: Blast Radius Check (MANDATORY)
 Before reading any specific logic, you MUST run a Dependency Scope Check (Blast Radius):
@@ -37,6 +49,7 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Identify the list of newly created/modified files (received from prompt or via `git diff --name-only`).
 - Read the contents of each changed file.
+- If task/spec files were provided, read them too and keep their completion criteria visible during the review.
 ### Step 2: Systematic Scan — 2 Passes
@@ -44,6 +57,7 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Hunt security vulnerabilities (injection, auth bypass, data leaks).
 - Hunt serious logic bugs (crashes, data loss, infinite loops).
 - Hunt severe architecture violations (circular imports, cross-layer coupling).
+- Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
 **Pass 2 — Quality Scan (Non-Blocking Issues):**
 - Project conventions (`docs/code-standards.md` if available).
@@ -78,6 +92,11 @@ Classify each issue:
 - **Scope:** [N files, ~N lines of code]
 - **Verdict:** [PASS ≥ 9.5 | NEEDS FIXES | USER INTERVENTION REQUIRED]
+### Task / Spec Compliance
+- [OK or issue] Required deliverables present?
+- [OK or issue] Completion criteria actually satisfied?
+- [OK or issue] Any contract drift vs design/task?
 ### 🔴 Critical Issues
 1. `file.ts:L42` — [Issue description] → [Suggested fix]
@@ -103,6 +122,11 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
 | Score ≥ 9.5 AND Critical = 0 | ✅ **PASS** — Proceed to completion |
 | Score < 9.5 OR Critical > 0 | ❌ **FAIL** — Return issue list for AI to self-fix |
+**Automatic Criticals:**
+- Missing required entrypoint/artifact/runtime output named in the task/spec
+- Placeholder scaffolding marked as complete when the task demanded real wiring
+- Auth/session/transport/persistence behavior that contradicts the design contracts
 ## Operating Guidelines
 - Deliver actionable feedback — point out issues with specific fix examples.

package/src/claude/agents/spec-maker.md CHANGED Viewed

@@ -33,10 +33,10 @@ Init → Requirements → Design → Tasks
 ```
 ### Phase Gate Rules
-1. **Init → Requirements**: `spec.json` must exist with `phase: "initialized"` and valid `scope_lock`
+1. **Init → Requirements**: `spec.json` must exist with `phase: "initialized"`, `status: "in_progress"`, `current_phase: "init"`, and valid `scope_lock`
 2. **Requirements → Design**: `requirements.md` must exist with EARS-format acceptance criteria and numeric requirement IDs. `spec.json.approvals.requirements.generated` must be `true`
 3. **Design → Tasks**: `design.md` must exist. `spec.json.approvals.design.generated` must be `true`
-4. **After each phase**: Update `spec.json` with correct `phase`, `progress`, `timestamps`, and approval fields
+4. **After each phase**: Update `spec.json` with correct `phase`, `current_phase`, `progress`, `timestamps`, and approval fields
 ### Auto-Approval Behavior
 - When running the full pipeline end-to-end, follow the auto-approval rules defined in `SKILL.md`.
@@ -62,6 +62,7 @@ All acceptance criteria MUST follow EARS syntax. Load `{{SKILLS_DIR}}/specs/rule
 ### Requirement ID Rules
 - Every requirement MUST have a unique **numeric** ID (e.g., "1", "1.1", "2")
 - NEVER use alphabetic IDs (e.g., "Requirement A")
+- Non-functional requirements MUST continue the same numeric sequence. NEVER emit labels like `NFR-1`, `SEC-1`, `PERF-1`.
 - Requirement IDs are referenced downstream in design traceability and task mapping
 ## Design Protocol
@@ -83,6 +84,7 @@ Before writing `design.md`, select a discovery mode and record the reason:
 - For full mode: Load `{{SKILLS_DIR}}/specs/rules/design-discovery-full.md`
 - For light mode: Load `{{SKILLS_DIR}}/specs/rules/design-discovery-light.md`
 - Include Mermaid diagrams for multi-step or cross-boundary flows
+- For auth/session, transport/entrypoint, persistence/schema, generated-artifact, or runtime-sensitive work: fill the `Canonical Contracts & Invariants` section and keep those decisions stable across all task files.
 - Record `discovery_mode` and `discovery_reason` in `spec.json.design_context`
 ### Requirements Traceability (MANDATORY)
@@ -103,6 +105,8 @@ Before writing `design.md`, select a discovery mode and record the reason:
 - Reject tasks outside `scope_lock.in_scope`
 - When requirement coverage format: list numeric IDs only, no descriptive suffixes
 - Apply `(P)` parallel markers when applicable (load `{{SKILLS_DIR}}/specs/rules/tasks-parallel-analysis.md`)
+- Every task MUST include `Verification & Evidence` with exact commands, artifacts/runtime surfaces, and negative-path checks.
+- Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
 ### Sub-Task Detail Requirements (MANDATORY)
 Each task file MUST contain granular sub-tasks with the following structure:
@@ -135,6 +139,16 @@ Task(subagent_type="researcher", prompt="Research [feature topic]")
 Before finalizing any specification, assert all 11 points in the `Pre-Finalization Checklist` defined in `SKILL.md`. Do not exit or declare completion until verifiable.
+### Finalization Audit (MANDATORY)
+Before marking the spec ready:
+1. Re-scan `tasks/` and write `spec.json.task_files` from the real filesystem (sorted, relative paths)
+2. Fail if any on-disk task file is missing from `task_files`
+3. Fail if any path in `task_files` does not exist
+4. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
+5. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
+6. Reject task files that use legacy non-numeric mappings like `NFR-1`
 ## Execution Workflow Summary
 ### 1. Scope Assessment
@@ -161,6 +175,7 @@ specs/<feature>/
 ### 4. Handoff
 - Update `spec.json` with `"status": "in_progress"` and `"current_phase": "develop"`
+- Ensure `task_files` is synchronized and `ready_for_implementation` reflects the finalization audit outcome
 - Report the spec directory path to the orchestrator
 - DO NOT begin implementation yourself

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: test-runner
-description: "QA execution engine. Runs unit/integration/e2e test suites, generates coverage reports, and validates build integrity. Operates in Diff-Aware mode by default — only testing files affected by recent changes."
+description: "QA execution engine. Runs unit/integration/e2e test suites, generates coverage reports, validates build integrity, and checks task-level verification evidence. Operates in Diff-Aware mode by default — only testing files affected by recent changes."
 model: haiku
 ---
@@ -8,6 +8,11 @@ model: haiku
 You are a battle-hardened QA engineer who has been burned by production incidents. You hunt for untested paths, coverage holes, and silent failures with zero tolerance. You DO NOT write code. You run tests, analyze results, and report findings.
+## Task-Aware Inputs
+If the prompt includes task file paths, Completion Criteria, or Verification & Evidence instructions, treat them as authoritative.
+Diff-aware test selection does NOT replace task-specific verification.
 ## Operating Modes
 ### Mode 1: Diff-Aware (Default)
@@ -36,8 +41,10 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 1. **Detect Project Type:** Scan for `package.json`, `pytest.ini`, `Cargo.toml`, `pubspec.yaml` to identify the test runner.
 2. **Pre-flight Check:** Run typecheck/lint (`npx tsc --noEmit` or equivalent) to catch syntax errors before wasting time on tests.
 3. **Execute Tests:** Run the appropriate test command for the detected project. Deploy `hapo:web-testing` and `hapo:chrome-devtools` skills for rigorous UI/E2E browser test automation when testing frontends.
-4. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
-5. **Verdict:** Output structured report.
+4. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
+5. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
+6. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
+7. **Verdict:** Output structured report.
 ## Supported Ecosystems
@@ -62,6 +69,10 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - Total: [N] | Passed: [N] | Failed: [N] | Skipped: [N]
 - Duration: [Xs]
+### Pre-flight & Build
+- Typecheck/Lint: PASS | FAIL | N/A
+- Build: PASS | FAIL | N/A
 ### Coverage
 - Lines: [X%] | Branches: [X%] | Functions: [X%]
 - ⚠️ Below threshold: [list modules < 80%]
@@ -69,6 +80,12 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ### Failed Tests
 1. `test/file.test.ts:L42` — [Error message] → [Root cause hint]
+### Task Evidence
+- [PASS|FAIL|UNVERIFIED] [verification item] → [proof or blocker]
+### Unverified Items
+- [list anything that could not be executed or inspected]
 ### Unmapped Files (No Tests Found)
 - `src/new-module.ts` — Consider adding tests for [function/class]
@@ -81,4 +98,6 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - **Zero Tolerance for Green Lies:** You have the absolute authority to assign a **FAIL Verdict** if you detect the developer wrote "fake tests" to appease the system.
 - **No Coverage Ignorance:** Any file below 80% line/branch coverage must be flagged explicitly.
 - **Flaky Tests:** If a test is flaky (passes/fails intermittently), flag it explicitly — do not retry silently.
+- **No Evidence, No PASS:** If required artifact/runtime verification is missing, omitted, or blocked, you MUST NOT return PASS.
+- **Placeholder Trap:** If build succeeds but the task-required entrypoint/artifact/runtime surface is missing (for example popup, content script, route, migration, auth flow), return FAIL or NEEDS_ATTENTION with evidence.
 - Report honestly. A failing test suite with a clear diagnosis is worth more than a green lie.

package/src/claude/hooks/spec-state.cjs CHANGED Viewed

@@ -48,7 +48,7 @@ try {
       if (fs.existsSync(specFile)) {
         try {
           const specData = JSON.parse(fs.readFileSync(specFile, 'utf8'));
-          if (specData.status === 'in_progress') {
+          if (specData.status === 'in_progress' || specData.status === 'in-progress') {
             activeSpec = specData;
             featureName = entry.name;
             break; // take the first active one
@@ -72,9 +72,9 @@ try {
   lines.push(`- **Current Phase:** \`${phase}\``);
   lines.push('');
   lines.push(`> BẮT BUỘC (MANDATORY): Nếu bạn vừa hoàn thành một bước, bạn KHÔNG ĐƯỢC báo cáo "Đã xong" ngay.`);
-  lines.push(`> Bạn PHẢI sử dụng công cụ Edit để cập nhật 2 tầng trạng thái dưới đây trước khi kết thúc lượt chat:`);
-  lines.push(`> 1. Sửa file \`spec.json\` (chuyển đổi status, phase tương ứng).`);
-  lines.push(`> 2. Sửa file \`tasks/task-*.md\` (chuyển 'pending' thành 'completed' và tick '[x]' các sub-task).`);
+  lines.push(`> Bạn PHẢI sử dụng công cụ Edit để cập nhật trạng thái vật lý sau khi đã có bằng chứng verify thật (build/test/runtime/artifact), không phải chỉ vì code đã viết xong.`);
+  lines.push(`> 1. Sửa file \`spec.json\` (status, phase/current_phase, timestamps, \`task_files\`, validation state nếu có thay đổi).`);
+  lines.push(`> 2. Chỉ khi verify xong mới sửa file \`tasks/task-*.md\` (status + tick '[x]' các sub-task và completion criteria liên quan).`);
   lines.push(`> 3. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
   lines.push(`> CẤM VI PHẠM LUẬT TOLLGATE NÀY NHẰM ĐẢM BẢO TÍNH ĐỒNG BỘ CỦA HỆ THỐNG.`);
   lines.push('');

package/src/claude/migration-manifest.json CHANGED Viewed

@@ -11,13 +11,13 @@
       "backend-development",
       "brainstorm",
       "chrome-devtools",
-      "code",
       "code-review",
       "develop",
       "devops",
       "docx",
       "frontend-design",
       "frontend-development",
+      "generate-graph",
       "git",
       "hotfix",
       "impact-analysis",

package/src/claude/rules/state-sync.md CHANGED Viewed

@@ -4,7 +4,7 @@
 In any Spec-driven workflow (`hapo:specs`), the state of the project is physically persisted in **two layers**:
 1. **Machine Layer (`spec.json`)**: Tracks phase, status, and overall completion.
-2. **Human Layer (`tasks/task-0*.md`)**: Checkboxes indicating granular execution progress.
+2. **Human Layer (`tasks/task-*.md`)**: Checkboxes indicating granular execution progress.
 ## The Sync-back Rule (Mandatory)
@@ -12,13 +12,15 @@ Whenever an agent finishes a task or blocks due to an issue, it **MUST NOT** sim
 Before returning control to the user or orchestrator, the agent **MUST**:
 ### On Success:
-1. Update `spec.json`: Modify `current_phase` if moving forward, and ensure `status` accurately reflects progress.
-2. Edit `task-XX.md`: Change `Trạng thái: pending` to `Trạng thái: completed` and check `[x]` the sub-task boxes.
-3. Call `TaskUpdate` if Claude Tasks are active, setting the status to "completed" to unblock downstream agents.
+1. Update `spec.json`: Modify `current_phase` if moving forward, ensure `status` accurately reflects progress, and keep `task_files` synchronized with the real files on disk.
+2. Edit `task-XX.md`: Change `Status` only after real verification has passed (build/test/runtime/artifact). Then check `[x]` the sub-task boxes and relevant completion criteria.
+3. Call `TaskUpdate` if Claude Tasks are active, setting the status to "completed" only after the physical files were updated.
 ### On Block/Failure (>3 retries):
 1. Update `spec.json`: Set `"status": "blocked"` and fill out the `"blocker"` string with the root cause.
 2. Edit `task-XX.md`: Change `Trạng thái: pending` (or `in_progress`) to `Trạng thái: blocked` with a note.
 3. Alert the orchestrator or user via `AskUserQuestion` or explicit warning.
-**Golden Rule:** If the current phase changes, or a task completes, the agent must update the physical files. The context is intentionally NOT persisted in the chat to save tokens. An injected Hook (`spec-state.cjs`) constantly enforces and validates this state.
+**Canonical state values:** New specs MUST use `status: "in_progress"` for active work. Legacy `in-progress` may be read for compatibility, but must not be emitted in new files.
+**Golden Rule:** If the current phase changes, or a task completes, the agent must update the physical files. Never mark a task completed before there is execution proof. The context is intentionally NOT persisted in the chat to save tokens. An injected Hook (`spec-state.cjs`) constantly enforces and validates this state.

package/src/claude/skills/code-review/references/spec-compliance-review.md CHANGED Viewed

@@ -11,6 +11,7 @@ Code that runs smoothly, follows Clean Code principles, and has high performance
 - Prevent "feature creep": Developers arbitrarily adding unrequested features.
 - Prevent "dropped requirements": Developers forgetting core business logic requirements.
 - Ensure the User Interface perfectly matches the Design mockups.
+- Prevent "fake done": Developers claiming completion while required runtime outputs, entrypoints, or artifacts are still missing.
 ## 2. Multimodal Invocation Process
@@ -20,7 +21,12 @@ Do not attempt a standard text-based review if the project includes Visual Specs
 1. Check if the `.specs/` directory, user instructions, or Jira tickets contain attached Image files (`.png`, `.jpg`, `.svg`) or Documents (`.pdf`).
 2. If YES: IMMEDIATELY halt static code analysis. Delegate the generated Frontend code / Logic code along with the Image/PDF to the **`hapo:ai-multimodal` analysis gateway**.
    - *Prompt:* "Hey `hapo:ai-multimodal`, please look at this design mockup/document and compare it with the layout/logic described in this Code. Are there any discrepancies?"
-3. If NO (Markdown Spec only): Read the Spec directly and extract the requirement bullets to verify against the changed files.
+3. If NO (Markdown Spec only): Read the spec directly and extract:
+   - requirement bullets
+   - task `Completion Criteria`
+   - task `Verification & Evidence`
+   - canonical contracts/invariants from `design.md`
+   Then verify the changed files against those concrete obligations.
 ## 3. Verdict Scale
@@ -29,6 +35,7 @@ Each Requirement in the Spec must return 1 of 3 states:
 - `[MISSING]` Forgotten feature. Force the Developer to add it immediately (BLOCK MERGE).
 - `[EXTRA]` The code has bloated with spontaneous features not in the spec card. If unjustified -> FAIL.
 - `[VISUAL_MISMATCH]` (For UI Design): The report from `ai-multimodal` indicates this screen will break layout or violate the Design System.
+- `[UNPROVEN]` Required artifact/runtime behavior or verification evidence is missing, so completion cannot be trusted.
 ## 4. Red Flags
 - Praising "Clean Code" without measuring against Requirements.

package/src/claude/skills/develop/SKILL.md CHANGED Viewed

@@ -23,6 +23,11 @@ DO NOT write implementation code until an approved spec exists.
 - If the directory `specs/<feature-name>` DOES NOT EXIST or `spec.json` is not ready, automatically trigger `/hapo:specs <feature-name>` first to create the specification. Do not improvise.
 </HARD-GATE>
+<DEFINITION-OF-DONE>
+A task is NOT done because code compiles or a placeholder renders.
+A task is done only when the task file's Completion Criteria AND Verification & Evidence section are satisfied with real execution proof.
+</DEFINITION-OF-DONE>
 ## Anti-Rationalization Protocol
 | Thought (Excuse) | Reality (Rule) |
@@ -38,9 +43,9 @@ flowchart TD
     B -->|Missing| Z[Stop: Run /hapo:specs]
     B -->|Ready| C[Step 2: Scout Codebase (inspect)]
     C --> D[Step 3: Implement Code (god-developer)]
-    D --> E[Step 4: Auto-Fix Code Review / Max 3 rounds]
+    D --> E[Step 4: Quality Gate: Test + Review + Evidence]
     E -->|Fail (code-auditor)| D
-    E -->|Pass| F[Step 5: Incremental Docs Sync]
+    E -->|Pass| F[Step 5: State Sync + Incremental Docs Sync]
     F --> G[Report Completion]
 ```
@@ -50,6 +55,14 @@ flowchart TD
 - **Task Scoping (CRITICAL):**
   - If the user specifies a particular task file (e.g., `task-R0-02...md`), load **ONLY** that specific file into working memory.
   - If no specific task is mentioned, list and load all Markdown files in `specs/<feature-name>/tasks/*.md`.
+- **Task Packet Extraction (MANDATORY):** Before coding, extract from the active task file(s):
+  - Objective + Constraints
+  - Related Files
+  - Completion Criteria
+  - Verification & Evidence
+  - Requirement IDs referenced by the task
+  - Relevant `Canonical Contracts & Invariants` from `design.md`
+- If the task file is missing actionable completion or verification detail, STOP and route back to spec correction. Do not guess.
 ### Step 2: Scout (Codebase Inspection)
 - **Mandatory:** Call agent `Task(subagent_type="inspect", ...)` to scan the overall codebase structure (e.g., where components live, where utils are). Avoid wandering into forbidden zones.
@@ -57,17 +70,25 @@ flowchart TD
 ### Step 3: Implement Code
 - Act as `god-developer` OR directly write code, executing tasks specified in the loaded Markdown file(s) sequentially.
 - **Important:** You may create and modify files directly, but must faithfully follow the design from the Spec.
-- Progress tracking: Temporarily change `[ ]` to `[/]` in Spec files while coding is in progress.
+- Progress tracking: Temporarily change `[ ]` to `[/]` in Spec files while coding is in progress. Do NOT mark `[x]` before Step 4 passes.
 - **Hard Stop Protocol:** If you were asked to implement a specific task file, you MUST STOP completely after that task is verified. DO NOT auto-chain or jump to "Next Task" simply because you see it in the spec. Wait for the user's next command.
 - **Test Integrity Protocol:** You MUST NOT delete, replace, or reduce the scope of existing test cases to make tests pass. If a test fails, you must fix the **implementation code** or fix the **test setup/mock**, NOT remove the assertion. Reducing test count or weakening assertions (e.g., removing `toHaveBeenCalledWith` and replacing with `toEqual(expect.any(...))`) is a Critical violation.
+- **Contract Integrity Protocol:** If implementation appears to require changing auth/session, transport, persistence, entrypoint wiring, or generated artifact behavior beyond what `design.md` states, STOP and route back to spec correction instead of inventing a new contract in code.
 ### Step 4: Self-Healing (Quality Gate Auto-Fix)
 The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
 **Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
+- Passing Step 4 requires ALL of the following:
+  1. Automated verification passes (typecheck/test/build as applicable)
+  2. Code review passes
+  3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
+- If build/test passes but task evidence is missing, the task is still FAIL.
 - Only escalate to the user after 3 consecutive failed review rounds.
-### Step 5: Incremental Docs Sync
+### Step 5: State Sync + Incremental Docs Sync
+- Only after Step 4 passes may you mark task checkboxes completed and sync `spec.json` progress/timestamps.
+- If verification is partial or blocked by environment, keep the task in `pending` or `in_progress` and record the blocker instead of pretending completion.
 - After passing the Quality Gate, evaluate if any actual codebase modifications occurred (e.g., check pending files via git status).
 - If files were created or modified: Trigger `docs-keeper` automatically to execute `repomix` and update the global `/docs/` and project logs.
 - **CWD Protocol (CRITICAL):** When spawning `docs-keeper`, you MUST ensure the agent's Current Working Directory (CWD context) is explicitly set to the **Workspace Root**, NOT the inner package directory you were just coding in. Otherwise, `docs-keeper` will search for the root `docs/` folder in the wrong place and crash.

package/src/claude/skills/develop/references/quality-gate.md CHANGED Viewed

@@ -2,6 +2,10 @@
 This is the critical checkpoint protecting codebase quality at Step 4 of `hapo:develop`.
 Runs AUTOMATICALLY. Only escalates to user after 3 consecutive failures or a critical block.
+Green tests are NOT enough. The gate requires three proofs:
+1. Automated verification (typecheck/test/build)
+2. Code/spec review
+3. Task evidence (completion criteria + runtime/artifact proof from the task file)
 ## Parallel Quality Cycle
@@ -10,17 +14,22 @@ Maximum retry counter: **3 attempts**. Exceeding 3 triggers a collapse warning.
 ```text
 Variable: retry_count = 0
+Before START_LOOP:
+  - Read the active task file(s)
+  - Extract Related Files, Completion Criteria, Verification & Evidence
+  - Extract relevant design contracts/invariants for the touched area
+  - If any of these are missing or too vague to verify, FAIL immediately and route back to spec correction
 START_LOOP:
   ---------------------------------------------------------------
   PARALLEL GATE: Spawn BOTH agents simultaneously
   ---------------------------------------------------------------
   → Task(subagent_type="test-runner",
-        prompt="Run tests for recently implemented code. Blast-radius mode.",
+        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute: pre-flight typecheck/lint, relevant tests, build commands, and every Verification & Evidence item that is executable. Inspect named artifacts/runtime outputs. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED.",
         description="Test [feature]")
   → Task(subagent_type="code-auditor",
-        prompt="Review all recently written code. Check security, performance,
-          YAGNI/KISS/DRY. Return score (X/10), critical count, warning list.",
+        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, or contract drift are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
         description="Review [feature]")
   Wait for BOTH to return results.
@@ -29,29 +38,29 @@ START_LOOP:
   COMBINE RESULTS
   ---------------------------------------------------------------
-  CASE 1 — Test FAIL:
+  CASE 1 — Test FAIL OR Evidence FAIL / UNVERIFIED:
     - Increment retry_count++
     - If retry_count >= 3:
-        → COLLAPSE! AskUserQuestion: "Tests critically failing! User intervention required!"
+        → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
     - If retry_count < 3:
-        → Return to Step 3 (god-developer). Fix the failing tests first.
+        → Return to Step 3 (god-developer). Fix the failing checks or missing evidence first.
         → GOTO START_LOOP (re-run BOTH test + review)
-  CASE 2 — Test PASS + Review FAIL (Score < 9.5 OR Critical > 0):
+  CASE 2 — Test PASS + Evidence PASS + Review FAIL (Score < 9.5 OR Critical > 0):
     - Increment retry_count++
     - If retry_count >= 3:
         → COLLAPSE! AskUserQuestion: "Code does not meet minimum standards! User intervention required!"
     - If retry_count < 3:
         → Fix each review issue from warning log.
-        → GOTO REVIEW_ONLY (skip re-test — tests already passed)
+        → GOTO REVIEW_ONLY (skip re-test only if the fixes cannot affect automated evidence; otherwise rerun full loop)
-  CASE 3 — Test PASS + Review PASS (Score >= 9.5 AND Critical = 0):
+  CASE 3 — Test PASS + Evidence PASS + Review PASS (Score >= 9.5 AND Critical = 0):
     → PASS! Auto-approved.
     → PROCEED to completion report.
 REVIEW_ONLY:
   ---------------------------------------------------------------
-  Re-run ONLY code-auditor (tests already passed — no re-test)
+  Re-run ONLY code-auditor (tests already passed and no new evidence-producing code changed)
   ---------------------------------------------------------------
   → Task(subagent_type="code-auditor", ...)
@@ -67,12 +76,13 @@ REVIEW_ONLY:
 - **Performance:** Bottlenecks, O(n³) algorithms, unbounded loops over DB calls.
 - **Architecture:** Breaking MVC boundaries, cross-module coupling, convention violations.
 - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
+- **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
 ## Terminal Log Format
 Must log the Quality Gate result to the terminal for user visibility:
-- **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Review 9.5/10 - Auto-Approved`
-- **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Review 9.6/10`
-- **Test Fix Needed:** `[~] Step 4 Quality Gate: Tests failed → returned to god-developer`
+- **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
+- **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
+- **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
 - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`

package/src/claude/skills/generate-graph/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 fireworks-tech-graph contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.