npm - @haposoft/cafekit - Versions diffs - 0.7.25 → 0.7.27 - Mend

@haposoft/cafekit 0.7.25 → 0.7.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/package.json +1 -1
package/src/claude/agents/code-auditor.md +14 -3
package/src/claude/agents/docs-keeper.md +12 -1
package/src/claude/agents/god-developer.md +2 -2
package/src/claude/agents/test-runner.md +24 -4
package/src/claude/hooks/privacy-block.cjs +133 -80
package/src/claude/hooks/state.cjs +164 -122
package/src/claude/rules/manage-docs.md +1 -0
package/src/claude/settings/settings.json +10 -1
package/src/claude/skills/develop/SKILL.md +55 -8
package/src/claude/skills/develop/references/quality-gate.md +19 -4
package/src/claude/skills/specs/references/review.md +1 -1
package/src/claude/skills/specs/references/task-hydration.md +4 -4
package/src/claude/skills/sync/SKILL.md +3 -1
package/src/claude/skills/sync/references/sync-protocols.md +14 -4
package/src/claude/skills/test/SKILL.md +2 -2
package/src/claude/skills/test/references/execution-strategy.md +2 -3

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.7.25",
+  "version": "0.7.27",
   "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -19,11 +19,14 @@ If the prompt includes task file paths, requirement IDs, completion criteria, or
 Extract and verify:
 1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
-2. Completion Criteria
-3. Verification & Evidence expectations
-4. Canonical Contracts & Invariants from the design
+2. Declared task scope (`Related Files` and direct support files that are clearly justified)
+3. Completion Criteria
+4. Verification & Evidence expectations
+5. Canonical Contracts & Invariants from the design
+6. Named technologies and runtime choices that the task/spec explicitly requires
 Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
+If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
 ## Pre-Review: Blast Radius Check (MANDATORY)
@@ -58,6 +61,9 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
 - Hunt serious logic bugs (crashes, data loss, infinite loops).
 - Hunt severe architecture violations (circular imports, cross-layer coupling).
 - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
+- Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
+- Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
+- Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
 **Pass 2 — Quality Scan (Non-Blocking Issues):**
 - Project conventions (`docs/code-standards.md` if available).
@@ -94,6 +100,7 @@ Classify each issue:
 ### Task / Spec Compliance
 - [OK or issue] Required deliverables present?
+- [OK or issue] Changes stayed within task scope?
 - [OK or issue] Completion criteria actually satisfied?
 - [OK or issue] Any contract drift vs design/task?
@@ -126,6 +133,10 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
 - Missing required entrypoint/artifact/runtime output named in the task/spec
 - Placeholder scaffolding marked as complete when the task demanded real wiring
 - Auth/session/transport/persistence behavior that contradicts the design contracts
+- Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification
+- Cross-service behavior "proven" only by process-local memory, fake adapters, or other non-shared placeholders
+- Files or features from later tasks delivered early without explicit scope-escape justification
+- Task marked complete while required commands/evidence are still FAIL / UNVERIFIED
 ## Operating Guidelines

package/src/claude/agents/docs-keeper.md CHANGED Viewed

@@ -40,10 +40,21 @@ Before documenting ANY code reference, you MUST prove it exists:
 ### 4. Codebase Summary Engine
 Generate the project's technical DNA map:
-- Run `repomix` to compact the entire repo into `./repomix-output.xml`.
+- Default to direct code + docs verification first.
+- Run `repomix` only when macro-architecture context is truly required or `./docs/codebase-summary.md` is missing/stale enough that you cannot safely update docs from the local evidence alone.
+- When `repomix` is used, compact the repo into `./repomix-output.xml`.
 - Digest and synthesize into `./docs/codebase-summary.md` (Update the existing one).
 - This file acts as the single source of truth for all other agents to quickly grasp the project landscape.
+### 4b. Task Closeout Mode
+When called from `hapo:develop` after a verified task is complete:
+- Treat the job as a **lightweight task-closeout sync**
+- Update only the existing docs affected by that task
+- Start by classifying `Docs impact: none | minor | major`
+- If impact is `none`, return a short report and do not force edits
+- If impact is `minor` or `major`, prefer surgical edits to `docs/project-overview-pdr.md`, `docs/system-architecture.md`, `docs/code-standards.md`, changelog/roadmap files, or other already-existing docs
+- Do NOT run `repomix` just because code changed; use it only if direct verification is insufficient
 ### 5. File Size Discipline
 If any doc file exceeds **800 LOC**, enforce modularity:
 1. Identify semantic boundaries (distinct topics that can stand alone).

package/src/claude/agents/god-developer.md CHANGED Viewed

@@ -77,8 +77,8 @@ Upon completion, output a concise report in this format:
 - ...
 ### Tasks Completed
-- [x] Task 01: ...
-- [x] Task 02: ...
+- [x] R0-01: ...
+- [x] R0-02: ...
 ### Build Results
 - Typecheck: [pass/fail]

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -12,6 +12,17 @@ You are a battle-hardened QA engineer who has been burned by production incident
 If the prompt includes task file paths, Completion Criteria, or Verification & Evidence instructions, treat them as authoritative.
 Diff-aware test selection does NOT replace task-specific verification.
+If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
+## Command Resolution Order
+When the task file names exact commands, use this order:
+1. Run every exact executable command from `Verification & Evidence` in declaration order.
+2. Run repo-default typecheck/test/build commands only to fill gaps not already covered above.
+3. Apply diff-aware test selection only after task-mandated commands are satisfied.
+Never silently substitute a lighter command for a task-mandated one. Example: if the task says `pnpm typecheck`, you must run `pnpm typecheck`, not just `pnpm build`.
+Preflight compile/typecheck/build failures take precedence over the absence of tests.
 ## Operating Modes
@@ -39,12 +50,13 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ## Execution Pipeline
 1. **Detect Project Type:** Scan for `package.json`, `pytest.ini`, `Cargo.toml`, `pubspec.yaml` to identify the test runner.
-2. **Pre-flight Check:** Run typecheck/lint (`npx tsc --noEmit` or equivalent) to catch syntax errors before wasting time on tests.
+2. **Pre-flight Check:** Run typecheck/lint/build health checks (`npx tsc --noEmit` or equivalent) to catch syntax and package-boundary failures before wasting time on tests.
 3. **Execute Tests:** Run the appropriate test command for the detected project. Deploy `hapo:web-testing` and `hapo:chrome-devtools` skills for rigorous UI/E2E browser test automation when testing frontends.
 4. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
 5. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
-6. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
-7. **Verdict:** Output structured report.
+6. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
+7. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
+8. **Verdict:** Output structured report.
 ## Supported Ecosystems
@@ -73,6 +85,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - Typecheck/Lint: PASS | FAIL | N/A
 - Build: PASS | FAIL | N/A
+### Exact Commands Executed
+- `command here` → PASS | FAIL | UNVERIFIED
 ### Coverage
 - Lines: [X%] | Branches: [X%] | Functions: [X%]
 - ⚠️ Below threshold: [list modules < 80%]
@@ -89,7 +104,7 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 ### Unmapped Files (No Tests Found)
 - `src/new-module.ts` — Consider adding tests for [function/class]
-### Verdict: [PASS | FAIL | NEEDS_ATTENTION]
+### Verdict: [PASS | FAIL | PRECHECK_FAIL | NEEDS_ATTENTION]
 ```
 ## Strict Rules — The "Anti-Illusion" Protocol
@@ -100,4 +115,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
 - **Flaky Tests:** If a test is flaky (passes/fails intermittently), flag it explicitly — do not retry silently.
 - **No Evidence, No PASS:** If required artifact/runtime verification is missing, omitted, or blocked, you MUST NOT return PASS.
 - **Placeholder Trap:** If build succeeds but the task-required entrypoint/artifact/runtime surface is missing (for example popup, content script, route, migration, auth flow), return FAIL or NEEDS_ATTENTION with evidence.
+- **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
+- **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
+- **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
+- **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
+- **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
 - Report honestly. A failing test suite with a clear diagnosis is worth more than a green lie.

package/src/claude/hooks/privacy-block.cjs CHANGED Viewed

@@ -3,124 +3,177 @@
  * Copyright (c) 2026 Haposoft. MIT License.
  *
  * PreToolUse Hook — privacy-block.cjs
- * Implements: https://docs.anthropic.com/en/docs/claude-code/hooks
  *
- * Blocks access to sensitive files unless the user explicitly approves.
+ * Claude Code CLI privacy gate for sensitive files.
  *
- * Approval flow:
- *   1. Hook blocks with exit(2) and shows a prompt
- *   2. Claude Code asks user for approval
- *   3. User approves → Claude retries with "APPROVED:" prefix
- *   4. Hook detects prefix → allows through
- *
- * Disable: set "privacyBlock": false in .claude/runtime.json
+ * Runtime contract:
+ * - Non-bash file access to sensitive files is blocked with a JSON marker
+ * - Assistant must use AskUserQuestion with that JSON payload
+ * - If user approves, assistant should read via `bash cat "file"`
+ * - Bash access is allowed with a warning to enable the approved follow-up path
  *
  * Exit: 0 = allow, 2 = block
  */
 try {
-  const fs   = require('fs');
+  const fs = require('fs');
   const path = require('path');
-  // Sensitive file patterns — matched against basename and full path
   const RESTRICTED_PATTERNS = [
-    /^\.env(\.|$)/i,                  // .env, .env.local, .env.production …
-    /^credentials/i,                  // credentials.json, aws-credentials …
-    /secrets?\.(ya?ml|json)$/i,       // secrets.yaml, secret.json
-    /\.pem$/i,                        // TLS certificates
-    /\.key$/i,                        // Private keys
-    /\.p12$/i,                        // PKCS12 bundles
-    /\.pfx$/i,                        // PFX bundles
-    /^id_(rsa|ed25519|ecdsa|dsa)$/i,  // SSH private keys
-    /\.netrc$/i,                      // Network credentials
-    /\.pgpass$/i,                     // PostgreSQL passwords
-    /kubeconfig/i,                    // Kubernetes config
-    /\.keystore$/i,                   // Java / Android keystores
-    /\.jks$/i,                        // Java KeyStore
-    /auth\.json$/i,                   // OAuth tokens
-    /token(s)?\.json$/i,              // Token files
+    /^\.env(\.|$)/i,
+    /^credentials/i,
+    /secrets?\.(ya?ml|json)$/i,
+    /\.pem$/i,
+    /\.key$/i,
+    /\.p12$/i,
+    /\.pfx$/i,
+    /^id_(rsa|ed25519|ecdsa|dsa)$/i,
+    /\.netrc$/i,
+    /\.pgpass$/i,
+    /kubeconfig/i,
+    /\.keystore$/i,
+    /\.jks$/i,
+    /auth\.json$/i,
+    /token(s)?\.json$/i
   ];
-  // Safe exceptions — these always pass through (example / template files)
   const ALLOWED_EXEMPTIONS = [
-    /\.env\.(example|sample|template|test)$/i,
+    /\.env\.(example|sample|template|test)$/i
   ];
-  function isSafe(p)      { const b = path.basename(p); return ALLOWED_EXEMPTIONS.some(r => r.test(b) || r.test(p)); }
-  function isSensitive(p) { const b = path.basename(p); return RESTRICTED_PATTERNS.some(r => r.test(b) || r.test(p)); }
-  /** Extract file paths from various tool inputs */
-  function extractPaths(toolName, input) {
-    const out = [];
-    if (!input) return out;
-    if (input.file_path) out.push(input.file_path);
-    if (input.path)      out.push(input.path);
-    // Bash: look for cat/less/head/tail etc.
-    if (typeof input.command === 'string') {
-      const m = input.command.match(/(?:cat|less|more|head|tail|source|\.)\s+(\S+)/g);
-      if (m) m.forEach(s => out.push(s.trim().split(/\s+/).pop()));
+  function readRuntime(cwd) {
+    try {
+      const file = path.join(cwd, '.claude', 'runtime.json');
+      return fs.existsSync(file) ? JSON.parse(fs.readFileSync(file, 'utf8')) : {};
+    } catch {
+      return {};
     }
-    return out.filter(Boolean);
   }
-  /** True if the user prompt contains an APPROVED: prefix */
-  function approved(prompt) {
-    return typeof prompt === 'string' && prompt.includes('APPROVED:');
+  function isSafe(filePath) {
+    const base = path.basename(filePath);
+    return ALLOWED_EXEMPTIONS.some((rule) => rule.test(base) || rule.test(filePath));
   }
-  /** Read runtime.json */
-  function readRuntime(cwd) {
-    try {
-      const p = path.join(cwd, '.claude', 'runtime.json');
-      return fs.existsSync(p) ? JSON.parse(fs.readFileSync(p, 'utf8')) : {};
-    } catch { return {}; }
+  function isSensitive(filePath) {
+    const base = path.basename(filePath);
+    return RESTRICTED_PATTERNS.some((rule) => rule.test(base) || rule.test(filePath));
+  }
+  function extractBashPaths(command) {
+    const paths = [];
+    const regex = /(?:cat|less|more|head|tail|source|\.)\s+(?:"([^"]+)"|'([^']+)'|([^\s]+))/g;
+    let match;
+    while ((match = regex.exec(command)) !== null) {
+      paths.push(match[1] || match[2] || match[3]);
+    }
+    return paths;
+  }
+  function extractPaths(toolName, input) {
+    const paths = [];
+    if (!input) return paths;
+    for (const key of ['file_path', 'path']) {
+      if (typeof input[key] === 'string' && input[key].trim()) {
+        paths.push(input[key].trim());
+      }
+    }
+    for (const key of ['paths', 'search_paths']) {
+      if (Array.isArray(input[key])) {
+        paths.push(...input[key].filter(Boolean));
+      }
+    }
+    if (toolName === 'Bash' && typeof input.command === 'string') {
+      paths.push(...extractBashPaths(input.command));
+    }
+    return paths.filter(Boolean);
   }
-  // ── Main ──────────────────────────────────────────────────────────────────
+  function formatBlockMessage(filePath) {
+    const basename = path.basename(filePath);
+    const promptData = {
+      type: 'PRIVACY_PROMPT',
+      file: filePath,
+      basename,
+      question: {
+        header: 'File Access',
+        text: `I need to read "${basename}" which may contain sensitive data (API keys, passwords, tokens). Do you approve?`,
+        options: [
+          {
+            label: 'Yes, approve access',
+            description: `Allow reading ${basename} this time`
+          },
+          {
+            label: 'No, skip this file',
+            description: 'Continue without accessing this file'
+          }
+        ]
+      }
+    };
+    return [
+      'NOTE: This is not an error. This block protects sensitive data.',
+      '',
+      `PRIVACY BLOCK: Sensitive file access requires user approval`,
+      `File: ${filePath}`,
+      '',
+      '@@PRIVACY_PROMPT_START@@',
+      JSON.stringify(promptData, null, 2),
+      '@@PRIVACY_PROMPT_END@@',
+      '',
+      'Claude Code follow-up:',
+      `- If approved: use bash to read: cat "${filePath}"`,
+      '- If denied: continue without this file'
+    ].join('\n');
+  }
-  const stdin   = fs.readFileSync(0, 'utf8').trim();
+  const stdin = fs.readFileSync(0, 'utf8').trim();
   if (!stdin) process.exit(0);
-  const data      = JSON.parse(stdin);
-  const toolName  = data.tool_name  || '';
+  const data = JSON.parse(stdin);
+  const toolName = data.tool_name || '';
   const toolInput = data.tool_input || {};
-  const prompt    = data.prompt     || '';
-  const cwd       = data.cwd        || process.cwd();
-  const runtime   = readRuntime(cwd);
+  const cwd = data.cwd || process.cwd();
+  const runtime = readRuntime(cwd);
-  // Disabled via config
   if (runtime.privacyBlock === false) process.exit(0);
-  // Already approved by user
-  if (approved(prompt)) process.exit(0);
   const paths = extractPaths(toolName, toolInput);
   if (!paths.length) process.exit(0);
   for (const filePath of paths) {
     if (isSafe(filePath)) continue;
-    if (isSensitive(filePath)) {
-      console.log(
-        `RESTRICTED ACCESS: File protection active — approval required\n` +
-        `Target: ${filePath}\n\n` +
-        `Retry query with: APPROVED:${filePath}\n\n` +
-        `--- RESTRICTED_FILE_PROMPT_BEGIN ---\n` +
-        JSON.stringify({ type: 'RESTRICTED_PROMPT', file: filePath, tool: toolName }) + '\n' +
-        `--- RESTRICTED_FILE_PROMPT_END ---`
-      );
-      process.exit(2);
+    if (!isSensitive(filePath)) continue;
+    if (toolName === 'Bash') {
+      console.error(`WARN: Privacy-sensitive file access via bash allowed for approved follow-up: ${path.basename(filePath)}`);
+      process.exit(0);
     }
+    console.error(formatBlockMessage(filePath));
+    process.exit(2);
   }
   process.exit(0);
-} catch (e) {
+} catch (error) {
   try {
-    const fs = require('fs'), p = require('path');
-    const d = p.join(__dirname, '.logs');
-    if (!fs.existsSync(d)) fs.mkdirSync(d, { recursive: true });
-    fs.appendFileSync(p.join(d, 'hook-log.jsonl'),
-      JSON.stringify({ ts: new Date().toISOString(), hook: 'privacy-block', status: 'crash', error: e.message }) + '\n');
+    const fs = require('fs');
+    const path = require('path');
+    const logDir = path.join(__dirname, '.logs');
+    if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
+    fs.appendFileSync(
+      path.join(logDir, 'hook-log.jsonl'),
+      JSON.stringify({
+        ts: new Date().toISOString(),
+        hook: 'privacy-block',
+        status: 'crash',
+        error: error.message
+      }) + '\n'
+    );
   } catch (_) {}
-  process.exit(0); // fail-open
+  process.exit(0);
 }

package/src/claude/hooks/state.cjs CHANGED Viewed

@@ -3,32 +3,30 @@
  * Copyright (c) 2026 Haposoft. MIT License.
  *
  * Multi-event Hook — state.cjs
- * Implements: https://docs.anthropic.com/en/docs/claude-code/hooks
  *
  * Persists and restores session progress across Claude Code sessions.
  *
  * Events:
  *   SessionStart  → load previous state and print to context
- *   Stop          → extract todos + git changes, save to latest.md
+ *   PostToolUse   → refresh state after Task/TaskCreate/TaskUpdate/TodoWrite
+ *   Stop          → persist full session state and archive
  *   SubagentStop  → append agent completion note to current state
  *
  * Storage: .claude/session-state/latest.md (+ archive/)
- * Safety:  atomic writes, 7-day expiry, max 5 archives
- *
  * Exit: 0 always (fail-open)
  */
 try {
-  const fs     = require('fs');
-  const path   = require('path');
-  const os     = require('os');
+  const fs = require('fs');
+  const path = require('path');
+  const os = require('os');
   const crypto = require('crypto');
   const { execSync } = require('child_process');
+  const { parseTranscript } = require('./lib/parser.cjs');
-  const EXPIRY_DAYS  = 7;
+  const EXPIRY_DAYS = 7;
   const MAX_ARCHIVES = 5;
-  // ── Storage ───────────────────────────────────────────────────────────────
+  const TRACKED_POST_TOOL_EVENTS = new Set(['Task', 'TaskCreate', 'TaskUpdate', 'TodoWrite']);
   function stateDir(cwd) {
     try {
@@ -37,56 +35,73 @@ try {
         if (!fs.existsSync(local)) fs.mkdirSync(local, { recursive: true });
         return local;
       }
-      const hash   = crypto.createHash('md5').update(cwd).digest('hex').slice(0, 12);
+      const hash = crypto.createHash('md5').update(cwd).digest('hex').slice(0, 12);
       const global = path.join(os.homedir(), '.claude', 'session-states', hash);
       if (!fs.existsSync(global)) fs.mkdirSync(global, { recursive: true });
       return global;
-    } catch { return null; }
+    } catch {
+      return null;
+    }
   }
   function loadLatest(cwd) {
     try {
-      const dir   = stateDir(cwd);
+      const dir = stateDir(cwd);
       if (!dir) return null;
-      const file  = path.join(dir, 'latest.md');
+      const file = path.join(dir, 'latest.md');
       if (!fs.existsSync(file)) return null;
-      const text  = fs.readFileSync(file, 'utf8');
-      const tsMatch = text.match(/<!-- Generated: (.+?) -->/);
-      if (tsMatch) {
-        const parsed = new Date(tsMatch[1]).getTime();
-        if (isNaN(parsed)) return null;
-        if (Date.now() - parsed > EXPIRY_DAYS * 24 * 60 * 60 * 1000) return null;
+      const text = fs.readFileSync(file, 'utf8');
+      const match = text.match(/<!-- Generated: (.+?) -->/);
+      if (match) {
+        const generatedAt = new Date(match[1]).getTime();
+        if (Number.isNaN(generatedAt)) return null;
+        if (Date.now() - generatedAt > EXPIRY_DAYS * 24 * 60 * 60 * 1000) return null;
       }
       return text;
-    } catch { return null; }
+    } catch {
+      return null;
+    }
   }
   function writeAtomic(filePath, content) {
-    const tmp = `${filePath}.${process.pid}.${Math.random().toString(36).slice(2)}.tmp`;
-    fs.writeFileSync(tmp, content);
-    fs.renameSync(tmp, filePath);
+    const tempFile = `${filePath}.${process.pid}.${Math.random().toString(36).slice(2)}.tmp`;
+    fs.writeFileSync(tempFile, content);
+    fs.renameSync(tempFile, filePath);
   }
   function archive(dir) {
     try {
-      const src  = path.join(dir, 'latest.md');
-      if (!fs.existsSync(src)) return;
-      const aDir = path.join(dir, 'archive');
-      if (!fs.existsSync(aDir)) fs.mkdirSync(aDir);
-      const now  = new Date();
-      const pad  = n => String(n).padStart(2, '0');
-      const ts   = `${now.getFullYear()}${pad(now.getMonth()+1)}${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}`;
-      fs.copyFileSync(src, path.join(aDir, `${ts}.md`));
-      const files = fs.readdirSync(aDir).filter(f => f.endsWith('.md')).sort();
+      const latestFile = path.join(dir, 'latest.md');
+      if (!fs.existsSync(latestFile)) return;
+      const archiveDir = path.join(dir, 'archive');
+      if (!fs.existsSync(archiveDir)) fs.mkdirSync(archiveDir);
+      const now = new Date();
+      const pad = (value) => String(value).padStart(2, '0');
+      const stamp = `${now.getFullYear()}${pad(now.getMonth() + 1)}${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}`;
+      fs.copyFileSync(latestFile, path.join(archiveDir, `${stamp}.md`));
+      const files = fs.readdirSync(archiveDir).filter((file) => file.endsWith('.md')).sort();
       while (files.length > MAX_ARCHIVES) {
-        try { fs.unlinkSync(path.join(aDir, files.shift())); } catch { /* ignore */ }
+        const oldest = files.shift();
+        try {
+          fs.unlinkSync(path.join(archiveDir, oldest));
+        } catch {
+          // fail-open
+        }
       }
-    } catch { /* fail-open */ }
+    } catch {
+      // fail-open
+    }
   }
-  // ── Data extraction ───────────────────────────────────────────────────────
-  function extractSessionData(stdinData) {
+  async function extractSessionData(stdinData) {
     const data = {
       timestamp: new Date().toISOString(),
       branch: process.env.GIT_BRANCH || '',
@@ -96,132 +111,159 @@ try {
     if (stdinData.transcript_path && fs.existsSync(stdinData.transcript_path)) {
       try {
-        const latest = [];
-        const lines  = fs.readFileSync(stdinData.transcript_path, 'utf8').split('\n').filter(Boolean);
-        for (const line of lines) {
-          try {
-            const entry = JSON.parse(line);
-            const blocks = entry.message?.content;
-            if (!Array.isArray(blocks)) continue;
-            for (const b of blocks) {
-              if (b.type === 'tool_use' && b.name === 'TodoWrite' && Array.isArray(b.input?.todos)) {
-                latest.length = 0;
-                latest.push(...b.input.todos);
-              }
-            }
-          } catch { /* skip bad lines */ }
-        }
-        data.todos = latest;
-      } catch { /* ignore */ }
+        const transcript = await parseTranscript(stdinData.transcript_path);
+        data.todos = transcript.todos;
+      } catch {
+        // fail-open
+      }
     }
     try {
-      const out = execSync('git diff --name-only HEAD', {
-        encoding: 'utf8', timeout: 3000, stdio: ['pipe','pipe','pipe']
+      const diff = execSync('git diff --name-only HEAD', {
+        encoding: 'utf8',
+        timeout: 3000,
+        stdio: ['pipe', 'pipe', 'pipe']
       }).trim();
-      if (out) data.modifiedFiles = out.split('\n').slice(0, 20);
-    } catch { /* ignore */ }
+      if (diff) {
+        data.modifiedFiles = diff.split('\n').slice(0, 20);
+      }
+    } catch {
+      // fail-open
+    }
     return data;
   }
-  // ── Markdown builder ──────────────────────────────────────────────────────
   function buildStateContent(data) {
-    const done    = data.todos.filter(t => t.status === 'completed');
-    const pending = data.todos.filter(t => t.status !== 'completed');
+    const done = data.todos.filter((todo) => todo.status === 'completed' || todo.status === 'done');
+    const pending = data.todos.filter((todo) => !['completed', 'done'].includes(todo.status));
     return [
       '# Session State',
       `<!-- Generated: ${data.timestamp} -->`,
       `<!-- Branch: ${data.branch || 'unknown'} -->`,
       '',
       '## What Worked (Verified)',
-      ...(done.length    ? done.map(t => `- ${t.content}`)    : ['- (No completed tasks recorded)']),
+      ...(done.length ? done.map((todo) => `- ${todo.content}`) : ['- (No completed tasks recorded)']),
       '',
       "## What's Left",
-      ...(pending.length ? pending.map(t => `- [ ] ${t.content}`) : ['- (All tasks completed)']),
+      ...(pending.length ? pending.map((todo) => `- [ ] ${todo.content}`) : ['- (All tasks completed)']),
       '',
       '## Key Files Modified',
-      ...(data.modifiedFiles.length   ? data.modifiedFiles.map(f => `- ${f}`) : ['- (No file changes detected)']),
+      ...(data.modifiedFiles.length ? data.modifiedFiles.map((file) => `- ${file}`) : ['- (No file changes detected)']),
       ''
     ].join('\n');
   }
   function buildAgentSection(data) {
-    const type = data.agent_type || 'unknown';
-    const ts   = new Date().toISOString().slice(11, 19);
-    return `\n## Agent Result: ${type} (${ts})\n- Completed at ${ts}\n`;
+    const agentType = data.agent_type || 'unknown';
+    const time = new Date().toISOString().slice(11, 19);
+    return `\n## Agent Result: ${agentType} (${time})\n- Completed at ${time}\n`;
   }
-  // ── Main ──────────────────────────────────────────────────────────────────
+  function mergeAgentSections(existing, content) {
+    if (!existing) return content;
+    const agentSections = existing.match(/## Agent Result:.+?(?=\n## |$)/gs);
+    if (!agentSections) return content;
-  const stdin   = fs.readFileSync(0, 'utf8').trim();
-  if (!stdin) process.exit(0);
+    const marker = '\n## Key Files Modified';
+    if (content.includes(marker)) {
+      return content.replace(marker, `\n${agentSections.join('\n')}${marker}`);
+    }
+    return `${content.trimEnd()}\n\n${agentSections.join('\n')}\n`;
+  }
-  const data  = JSON.parse(stdin);
-  const event = data.hook_event_name || '';
-  const cwd   = data.cwd || process.cwd();
-  const dir   = stateDir(cwd);
+  function appendAgentSection(existing, agentSection) {
+    if (!existing) return agentSection.trimStart();
-  // SessionStart: restore previous state
-  if (event === 'SessionStart') {
-    const prev = loadLatest(cwd);
-    if (prev) {
-      console.log('\n=== Prior Execution Context ===');
-      console.log(prev.trim());
-      console.log('=== End of Prior Context ===\n');
+    const marker = '\n## Key Files Modified';
+    if (existing.includes(marker)) {
+      return existing.replace(marker, `\n${agentSection}${marker}`);
     }
-    process.exit(0);
+    return `${existing.trimEnd()}\n${agentSection}`;
   }
-  // SubagentStop: append completion note
-  if (event === 'SubagentStop' && dir) {
-    const file    = path.join(dir, 'latest.md');
-    const agentSection = buildAgentSection(data);
+  async function persistSnapshot(dir, data, options = {}) {
+    const file = path.join(dir, 'latest.md');
     const existing = fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
-    let updated;
-    if (existing) {
-      updated = existing.replace(/(\n## Key Files Modified)/, `\n${agentSection}$1`);
-      if (updated === existing) updated = existing.trimEnd() + '\n' + agentSection;
-    } else {
-      updated = buildStateContent(extractSessionData(data)) + '\n' + agentSection;
-    }
-    writeAtomic(file, updated);
-    process.exit(0);
+    const content = mergeAgentSections(existing, buildStateContent(data));
+    writeAtomic(file, content);
+    if (options.archive) archive(dir);
   }
-  // Stop: persist full state
-  if (event === 'Stop' && dir) {
-    const file    = path.join(dir, 'latest.md');
-    const sessionData = extractSessionData(data);
-    let content = buildStateContent(sessionData);
-    // Preserve agent sections from SubagentStop
-    if (fs.existsSync(file)) {
-      const existing = fs.readFileSync(file, 'utf8');
-      const agentMatches = existing.match(/## Agent Result:.+?(?=\n## |$)/gs);
-      if (agentMatches) {
-        content = content.replace(
-          /(\n## Key Files Modified)/,
-          `\n${agentMatches.join('\n')}$1`
-        );
+  async function main() {
+    const stdin = fs.readFileSync(0, 'utf8').trim();
+    if (!stdin) process.exit(0);
+    const data = JSON.parse(stdin);
+    const event = data.hook_event_name || '';
+    const cwd = data.cwd || process.cwd();
+    const dir = stateDir(cwd);
+    if (event === 'SessionStart') {
+      const previous = loadLatest(cwd);
+      if (previous) {
+        console.log('\n=== Prior Execution Context ===');
+        console.log(previous.trim());
+        console.log('=== End of Prior Context ===\n');
       }
+      process.exit(0);
+    }
+    if (!dir) process.exit(0);
+    if (event === 'PostToolUse') {
+      const toolName = data.tool_name || '';
+      if (TRACKED_POST_TOOL_EVENTS.has(toolName)) {
+        const sessionData = await extractSessionData(data);
+        await persistSnapshot(dir, sessionData);
+      }
+      process.exit(0);
+    }
+    if (event === 'SubagentStop') {
+      const file = path.join(dir, 'latest.md');
+      const agentSection = buildAgentSection(data);
+      const existing = fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
+      const updated = existing
+        ? appendAgentSection(existing, agentSection)
+        : `${buildStateContent(await extractSessionData(data))}\n${agentSection}`;
+      writeAtomic(file, updated);
+      process.exit(0);
+    }
+    if (event === 'Stop') {
+      const sessionData = await extractSessionData(data);
+      await persistSnapshot(dir, sessionData, { archive: true });
+      process.exit(0);
     }
-    writeAtomic(file, content);
-    archive(dir);
     process.exit(0);
   }
-  process.exit(0);
-} catch (e) {
+  main().catch(() => {
+    process.exit(0);
+  });
+} catch (error) {
   try {
-    const fs = require('fs'), p = require('path');
-    const d = p.join(__dirname, '.logs');
-    if (!fs.existsSync(d)) fs.mkdirSync(d, { recursive: true });
-    fs.appendFileSync(p.join(d, 'hook-log.jsonl'),
-      JSON.stringify({ ts: new Date().toISOString(), hook: 'state', status: 'crash', error: e.message }) + '\n');
+    const fs = require('fs');
+    const path = require('path');
+    const logDir = path.join(__dirname, '.logs');
+    if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
+    fs.appendFileSync(
+      path.join(logDir, 'hook-log.jsonl'),
+      JSON.stringify({
+        ts: new Date().toISOString(),
+        hook: 'state',
+        status: 'crash',
+        error: error.message
+      }) + '\n'
+    );
   } catch (_) {}
   process.exit(0);
 }

package/src/claude/rules/manage-docs.md CHANGED Viewed

@@ -22,6 +22,7 @@ The project maintains these core documents in `./docs`:
 The `hapo:docs-keeper` agent is responsible for keeping these documents current. Trigger an update whenever:
 - A development phase transitions (e.g., "In Progress" → "Complete")
+- A verified task completion changes user-facing behavior, architecture, API contracts, operational flow, or project status enough that docs should be refreshed
 - A significant feature ships or a critical bug is resolved
 - Security patches are applied or dependencies change
 - Project scope or timeline shifts

package/src/claude/settings/settings.json CHANGED Viewed

@@ -56,7 +56,7 @@
     ],
     "PreToolUse": [
       {
-        "matcher": "Read|Write|Edit|MultiEdit|Bash|Glob",
+        "matcher": "Read|Write|Edit|MultiEdit|Bash|Glob|Grep",
         "hooks": [
           {
             "type": "command",
@@ -70,6 +70,15 @@
       }
     ],
     "PostToolUse": [
+      {
+        "matcher": "Task|TaskCreate|TaskUpdate|TodoWrite",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/state.cjs\""
+          }
+        ]
+      },
       {
         "matcher": "Edit|Write|MultiEdit",
         "hooks": [

package/src/claude/skills/develop/SKILL.md CHANGED Viewed

@@ -4,9 +4,9 @@ description: "Code execution engine: Reads specs and implements code end-to-end
 argument-hint: "[feature-name|specs-directory-path]"
 ---
-# Develop — Feature Implementation (Full Build)
+# Develop — Feature Implementation (Task-Orchestrated Build)
-Reads the full project specification (`hapo:specs`) and relentlessly implements code from A to Z in a disciplined, single-track workflow. Automatically overcomes obstacles and only escalates to the user when facing persistent critical failures.
+Reads the project specification (`hapo:specs`) and implements code through a disciplined task loop. In specific-task mode it behaves like a surgical executor. In full-spec mode it behaves like a sequential orchestrator, processing one unblocked task at a time and syncing state after every verified task.
 **Principles:** YAGNI, KISS, DRY | Continuous execution | Smart self-healing
@@ -18,6 +18,26 @@ Reads the full project specification (`hapo:specs`) and relentlessly implements
 /hapo:develop <feature name> <specific-task-file.md>
 ```
+## Execution Modes
+### 1. Specific-Task Mode
+Triggered by `/hapo:develop <feature> <task-file>`.
+- Load exactly one task file.
+- Implement only that task packet.
+- STOP immediately after the task is verified and synchronized.
+- Never auto-chain into the next task.
+### 2. Full-Spec Mode
+Triggered by `/hapo:develop <feature>` or `/hapo:develop specs/<feature>`.
+- Build a queue from `spec.json.task_registry`.
+- Select the next `pending` + unblocked task only.
+- Run the full implementation cycle for that single task.
+- Sync state.
+- Recompute the queue and continue.
+- STOP the overall run on the first blocked task, unresolved gate failure, or missing proof.
 <HARD-GATE>
 DO NOT write implementation code until an approved spec exists.
 - If the directory `specs/<feature-name>` DOES NOT EXIST or `spec.json` is not ready, automatically trigger `/hapo:specs <feature-name>` first to create the specification. Do not improvise.
@@ -28,6 +48,11 @@ A task is NOT done because code compiles or a placeholder renders.
 A task is done only when the task file's Completion Criteria AND Verification & Evidence section are satisfied with real execution proof.
 </DEFINITION-OF-DONE>
+<CONTRACT-FIDELITY>
+If the spec/task explicitly names a framework, auth system, datastore, transport path, or runtime boundary, that named choice is contractual.
+You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "placeholder", "temporary auth", "in-memory until later") unless the spec itself is updated first.
+</CONTRACT-FIDELITY>
 ## Anti-Rationalization Protocol
 | Thought (Excuse) | Reality (Rule) |
@@ -55,13 +80,15 @@ flowchart TD
 - Load `task_registry` and verify it matches the requested task file(s). If registry is missing or stale, route to `/hapo:sync audit <feature>` before coding.
 - **Task Scoping (CRITICAL):**
   - If the user specifies a particular task file (e.g., `task-R0-02...md`), load **ONLY** that specific file into working memory.
-  - If no specific task is mentioned, list and load all Markdown files in `specs/<feature-name>/tasks/*.md`.
+  - If no specific task is mentioned, DO NOT load all tasks into working memory. Resolve the next single unblocked `pending` task from `task_registry` and load only that task packet.
 - **Task Packet Extraction (MANDATORY):** Before coding, extract from the active task file(s):
   - Objective + Constraints
   - Related Files
   - Completion Criteria
   - Verification & Evidence
+  - Exact executable verification commands named in the task
   - Requirement IDs referenced by the task
+  - Named technologies, frameworks, protocols, and data stores that the task/spec explicitly requires
   - Relevant `Canonical Contracts & Invariants` from `design.md`
 - If the task file is missing actionable completion or verification detail, STOP and route back to spec correction. Do not guess.
 - Before coding, set the active task(s) to `in_progress` in both markdown and `spec.json.task_registry`, or route through `/hapo:sync` if the runtime expects the sync protocol.
@@ -73,22 +100,34 @@ flowchart TD
 - Act as `god-developer` OR directly write code, executing tasks specified in the loaded Markdown file(s) sequentially.
 - **Important:** You may create and modify files directly, but must faithfully follow the design from the Spec.
 - Progress tracking: Temporarily change `[ ]` to `[/]` in Spec files while coding is in progress. Do NOT mark `[x]` before Step 4 passes.
+- **Task Boundary Protocol (CRITICAL):**
+  - Default editable scope is `Related Files` from the task packet.
+  - You may additionally touch direct test files plus minimal support files required to make the current task executable (shared types, exports, config glue, generated migration wiring).
+  - If you must edit a file outside this scope, explicitly treat it as a `scope escape` and justify why it is required for the current task.
+  - If the out-of-scope change would deliver functionality clearly assigned to a later task, STOP instead of implementing it early.
 - **Hard Stop Protocol:** If you were asked to implement a specific task file, you MUST STOP completely after that task is verified. DO NOT auto-chain or jump to "Next Task" simply because you see it in the spec. Wait for the user's next command.
+- **Full-Spec Loop Protocol:** If you were asked to implement the whole feature, you MUST still work one task at a time. Finish Step 4 and Step 5 for the current task before selecting the next unblocked task from `task_registry`.
 - **Test Integrity Protocol:** You MUST NOT delete, replace, or reduce the scope of existing test cases to make tests pass. If a test fails, you must fix the **implementation code** or fix the **test setup/mock**, NOT remove the assertion. Reducing test count or weakening assertions (e.g., removing `toHaveBeenCalledWith` and replacing with `toEqual(expect.any(...))`) is a Critical violation.
 - **Contract Integrity Protocol:** If implementation appears to require changing auth/session, transport, persistence, entrypoint wiring, or generated artifact behavior beyond what `design.md` states, STOP and route back to spec correction instead of inventing a new contract in code.
+- **Named Technology Rule:** If the task/spec explicitly requires a named dependency or runtime choice (for example Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, S3), you MUST implement that choice or stop. Do not swap it for a custom/in-memory/local substitute and still call the task complete.
+- **Cross-Service Reality Rule:** If a task spans multiple processes or runtimes (web ↔ API, worker ↔ DB, extension ↔ backend), you MUST prove the integration uses shared real state or a real contract boundary. Process-local placeholders on both sides do not count as completion.
+- **Placeholder Completion Rule:** You MAY scaffold future files only when the active task truly needs them to compile, but placeholder route handlers, in-memory stores, or fake adapters MUST NOT be used as evidence that the current task's behavior works end-to-end.
 ### Step 4: Self-Healing (Quality Gate Auto-Fix)
 The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
 **Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
 - Passing Step 4 requires ALL of the following:
-  1. Automated verification passes (typecheck/test/build as applicable)
+  1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Verification & Evidence` section
   2. Code review passes
   3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
+- `PRECHECK_FAIL` outranks `NO_TESTS`. If compile/typecheck/build fails, the task is FAIL even when no test suite exists yet.
+- `NO_TESTS` is NOT equivalent to PASS. If the task explicitly requires a test command or automated test proof, `NO_TESTS` is a FAIL or BLOCKED outcome until the requirement is satisfied or the spec is corrected.
 - If build/test passes but task evidence is missing, the task is still FAIL.
+- If the implementation silently replaced a named contract choice or relies on cross-service process-local stand-ins, the task is still FAIL.
 - Only escalate to the user after 3 consecutive failed review rounds.
-### Step 5: State Sync + Incremental Docs Sync
+### Step 5: State Sync + Task-Level Docs Sync
 - Only after Step 4 passes may you mark task checkboxes completed and sync `spec.json` progress/timestamps/task_registry.
 - If verification is partial or blocked by environment, keep the task in `pending` or `in_progress` and record the blocker instead of pretending completion.
 - A completed task MUST leave behind:
@@ -96,10 +135,18 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
   - `spec.json.task_registry[path].status = "done"`
   - `completed_at` + `last_updated_at`
   - synchronized top-level `updated_at`
-- After passing the Quality Gate, evaluate if any actual codebase modifications occurred (e.g., check pending files via git status).
-- If files were created or modified: Trigger `docs-keeper` automatically to execute `repomix` and update the global `/docs/` and project logs.
+  - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran, their outcomes, and what proof was observed
+- Verification receipts with `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation intentionally simplified a named contract MUST NOT be synchronized as `done`.
+- After syncing the active task, run a **Task Closeout Docs Checkpoint**
+- Task Closeout Docs Checkpoint:
+  - Evaluate `Docs impact: none | minor | major` based on real behavior changes from the just-completed task
+  - If `none`: record that explicitly in the completion report and stop
+  - If `minor` or `major`: trigger `docs-keeper` to surgically update affected existing docs under `./docs`
+  - Default to **lightweight docs sync**: update only the docs touched by this task and its verified behavior; do NOT run `repomix` unless `docs-keeper` truly cannot verify the required architecture/context from the code, spec, and current docs
 - **CWD Protocol (CRITICAL):** When spawning `docs-keeper`, you MUST ensure the agent's Current Working Directory (CWD context) is explicitly set to the **Workspace Root**, NOT the inner package directory you were just coding in. Otherwise, `docs-keeper` will search for the root `docs/` folder in the wrong place and crash.
-- Do NOT skip this step! The user explicitly requires documentation to be synced immediately after every `/hapo:develop` action, overriding the default Phase 3-only rule.
+- Task-level docs sync happens after every verified completed task, but actual edits still depend on `Docs impact`.
+- In **Specific-Task Mode**, STOP after sync and report the result.
+- In **Full-Spec Mode**, only after sync may you re-read `task_registry`, pick the next unblocked pending task, and repeat from Step 1 for that task.
 ---
 ## Attached References

package/src/claude/skills/develop/references/quality-gate.md CHANGED Viewed

@@ -7,6 +7,16 @@ Green tests are NOT enough. The gate requires three proofs:
 2. Code/spec review
 3. Task evidence (completion criteria + runtime/artifact proof from the task file)
+## Automation Semantics
+- If the task names exact commands in `Verification & Evidence`, those exact commands are mandatory and must run before any fallback repo defaults.
+- Preflight compile/typecheck/build health is mandatory. If compile/typecheck/build fails before tests are meaningful, the gate result is `PRECHECK_FAIL`, not `NO_TESTS`.
+- `NO_TESTS` is never an automatic PASS.
+- `NO_TESTS` is acceptable only when the task does **not** require a dedicated test suite command and every other required automated command/evidence item passes.
+- If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
+- Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
+- Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
 ## Parallel Quality Cycle
 Maximum retry counter: **3 attempts**. Exceeding 3 triggers a collapse warning.
@@ -17,6 +27,7 @@ Variable: retry_count = 0
 Before START_LOOP:
   - Read the active task file(s)
   - Extract Related Files, Completion Criteria, Verification & Evidence
+  - Extract the exact executable verification commands in declaration order
   - Extract relevant design contracts/invariants for the touched area
   - If any of these are missing or too vague to verify, FAIL immediately and route back to spec correction
@@ -25,11 +36,11 @@ START_LOOP:
   PARALLEL GATE: Spawn BOTH agents simultaneously
   ---------------------------------------------------------------
   → Task(subagent_type="test-runner",
-        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute: pre-flight typecheck/lint, relevant tests, build commands, and every Verification & Evidence item that is executable. Inspect named artifacts/runtime outputs. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED.",
+        prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
         description="Test [feature]")
   → Task(subagent_type="code-auditor",
-        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, or contract drift are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
+        prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
         description="Review [feature]")
   Wait for BOTH to return results.
@@ -38,7 +49,7 @@ START_LOOP:
   COMBINE RESULTS
   ---------------------------------------------------------------
-  CASE 1 — Test FAIL OR Evidence FAIL / UNVERIFIED:
+  CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
     - Increment retry_count++
     - If retry_count >= 3:
         → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
@@ -56,7 +67,7 @@ START_LOOP:
   CASE 3 — Test PASS + Evidence PASS + Review PASS (Score >= 9.5 AND Critical = 0):
     → PASS! Auto-approved.
-    → PROCEED to completion report.
+    → PROCEED to completion report with a verification receipt summarizing exact commands executed, artifact/runtime proof, and review result.
 REVIEW_ONLY:
   ---------------------------------------------------------------
@@ -77,6 +88,9 @@ REVIEW_ONLY:
 - **Architecture:** Breaking MVC boundaries, cross-module coupling, convention violations.
 - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
 - **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
+- **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task.
+- **Contract Substitution Drift:** Replacing a named framework/auth/transport/datastore/runtime boundary with a custom simplification without a spec amendment.
+- **Cross-Service Reality Failure:** Claiming end-to-end behavior across web/api/worker/extension boundaries while state only exists in local process memory or placeholder adapters.
 ## Terminal Log Format
@@ -84,5 +98,6 @@ Must log the Quality Gate result to the terminal for user visibility:
 - **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
 - **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
+- **Preflight Fail:** `[x] Step 4 Quality Gate: PRECHECK_FAIL → compile/typecheck/build failed before tests mattered`
 - **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
 - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`

package/src/claude/skills/specs/references/review.md CHANGED Viewed

@@ -138,7 +138,7 @@ If "Review each one": For each finding, ask: "Apply" | "Reject" | "Modify sugges
 | # | Finding | Severity | Disposition | Applied To |
 |---|---------|----------|-------------|------------|
-| 1 | {title} | Critical | Accept | Task 02 |
+| 1 | {title} | Critical | Accept | task-R0-02-... |
 ```
 ---

package/src/claude/skills/specs/references/task-hydration.md CHANGED Viewed

@@ -16,16 +16,16 @@ Convert task files (persistent storage) into Claude Tasks (session-scoped only),
 ┌──────────────────┐  Hydrate   ┌───────────────────┐
 │ Task Files       │ ─────────► │ Claude Tasks      │
 │ (persistent)     │            │ (session-scoped)  │
-│ [ ] Task 01      │            │ ◼ pending         │
-│ [ ] Task 02      │            │ ◼ pending         │
+│ [ ] task-R0-01   │            │ ◼ pending         │
+│ [ ] task-R0-02   │            │ ◼ pending         │
 └──────────────────┘            └───────────────────┘
                                         │ Work
                                         ▼
 ┌──────────────────┐  Sync-back ┌───────────────────┐
 │ Task Files       │ ◄───────── │ Task Updates      │
 │ (updated)        │            │ (completed)       │
-│ [x] Task 01      │            │ ✓ completed       │
-│ [ ] Task 02      │            │ ◼ in_progress     │
+│ [x] task-R0-01   │            │ ✓ completed       │
+│ [ ] task-R0-02   │            │ ◼ in_progress     │
 └──────────────────┘            └───────────────────┘
 ```

package/src/claude/skills/sync/SKILL.md CHANGED Viewed

@@ -35,7 +35,9 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
 1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
 2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
 3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Verification & Evidence` checkboxes that have actual proof.
-4. **Task Completion Hook:** When `hapo:sync` marks the final pending task as `done`, it should automatically prompt the user if they'd like to advance the phase.
+4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
+5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
+6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.
 ## References
 Read `references/sync-protocols.md` for exact Search/Replace regex patterns and JSON schema expectations before acting on the files.

package/src/claude/skills/sync/references/sync-protocols.md CHANGED Viewed

@@ -15,6 +15,10 @@ When requested to update a phase or change task configuration, `spec.json` must
     - full relative path like `tasks/task-R0-02-extension-shell.md`
 *   **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
 *   **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
+*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
+*   **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
+*   **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
+*   **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
 ## 2. Updating `tasks/task-**.md`
@@ -23,10 +27,13 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
 ### A. Completing a Task
 When `/hapo:sync <feature> <task-id> done`:
 1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
-2. Replace with: `**Status:** done`.
-3. Locate block: `## Implementation Steps`.
-4. Convert `- [ ]` into `- [x]` strictly within that section.
-5. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
+2. Inspect `## Verification & Evidence` first. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
+3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
+4. Replace with: `**Status:** done`.
+5. Locate block: `## Implementation Steps`.
+6. Convert `- [ ]` into `- [x]` strictly within that section.
+7. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
+8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
 ### B. Blocking a Task
 When `/hapo:sync <feature> <task-id> blocked "API error"`:
@@ -52,4 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
    - Missing disk file referenced in registry → remove or flag it
    - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
    - Registry says `done` but markdown still pending → update markdown only if evidence exists
+   - Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
+   - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
 5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
+6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.

package/src/claude/skills/test/SKILL.md CHANGED Viewed

@@ -1,14 +1,14 @@
 ---
 name: hapo:test
 description: "Run and verify project tests across all scopes: unit, integration, e2e, and UI. Blast-radius scoping for speed, chrome-devtools for UI verification, structured verdicts for downstream automation."
-argument-hint: "[scope|--full|--ui <url>|--ui-auth <url>]"
+argument-hint: "[scope|--full|--ui <url>|--ui-auth <url>|--ui-flow <url>]"
 version: 2.0.0
 ---
 # Test — Verify Implementation Quality
 Run the project's test suite, analyze results, and return a structured verdict.
-Designed to work **after `hapo:code`** and to run **in parallel with `hapo:code-review`** during the `hapo:develop` Quality Gate.
+Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same `test-runner` contract that `hapo:develop` relies on during its Quality Gate, and may run **in parallel with `hapo:code-review`**.
 **Principles:** Fail-fast | Blast-radius scoping | Zero hidden failures | No mocking to pass

package/src/claude/skills/test/references/execution-strategy.md CHANGED Viewed

@@ -38,7 +38,7 @@ When `--full` is NOT specified, narrow the test scope to only what changed:
 ### Pre-flight Checks (always run first)
 Catch compile errors before spending time on tests.
-**HARD RULE:** If a pre-flight tool (like `eslint`, `flake8`, `tsc`) is missing, you MUST install it (e.g., `npm install -D eslint`, `pip install flake8`). Do NOT skip the pre-flight check.
+**HARD RULE:** Do NOT auto-install missing tooling during verification. If a required pre-flight tool (like `eslint`, `flake8`, `tsc`) is missing, stop and report it as an environment gap or missing project setup.
 ```bash
 # JavaScript / TypeScript
@@ -59,7 +59,7 @@ cargo check
 flutter analyze
 ```
-If pre-flight fails → report `Compile Error`, do NOT proceed to test execution.
+If pre-flight fails or a required tool is missing → report `Compile Error` / `Environment Gap`, do NOT proceed to test execution.
 ### Test Execution by Language
@@ -361,4 +361,3 @@ Flag as `Security Warning` if:
 - Mixed content (HTTP resources on HTTPS page) detected via network audit
 - `autocomplete="off"` missing on password fields