@haposoft/cafekit 0.7.25 → 0.7.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@haposoft/cafekit",
3
- "version": "0.7.25",
3
+ "version": "0.7.27",
4
4
  "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
5
5
  "author": "Haposoft <nghialt@haposoft.com>",
6
6
  "license": "MIT",
@@ -19,11 +19,14 @@ If the prompt includes task file paths, requirement IDs, completion criteria, or
19
19
 
20
20
  Extract and verify:
21
21
  1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
22
- 2. Completion Criteria
23
- 3. Verification & Evidence expectations
24
- 4. Canonical Contracts & Invariants from the design
22
+ 2. Declared task scope (`Related Files` and direct support files that are clearly justified)
23
+ 3. Completion Criteria
24
+ 4. Verification & Evidence expectations
25
+ 5. Canonical Contracts & Invariants from the design
26
+ 6. Named technologies and runtime choices that the task/spec explicitly requires
25
27
 
26
28
  Any missing declared deliverable, placeholder-only wiring, or contract drift is a **Critical** issue even if tests/build pass.
29
+ If the task/spec explicitly names Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, or any other concrete choice, replacing it with a custom simplification is a **Critical** issue unless the spec was amended first.
27
30
 
28
31
  ## Pre-Review: Blast Radius Check (MANDATORY)
29
32
 
@@ -58,6 +61,9 @@ Before reading any specific logic, you MUST run a Dependency Scope Check (Blast
58
61
  - Hunt serious logic bugs (crashes, data loss, infinite loops).
59
62
  - Hunt severe architecture violations (circular imports, cross-layer coupling).
60
63
  - Hunt missing required artifacts/runtime entrypoints and spec contract mismatches.
64
+ - Hunt overscope edits: later-task deliverables, unjustified file additions, or edits outside the active task packet.
65
+ - Hunt named-contract substitutions: custom placeholders or in-memory stand-ins where the spec required a concrete framework/service.
66
+ - Hunt fake cross-service proof: flows that claim web ↔ api ↔ worker ↔ extension integration while using isolated local state on each side.
61
67
 
62
68
  **Pass 2 — Quality Scan (Non-Blocking Issues):**
63
69
  - Project conventions (`docs/code-standards.md` if available).
@@ -94,6 +100,7 @@ Classify each issue:
94
100
 
95
101
  ### Task / Spec Compliance
96
102
  - [OK or issue] Required deliverables present?
103
+ - [OK or issue] Changes stayed within task scope?
97
104
  - [OK or issue] Completion criteria actually satisfied?
98
105
  - [OK or issue] Any contract drift vs design/task?
99
106
 
@@ -126,6 +133,10 @@ When called from `hapo:develop` Step 4 (Quality Gate Auto-Fix):
126
133
  - Missing required entrypoint/artifact/runtime output named in the task/spec
127
134
  - Placeholder scaffolding marked as complete when the task demanded real wiring
128
135
  - Auth/session/transport/persistence behavior that contradicts the design contracts
136
+ - Silent replacement of a named framework/auth/provider/transport/datastore with a custom simplification
137
+ - Cross-service behavior "proven" only by process-local memory, fake adapters, or other non-shared placeholders
138
+ - Files or features from later tasks delivered early without explicit scope-escape justification
139
+ - Task marked complete while required commands/evidence are still FAIL / UNVERIFIED
129
140
 
130
141
  ## Operating Guidelines
131
142
 
@@ -40,10 +40,21 @@ Before documenting ANY code reference, you MUST prove it exists:
40
40
 
41
41
  ### 4. Codebase Summary Engine
42
42
  Generate the project's technical DNA map:
43
- - Run `repomix` to compact the entire repo into `./repomix-output.xml`.
43
+ - Default to direct code + docs verification first.
44
+ - Run `repomix` only when macro-architecture context is truly required or `./docs/codebase-summary.md` is missing/stale enough that you cannot safely update docs from the local evidence alone.
45
+ - When `repomix` is used, compact the repo into `./repomix-output.xml`.
44
46
  - Digest and synthesize into `./docs/codebase-summary.md` (Update the existing one).
45
47
  - This file acts as the single source of truth for all other agents to quickly grasp the project landscape.
46
48
 
49
+ ### 4b. Task Closeout Mode
50
+ When called from `hapo:develop` after a verified task is complete:
51
+ - Treat the job as a **lightweight task-closeout sync**
52
+ - Update only the existing docs affected by that task
53
+ - Start by classifying `Docs impact: none | minor | major`
54
+ - If impact is `none`, return a short report and do not force edits
55
+ - If impact is `minor` or `major`, prefer surgical edits to `docs/project-overview-pdr.md`, `docs/system-architecture.md`, `docs/code-standards.md`, changelog/roadmap files, or other already-existing docs
56
+ - Do NOT run `repomix` just because code changed; use it only if direct verification is insufficient
57
+
47
58
  ### 5. File Size Discipline
48
59
  If any doc file exceeds **800 LOC**, enforce modularity:
49
60
  1. Identify semantic boundaries (distinct topics that can stand alone).
@@ -77,8 +77,8 @@ Upon completion, output a concise report in this format:
77
77
  - ...
78
78
 
79
79
  ### Tasks Completed
80
- - [x] Task 01: ...
81
- - [x] Task 02: ...
80
+ - [x] R0-01: ...
81
+ - [x] R0-02: ...
82
82
 
83
83
  ### Build Results
84
84
  - Typecheck: [pass/fail]
@@ -12,6 +12,17 @@ You are a battle-hardened QA engineer who has been burned by production incident
12
12
 
13
13
  If the prompt includes task file paths, Completion Criteria, or Verification & Evidence instructions, treat them as authoritative.
14
14
  Diff-aware test selection does NOT replace task-specific verification.
15
+ If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
16
+
17
+ ## Command Resolution Order
18
+
19
+ When the task file names exact commands, use this order:
20
+ 1. Run every exact executable command from `Verification & Evidence` in declaration order.
21
+ 2. Run repo-default typecheck/test/build commands only to fill gaps not already covered above.
22
+ 3. Apply diff-aware test selection only after task-mandated commands are satisfied.
23
+
24
+ Never silently substitute a lighter command for a task-mandated one. Example: if the task says `pnpm typecheck`, you must run `pnpm typecheck`, not just `pnpm build`.
25
+ Preflight compile/typecheck/build failures take precedence over the absence of tests.
15
26
 
16
27
  ## Operating Modes
17
28
 
@@ -39,12 +50,13 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
39
50
  ## Execution Pipeline
40
51
 
41
52
  1. **Detect Project Type:** Scan for `package.json`, `pytest.ini`, `Cargo.toml`, `pubspec.yaml` to identify the test runner.
42
- 2. **Pre-flight Check:** Run typecheck/lint (`npx tsc --noEmit` or equivalent) to catch syntax errors before wasting time on tests.
53
+ 2. **Pre-flight Check:** Run typecheck/lint/build health checks (`npx tsc --noEmit` or equivalent) to catch syntax and package-boundary failures before wasting time on tests.
43
54
  3. **Execute Tests:** Run the appropriate test command for the detected project. Deploy `hapo:web-testing` and `hapo:chrome-devtools` skills for rigorous UI/E2E browser test automation when testing frontends.
44
55
  4. **Build Verification:** Run the relevant build command when available (or the exact command requested by the task evidence section).
45
56
  5. **Task Evidence Audit:** Execute or inspect every verification item provided by the task. If a check cannot run, mark it `UNVERIFIED` with the exact blocker.
46
- 6. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
47
- 7. **Verdict:** Output structured report.
57
+ 6. **Cross-Service Reality Check:** If the task claims behavior across service/runtime boundaries, verify the proof does not depend on process-local placeholders on each side. If it does, mark the evidence FAIL.
58
+ 7. **Coverage Analysis:** Generate coverage report. Flag any module below 80% line coverage.
59
+ 8. **Verdict:** Output structured report.
48
60
 
49
61
  ## Supported Ecosystems
50
62
 
@@ -73,6 +85,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
73
85
  - Typecheck/Lint: PASS | FAIL | N/A
74
86
  - Build: PASS | FAIL | N/A
75
87
 
88
+ ### Exact Commands Executed
89
+ - `command here` → PASS | FAIL | UNVERIFIED
90
+
76
91
  ### Coverage
77
92
  - Lines: [X%] | Branches: [X%] | Functions: [X%]
78
93
  - ⚠️ Below threshold: [list modules < 80%]
@@ -89,7 +104,7 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
89
104
  ### Unmapped Files (No Tests Found)
90
105
  - `src/new-module.ts` — Consider adding tests for [function/class]
91
106
 
92
- ### Verdict: [PASS | FAIL | NEEDS_ATTENTION]
107
+ ### Verdict: [PASS | FAIL | PRECHECK_FAIL | NEEDS_ATTENTION]
93
108
  ```
94
109
 
95
110
  ## Strict Rules — The "Anti-Illusion" Protocol
@@ -100,4 +115,9 @@ Run the entire test suite without diff filtering. Use when: first run, major ref
100
115
  - **Flaky Tests:** If a test is flaky (passes/fails intermittently), flag it explicitly — do not retry silently.
101
116
  - **No Evidence, No PASS:** If required artifact/runtime verification is missing, omitted, or blocked, you MUST NOT return PASS.
102
117
  - **Placeholder Trap:** If build succeeds but the task-required entrypoint/artifact/runtime surface is missing (for example popup, content script, route, migration, auth flow), return FAIL or NEEDS_ATTENTION with evidence.
118
+ - **Named Contract Trap:** If the task/spec requires a named dependency or protocol and the implementation replaced it with a custom simplification, flag the evidence as FAIL.
119
+ - **Cross-Service Reality Trap:** If web/api/worker/extension proof relies on separate in-memory stores or other process-local stand-ins instead of shared real state, return FAIL.
120
+ - **Required Command Missing = FAIL:** If the task explicitly names a command and it was not run successfully, you MUST NOT return PASS.
121
+ - **PRECHECK_FAIL Semantics:** If compile/typecheck/build fails, return `PRECHECK_FAIL` even when no tests exist yet.
122
+ - **NO_TESTS Semantics:** If no tests exist, report `NO_TESTS` explicitly. `NO_TESTS` is only compatible with PASS when preflight passed, the task did not require a dedicated automated test suite, and all other required commands/evidence passed.
103
123
  - Report honestly. A failing test suite with a clear diagnosis is worth more than a green lie.
@@ -3,124 +3,177 @@
3
3
  * Copyright (c) 2026 Haposoft. MIT License.
4
4
  *
5
5
  * PreToolUse Hook — privacy-block.cjs
6
- * Implements: https://docs.anthropic.com/en/docs/claude-code/hooks
7
6
  *
8
- * Blocks access to sensitive files unless the user explicitly approves.
7
+ * Claude Code CLI privacy gate for sensitive files.
9
8
  *
10
- * Approval flow:
11
- * 1. Hook blocks with exit(2) and shows a prompt
12
- * 2. Claude Code asks user for approval
13
- * 3. User approves Claude retries with "APPROVED:" prefix
14
- * 4. Hook detects prefix allows through
15
- *
16
- * Disable: set "privacyBlock": false in .claude/runtime.json
9
+ * Runtime contract:
10
+ * - Non-bash file access to sensitive files is blocked with a JSON marker
11
+ * - Assistant must use AskUserQuestion with that JSON payload
12
+ * - If user approves, assistant should read via `bash cat "file"`
13
+ * - Bash access is allowed with a warning to enable the approved follow-up path
17
14
  *
18
15
  * Exit: 0 = allow, 2 = block
19
16
  */
20
17
 
21
18
  try {
22
- const fs = require('fs');
19
+ const fs = require('fs');
23
20
  const path = require('path');
24
21
 
25
- // Sensitive file patterns — matched against basename and full path
26
22
  const RESTRICTED_PATTERNS = [
27
- /^\.env(\.|$)/i, // .env, .env.local, .env.production …
28
- /^credentials/i, // credentials.json, aws-credentials …
29
- /secrets?\.(ya?ml|json)$/i, // secrets.yaml, secret.json
30
- /\.pem$/i, // TLS certificates
31
- /\.key$/i, // Private keys
32
- /\.p12$/i, // PKCS12 bundles
33
- /\.pfx$/i, // PFX bundles
34
- /^id_(rsa|ed25519|ecdsa|dsa)$/i, // SSH private keys
35
- /\.netrc$/i, // Network credentials
36
- /\.pgpass$/i, // PostgreSQL passwords
37
- /kubeconfig/i, // Kubernetes config
38
- /\.keystore$/i, // Java / Android keystores
39
- /\.jks$/i, // Java KeyStore
40
- /auth\.json$/i, // OAuth tokens
41
- /token(s)?\.json$/i, // Token files
23
+ /^\.env(\.|$)/i,
24
+ /^credentials/i,
25
+ /secrets?\.(ya?ml|json)$/i,
26
+ /\.pem$/i,
27
+ /\.key$/i,
28
+ /\.p12$/i,
29
+ /\.pfx$/i,
30
+ /^id_(rsa|ed25519|ecdsa|dsa)$/i,
31
+ /\.netrc$/i,
32
+ /\.pgpass$/i,
33
+ /kubeconfig/i,
34
+ /\.keystore$/i,
35
+ /\.jks$/i,
36
+ /auth\.json$/i,
37
+ /token(s)?\.json$/i
42
38
  ];
43
39
 
44
- // Safe exceptions — these always pass through (example / template files)
45
40
  const ALLOWED_EXEMPTIONS = [
46
- /\.env\.(example|sample|template|test)$/i,
41
+ /\.env\.(example|sample|template|test)$/i
47
42
  ];
48
43
 
49
- function isSafe(p) { const b = path.basename(p); return ALLOWED_EXEMPTIONS.some(r => r.test(b) || r.test(p)); }
50
- function isSensitive(p) { const b = path.basename(p); return RESTRICTED_PATTERNS.some(r => r.test(b) || r.test(p)); }
51
-
52
- /** Extract file paths from various tool inputs */
53
- function extractPaths(toolName, input) {
54
- const out = [];
55
- if (!input) return out;
56
- if (input.file_path) out.push(input.file_path);
57
- if (input.path) out.push(input.path);
58
- // Bash: look for cat/less/head/tail etc.
59
- if (typeof input.command === 'string') {
60
- const m = input.command.match(/(?:cat|less|more|head|tail|source|\.)\s+(\S+)/g);
61
- if (m) m.forEach(s => out.push(s.trim().split(/\s+/).pop()));
44
+ function readRuntime(cwd) {
45
+ try {
46
+ const file = path.join(cwd, '.claude', 'runtime.json');
47
+ return fs.existsSync(file) ? JSON.parse(fs.readFileSync(file, 'utf8')) : {};
48
+ } catch {
49
+ return {};
62
50
  }
63
- return out.filter(Boolean);
64
51
  }
65
52
 
66
- /** True if the user prompt contains an APPROVED: prefix */
67
- function approved(prompt) {
68
- return typeof prompt === 'string' && prompt.includes('APPROVED:');
53
+ function isSafe(filePath) {
54
+ const base = path.basename(filePath);
55
+ return ALLOWED_EXEMPTIONS.some((rule) => rule.test(base) || rule.test(filePath));
69
56
  }
70
57
 
71
- /** Read runtime.json */
72
- function readRuntime(cwd) {
73
- try {
74
- const p = path.join(cwd, '.claude', 'runtime.json');
75
- return fs.existsSync(p) ? JSON.parse(fs.readFileSync(p, 'utf8')) : {};
76
- } catch { return {}; }
58
+ function isSensitive(filePath) {
59
+ const base = path.basename(filePath);
60
+ return RESTRICTED_PATTERNS.some((rule) => rule.test(base) || rule.test(filePath));
61
+ }
62
+
63
+ function extractBashPaths(command) {
64
+ const paths = [];
65
+ const regex = /(?:cat|less|more|head|tail|source|\.)\s+(?:"([^"]+)"|'([^']+)'|([^\s]+))/g;
66
+ let match;
67
+ while ((match = regex.exec(command)) !== null) {
68
+ paths.push(match[1] || match[2] || match[3]);
69
+ }
70
+ return paths;
71
+ }
72
+
73
+ function extractPaths(toolName, input) {
74
+ const paths = [];
75
+ if (!input) return paths;
76
+
77
+ for (const key of ['file_path', 'path']) {
78
+ if (typeof input[key] === 'string' && input[key].trim()) {
79
+ paths.push(input[key].trim());
80
+ }
81
+ }
82
+
83
+ for (const key of ['paths', 'search_paths']) {
84
+ if (Array.isArray(input[key])) {
85
+ paths.push(...input[key].filter(Boolean));
86
+ }
87
+ }
88
+
89
+ if (toolName === 'Bash' && typeof input.command === 'string') {
90
+ paths.push(...extractBashPaths(input.command));
91
+ }
92
+
93
+ return paths.filter(Boolean);
77
94
  }
78
95
 
79
- // ── Main ──────────────────────────────────────────────────────────────────
96
+ function formatBlockMessage(filePath) {
97
+ const basename = path.basename(filePath);
98
+ const promptData = {
99
+ type: 'PRIVACY_PROMPT',
100
+ file: filePath,
101
+ basename,
102
+ question: {
103
+ header: 'File Access',
104
+ text: `I need to read "${basename}" which may contain sensitive data (API keys, passwords, tokens). Do you approve?`,
105
+ options: [
106
+ {
107
+ label: 'Yes, approve access',
108
+ description: `Allow reading ${basename} this time`
109
+ },
110
+ {
111
+ label: 'No, skip this file',
112
+ description: 'Continue without accessing this file'
113
+ }
114
+ ]
115
+ }
116
+ };
117
+
118
+ return [
119
+ 'NOTE: This is not an error. This block protects sensitive data.',
120
+ '',
121
+ `PRIVACY BLOCK: Sensitive file access requires user approval`,
122
+ `File: ${filePath}`,
123
+ '',
124
+ '@@PRIVACY_PROMPT_START@@',
125
+ JSON.stringify(promptData, null, 2),
126
+ '@@PRIVACY_PROMPT_END@@',
127
+ '',
128
+ 'Claude Code follow-up:',
129
+ `- If approved: use bash to read: cat "${filePath}"`,
130
+ '- If denied: continue without this file'
131
+ ].join('\n');
132
+ }
80
133
 
81
- const stdin = fs.readFileSync(0, 'utf8').trim();
134
+ const stdin = fs.readFileSync(0, 'utf8').trim();
82
135
  if (!stdin) process.exit(0);
83
136
 
84
- const data = JSON.parse(stdin);
85
- const toolName = data.tool_name || '';
137
+ const data = JSON.parse(stdin);
138
+ const toolName = data.tool_name || '';
86
139
  const toolInput = data.tool_input || {};
87
- const prompt = data.prompt || '';
88
- const cwd = data.cwd || process.cwd();
89
- const runtime = readRuntime(cwd);
140
+ const cwd = data.cwd || process.cwd();
141
+ const runtime = readRuntime(cwd);
90
142
 
91
- // Disabled via config
92
143
  if (runtime.privacyBlock === false) process.exit(0);
93
144
 
94
- // Already approved by user
95
- if (approved(prompt)) process.exit(0);
96
-
97
145
  const paths = extractPaths(toolName, toolInput);
98
146
  if (!paths.length) process.exit(0);
99
147
 
100
148
  for (const filePath of paths) {
101
149
  if (isSafe(filePath)) continue;
102
- if (isSensitive(filePath)) {
103
- console.log(
104
- `RESTRICTED ACCESS: File protection active — approval required\n` +
105
- `Target: ${filePath}\n\n` +
106
- `Retry query with: APPROVED:${filePath}\n\n` +
107
- `--- RESTRICTED_FILE_PROMPT_BEGIN ---\n` +
108
- JSON.stringify({ type: 'RESTRICTED_PROMPT', file: filePath, tool: toolName }) + '\n' +
109
- `--- RESTRICTED_FILE_PROMPT_END ---`
110
- );
111
- process.exit(2);
150
+ if (!isSensitive(filePath)) continue;
151
+
152
+ if (toolName === 'Bash') {
153
+ console.error(`WARN: Privacy-sensitive file access via bash allowed for approved follow-up: ${path.basename(filePath)}`);
154
+ process.exit(0);
112
155
  }
156
+
157
+ console.error(formatBlockMessage(filePath));
158
+ process.exit(2);
113
159
  }
114
160
 
115
161
  process.exit(0);
116
-
117
- } catch (e) {
162
+ } catch (error) {
118
163
  try {
119
- const fs = require('fs'), p = require('path');
120
- const d = p.join(__dirname, '.logs');
121
- if (!fs.existsSync(d)) fs.mkdirSync(d, { recursive: true });
122
- fs.appendFileSync(p.join(d, 'hook-log.jsonl'),
123
- JSON.stringify({ ts: new Date().toISOString(), hook: 'privacy-block', status: 'crash', error: e.message }) + '\n');
164
+ const fs = require('fs');
165
+ const path = require('path');
166
+ const logDir = path.join(__dirname, '.logs');
167
+ if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
168
+ fs.appendFileSync(
169
+ path.join(logDir, 'hook-log.jsonl'),
170
+ JSON.stringify({
171
+ ts: new Date().toISOString(),
172
+ hook: 'privacy-block',
173
+ status: 'crash',
174
+ error: error.message
175
+ }) + '\n'
176
+ );
124
177
  } catch (_) {}
125
- process.exit(0); // fail-open
178
+ process.exit(0);
126
179
  }
@@ -3,32 +3,30 @@
3
3
  * Copyright (c) 2026 Haposoft. MIT License.
4
4
  *
5
5
  * Multi-event Hook — state.cjs
6
- * Implements: https://docs.anthropic.com/en/docs/claude-code/hooks
7
6
  *
8
7
  * Persists and restores session progress across Claude Code sessions.
9
8
  *
10
9
  * Events:
11
10
  * SessionStart → load previous state and print to context
12
- * Stop extract todos + git changes, save to latest.md
11
+ * PostToolUse refresh state after Task/TaskCreate/TaskUpdate/TodoWrite
12
+ * Stop → persist full session state and archive
13
13
  * SubagentStop → append agent completion note to current state
14
14
  *
15
15
  * Storage: .claude/session-state/latest.md (+ archive/)
16
- * Safety: atomic writes, 7-day expiry, max 5 archives
17
- *
18
16
  * Exit: 0 always (fail-open)
19
17
  */
20
18
 
21
19
  try {
22
- const fs = require('fs');
23
- const path = require('path');
24
- const os = require('os');
20
+ const fs = require('fs');
21
+ const path = require('path');
22
+ const os = require('os');
25
23
  const crypto = require('crypto');
26
24
  const { execSync } = require('child_process');
25
+ const { parseTranscript } = require('./lib/parser.cjs');
27
26
 
28
- const EXPIRY_DAYS = 7;
27
+ const EXPIRY_DAYS = 7;
29
28
  const MAX_ARCHIVES = 5;
30
-
31
- // ── Storage ───────────────────────────────────────────────────────────────
29
+ const TRACKED_POST_TOOL_EVENTS = new Set(['Task', 'TaskCreate', 'TaskUpdate', 'TodoWrite']);
32
30
 
33
31
  function stateDir(cwd) {
34
32
  try {
@@ -37,56 +35,73 @@ try {
37
35
  if (!fs.existsSync(local)) fs.mkdirSync(local, { recursive: true });
38
36
  return local;
39
37
  }
40
- const hash = crypto.createHash('md5').update(cwd).digest('hex').slice(0, 12);
38
+
39
+ const hash = crypto.createHash('md5').update(cwd).digest('hex').slice(0, 12);
41
40
  const global = path.join(os.homedir(), '.claude', 'session-states', hash);
42
41
  if (!fs.existsSync(global)) fs.mkdirSync(global, { recursive: true });
43
42
  return global;
44
- } catch { return null; }
43
+ } catch {
44
+ return null;
45
+ }
45
46
  }
46
47
 
47
48
  function loadLatest(cwd) {
48
49
  try {
49
- const dir = stateDir(cwd);
50
+ const dir = stateDir(cwd);
50
51
  if (!dir) return null;
51
- const file = path.join(dir, 'latest.md');
52
+
53
+ const file = path.join(dir, 'latest.md');
52
54
  if (!fs.existsSync(file)) return null;
53
- const text = fs.readFileSync(file, 'utf8');
54
- const tsMatch = text.match(/<!-- Generated: (.+?) -->/);
55
- if (tsMatch) {
56
- const parsed = new Date(tsMatch[1]).getTime();
57
- if (isNaN(parsed)) return null;
58
- if (Date.now() - parsed > EXPIRY_DAYS * 24 * 60 * 60 * 1000) return null;
55
+
56
+ const text = fs.readFileSync(file, 'utf8');
57
+ const match = text.match(/<!-- Generated: (.+?) -->/);
58
+ if (match) {
59
+ const generatedAt = new Date(match[1]).getTime();
60
+ if (Number.isNaN(generatedAt)) return null;
61
+ if (Date.now() - generatedAt > EXPIRY_DAYS * 24 * 60 * 60 * 1000) return null;
59
62
  }
63
+
60
64
  return text;
61
- } catch { return null; }
65
+ } catch {
66
+ return null;
67
+ }
62
68
  }
63
69
 
64
70
  function writeAtomic(filePath, content) {
65
- const tmp = `${filePath}.${process.pid}.${Math.random().toString(36).slice(2)}.tmp`;
66
- fs.writeFileSync(tmp, content);
67
- fs.renameSync(tmp, filePath);
71
+ const tempFile = `${filePath}.${process.pid}.${Math.random().toString(36).slice(2)}.tmp`;
72
+ fs.writeFileSync(tempFile, content);
73
+ fs.renameSync(tempFile, filePath);
68
74
  }
69
75
 
70
76
  function archive(dir) {
71
77
  try {
72
- const src = path.join(dir, 'latest.md');
73
- if (!fs.existsSync(src)) return;
74
- const aDir = path.join(dir, 'archive');
75
- if (!fs.existsSync(aDir)) fs.mkdirSync(aDir);
76
- const now = new Date();
77
- const pad = n => String(n).padStart(2, '0');
78
- const ts = `${now.getFullYear()}${pad(now.getMonth()+1)}${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}`;
79
- fs.copyFileSync(src, path.join(aDir, `${ts}.md`));
80
- const files = fs.readdirSync(aDir).filter(f => f.endsWith('.md')).sort();
78
+ const latestFile = path.join(dir, 'latest.md');
79
+ if (!fs.existsSync(latestFile)) return;
80
+
81
+ const archiveDir = path.join(dir, 'archive');
82
+ if (!fs.existsSync(archiveDir)) fs.mkdirSync(archiveDir);
83
+
84
+ const now = new Date();
85
+ const pad = (value) => String(value).padStart(2, '0');
86
+ const stamp = `${now.getFullYear()}${pad(now.getMonth() + 1)}${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}`;
87
+
88
+ fs.copyFileSync(latestFile, path.join(archiveDir, `${stamp}.md`));
89
+
90
+ const files = fs.readdirSync(archiveDir).filter((file) => file.endsWith('.md')).sort();
81
91
  while (files.length > MAX_ARCHIVES) {
82
- try { fs.unlinkSync(path.join(aDir, files.shift())); } catch { /* ignore */ }
92
+ const oldest = files.shift();
93
+ try {
94
+ fs.unlinkSync(path.join(archiveDir, oldest));
95
+ } catch {
96
+ // fail-open
97
+ }
83
98
  }
84
- } catch { /* fail-open */ }
99
+ } catch {
100
+ // fail-open
101
+ }
85
102
  }
86
103
 
87
- // ── Data extraction ───────────────────────────────────────────────────────
88
-
89
- function extractSessionData(stdinData) {
104
+ async function extractSessionData(stdinData) {
90
105
  const data = {
91
106
  timestamp: new Date().toISOString(),
92
107
  branch: process.env.GIT_BRANCH || '',
@@ -96,132 +111,159 @@ try {
96
111
 
97
112
  if (stdinData.transcript_path && fs.existsSync(stdinData.transcript_path)) {
98
113
  try {
99
- const latest = [];
100
- const lines = fs.readFileSync(stdinData.transcript_path, 'utf8').split('\n').filter(Boolean);
101
- for (const line of lines) {
102
- try {
103
- const entry = JSON.parse(line);
104
- const blocks = entry.message?.content;
105
- if (!Array.isArray(blocks)) continue;
106
- for (const b of blocks) {
107
- if (b.type === 'tool_use' && b.name === 'TodoWrite' && Array.isArray(b.input?.todos)) {
108
- latest.length = 0;
109
- latest.push(...b.input.todos);
110
- }
111
- }
112
- } catch { /* skip bad lines */ }
113
- }
114
- data.todos = latest;
115
- } catch { /* ignore */ }
114
+ const transcript = await parseTranscript(stdinData.transcript_path);
115
+ data.todos = transcript.todos;
116
+ } catch {
117
+ // fail-open
118
+ }
116
119
  }
117
120
 
118
121
  try {
119
- const out = execSync('git diff --name-only HEAD', {
120
- encoding: 'utf8', timeout: 3000, stdio: ['pipe','pipe','pipe']
122
+ const diff = execSync('git diff --name-only HEAD', {
123
+ encoding: 'utf8',
124
+ timeout: 3000,
125
+ stdio: ['pipe', 'pipe', 'pipe']
121
126
  }).trim();
122
- if (out) data.modifiedFiles = out.split('\n').slice(0, 20);
123
- } catch { /* ignore */ }
127
+
128
+ if (diff) {
129
+ data.modifiedFiles = diff.split('\n').slice(0, 20);
130
+ }
131
+ } catch {
132
+ // fail-open
133
+ }
124
134
 
125
135
  return data;
126
136
  }
127
137
 
128
- // ── Markdown builder ──────────────────────────────────────────────────────
129
-
130
138
  function buildStateContent(data) {
131
- const done = data.todos.filter(t => t.status === 'completed');
132
- const pending = data.todos.filter(t => t.status !== 'completed');
139
+ const done = data.todos.filter((todo) => todo.status === 'completed' || todo.status === 'done');
140
+ const pending = data.todos.filter((todo) => !['completed', 'done'].includes(todo.status));
141
+
133
142
  return [
134
143
  '# Session State',
135
144
  `<!-- Generated: ${data.timestamp} -->`,
136
145
  `<!-- Branch: ${data.branch || 'unknown'} -->`,
137
146
  '',
138
147
  '## What Worked (Verified)',
139
- ...(done.length ? done.map(t => `- ${t.content}`) : ['- (No completed tasks recorded)']),
148
+ ...(done.length ? done.map((todo) => `- ${todo.content}`) : ['- (No completed tasks recorded)']),
140
149
  '',
141
150
  "## What's Left",
142
- ...(pending.length ? pending.map(t => `- [ ] ${t.content}`) : ['- (All tasks completed)']),
151
+ ...(pending.length ? pending.map((todo) => `- [ ] ${todo.content}`) : ['- (All tasks completed)']),
143
152
  '',
144
153
  '## Key Files Modified',
145
- ...(data.modifiedFiles.length ? data.modifiedFiles.map(f => `- ${f}`) : ['- (No file changes detected)']),
154
+ ...(data.modifiedFiles.length ? data.modifiedFiles.map((file) => `- ${file}`) : ['- (No file changes detected)']),
146
155
  ''
147
156
  ].join('\n');
148
157
  }
149
158
 
150
159
  function buildAgentSection(data) {
151
- const type = data.agent_type || 'unknown';
152
- const ts = new Date().toISOString().slice(11, 19);
153
- return `\n## Agent Result: ${type} (${ts})\n- Completed at ${ts}\n`;
160
+ const agentType = data.agent_type || 'unknown';
161
+ const time = new Date().toISOString().slice(11, 19);
162
+ return `\n## Agent Result: ${agentType} (${time})\n- Completed at ${time}\n`;
154
163
  }
155
164
 
156
- // ── Main ──────────────────────────────────────────────────────────────────
165
+ function mergeAgentSections(existing, content) {
166
+ if (!existing) return content;
167
+
168
+ const agentSections = existing.match(/## Agent Result:.+?(?=\n## |$)/gs);
169
+ if (!agentSections) return content;
157
170
 
158
- const stdin = fs.readFileSync(0, 'utf8').trim();
159
- if (!stdin) process.exit(0);
171
+ const marker = '\n## Key Files Modified';
172
+ if (content.includes(marker)) {
173
+ return content.replace(marker, `\n${agentSections.join('\n')}${marker}`);
174
+ }
175
+
176
+ return `${content.trimEnd()}\n\n${agentSections.join('\n')}\n`;
177
+ }
160
178
 
161
- const data = JSON.parse(stdin);
162
- const event = data.hook_event_name || '';
163
- const cwd = data.cwd || process.cwd();
164
- const dir = stateDir(cwd);
179
+ function appendAgentSection(existing, agentSection) {
180
+ if (!existing) return agentSection.trimStart();
165
181
 
166
- // SessionStart: restore previous state
167
- if (event === 'SessionStart') {
168
- const prev = loadLatest(cwd);
169
- if (prev) {
170
- console.log('\n=== Prior Execution Context ===');
171
- console.log(prev.trim());
172
- console.log('=== End of Prior Context ===\n');
182
+ const marker = '\n## Key Files Modified';
183
+ if (existing.includes(marker)) {
184
+ return existing.replace(marker, `\n${agentSection}${marker}`);
173
185
  }
174
- process.exit(0);
186
+
187
+ return `${existing.trimEnd()}\n${agentSection}`;
175
188
  }
176
189
 
177
- // SubagentStop: append completion note
178
- if (event === 'SubagentStop' && dir) {
179
- const file = path.join(dir, 'latest.md');
180
- const agentSection = buildAgentSection(data);
190
+ async function persistSnapshot(dir, data, options = {}) {
191
+ const file = path.join(dir, 'latest.md');
181
192
  const existing = fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
182
- let updated;
183
- if (existing) {
184
- updated = existing.replace(/(\n## Key Files Modified)/, `\n${agentSection}$1`);
185
- if (updated === existing) updated = existing.trimEnd() + '\n' + agentSection;
186
- } else {
187
- updated = buildStateContent(extractSessionData(data)) + '\n' + agentSection;
188
- }
189
- writeAtomic(file, updated);
190
- process.exit(0);
193
+ const content = mergeAgentSections(existing, buildStateContent(data));
194
+ writeAtomic(file, content);
195
+ if (options.archive) archive(dir);
191
196
  }
192
197
 
193
- // Stop: persist full state
194
- if (event === 'Stop' && dir) {
195
- const file = path.join(dir, 'latest.md');
196
- const sessionData = extractSessionData(data);
197
- let content = buildStateContent(sessionData);
198
-
199
- // Preserve agent sections from SubagentStop
200
- if (fs.existsSync(file)) {
201
- const existing = fs.readFileSync(file, 'utf8');
202
- const agentMatches = existing.match(/## Agent Result:.+?(?=\n## |$)/gs);
203
- if (agentMatches) {
204
- content = content.replace(
205
- /(\n## Key Files Modified)/,
206
- `\n${agentMatches.join('\n')}$1`
207
- );
198
+ async function main() {
199
+ const stdin = fs.readFileSync(0, 'utf8').trim();
200
+ if (!stdin) process.exit(0);
201
+
202
+ const data = JSON.parse(stdin);
203
+ const event = data.hook_event_name || '';
204
+ const cwd = data.cwd || process.cwd();
205
+ const dir = stateDir(cwd);
206
+
207
+ if (event === 'SessionStart') {
208
+ const previous = loadLatest(cwd);
209
+ if (previous) {
210
+ console.log('\n=== Prior Execution Context ===');
211
+ console.log(previous.trim());
212
+ console.log('=== End of Prior Context ===\n');
208
213
  }
214
+ process.exit(0);
215
+ }
216
+
217
+ if (!dir) process.exit(0);
218
+
219
+ if (event === 'PostToolUse') {
220
+ const toolName = data.tool_name || '';
221
+ if (TRACKED_POST_TOOL_EVENTS.has(toolName)) {
222
+ const sessionData = await extractSessionData(data);
223
+ await persistSnapshot(dir, sessionData);
224
+ }
225
+ process.exit(0);
226
+ }
227
+
228
+ if (event === 'SubagentStop') {
229
+ const file = path.join(dir, 'latest.md');
230
+ const agentSection = buildAgentSection(data);
231
+ const existing = fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
232
+ const updated = existing
233
+ ? appendAgentSection(existing, agentSection)
234
+ : `${buildStateContent(await extractSessionData(data))}\n${agentSection}`;
235
+
236
+ writeAtomic(file, updated);
237
+ process.exit(0);
238
+ }
239
+
240
+ if (event === 'Stop') {
241
+ const sessionData = await extractSessionData(data);
242
+ await persistSnapshot(dir, sessionData, { archive: true });
243
+ process.exit(0);
209
244
  }
210
245
 
211
- writeAtomic(file, content);
212
- archive(dir);
213
246
  process.exit(0);
214
247
  }
215
248
 
216
- process.exit(0);
217
-
218
- } catch (e) {
249
+ main().catch(() => {
250
+ process.exit(0);
251
+ });
252
+ } catch (error) {
219
253
  try {
220
- const fs = require('fs'), p = require('path');
221
- const d = p.join(__dirname, '.logs');
222
- if (!fs.existsSync(d)) fs.mkdirSync(d, { recursive: true });
223
- fs.appendFileSync(p.join(d, 'hook-log.jsonl'),
224
- JSON.stringify({ ts: new Date().toISOString(), hook: 'state', status: 'crash', error: e.message }) + '\n');
254
+ const fs = require('fs');
255
+ const path = require('path');
256
+ const logDir = path.join(__dirname, '.logs');
257
+ if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
258
+ fs.appendFileSync(
259
+ path.join(logDir, 'hook-log.jsonl'),
260
+ JSON.stringify({
261
+ ts: new Date().toISOString(),
262
+ hook: 'state',
263
+ status: 'crash',
264
+ error: error.message
265
+ }) + '\n'
266
+ );
225
267
  } catch (_) {}
226
268
  process.exit(0);
227
269
  }
@@ -22,6 +22,7 @@ The project maintains these core documents in `./docs`:
22
22
  The `hapo:docs-keeper` agent is responsible for keeping these documents current. Trigger an update whenever:
23
23
 
24
24
  - A development phase transitions (e.g., "In Progress" → "Complete")
25
+ - A verified task completion changes user-facing behavior, architecture, API contracts, operational flow, or project status enough that docs should be refreshed
25
26
  - A significant feature ships or a critical bug is resolved
26
27
  - Security patches are applied or dependencies change
27
28
  - Project scope or timeline shifts
@@ -56,7 +56,7 @@
56
56
  ],
57
57
  "PreToolUse": [
58
58
  {
59
- "matcher": "Read|Write|Edit|MultiEdit|Bash|Glob",
59
+ "matcher": "Read|Write|Edit|MultiEdit|Bash|Glob|Grep",
60
60
  "hooks": [
61
61
  {
62
62
  "type": "command",
@@ -70,6 +70,15 @@
70
70
  }
71
71
  ],
72
72
  "PostToolUse": [
73
+ {
74
+ "matcher": "Task|TaskCreate|TaskUpdate|TodoWrite",
75
+ "hooks": [
76
+ {
77
+ "type": "command",
78
+ "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/state.cjs\""
79
+ }
80
+ ]
81
+ },
73
82
  {
74
83
  "matcher": "Edit|Write|MultiEdit",
75
84
  "hooks": [
@@ -4,9 +4,9 @@ description: "Code execution engine: Reads specs and implements code end-to-end
4
4
  argument-hint: "[feature-name|specs-directory-path]"
5
5
  ---
6
6
 
7
- # Develop — Feature Implementation (Full Build)
7
+ # Develop — Feature Implementation (Task-Orchestrated Build)
8
8
 
9
- Reads the full project specification (`hapo:specs`) and relentlessly implements code from A to Z in a disciplined, single-track workflow. Automatically overcomes obstacles and only escalates to the user when facing persistent critical failures.
9
+ Reads the project specification (`hapo:specs`) and implements code through a disciplined task loop. In specific-task mode it behaves like a surgical executor. In full-spec mode it behaves like a sequential orchestrator, processing one unblocked task at a time and syncing state after every verified task.
10
10
 
11
11
  **Principles:** YAGNI, KISS, DRY | Continuous execution | Smart self-healing
12
12
 
@@ -18,6 +18,26 @@ Reads the full project specification (`hapo:specs`) and relentlessly implements
18
18
  /hapo:develop <feature name> <specific-task-file.md>
19
19
  ```
20
20
 
21
+ ## Execution Modes
22
+
23
+ ### 1. Specific-Task Mode
24
+ Triggered by `/hapo:develop <feature> <task-file>`.
25
+
26
+ - Load exactly one task file.
27
+ - Implement only that task packet.
28
+ - STOP immediately after the task is verified and synchronized.
29
+ - Never auto-chain into the next task.
30
+
31
+ ### 2. Full-Spec Mode
32
+ Triggered by `/hapo:develop <feature>` or `/hapo:develop specs/<feature>`.
33
+
34
+ - Build a queue from `spec.json.task_registry`.
35
+ - Select the next `pending` + unblocked task only.
36
+ - Run the full implementation cycle for that single task.
37
+ - Sync state.
38
+ - Recompute the queue and continue.
39
+ - STOP the overall run on the first blocked task, unresolved gate failure, or missing proof.
40
+
21
41
  <HARD-GATE>
22
42
  DO NOT write implementation code until an approved spec exists.
23
43
  - If the directory `specs/<feature-name>` DOES NOT EXIST or `spec.json` is not ready, automatically trigger `/hapo:specs <feature-name>` first to create the specification. Do not improvise.
@@ -28,6 +48,11 @@ A task is NOT done because code compiles or a placeholder renders.
28
48
  A task is done only when the task file's Completion Criteria AND Verification & Evidence section are satisfied with real execution proof.
29
49
  </DEFINITION-OF-DONE>
30
50
 
51
+ <CONTRACT-FIDELITY>
52
+ If the spec/task explicitly names a framework, auth system, datastore, transport path, or runtime boundary, that named choice is contractual.
53
+ You MUST NOT silently replace it with a simpler custom substitute ("for MVP", "placeholder", "temporary auth", "in-memory until later") unless the spec itself is updated first.
54
+ </CONTRACT-FIDELITY>
55
+
31
56
  ## Anti-Rationalization Protocol
32
57
 
33
58
  | Thought (Excuse) | Reality (Rule) |
@@ -55,13 +80,15 @@ flowchart TD
55
80
  - Load `task_registry` and verify it matches the requested task file(s). If registry is missing or stale, route to `/hapo:sync audit <feature>` before coding.
56
81
  - **Task Scoping (CRITICAL):**
57
82
  - If the user specifies a particular task file (e.g., `task-R0-02...md`), load **ONLY** that specific file into working memory.
58
- - If no specific task is mentioned, list and load all Markdown files in `specs/<feature-name>/tasks/*.md`.
83
+ - If no specific task is mentioned, DO NOT load all tasks into working memory. Resolve the next single unblocked `pending` task from `task_registry` and load only that task packet.
59
84
  - **Task Packet Extraction (MANDATORY):** Before coding, extract from the active task file(s):
60
85
  - Objective + Constraints
61
86
  - Related Files
62
87
  - Completion Criteria
63
88
  - Verification & Evidence
89
+ - Exact executable verification commands named in the task
64
90
  - Requirement IDs referenced by the task
91
+ - Named technologies, frameworks, protocols, and data stores that the task/spec explicitly requires
65
92
  - Relevant `Canonical Contracts & Invariants` from `design.md`
66
93
  - If the task file is missing actionable completion or verification detail, STOP and route back to spec correction. Do not guess.
67
94
  - Before coding, set the active task(s) to `in_progress` in both markdown and `spec.json.task_registry`, or route through `/hapo:sync` if the runtime expects the sync protocol.
@@ -73,22 +100,34 @@ flowchart TD
73
100
  - Act as `god-developer` OR directly write code, executing tasks specified in the loaded Markdown file(s) sequentially.
74
101
  - **Important:** You may create and modify files directly, but must faithfully follow the design from the Spec.
75
102
  - Progress tracking: Temporarily change `[ ]` to `[/]` in Spec files while coding is in progress. Do NOT mark `[x]` before Step 4 passes.
103
+ - **Task Boundary Protocol (CRITICAL):**
104
+ - Default editable scope is `Related Files` from the task packet.
105
+ - You may additionally touch direct test files plus minimal support files required to make the current task executable (shared types, exports, config glue, generated migration wiring).
106
+ - If you must edit a file outside this scope, explicitly treat it as a `scope escape` and justify why it is required for the current task.
107
+ - If the out-of-scope change would deliver functionality clearly assigned to a later task, STOP instead of implementing it early.
76
108
  - **Hard Stop Protocol:** If you were asked to implement a specific task file, you MUST STOP completely after that task is verified. DO NOT auto-chain or jump to "Next Task" simply because you see it in the spec. Wait for the user's next command.
109
+ - **Full-Spec Loop Protocol:** If you were asked to implement the whole feature, you MUST still work one task at a time. Finish Step 4 and Step 5 for the current task before selecting the next unblocked task from `task_registry`.
77
110
  - **Test Integrity Protocol:** You MUST NOT delete, replace, or reduce the scope of existing test cases to make tests pass. If a test fails, you must fix the **implementation code** or fix the **test setup/mock**, NOT remove the assertion. Reducing test count or weakening assertions (e.g., removing `toHaveBeenCalledWith` and replacing with `toEqual(expect.any(...))`) is a Critical violation.
78
111
  - **Contract Integrity Protocol:** If implementation appears to require changing auth/session, transport, persistence, entrypoint wiring, or generated artifact behavior beyond what `design.md` states, STOP and route back to spec correction instead of inventing a new contract in code.
112
+ - **Named Technology Rule:** If the task/spec explicitly requires a named dependency or runtime choice (for example Better Auth, Hono, Next.js proxy routes, Redis, Drizzle, S3), you MUST implement that choice or stop. Do not swap it for a custom/in-memory/local substitute and still call the task complete.
113
+ - **Cross-Service Reality Rule:** If a task spans multiple processes or runtimes (web ↔ API, worker ↔ DB, extension ↔ backend), you MUST prove the integration uses shared real state or a real contract boundary. Process-local placeholders on both sides do not count as completion.
114
+ - **Placeholder Completion Rule:** You MAY scaffold future files only when the active task truly needs them to compile, but placeholder route handlers, in-memory stores, or fake adapters MUST NOT be used as evidence that the current task's behavior works end-to-end.
79
115
 
80
116
  ### Step 4: Self-Healing (Quality Gate Auto-Fix)
81
117
  The moment you finish coding, DO NOT proceed further. Switch to `references/quality-gate.md` and run the automatic review loop.
82
118
  **Mantra:** All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
83
119
 
84
120
  - Passing Step 4 requires ALL of the following:
85
- 1. Automated verification passes (typecheck/test/build as applicable)
121
+ 1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Verification & Evidence` section
86
122
  2. Code review passes
87
123
  3. Task evidence passes (artifacts/runtime surfaces/negative-path checks from the task file are proven)
124
+ - `PRECHECK_FAIL` outranks `NO_TESTS`. If compile/typecheck/build fails, the task is FAIL even when no test suite exists yet.
125
+ - `NO_TESTS` is NOT equivalent to PASS. If the task explicitly requires a test command or automated test proof, `NO_TESTS` is a FAIL or BLOCKED outcome until the requirement is satisfied or the spec is corrected.
88
126
  - If build/test passes but task evidence is missing, the task is still FAIL.
127
+ - If the implementation silently replaced a named contract choice or relies on cross-service process-local stand-ins, the task is still FAIL.
89
128
  - Only escalate to the user after 3 consecutive failed review rounds.
90
129
 
91
- ### Step 5: State Sync + Incremental Docs Sync
130
+ ### Step 5: State Sync + Task-Level Docs Sync
92
131
  - Only after Step 4 passes may you mark task checkboxes completed and sync `spec.json` progress/timestamps/task_registry.
93
132
  - If verification is partial or blocked by environment, keep the task in `pending` or `in_progress` and record the blocker instead of pretending completion.
94
133
  - A completed task MUST leave behind:
@@ -96,10 +135,18 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
96
135
  - `spec.json.task_registry[path].status = "done"`
97
136
  - `completed_at` + `last_updated_at`
98
137
  - synchronized top-level `updated_at`
99
- - After passing the Quality Gate, evaluate if any actual codebase modifications occurred (e.g., check pending files via git status).
100
- - If files were created or modified: Trigger `docs-keeper` automatically to execute `repomix` and update the global `/docs/` and project logs.
138
+ - a human-readable verification receipt inside the task's `Verification & Evidence` section showing which commands ran, their outcomes, and what proof was observed
139
+ - Verification receipts with `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation intentionally simplified a named contract MUST NOT be synchronized as `done`.
140
+ - After syncing the active task, run a **Task Closeout Docs Checkpoint**
141
+ - Task Closeout Docs Checkpoint:
142
+ - Evaluate `Docs impact: none | minor | major` based on real behavior changes from the just-completed task
143
+ - If `none`: record that explicitly in the completion report and stop
144
+ - If `minor` or `major`: trigger `docs-keeper` to surgically update affected existing docs under `./docs`
145
+ - Default to **lightweight docs sync**: update only the docs touched by this task and its verified behavior; do NOT run `repomix` unless `docs-keeper` truly cannot verify the required architecture/context from the code, spec, and current docs
101
146
  - **CWD Protocol (CRITICAL):** When spawning `docs-keeper`, you MUST ensure the agent's Current Working Directory (CWD context) is explicitly set to the **Workspace Root**, NOT the inner package directory you were just coding in. Otherwise, `docs-keeper` will search for the root `docs/` folder in the wrong place and crash.
102
- - Do NOT skip this step! The user explicitly requires documentation to be synced immediately after every `/hapo:develop` action, overriding the default Phase 3-only rule.
147
+ - Task-level docs sync happens after every verified completed task, but actual edits still depend on `Docs impact`.
148
+ - In **Specific-Task Mode**, STOP after sync and report the result.
149
+ - In **Full-Spec Mode**, only after sync may you re-read `task_registry`, pick the next unblocked pending task, and repeat from Step 1 for that task.
103
150
 
104
151
  ---
105
152
  ## Attached References
@@ -7,6 +7,16 @@ Green tests are NOT enough. The gate requires three proofs:
7
7
  2. Code/spec review
8
8
  3. Task evidence (completion criteria + runtime/artifact proof from the task file)
9
9
 
10
+ ## Automation Semantics
11
+
12
+ - If the task names exact commands in `Verification & Evidence`, those exact commands are mandatory and must run before any fallback repo defaults.
13
+ - Preflight compile/typecheck/build health is mandatory. If compile/typecheck/build fails before tests are meaningful, the gate result is `PRECHECK_FAIL`, not `NO_TESTS`.
14
+ - `NO_TESTS` is never an automatic PASS.
15
+ - `NO_TESTS` is acceptable only when the task does **not** require a dedicated test suite command and every other required automated command/evidence item passes.
16
+ - If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
17
+ - Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
18
+ - Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
19
+
10
20
  ## Parallel Quality Cycle
11
21
 
12
22
  Maximum retry counter: **3 attempts**. Exceeding 3 triggers a collapse warning.
@@ -17,6 +27,7 @@ Variable: retry_count = 0
17
27
  Before START_LOOP:
18
28
  - Read the active task file(s)
19
29
  - Extract Related Files, Completion Criteria, Verification & Evidence
30
+ - Extract the exact executable verification commands in declaration order
20
31
  - Extract relevant design contracts/invariants for the touched area
21
32
  - If any of these are missing or too vague to verify, FAIL immediately and route back to spec correction
22
33
 
@@ -25,11 +36,11 @@ START_LOOP:
25
36
  PARALLEL GATE: Spawn BOTH agents simultaneously
26
37
  ---------------------------------------------------------------
27
38
  → Task(subagent_type="test-runner",
28
- prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute: pre-flight typecheck/lint, relevant tests, build commands, and every Verification & Evidence item that is executable. Inspect named artifacts/runtime outputs. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED.",
39
+ prompt="Run task-aware verification for the recently implemented code. Read the active task file(s) and execute the exact verification commands named there first, in order. Preflight compile/typecheck/build failures must be reported as PRECHECK_FAIL and take precedence over NO_TESTS. After that, run any additional repo-level typecheck/test/build checks needed for confidence. Inspect named artifacts/runtime outputs. For multi-service tasks, verify the flow does not rely on process-local stand-ins masquerading as shared state. Return PASS only if automated checks and task evidence both pass. Mark anything unexecuted as UNVERIFIED. Treat NO_TESTS as non-passing unless the task did not require a dedicated test suite.",
29
40
  description="Test [feature]")
30
41
 
31
42
  → Task(subagent_type="code-auditor",
32
- prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, or contract drift are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
43
+ prompt="Review all recently written code against the active task file(s), referenced requirements, and design contracts. Missing deliverables, placeholder-only wiring, missing runtime entrypoints, overscope edits outside the task packet, silent replacement of named technologies/contracts, or fake cross-service proof via process-local state are Critical even if build/tests pass. Check security, logic, architecture, YAGNI/KISS/DRY. Return score (X/10), critical count, warning list, and evidence gaps.",
33
44
  description="Review [feature]")
34
45
 
35
46
  Wait for BOTH to return results.
@@ -38,7 +49,7 @@ START_LOOP:
38
49
  COMBINE RESULTS
39
50
  ---------------------------------------------------------------
40
51
 
41
- CASE 1 — Test FAIL OR Evidence FAIL / UNVERIFIED:
52
+ CASE 1 — PRECHECK_FAIL OR Automated FAIL OR required command missing OR Evidence FAIL / UNVERIFIED OR NO_TESTS when tests were required:
42
53
  - Increment retry_count++
43
54
  - If retry_count >= 3:
44
55
  → COLLAPSE! AskUserQuestion: "Quality gate cannot prove this task is complete! User intervention required!"
@@ -56,7 +67,7 @@ START_LOOP:
56
67
 
57
68
  CASE 3 — Test PASS + Evidence PASS + Review PASS (Score >= 9.5 AND Critical = 0):
58
69
  → PASS! Auto-approved.
59
- → PROCEED to completion report.
70
+ → PROCEED to completion report with a verification receipt summarizing exact commands executed, artifact/runtime proof, and review result.
60
71
 
61
72
  REVIEW_ONLY:
62
73
  ---------------------------------------------------------------
@@ -77,6 +88,9 @@ REVIEW_ONLY:
77
88
  - **Architecture:** Breaking MVC boundaries, cross-module coupling, convention violations.
78
89
  - **Principles:** YAGNI violations, KISS violations, DRY violations (excessive code duplication).
79
90
  - **Evidence / Done-Criteria Drift:** Missing required artifacts, placeholder-only wiring, missing entrypoints, unproven completion criteria, or runtime contract mismatches.
91
+ - **Overscope Delivery Drift:** Implementing later-task deliverables or editing out-of-scope files without direct justification for the active task.
92
+ - **Contract Substitution Drift:** Replacing a named framework/auth/transport/datastore/runtime boundary with a custom simplification without a spec amendment.
93
+ - **Cross-Service Reality Failure:** Claiming end-to-end behavior across web/api/worker/extension boundaries while state only exists in local process memory or placeholder adapters.
80
94
 
81
95
  ## Terminal Log Format
82
96
 
@@ -84,5 +98,6 @@ Must log the Quality Gate result to the terminal for user visibility:
84
98
 
85
99
  - **Quick Pass:** `✓ Step 4 Quality Gate: Test PASS + Evidence PASS + Review 9.5/10 - Auto-Approved`
86
100
  - **Hard-Won Pass:** `✓ Step 4 Quality Gate: Failed 2 rounds → Test PASS + Evidence PASS + Review 9.6/10`
101
+ - **Preflight Fail:** `[x] Step 4 Quality Gate: PRECHECK_FAIL → compile/typecheck/build failed before tests mattered`
87
102
  - **Fix Needed:** `[~] Step 4 Quality Gate: Tests/evidence failed → returned to god-developer`
88
103
  - **Awaiting Rescue:** `[!] Step 4 Quality Gate: Failed 3 rounds! Awaiting user intervention...`
@@ -138,7 +138,7 @@ If "Review each one": For each finding, ask: "Apply" | "Reject" | "Modify sugges
138
138
 
139
139
  | # | Finding | Severity | Disposition | Applied To |
140
140
  |---|---------|----------|-------------|------------|
141
- | 1 | {title} | Critical | Accept | Task 02 |
141
+ | 1 | {title} | Critical | Accept | task-R0-02-... |
142
142
  ```
143
143
 
144
144
  ---
@@ -16,16 +16,16 @@ Convert task files (persistent storage) into Claude Tasks (session-scoped only),
16
16
  ┌──────────────────┐ Hydrate ┌───────────────────┐
17
17
  │ Task Files │ ─────────► │ Claude Tasks │
18
18
  │ (persistent) │ │ (session-scoped) │
19
- │ [ ] Task 01 │ │ ◼ pending │
20
- │ [ ] Task 02 │ │ ◼ pending │
19
+ │ [ ] task-R0-01 │ │ ◼ pending │
20
+ │ [ ] task-R0-02 │ │ ◼ pending │
21
21
  └──────────────────┘ └───────────────────┘
22
22
  │ Work
23
23
 
24
24
  ┌──────────────────┐ Sync-back ┌───────────────────┐
25
25
  │ Task Files │ ◄───────── │ Task Updates │
26
26
  │ (updated) │ │ (completed) │
27
- │ [x] Task 01 │ │ ✓ completed │
28
- │ [ ] Task 02 │ │ ◼ in_progress │
27
+ │ [x] task-R0-01 │ │ ✓ completed │
28
+ │ [ ] task-R0-02 │ │ ◼ in_progress │
29
29
  └──────────────────┘ └───────────────────┘
30
30
  ```
31
31
 
@@ -35,7 +35,9 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
35
35
  1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
36
36
  2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
37
37
  3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Verification & Evidence` checkboxes that have actual proof.
38
- 4. **Task Completion Hook:** When `hapo:sync` marks the final pending task as `done`, it should automatically prompt the user if they'd like to advance the phase.
38
+ 4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
39
+ 5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
40
+ 6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.
39
41
 
40
42
  ## References
41
43
  Read `references/sync-protocols.md` for exact Search/Replace regex patterns and JSON schema expectations before acting on the files.
@@ -15,6 +15,10 @@ When requested to update a phase or change task configuration, `spec.json` must
15
15
  - full relative path like `tasks/task-R0-02-extension-shell.md`
16
16
  * **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
17
17
  * **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
18
+ * **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
19
+ * **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
20
+ * **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
21
+ * **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
18
22
 
19
23
  ## 2. Updating `tasks/task-**.md`
20
24
 
@@ -23,10 +27,13 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
23
27
  ### A. Completing a Task
24
28
  When `/hapo:sync <feature> <task-id> done`:
25
29
  1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
26
- 2. Replace with: `**Status:** done`.
27
- 3. Locate block: `## Implementation Steps`.
28
- 4. Convert `- [ ]` into `- [x]` strictly within that section.
29
- 5. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
30
+ 2. Inspect `## Verification & Evidence` first. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
31
+ 3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
32
+ 4. Replace with: `**Status:** done`.
33
+ 5. Locate block: `## Implementation Steps`.
34
+ 6. Convert `- [ ]` into `- [x]` strictly within that section.
35
+ 7. Update relevant checkboxes in `## Completion Criteria` and `## Verification & Evidence` only when the caller provides or the file already contains real proof.
36
+ 8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
30
37
 
31
38
  ### B. Blocking a Task
32
39
  When `/hapo:sync <feature> <task-id> blocked "API error"`:
@@ -52,4 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
52
59
  - Missing disk file referenced in registry → remove or flag it
53
60
  - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
54
61
  - Registry says `done` but markdown still pending → update markdown only if evidence exists
62
+ - Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
63
+ - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
55
64
  5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
65
+ 6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.
@@ -1,14 +1,14 @@
1
1
  ---
2
2
  name: hapo:test
3
3
  description: "Run and verify project tests across all scopes: unit, integration, e2e, and UI. Blast-radius scoping for speed, chrome-devtools for UI verification, structured verdicts for downstream automation."
4
- argument-hint: "[scope|--full|--ui <url>|--ui-auth <url>]"
4
+ argument-hint: "[scope|--full|--ui <url>|--ui-auth <url>|--ui-flow <url>]"
5
5
  version: 2.0.0
6
6
  ---
7
7
 
8
8
  # Test — Verify Implementation Quality
9
9
 
10
10
  Run the project's test suite, analyze results, and return a structured verdict.
11
- Designed to work **after `hapo:code`** and to run **in parallel with `hapo:code-review`** during the `hapo:develop` Quality Gate.
11
+ Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same `test-runner` contract that `hapo:develop` relies on during its Quality Gate, and may run **in parallel with `hapo:code-review`**.
12
12
 
13
13
  **Principles:** Fail-fast | Blast-radius scoping | Zero hidden failures | No mocking to pass
14
14
 
@@ -38,7 +38,7 @@ When `--full` is NOT specified, narrow the test scope to only what changed:
38
38
  ### Pre-flight Checks (always run first)
39
39
 
40
40
  Catch compile errors before spending time on tests.
41
- **HARD RULE:** If a pre-flight tool (like `eslint`, `flake8`, `tsc`) is missing, you MUST install it (e.g., `npm install -D eslint`, `pip install flake8`). Do NOT skip the pre-flight check.
41
+ **HARD RULE:** Do NOT auto-install missing tooling during verification. If a required pre-flight tool (like `eslint`, `flake8`, `tsc`) is missing, stop and report it as an environment gap or missing project setup.
42
42
 
43
43
  ```bash
44
44
  # JavaScript / TypeScript
@@ -59,7 +59,7 @@ cargo check
59
59
  flutter analyze
60
60
  ```
61
61
 
62
- If pre-flight fails → report `Compile Error`, do NOT proceed to test execution.
62
+ If pre-flight fails or a required tool is missing → report `Compile Error` / `Environment Gap`, do NOT proceed to test execution.
63
63
 
64
64
  ### Test Execution by Language
65
65
 
@@ -361,4 +361,3 @@ Flag as `Security Warning` if:
361
361
  - Mixed content (HTTP resources on HTTPS page) detected via network audit
362
362
  - `autocomplete="off"` missing on password fields
363
363
 
364
-