npm - @haposoft/cafekit - Versions diffs - 0.8.8 → 0.8.9 - Mend

@haposoft/cafekit 0.8.8 → 0.8.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/package.json +1 -1
package/src/claude/CLAUDE.md +2 -2
package/src/claude/agents/code-auditor.md +1 -1
package/src/claude/agents/spec-maker.md +24 -23
package/src/claude/agents/test-runner.md +14 -3
package/src/claude/hooks/spec-state.cjs +2 -1
package/src/claude/migration-manifest.json +2 -1
package/src/claude/rules/workflow.md +5 -4
package/src/claude/scripts/validate-spec-output.cjs +271 -0
package/src/claude/skills/code-review/references/spec-compliance-review.md +1 -1
package/src/claude/skills/develop/SKILL.md +4 -4
package/src/claude/skills/develop/references/quality-gate.md +3 -2
package/src/claude/skills/specs/SKILL.md +28 -28
package/src/claude/skills/specs/references/review.md +1 -1
package/src/claude/skills/specs/rules/tasks-generation.md +29 -9
package/src/claude/skills/specs/templates/task.md +38 -33
package/src/claude/skills/sync/SKILL.md +2 -2
package/src/claude/skills/sync/references/sync-protocols.md +5 -5
package/src/claude/skills/test/SKILL.md +10 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@haposoft/cafekit",
-  "version": "0.8.8",
+  "version": "0.8.9",
   "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
   "author": "Haposoft <nghialt@haposoft.com>",
   "license": "MIT",

package/src/claude/CLAUDE.md CHANGED Viewed

@@ -37,7 +37,7 @@ These rules reduce common agent coding failures: hidden assumptions, overbuilt s
 ### 4. Goal-Driven Execution
 - Convert requests into verifiable success criteria.
-- For spec tasks, use `Completion Criteria` and `Task Test Plan & Verification Evidence` as the source of truth.
+- For spec tasks, use `Completion Criteria` and `Evidence` as the source of truth. Existing task files may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
 - For bugs, reproduce with a failing test or concrete evidence when feasible before fixing.
 - Loop until verification passes or a real blocker is recorded.
@@ -65,7 +65,7 @@ Use this loop for non-trivial work:
 A task is done only when all apply:
 - implementation satisfies `Completion Criteria`
-- `Task Test Plan & Verification Evidence` is satisfied with concrete proof
+- `Evidence` is satisfied with concrete proof
 - preflight/build/test outcomes are passing or an explicit blocker is recorded
 - code review has no critical issues
 - a verification receipt exists before task state is synced to `done`

package/src/claude/agents/code-auditor.md CHANGED Viewed

@@ -23,7 +23,7 @@ Extract and verify:
 1. Declared deliverables (files, routes, entrypoints, UI surfaces, schemas, migrations)
 2. Declared task scope (`Related Files` and direct support files that are clearly justified)
 3. Completion Criteria
-4. Task Test Plan & Verification Evidence expectations (or legacy Verification & Evidence)
+4. Task Evidence expectations (or Task Test Plan & Verification Evidence / legacy Verification & Evidence)
 5. Canonical Contracts & Invariants from the design
 6. Named technologies and runtime choices that the task/spec explicitly requires
 7. Runtime entrypoints/callers and reachability obligations from task evidence or the task-aware scout report

package/src/claude/agents/spec-maker.md CHANGED Viewed

@@ -31,6 +31,7 @@ specs/<feature>/
 - `spec.json` is generated from `.claude/skills/specs/templates/spec-state.json`; never write `init.json` or `spec-state.json` into the spec directory.
 - Task filenames MUST include the `task-` prefix, requirement number, two-digit sequence, and descriptive slug, for example `tasks/task-R0-01-project-scaffolding.md`.
 - Do NOT write `hydration.md`; task hydration is session/task-state synchronization only.
+- Before setting `ready_for_implementation = true`, run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and fix every failure.
 ## Mental Models (How You Think)
@@ -125,26 +126,24 @@ Before writing `design.md`, select a discovery mode and record the reason:
 - Reject tasks outside `scope_lock.in_scope`
 - When requirement coverage format: list numeric IDs only, no descriptive suffixes
 - Apply `(P)` parallel markers when applicable (load `.claude/skills/specs/rules/tasks-parallel-analysis.md`)
-- Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, runtime reachability proof, and negative-path checks.
+- Every task MUST use the compact implementation-ready shape: `Context`, `Steps`, `Requirements`, `Related Files`, `Completion Criteria`, `Evidence`, `Risk Assessment`.
+- `Evidence` MUST include exact commands, artifacts/runtime surfaces, runtime reachability proof, and negative-path checks. Existing specs may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
 - Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
 - UI/app/runtime workflows MUST include a final integration/reachability task or final integration section that names the real entrypoint and proves all scoped user-facing surfaces are wired.
 - Do not allow orphan task outputs: components, services, hooks, routes, commands, workers, providers, reducers, data loaders, and generated artifacts must be reachable now or assigned to a named later integration task.
-- Validation decisions that affect implementation MUST be written into implementation-facing sections (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`) rather than only `Risk Assessment`.
-### Sub-Task Detail Requirements (MANDATORY)
-Each task file MUST contain granular sub-tasks with the following structure:
-1. **Major steps** (`- [ ] 1. ...`) group related work by cohesion
-2. **Sub-tasks** (`- [ ] 1.1 ...`) describe specific actionable items (1-3 hours each)
-3. **Detail bullets** under each sub-task describe:
-   - Business logic and behavior to implement
-   - Edge cases and constraints
-   - Validation rules
-4. **Requirement mapping** (`_Requirements: X.X_`) at the end of EVERY sub-task — no exceptions
-5. **Test coverage section** as the last major step in every task, with unit + integration sub-tasks
-6. **Completion criteria** must be observable and testable — not subjective
-7. **Scope/reachability criteria** must prove the task implements scoped behavior without out-of-scope additions and without unreachable runtime-facing outputs
-**FORBIDDEN**: Task files with only 3-5 top-level checkboxes and no sub-task breakdown. This level of detail is INSUFFICIENT for implementation.
+- Validation decisions that affect implementation MUST be written into implementation-facing sections (`Context`, `Steps`, `Requirements`, `Completion Criteria`, `Evidence`) rather than only `Risk Assessment`.
+### Task Detail Requirements (MANDATORY)
+Each task file MUST be compact but implementation-ready:
+1. `Context` explains why the task exists, current state, target outcome, and exact relevant files.
+2. `Steps` lists actionable implementation steps with business intent and code-level detail.
+3. `Requirements` lists the requirement IDs covered by this task.
+4. `Related Files` names exact paths and action type when known.
+5. `Completion Criteria` is observable and testable.
+6. `Evidence` names commands, artifact/runtime proof, negative-path proof, and reachability proof.
+7. `Risk Assessment` states real risks or `None identified`.
+**FORBIDDEN**: Vague task files with no exact files, no requirement mapping, or no evidence. Compact is good; vague is invalid.
 ## Research Phase
@@ -175,12 +174,14 @@ Before marking the spec ready:
 4. Fail if any path in `task_files` does not exist
 5. Fail if any on-disk task file is missing from `task_registry` or any registry path does not exist
 6. Fail if any task file path does not match `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`)
-7. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
-8. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
-9. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
-10. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
-11. Reject task files that use legacy non-numeric mappings like `NFR-1`
-12. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
+7. Fail if all task files are `R0` when the spec has more than two tasks
+8. Run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and treat non-zero exit as blocking
+9. Infer `design_context.validation_recommended = true` for auth, privacy, delete-data, migration, schema-change, browser-extension-permission, external-provider, or 5+ task file specs
+10. If the spec scope switched away from Claude/Anthropic, fail if `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider strings like `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` may mention old providers only as historical comparison.
+11. For delete/privacy specs, fail if requirements/design/tasks mix multiple deletion policies (for example `email_hash` in one place and `deleted-<uuid>` in another) without one canonical design decision.
+12. If `validation_recommended = true` and validation has not completed (or the user did not explicitly accept risk), keep `ready_for_implementation = false`
+13. Reject task files that use legacy non-numeric mappings like `NFR-1`
+14. If validation decisions were accepted, fail unless they are reflected in implementation-facing sections of affected artifacts and `spec.json.updated_at` / review timestamps reflect the reviewed state
 ## Execution Workflow Summary

package/src/claude/agents/test-runner.md CHANGED Viewed

@@ -11,15 +11,26 @@ You are a battle-hardened QA engineer who has been burned by production incident
 ## Task-Aware Inputs
-If the prompt includes task file paths, Completion Criteria, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
+If the prompt includes task file paths, Completion Criteria, Evidence, Task Test Plan & Verification Evidence, or legacy Verification & Evidence instructions, treat them as authoritative.
 Diff-aware test selection does NOT replace task-specific verification.
 If the task/spec names a specific framework, auth system, transport, or shared-state boundary, keep that contract visible while evaluating evidence.
-If the prompt includes a feature name or `specs/<feature>`, load `spec.json`, `requirements.md`, `design.md`, and the active/recent task files. Treat `scope_lock`, Completion Criteria, and Task Test Plan evidence as the test contract.
+If the prompt includes a feature name or `specs/<feature>`, load `spec.json`, `requirements.md`, `design.md`, and the active/recent task files. Treat `scope_lock`, Completion Criteria, and Evidence as the test contract.
+## Test Type Expectations
+Select tests by the task's touched surface:
+- Pure logic/data/parser/sort/filter/validator/regression work requires unit tests with negative-path coverage.
+- Stateful UI, context/store, API/service, persistence, or provider wiring requires component or integration proof.
+- Complete user workflows require E2E/UI-flow proof once the vertical slice exists.
+- Layout/theme/responsive work requires runtime visual checks, viewport checks, or screenshot proof when practical.
+- Interactive UI requires accessibility checks for focus, labels, roles, keyboard behavior, and ARIA when relevant.
+- Scaffold/config/release plumbing can pass with smoke proof when deeper behavior is not in scope.
+- Performance/security checks are required only when requirements, risk, or changed boundaries make them relevant.
 ## Command Resolution Order
 When the task file names exact commands, use this order:
-1. Run every exact executable command from `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`) in declaration order.
+1. Run every exact executable command from `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`) in declaration order.
 2. Run repo-default typecheck/test/build commands only to fill gaps not already covered above.
 3. Apply diff-aware test selection only after task-mandated commands are satisfied.

package/src/claude/hooks/spec-state.cjs CHANGED Viewed

@@ -94,7 +94,8 @@ try {
   lines.push(`> Bạn PHẢI sử dụng công cụ Edit để cập nhật trạng thái vật lý sau khi đã có bằng chứng verify thật (build/test/runtime/artifact), không phải chỉ vì code đã viết xong.`);
   lines.push(`> 1. Sửa file \`spec.json\` (status, phase/current_phase, timestamps, \`task_files\`, \`task_registry\`, validation state nếu có thay đổi).`);
   lines.push(`> 2. Chỉ khi verify xong mới sửa file \`tasks/task-*.md\` (status + tick '[x]' các sub-task và completion criteria liên quan).`);
-  lines.push(`> 3. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
+  lines.push(`> 3. Trước khi set \`ready_for_implementation = true\`, PHẢI chạy \`node .claude/scripts/validate-spec-output.cjs specs/${featureName}\` và sửa mọi lỗi.`);
+  lines.push(`> 4. NẾU VỪA HOÀN THÀNH 1 TASK CÓ SỬA SOURCE CODE, BẮT BUỘC cập nhật ngay tài liệu trong \`docs/\` (\`system-architecture.md\` hoặc Changelog) cho đồng bộ.`);
   lines.push(`> CẤM VI PHẠM LUẬT TOLLGATE NÀY NHẰM ĐẢM BẢO TÍNH ĐỒNG BỘ CỦA HỆ THỐNG.`);
   lines.push('');

package/src/claude/migration-manifest.json CHANGED Viewed

@@ -56,7 +56,8 @@
   "scripts": {
     "required": [
       "validate-docs.cjs",
-      "browser-tool.cjs"
+      "browser-tool.cjs",
+      "validate-spec-output.cjs"
     ]
   },
   "agentReferences": {

package/src/claude/rules/workflow.md CHANGED Viewed

@@ -15,11 +15,12 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
 - For non-trivial features, use `/hapo:specs` to create or validate the spec.
 - For approved specs, work one task file at a time.
 - Extract from the active task:
-  - `Objective`
-  - `Constraints`
+  - `Context`
+  - `Steps`
+  - `Requirements`
   - `Related Files`
   - `Completion Criteria`
-  - `Task Test Plan & Verification Evidence`
+  - `Evidence`
 - If these are missing or too vague to verify, route back to spec correction.
 ## 3. Execute
@@ -31,7 +32,7 @@ Use the CafeKit loop: **Understand -> Plan -> Execute -> Verify -> Sync**.
 ## 4. Verify
-- Run exact commands from `Task Test Plan & Verification Evidence` first.
+- Run exact commands from `Evidence` first.
 - Then run repo-level lint/test/build as needed for confidence.
 - Use only fresh verification from the current run when claiming completion.
 - `PRECHECK_FAIL` outranks `NO_TESTS`.

package/src/claude/scripts/validate-spec-output.cjs ADDED Viewed

@@ -0,0 +1,271 @@
+#!/usr/bin/env node
+/**
+ * CafeKit spec artifact validator.
+ *
+ * This is intentionally deterministic. Prompt rules can drift; this script is
+ * the hard backstop before a spec is marked ready for implementation.
+ */
+const fs = require('fs');
+const path = require('path');
+const TASK_PATH_RE = /^tasks\/task-R\d+-\d{2}-[a-z0-9]+(?:-[a-z0-9]+)*\.md$/;
+const REQUIRED_REGISTRY_KEYS = [
+  'id',
+  'title',
+  'status',
+  'dependencies',
+  'blocker',
+  'started_at',
+  'completed_at',
+  'last_updated_at',
+];
+function usage() {
+  console.error('Usage: node .claude/scripts/validate-spec-output.cjs specs/<feature>');
+}
+function resolveSpecDir(input) {
+  if (!input) return null;
+  const cwd = process.cwd();
+  const direct = path.resolve(cwd, input);
+  if (fs.existsSync(direct)) return direct;
+  const viaSpecs = path.resolve(cwd, 'specs', input);
+  if (fs.existsSync(viaSpecs)) return viaSpecs;
+  return direct;
+}
+function readJson(filePath, errors) {
+  try {
+    return JSON.parse(fs.readFileSync(filePath, 'utf8'));
+  } catch (error) {
+    errors.push(`${filePath}: invalid JSON (${error.message})`);
+    return null;
+  }
+}
+function listTaskFiles(specDir) {
+  const tasksDir = path.join(specDir, 'tasks');
+  if (!fs.existsSync(tasksDir)) return [];
+  return fs
+    .readdirSync(tasksDir, { withFileTypes: true })
+    .filter((entry) => entry.isFile() && entry.name.endsWith('.md'))
+    .map((entry) => `tasks/${entry.name}`)
+    .sort();
+}
+function hasHeading(content, heading) {
+  const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+  return new RegExp(`^##\\s+${escaped}\\s*$`, 'm').test(content);
+}
+function extractRequirementIds(requirementsText) {
+  const ids = new Set();
+  const headingRe = /^#{2,4}\s+(?:(?:Requirement)\s+)?((?:REQ-\d+)|(?:R\d+))\b/gim;
+  let match;
+  while ((match = headingRe.exec(requirementsText)) !== null) {
+    ids.add(match[1].toUpperCase());
+  }
+  const numericRequirementRe = /^#{2,4}\s+(?:Requirement\s+)?(\d+)(?=[:.\s-])/gim;
+  while ((match = numericRequirementRe.exec(requirementsText)) !== null) {
+    ids.add(`R${match[1]}`);
+  }
+  const bracketRe = /\[((?:REQ-\d+)|(?:R\d+))\]/gi;
+  while ((match = bracketRe.exec(requirementsText)) !== null) {
+    ids.add(match[1].toUpperCase());
+  }
+  return [...ids].filter((id) => id !== 'R0').sort();
+}
+function validateTaskSections(taskPath, content, errors) {
+  const hasContext =
+    hasHeading(content, 'Context') ||
+    hasHeading(content, 'Objective') ||
+    hasHeading(content, 'Goal');
+  const hasSteps =
+    hasHeading(content, 'Steps') || hasHeading(content, 'Implementation Steps');
+  const hasRequirements =
+    hasHeading(content, 'Requirements') || /^\*\*Requirement:\*\*/m.test(content);
+  const hasEvidence =
+    hasHeading(content, 'Evidence') ||
+    hasHeading(content, 'Task Test Plan & Verification Evidence') ||
+    hasHeading(content, 'Verification & Evidence');
+  if (!hasContext) errors.push(`${taskPath}: missing Context/Objective/Goal`);
+  if (!hasSteps) errors.push(`${taskPath}: missing Steps/Implementation Steps`);
+  if (!hasRequirements) errors.push(`${taskPath}: missing Requirements mapping`);
+  if (!hasEvidence) errors.push(`${taskPath}: missing Evidence or task test plan`);
+}
+function validateSpec(specDir) {
+  const errors = [];
+  const warnings = [];
+  const specJsonPath = path.join(specDir, 'spec.json');
+  if (!fs.existsSync(specDir)) {
+    errors.push(`${specDir}: spec directory does not exist`);
+    return { errors, warnings };
+  }
+  for (const forbidden of ['init.json', 'spec-state.json', 'hydration.md']) {
+    if (fs.existsSync(path.join(specDir, forbidden))) {
+      errors.push(`${forbidden}: forbidden generated artifact`);
+    }
+  }
+  if (!fs.existsSync(specJsonPath)) {
+    errors.push('spec.json: missing');
+    return { errors, warnings };
+  }
+  const spec = readJson(specJsonPath, errors);
+  if (!spec) return { errors, warnings };
+  if (!spec.scope_lock || typeof spec.scope_lock !== 'object' || Array.isArray(spec.scope_lock)) {
+    errors.push('spec.json.scope_lock: must be an object, not a boolean or array');
+  }
+  const taskFiles = listTaskFiles(specDir);
+  const taskFileSet = new Set(taskFiles);
+  if (!Array.isArray(spec.task_files)) {
+    errors.push('spec.json.task_files: missing array');
+    if (Array.isArray(spec.tasks)) {
+      errors.push('spec.json.tasks: legacy field detected; use task_files');
+    }
+  } else {
+    const declared = [...spec.task_files].sort();
+    if (JSON.stringify(declared) !== JSON.stringify(taskFiles)) {
+      errors.push('spec.json.task_files: must exactly match files under tasks/');
+      warnings.push(`expected task_files=${JSON.stringify(taskFiles)}`);
+    }
+  }
+  if (!spec.task_registry || typeof spec.task_registry !== 'object' || Array.isArray(spec.task_registry)) {
+    errors.push('spec.json.task_registry: missing object keyed by task file path');
+  } else {
+    const registryKeys = Object.keys(spec.task_registry).sort();
+    if (JSON.stringify(registryKeys) !== JSON.stringify(taskFiles)) {
+      errors.push('spec.json.task_registry: keys must exactly match task file paths');
+    }
+    for (const [registryPath, entry] of Object.entries(spec.task_registry)) {
+      if (!taskFileSet.has(registryPath)) {
+        errors.push(`spec.json.task_registry.${registryPath}: no matching task file`);
+      }
+      for (const key of REQUIRED_REGISTRY_KEYS) {
+        if (!(key in (entry || {}))) {
+          errors.push(`spec.json.task_registry.${registryPath}: missing ${key}`);
+        }
+      }
+      if (entry && !Array.isArray(entry.dependencies)) {
+        errors.push(`spec.json.task_registry.${registryPath}.dependencies: must be an array`);
+      }
+      for (const dep of entry?.dependencies || []) {
+        if (!taskFileSet.has(dep)) {
+          errors.push(`spec.json.task_registry.${registryPath}.dependencies: unknown dependency ${dep}`);
+        }
+      }
+    }
+  }
+  for (const taskFile of taskFiles) {
+    if (!TASK_PATH_RE.test(taskFile)) {
+      errors.push(`${taskFile}: must match tasks/task-R{N}-{SEQ}-<slug>.md with two-digit SEQ`);
+    }
+  }
+  if (taskFiles.length > 2 && taskFiles.every((taskFile) => /^tasks\/task-R0-/.test(taskFile))) {
+    errors.push('tasks/: feature work cannot be entirely R0; reserve R0 for shared foundation tasks');
+  }
+  const requirementsPath = path.join(specDir, 'requirements.md');
+  const designPath = path.join(specDir, 'design.md');
+  const researchPath = path.join(specDir, 'research.md');
+  if (!fs.existsSync(requirementsPath)) errors.push('requirements.md: missing');
+  if (!fs.existsSync(designPath)) errors.push('design.md: missing');
+  if (taskFiles.length > 0) {
+    if (!fs.existsSync(researchPath)) {
+      errors.push('research.md: missing Evidence Summary for non-trivial spec');
+    } else {
+      const research = fs.readFileSync(researchPath, 'utf8');
+      if (!/^##\s+Evidence Summary\s*$/m.test(research)) {
+        errors.push('research.md: missing ## Evidence Summary');
+      }
+    }
+  }
+  let requirementIds = [];
+  if (fs.existsSync(requirementsPath)) {
+    requirementIds = extractRequirementIds(fs.readFileSync(requirementsPath, 'utf8'));
+  }
+  const coveredRequirementIds = new Set();
+  for (const taskFile of taskFiles) {
+    const fullPath = path.join(specDir, taskFile);
+    const content = fs.readFileSync(fullPath, 'utf8');
+    validateTaskSections(taskFile, content, errors);
+    const idRe = /\b((?:REQ-\d+)|(?:R\d+))\b/gi;
+    let match;
+    while ((match = idRe.exec(content)) !== null) {
+      const id = match[1].toUpperCase();
+      if (id !== 'R0') coveredRequirementIds.add(id);
+    }
+    const numericMappingRe = /_Requirements:\s*([^_\n]+)_/gi;
+    while ((match = numericMappingRe.exec(content)) !== null) {
+      for (const token of match[1].split(',')) {
+        const number = token.trim().match(/^(\d+)(?:\.\d+)?$/);
+        if (number) coveredRequirementIds.add(`R${number[1]}`);
+      }
+    }
+  }
+  for (const requirementId of requirementIds) {
+    if (!coveredRequirementIds.has(requirementId)) {
+      errors.push(`requirements.md:${requirementId}: not covered by any task`);
+    }
+  }
+  if (spec.ready_for_implementation === true && errors.length > 0) {
+    errors.push('spec.json.ready_for_implementation: cannot be true while validator errors exist');
+  }
+  return { errors, warnings };
+}
+function main() {
+  const specDir = resolveSpecDir(process.argv[2]);
+  if (!specDir) {
+    usage();
+    process.exit(2);
+  }
+  const { errors, warnings } = validateSpec(specDir);
+  for (const warning of warnings) {
+    console.warn(`[WARN] ${warning}`);
+  }
+  if (errors.length > 0) {
+    console.error(`FAIL ${path.relative(process.cwd(), specDir) || specDir}`);
+    for (const error of errors) {
+      console.error(`- ${error}`);
+    }
+    process.exit(1);
+  }
+  console.log(`PASS ${path.relative(process.cwd(), specDir) || specDir}`);
+}
+main();

package/src/claude/skills/code-review/references/spec-compliance-review.md CHANGED Viewed

@@ -24,7 +24,7 @@ Do not attempt a standard text-based review if the project includes Visual Specs
 3. If NO (Markdown Spec only): Read the spec directly and extract:
    - requirement bullets
    - task `Completion Criteria`
-   - task `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`)
+   - task `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`)
    - canonical contracts/invariants from `design.md`
    Then verify the changed files against those concrete obligations.

package/src/claude/skills/develop/SKILL.md CHANGED Viewed

@@ -45,7 +45,7 @@ DO NOT write implementation code until an approved spec exists.
 <DEFINITION-OF-DONE>
 A task is NOT done because code compiles or a placeholder renders.
-A task is done only when the task file's Completion Criteria AND Task Test Plan & Verification Evidence section are satisfied with real execution proof. Existing specs may use legacy `Verification & Evidence`; treat that as the same contract.
+A task is done only when the task file's Completion Criteria AND Evidence section are satisfied with real execution proof. Existing specs may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`; treat those as the same contract.
 </DEFINITION-OF-DONE>
 <CONTRACT-FIDELITY>
@@ -92,7 +92,7 @@ flowchart TD
   - Objective + Constraints
   - Related Files
   - Completion Criteria
-  - Task Test Plan & Verification Evidence (or legacy Verification & Evidence)
+  - Evidence (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`)
   - Exact executable verification commands named in the task
   - Requirement IDs referenced by the task
   - Named technologies, frameworks, protocols, and data stores that the task/spec explicitly requires
@@ -142,7 +142,7 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
 **Mantra:** Scope/spec compliance first, code quality second. All feedback from code-auditor must be addressed thoroughly: Score >= 9.5 & Zero Critical issues.
 - Passing Step 4 requires ALL of the following:
-  1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Task Test Plan & Verification Evidence` section (or legacy `Verification & Evidence`)
+  1. Automated verification passes, including preflight compile/typecheck/build health and every exact command named in the task's `Evidence` section (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`)
   2. Spec compliance review passes: every scoped requirement and active task criterion is implemented, with no extras and no omissions
   3. Code quality review passes
   4. Task evidence passes (artifacts/runtime surfaces/reachability/negative-path checks from the task file are proven)
@@ -161,7 +161,7 @@ The moment you finish coding, DO NOT proceed further. Switch to `references/qual
   - `spec.json.task_registry[path].status = "done"`
   - `completed_at` + `last_updated_at`
   - synchronized top-level `updated_at`
-  - a human-readable verification receipt inside the task's `Task Test Plan & Verification Evidence` section showing which commands ran, their outcomes, and what proof was observed
+  - a human-readable verification receipt inside the task's `Evidence` section showing which commands ran, their outcomes, and what proof was observed
 - Verification receipts with `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation intentionally simplified a named contract MUST NOT be synchronized as `done`.
 - After syncing the active task, run a **Task Closeout Docs Checkpoint**
 - Task Closeout Docs Checkpoint:

package/src/claude/skills/develop/references/quality-gate.md CHANGED Viewed

@@ -10,11 +10,12 @@ Green tests are NOT enough. The gate requires four proofs:
 ## Automation Semantics
-- If the task names exact commands in `Task Test Plan & Verification Evidence` (or legacy `Verification & Evidence`), those exact commands are mandatory and must run before any fallback repo defaults.
+- If the task names exact commands in `Evidence` (or `Task Test Plan & Verification Evidence` / legacy `Verification & Evidence`), those exact commands are mandatory and must run before any fallback repo defaults.
 - Preflight compile/typecheck/build health is mandatory. If compile/typecheck/build fails before tests are meaningful, the gate result is `PRECHECK_FAIL`, not `NO_TESTS`.
 - `NO_TESTS` is never an automatic PASS.
 - `NO_TESTS` is acceptable only when the task does **not** require a dedicated test suite command and every other required automated command/evidence item passes.
 - If the task explicitly requires tests and the repo has no such test command or suite, the task is FAIL or BLOCKED, not done.
+- If the task kind implies a concrete test type, the gate must enforce it: unit tests for logic/regression, component or integration tests for stateful UI or cross-module wiring, E2E/UI-flow checks for complete user workflows, visual/responsive checks for layout/theme work, accessibility checks for interactive UI, and smoke checks for scaffold/config. Performance/security checks are mandatory only when specified by requirement/risk/boundary.
 - Named frameworks, auth systems, transports, datastores, and runtime boundaries in the task/spec are contractual. Silent substitutions are review failures, not acceptable implementation trade-offs.
 - Multi-process or multi-runtime flows must prove shared real state or a real boundary contract. Matching in-memory placeholders on both sides do not count as working integration.
 - Scope fidelity is mandatory: missing scoped behavior, extra unapproved behavior, or task output that exists only as orphaned/unreachable code is a review failure even when build/tests pass.
@@ -29,7 +30,7 @@ Variable: retry_count = 0
 Before START_LOOP:
   - Read the active task file(s)
-  - Extract Related Files, Completion Criteria, Task Test Plan & Verification Evidence (or legacy Verification & Evidence)
+  - Extract Related Files, Completion Criteria, Evidence (or Task Test Plan & Verification Evidence / legacy Verification & Evidence)
   - Extract the exact executable verification commands in declaration order
   - Extract relevant design contracts/invariants for the touched area
   - Extract scope_lock, requirement IDs, runtime entrypoints/callers, and reachability proof obligations

package/src/claude/skills/specs/SKILL.md CHANGED Viewed

@@ -80,6 +80,9 @@ Forbidden generated artifacts:
 - Do NOT create shorthand task files such as `tasks/task-R0-1.md`, `tasks/task-R1-1.md`, or `tasks/R0-1-<slug>.md`.
 - The template file name is never the output file name. `templates/spec-state.json` is only the schema source for generated `spec.json`.
 - Task hydration is session/task-state synchronization only; it MUST NOT be written as a markdown artifact.
+- Before marking a spec ready, run the deterministic validator:
+  - `node .claude/scripts/validate-spec-output.cjs specs/<feature>`
+  - Any validator failure blocks `ready_for_implementation = true`.
 ### Writing Style
 - Concise, prefer bullet lists
@@ -266,7 +269,8 @@ Load: `references/scope-inquiry.md`
 - Load `rules/tasks-parallel-analysis.md` for parallel markers (default: enabled)
 - Each task file follows template `templates/task.md`
 - `Related Files` and test plans must inherit paths, contracts, and test targets from the codebase scout. If exact files/tests cannot be named for an enhancement, run targeted inspect before generating tasks.
-- Each task file MUST include `Completion Criteria` and `Task Test Plan & Verification Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done.
+- Each task file MUST include `Completion Criteria` and `Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done. Existing specs may use `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence`.
+- Each task's `Evidence` MUST choose the right proof type for the touched surface: unit for pure logic, component/integration for UI or state wiring, E2E/UI flow for complete user workflows, visual/responsive checks for style/layout work, accessibility checks for interactive UI, smoke checks for scaffold/config, regression checks for bug fixes, and performance/security checks only when the requirement or risk calls for them.
 - Every task MUST preserve the approved `scope_lock`: implement all scoped acceptance criteria for its requirement, avoid out-of-scope features, and record any intentional deferral as a named later task rather than implicit omission.
 - For UI/app/runtime features, generate a final integration/reachability task or final section that names the real runtime entrypoint and proves prior task outputs are imported, mounted, registered, invoked, or otherwise reachable.
 - Build `spec.json.task_registry` alongside `task_files`. For each task file, register at minimum:
@@ -280,11 +284,11 @@ Load: `references/scope-inquiry.md`
   - `last_updated_at`
 - Update `spec.json` phase + task metadata
-#### Requirement-Driven Task Grouping (MANDATORY)
-Tasks MUST be organized **by requirement**, NOT by technical concern. Each requirement from `requirements.md` gets its own cluster of task files.
+#### Requirement-Covered Task Grouping (MANDATORY)
+Tasks MUST be organized by implementation flow while preserving explicit requirement coverage. Foundation work uses `R0`; feature work uses `R1+`.
 **Naming convention:** `tasks/task-R{N}-{SEQ}-<slug>.md`
-- `R{N}` = requirement number (e.g., R1, R2, R3...)
+- `R{N}` = foundation or implementation cluster (R0 foundation, R1+ feature work)
 - `{SEQ}` = sequential number within that requirement (01, 02, 03...)
 - `<slug>` = descriptive kebab-case name
@@ -303,11 +307,11 @@ tasks/
 ```
 **Splitting rules:**
-- Each requirement → 1 or more task files (split by sub-scope within the requirement)
-- A task file MUST serve exactly 1 primary requirement (cross-cutting references allowed as secondary)
-- If a requirement has only 1 natural task, create 1 file (no forced splitting)
-- If a requirement has many acceptance criteria spanning different concerns → split into multiple task files
-- After generating all tasks: verify **every requirement ID** appears as primary in at least one task file — gaps = failure
+- Split by real implementation dependency chain first: model/schema -> service -> API -> UI -> integration.
+- A task file MAY cover multiple requirement IDs when one code change naturally satisfies them.
+- A requirement MAY be covered by multiple task files when it spans layers.
+- Do not create all tasks under `R0`; `R0` is only shared foundation/setup.
+- After generating all tasks: verify **every requirement ID** appears in at least one task file's `## Requirements` section — gaps = failure.
 - **Legacy Protection:** If the `research.md` identified existing codebase files or tests that will be broken (Blast Radius), you MUST generate explicitly tasked files (e.g., `task-R5-01-update-legacy-tests.md`) to fix those breakages. Do not leave broken tests out of scope.
 **Dependency ordering:** Tasks within the same requirement are ordered by natural implementation flow. Cross-requirement dependencies use `Dependencies:` field referencing other task file names.
@@ -316,24 +320,17 @@ tasks/
 Each task file MUST be **self-contained and implementation-ready** — detailed enough for a junior developer or AI coding agent to execute without guessing.
 **Structure per task file:**
-1. **Objective** — 1-2 sentence objective (WHAT, not HOW)
-2. **Implementation Steps** — Hierarchical breakdown:
-   - Major steps (`- [ ] 1. ...`) group by cohesion
-   - Sub-tasks (`- [ ] 1.1 ...`) are specific actionable items (1-3 hours each)
-   - Detail bullets under each sub-task describe:
-     - Business logic and behavior to implement
-     - Edge cases and constraints
-     - Validation rules
-   - `_Requirements: X.X_` at the END of every sub-task — **no exceptions**
-3. **Test coverage** — Last major step in every task must cover unit + integration tests
-4. **Related Files** — Table with exact paths, action type, and descriptions
-5. **Completion Criteria** — Observable, testable criteria (checkbox format)
-6. **Risk Assessment** — Table with risk, severity, mitigation
-7. **Runtime reachability** — For any created component, service, route, command, worker, provider, or data loader, state where it is reached from or which named later task wires it
+1. **Context** — why this task exists, current state, target outcome, relevant exact files.
+2. **Steps** — concise implementation checklist with business intent and code-level detail.
+3. **Requirements** — list requirement IDs and acceptance criteria covered by this task.
+4. **Related Files** — table with exact paths, action type, and descriptions when paths are known; otherwise run scout first.
+5. **Completion Criteria** — observable, testable criteria.
+6. **Evidence** — automated command(s), artifact/runtime proof, negative-path proof, and runtime reachability proof.
+7. **Risk Assessment** — table with risk, severity, mitigation.
 **Parallel markers:** Append `(P)` to tasks that can run concurrently (no data dependency, no shared files, no prerequisite approval from another task). Tasks serving DIFFERENT requirements are often parallelizable.
-**FORBIDDEN:** Task files with only 3-5 top-level checkboxes and no sub-task breakdown. This level of detail is INSUFFICIENT for implementation.
+**FORBIDDEN:** Task files with only vague checkboxes and no exact files, requirements, or evidence. Compact is good; vague is invalid.
 ### Step 8: Task Hydration
 Load: `references/task-hydration.md`
@@ -357,6 +354,7 @@ Load: `references/review.md` + `rules/design-review.md`
 ### Step 9.5: Finalization Audit (MANDATORY)
 - Re-scan the `tasks/` directory and rebuild `spec.json.task_files` from the real filesystem (sorted, relative paths)
 - Rebuild `spec.json.task_registry` from the real filesystem if it is missing, stale, or missing keys. Preserve task status fields when the path still matches.
+- Run `node .claude/scripts/validate-spec-output.cjs specs/<feature>` and treat any non-zero exit as a blocking failure.
 - FAIL if any task file exists on disk but is missing from `task_files`
 - FAIL if any path in `task_files` does not exist on disk
 - FAIL if any task file exists on disk but is missing from `task_registry`
@@ -364,10 +362,10 @@ Load: `references/review.md` + `rules/design-review.md`
 - FAIL if any task file path does not match `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`)
 - FAIL if a newly generated non-trivial spec lacks a `research.md` Evidence Summary with codebase scout result, external research result or skip rationale, selected decision, rejected alternatives, and downstream task/test implications.
 - FAIL if any requirement or NFR mapping uses non-numeric labels (`NFR-1`, `SEC-1`, etc.)
-- FAIL if a task lacks `Completion Criteria` or `Task Test Plan & Verification Evidence` (legacy `Verification & Evidence` is accepted only for pre-existing task files)
+- FAIL if a task lacks `Completion Criteria` or `Evidence` (existing `Task Test Plan & Verification Evidence` or legacy `Verification & Evidence` is accepted)
 - FAIL if a task creates runtime-facing artifacts but neither proves reachability from an entrypoint/caller nor names a later integration task responsible for wiring them.
 - FAIL if a UI/app/runtime spec has multiple user-facing task outputs but no final integration/reachability task or final integration section.
-- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`, canonical contracts, or requirements text).
+- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Context`, `Steps`, `Requirements`, `Completion Criteria`, `Evidence`, canonical contracts, or requirements text).
 - FAIL if the spec scope/provider was switched away from Anthropic/Claude but `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider-specific strings such as `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` is the only allowed place for historical cost comparisons.
 - FAIL if privacy/delete-data work lacks a single canonical deletion policy. The design MUST explicitly choose either:
   1. hard-delete with no re-registration lock, or
@@ -450,7 +448,7 @@ specs/
     ├── requirements.md        # Technical requirements (EARS format)
     ├── research.md            # Research notes
     ├── design.md              # Architectural design
-    ├── tasks/                 # Grouped by requirement (R1, R2, R3...)
+    ├── tasks/                 # Foundation + implementation clusters (R0, R1, R2...)
     │   ├── task-R0-01-foundation.md
     │   ├── task-R1-01-<slug>.md
     │   ├── task-R1-02-<slug>.md
@@ -505,13 +503,14 @@ Before finalizing any specification, assert all the following:
 - [ ] **Requirements traceability** matrix present in design.md
 - [ ] **Canonical Contracts & Invariants** filled for auth/transport/persistence/artifact-sensitive work
 - [ ] **Every task file** maps to at least 1 valid in-scope requirement ID
-- [ ] **Every task file** includes `Task Test Plan & Verification Evidence` with executable or inspectable proof
+- [ ] **Every task file** includes `Evidence` with executable or inspectable proof
 - [ ] **State Machine Blueprint:** design.md contains Mermaid diagrams for non-trivial flows
 - [ ] **Dependency graph complete**: no task can start before its blockers are listed
 - [ ] **Risk matrix filled**: likelihood × impact, with mitigation for High items
 - [ ] **Test strategy defined**: what gets unit tested, integration tested, e2e validated
 - [ ] **task_files inventory synced**: no missing or orphaned task references
 - [ ] **task_registry synced**: every task file has exactly one machine-state entry with valid status + dependencies
+- [ ] **deterministic validator passed**: `node .claude/scripts/validate-spec-output.cjs specs/<feature>`
 - [ ] **Validation gate consistent**: validation_recommended and validation.status agree with spec risk
 - [ ] **Provider wording clean**: no stale vendor/provider strings outside allowed research context
 - [ ] **spec.json fully updated**: phase, current_phase, progress, timestamps, approvals, design_context
@@ -538,6 +537,7 @@ Before finalizing any specification, assert all the following:
 - `design.md` — Design document template
 - `research.md` — Research log template
 - `task.md` — Template for individual task file
+- `.claude/scripts/validate-spec-output.cjs` — Deterministic validator for generated spec artifacts
 ### Rules (`rules/`)
 - `ears-format.md` — EARS requirements standard

package/src/claude/skills/specs/references/review.md CHANGED Viewed

@@ -42,7 +42,7 @@ These rules override any self-reasoning or optimization the system may attempt:
 4. **Apply YAGNI to fixes.** When user says "configure later" or "decide later", add a single note to the task file. Do NOT generate multiple concrete implementations (e.g., 4 provider files when user only asked for abstraction).
 5. **No false completion.** You MUST NOT set `validation.status = "completed"` or `ready_for_implementation = true` until a reconciliation audit proves the accepted findings and validation decisions are reflected in the physical spec artifacts.
 6. **Provider drift is a real defect.** If the scope changed away from Claude/Anthropic, stale strings like `Claude API`, `Haiku`, or `haiku_reachable` in `requirements.md`, `design.md`, or `tasks/*.md` are validation failures. `research.md` may mention them only as historical comparison.
-7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `Task Test Plan & Verification Evidence`.
+7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Context`, `Steps`, `Requirements`, `Completion Criteria`, or `Evidence`.
 8. **CafeKit command dialect only.** Validation output MUST use `/hapo:develop <feature>` as the implementation handoff. Never mention `/sdd:execute-spec`, `/sdd:*`, `/work`, `/code`, `/specs <feature> --approve`, `/hapo:specs <feature> --approve`, or non-CafeKit aliases.
 9. **CafeKit task filename convention only.** Task files MUST use `tasks/task-R{N}-{SEQ}-<slug>.md` with two-digit `SEQ` (for example `tasks/task-R0-01-project-scaffolding.md`). Files like `tasks/R0-1-project-scaffolding.md` are legacy/foreign format; rename them and update `spec.json.task_files`, `spec.json.task_registry`, and dependency references before passing validation.

package/src/claude/skills/specs/rules/tasks-generation.md CHANGED Viewed

@@ -40,6 +40,8 @@ Detail bullets must include:
 **End with integration tasks** to wire everything together.
 - For UI/app/runtime workflows, the final integration task MUST name the real entrypoint (`App.tsx`, route, command, worker, extension manifest, API route, etc.) and verify every user-visible surface from the requirements is reachable from that entrypoint.
 - Components, services, routes, commands, workers, providers, and data loaders created by earlier tasks MUST be consumed by a later integration task or explicitly marked as internal support in `design.md`; orphaned deliverables are invalid.
+- Prefer compact, implementation-ready task prose over large boilerplate. The golden shape is: `Context` -> `Steps` -> `Requirements` -> `Related Files` -> `Completion Criteria` -> `Evidence` -> `Risk Assessment`.
+- A compact task is valid when it names exact files/contracts, maps requirements, and gives executable evidence. Do not expand it into nested filler just to satisfy a template.
 ### 3. Flexible Task Sizing
@@ -140,11 +142,11 @@ Every task file MUST contain the Risk Assessment table, even if no risks are ide
 - Never mark implementation work or integration-critical verification as optional—reserve `*` for auxiliary/deferrable test coverage that can be revisited post-MVP.
 - Never mark auth, permissions, privacy, data deletion, migration, schema, or contract verification work as optional.
-### Mandatory Task Test Plan & Verification Evidence
+### Mandatory Evidence Section
-Every new task file MUST include a `## Task Test Plan & Verification Evidence` section. Existing specs may still use the legacy `## Verification & Evidence` heading; readers and sync tools must support both.
+Every new task file MUST include a `## Evidence` section. Existing specs may still use the v0.8 heading `## Task Test Plan & Verification Evidence` or the legacy `## Verification & Evidence` heading; readers and sync tools must support all three.
-That section is the task-level test plan and MUST contain:
+That section is the task-level test plan and proof checklist. It MUST contain:
 1. **Automated proof** — exact command(s) for typecheck, tests, build, or explicit `N/A`
 2. **Artifact/runtime proof** — exact files, routes, UI surfaces, generated outputs, or persisted state to inspect
 3. **Contract/negative-path proof** — at least one contract-preserving check for unauthorized, invalid, missing-permission, rollback, or failure-path behavior when relevant
@@ -159,14 +161,32 @@ Rules:
 - For provider-sensitive work, use provider-neutral wording unless the scope lock explicitly names a vendor.
 - For delete-data/privacy work, task text MUST match the single deletion/retention policy chosen in `design.md`. Mixed policies are invalid.
+### Test Type Selection
+Choose verification by task risk and touched surface. Do not force every task to include every test type, but do not omit the test type that proves the task's actual behavior.
+| Task kind | Required / expected proof |
+|---|---|
+| Pure logic, data transform, parser, sorting, filtering, validator | Unit test plus negative-path case |
+| Stateful UI component or user interaction | Component test or integration test; add runtime UI check if the component must be mounted |
+| Cross-module state, API, persistence, provider, or service boundary | Integration test that proves real contract/state handoff |
+| User-facing workflow across screens/components | E2E or UI flow verification after the vertical slice exists |
+| Layout, theme, responsive, visual style | Runtime/visual viewport checks; screenshot proof when practical |
+| Keyboard/focus/form/modal/table interaction | Accessibility check for focus, labels, roles, and keyboard behavior |
+| Scaffolding/config/release plumbing | Smoke checks: typecheck/build/test/dev-server or equivalent |
+| Bug fix/regression | Regression test reproducing the old failure, then passing |
+| Performance/security-sensitive requirement or touched surface | Performance/security check only when specified by requirements, design risk, or changed boundary |
+`hapo:specs` writes the expected proof into each task. `hapo:develop` executes the task-local proof before marking the task done. `hapo:test` runs the broader system pass after implementation or for a requested feature scope.
 ## Task Hierarchy Rules
 ### Maximum 2 Levels
-- **Level 1**: Major tasks (1, 2, 3, 4...)
-- **Level 2**: Sub-tasks (1.1, 1.2, 2.1, 2.2...)
-- **No deeper nesting** (no 1.1.1)
-- If a major task would contain only a single actionable item, collapse the structure and promote the sub-task to the major level (e.g., replace `1.1` with `1.`).
-- When a major task exists purely as a container, keep the checkbox description concise and avoid duplicating detailed bullets—reserve specifics for its sub-tasks.
+- Prefer one actionable checkbox per real implementation step.
+- Use sub-tasks (`1.1`, `1.2`) only when a step has multiple separately verifiable units.
+- **No deeper nesting** (no `1.1.1`).
+- If a major task would contain only a single actionable item, collapse the structure and promote the sub-task to the major level.
+- When a major task exists purely as a container, keep the checkbox description concise and avoid duplicating detailed bullets.
 ### Sequential Numbering
 - Major tasks MUST increment: 1, 2, 3, 4, 5...
@@ -216,6 +236,6 @@ Rules:
 - If gaps found: Return to requirements or design phase
 - No requirement should be left without corresponding tasks
-Use `N.M`-style numeric requirement IDs where `N` is the top-level requirement number from requirements.md (for example, Requirement 1 → 1.1, 1.2; Requirement 2 → 2.1, 2.2), and `M` is a local index within that requirement group.
+Use the requirement ID style already present in `requirements.md` (`R1`, `REQ-01`, or `N.M`). The task filename cluster (`task-R1-01-*`) does not have to mirror every requirement ID exactly, but every requirement MUST be listed in at least one task's `## Requirements` section.
 Document any intentionally deferred requirements with rationale.

package/src/claude/skills/specs/templates/task.md CHANGED Viewed

@@ -7,9 +7,11 @@
 **Dependencies:** {{DEPENDENCIES}}
 **Spec:** specs/{{FEATURE_NAME}}/
-## Objective
+## Context
-{{Brief 1-2 sentence objective detailing WHAT to accomplish, not HOW. Must relate directly to requirement R{{REQ_NUMBER}}.}}
+- **Why**: {{Business/user reason this task exists}}
+- **Current state**: {{Relevant existing files, route, model, API, screen, or "greenfield"}}
+- **Target outcome**: {{Observable behavior after this task is done}}
 ## Constraints
@@ -18,33 +20,26 @@
 - **MUST NOT**: {{Explicitly forbidden action or approach}}
 - **SCOPE**: Implement only the behavior mapped to R{{REQ_NUMBER}} and the approved `scope_lock`; do not add out-of-scope features or leave scoped acceptance criteria unwired.
-## Implementation Steps
-- [ ] 1. {{MAJOR_STEP_1}}
-  - [ ] 1.1 {{Sub-task describing specific behavior/action}}
-    - {{Detail: business logic, behavior, target validation}}
-    - {{Detail: edge case or constraint}}
-    - _Requirements: {{REQ_NUMBER}}.{{X}}_
-  - [ ] 1.2 {{Next sub-task}}
-    - {{Detail items}}
-    - _Requirements: {{REQ_NUMBER}}.{{Y}}_
-- [ ] 2. {{MAJOR_STEP_2}}
-  - [ ] 2.1 {{Sub-task}}
-    - {{Details}}
-    - _Requirements: {{REQ_NUMBER}}.{{Z}}_
-  - [ ] 2.2 {{Sub-task}}
-    - {{Details}}
-    - _Requirements: {{REQ_NUMBER}}.{{W}}_
-- [ ] 3. Test coverage for R{{REQ_NUMBER}}
-  - [ ] 3.1 Unit tests
-    - {{Test case 1: target behavior to verify}}
-    - {{Test case 2: edge case / error case}}
-    - _Requirements: {{REQ_NUMBER}}_
-  - [ ]* 3.2 Integration tests (optional for MVP)
-    - {{Describe end-to-end flow to verify}}
-    - _Requirements: {{REQ_NUMBER}}_
+## Steps
+- [ ] 1. {{Actionable step with exact file/path/contract}}
+  - {{Business intent: what user/system behavior this enables}}
+  - {{Code detail: schema/API/component/function/route and validation rules}}
+  - _Requirements: {{REQ_NUMBER}}.{{X}}_
+- [ ] 2. {{Next actionable step}}
+  - {{Business intent}}
+  - {{Code detail, edge case, or integration contract}}
+  - _Requirements: {{REQ_NUMBER}}.{{Y}}_
+- [ ] 3. Verification implementation
+  - {{Unit/integration/e2e test or explicit manual verification hook}}
+  - _Requirements: {{REQ_NUMBER}}_
+## Requirements
+- {{REQ_NUMBER}}.{{X}} — {{Acceptance criterion or requirement covered}}
+- {{REQ_NUMBER}}.{{Y}} — {{Acceptance criterion or requirement covered}}
 ## Related Files
@@ -60,11 +55,21 @@
 - [ ] {{Criteria 3 — maps directly to acceptance criteria from requirements.md and can be proven below}}
 - [ ] {{Criteria 4 — no orphaned component/service/route/command; created runtime-facing work is reachable from the declared entrypoint or explicitly deferred to a named integration task}}
-## Task Test Plan & Verification Evidence
+## Evidence
+This section is both the task-level test plan and the proof checklist. Keep it short, exact, and executable.
+Select the proof by task risk; do not run every test type for every task.
-This section is the task-level test plan. It names the exact commands, observable runtime/artifact proof, and negative-path checks required before this task can be marked done.
+- Logic/data/validator task: include unit tests.
+- Stateful UI/component task: include component or integration tests.
+- Cross-module/API/state flow task: include integration tests.
+- User-facing end-to-end workflow: include E2E/UI flow verification.
+- Layout/theme/responsive task: include visual/runtime viewport checks.
+- Interactive UI task: include accessibility checks when keyboard, focus, labels, or ARIA can regress.
+- Scaffold/release task: include smoke build/test/dev-server checks.
+- Performance/security checks are required only when the requirement, risk, or touched surface calls for them.
-- [ ] Automated verification
+- [ ] Automated verification (unit/component/integration/E2E as applicable)
   - Command(s): `{{TYPECHECK / TEST / BUILD COMMANDS OR N/A}}`
   - Expected proof: {{What output, exit code, or report proves success}}
 - [ ] Artifact / runtime verification
@@ -89,4 +94,4 @@ This section is the task-level test plan. It names the exact commands, observabl
 > **Parallel marker**: Append `(P)` to the title if this task can run concurrently with another (usually when serving different requirements).
 > **Test note**: If a test coverage sub-task can be deferred post-MVP, mark it with `- [ ]*`.
 > **Requirement mapping**: Every sub-task MUST end with `_Requirements: X.X_`. No mapping = invalid task file.
-> **Verification rule**: No `## Task Test Plan & Verification Evidence` section = invalid task file. Existing specs may use legacy `## Verification & Evidence`; agents must support both headings.
+> **Evidence rule**: No `## Evidence` section = invalid task file. Existing specs may use `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`; agents must support all three headings.

package/src/claude/skills/sync/SKILL.md CHANGED Viewed

@@ -34,8 +34,8 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
 1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
 2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
-3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Task Test Plan & Verification Evidence` checkboxes that have actual proof. Legacy `Verification & Evidence` sections are supported.
-4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
+3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Steps` / `## Implementation Steps` and relevant `Completion Criteria` / `Evidence` checkboxes that have actual proof. `Task Test Plan & Verification Evidence` and legacy `Verification & Evidence` sections are supported.
+4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Evidence`, `## Task Test Plan & Verification Evidence`, or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
 5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
 6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.

package/src/claude/skills/sync/references/sync-protocols.md CHANGED Viewed

@@ -15,7 +15,7 @@ When requested to update a phase or change task configuration, `spec.json` must
     - full relative path like `tasks/task-R0-02-extension-shell.md`
 *   **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
 *   **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
-*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
+*   **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Evidence`, `## Task Test Plan & Verification Evidence`, or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
 *   **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
 *   **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
 *   **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
@@ -27,12 +27,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
 ### A. Completing a Task
 When `/hapo:sync <feature> <task-id> done`:
 1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
-2. Inspect `## Task Test Plan & Verification Evidence` first. If the task uses legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
+2. Inspect `## Evidence` first. If the task uses `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
 3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
 4. Replace with: `**Status:** done`.
-5. Locate block: `## Implementation Steps`.
+5. Locate block: `## Steps` or `## Implementation Steps`.
 6. Convert `- [ ]` into `- [x]` strictly within that section.
-7. Update relevant checkboxes in `## Completion Criteria` and `## Task Test Plan & Verification Evidence` only when the caller provides or the file already contains real proof. For legacy task files, update `## Verification & Evidence` instead.
+7. Update relevant checkboxes in `## Completion Criteria` and `## Evidence` only when the caller provides or the file already contains real proof. For v0.8 or legacy task files, update `## Task Test Plan & Verification Evidence` or `## Verification & Evidence` instead.
 8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
 ### B. Blocking a Task
@@ -59,7 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
    - Missing disk file referenced in registry → remove or flag it
    - Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
    - Registry says `done` but markdown still pending → update markdown only if evidence exists
-   - Either side says `done` but `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
+   - Either side says `done` but `## Evidence` / `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
    - Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
 5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
 6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.

package/src/claude/skills/test/SKILL.md CHANGED Viewed

@@ -35,7 +35,7 @@ If tests fail, list every failure explicitly — do not summarize failures away.
 <SCOPE-GATE>
 When a feature name or `specs/<feature>` path is supplied, testing is spec-aware.
-Load `spec.json`, `requirements.md`, `design.md`, active/recent task files, and Task Test Plan evidence.
+Load `spec.json`, `requirements.md`, `design.md`, active/recent task files, and task `Evidence` / test-plan proof.
 The verdict MUST compare executed/reachable behavior against `scope_lock`, requirements, design contracts, task Completion Criteria, and runtime reachability obligations.
 Build/typecheck success without scoped runtime proof is not PASS.
 </SCOPE-GATE>
@@ -85,6 +85,15 @@ If the argument resolves to `specs/<feature>` or a feature directory under `spec
 4. Treat 0 executed tests as `NO_TESTS`, even if the command exits 0
 5. In Spec-Aware Mode, inspect runtime reachability from declared entrypoints/callers and fail if scoped surfaces are missing or orphaned
+**Spec-aware test type escalation:**
+- Unit tests are mandatory when task evidence covers pure logic, transforms, validators, sorting/filtering, or regressions.
+- Component/integration tests are expected when task evidence covers stateful UI, context/store wiring, API/service boundaries, or persistence.
+- E2E/UI flow tests are expected once a complete user-facing workflow exists, not for isolated foundation tasks.
+- Visual/responsive checks are expected for layout, theme, dashboard, and style tasks.
+- Accessibility checks are expected for interactive UI surfaces where focus, roles, labels, keyboard navigation, or ARIA can regress.
+- Smoke checks are enough for scaffold/config tasks unless the task requires deeper proof.
+- Performance/security checks are only mandatory when the requirement, design risk, or touched runtime boundary calls for them.
 **UI verification (`--ui` / `--ui-auth` / `--ui-flow`):**
 Execute multi-page discovery, then spawn **Parallel UI Subagents** (test-runner instances) to handle Smoke, Core-Vitals, Accessibility, SEO, Security, and User Flows simultaneously.
 See `references/execution-strategy.md` Phase C for full phase breakdown.