npm - cc-dev-template - Versions diffs - 0.1.52 → 0.1.54 - Mend

cc-dev-template 0.1.52 → 0.1.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cc-dev-template",
-  "version": "0.1.52",
+  "version": "0.1.54",
   "description": "Structured AI-assisted development framework for Claude Code",
   "bin": {
     "cc-dev-template": "./bin/install.js"

package/src/skills/creating-agent-skills/references/create-step-2-design.md CHANGED Viewed

@@ -19,6 +19,43 @@ There are two types. Pick one:
 **How to decide:** If the skill describes a process with distinct sequential phases, it is procedural. If it captures principles or knowledge applied whenever relevant, it is informational.
+## Determine Invocation Mode
+Ask the user: "Should Claude be able to auto-activate this skill, or is it user-invoked only (via `/skill-name`)?"
+**User-invoked only** — The user explicitly triggers with `/skill-name`. Use for:
+- Actions with side effects (deploy, commit, publish)
+- Workflows the user wants full control over
+- Tasks that should never run unexpectedly
+Configuration: `disable-model-invocation: true` + minimal description (just states what it does, no trigger phrases needed)
+**Agent-invocable** — Claude detects when to activate based on conversation. Use for:
+- Knowledge and guidance skills
+- Workflows triggered by natural language patterns
+- Skills where "just knowing when" is valuable
+Configuration: No special flag + rich description with trigger phrases
+## Determine Execution Context
+Ask the user: "Should this skill run inline (in the main conversation) or as a sub-agent (isolated context)?"
+**Inline** — Skill runs in the main conversation. The agent sees all prior context and can continue the conversation naturally after the skill completes. Use for:
+- Most skills
+- Skills that need conversation history
+- Skills where follow-up interaction is expected
+Configuration: No special setting needed
+**Sub-agent (fork)** — Skill runs in an isolated context. The sub-agent cannot see prior conversation and returns a single result. Use for:
+- Heavy research or exploration tasks
+- Parallel execution of independent work
+- Skills that should not pollute the main context
+- Long-running tasks that benefit from isolation
+Configuration: `context: fork` (optionally add `agent: Explore` or `agent: Plan` for specialized behavior)
 ## Design the Frontmatter
 Every skill has YAML frontmatter. Required and optional fields:
@@ -54,6 +91,10 @@ Every skill has YAML frontmatter. Required and optional fields:
 ## Craft the Description
+The description's purpose depends on the invocation mode:
+### For Agent-Invocable Skills
 The description determines when Claude activates the skill. This is the most important piece of metadata.
 Collect 5-10 trigger phrases from the user: "When you need this skill, what would you say to Claude?"
@@ -66,7 +107,14 @@ Combine into a description that:
 **Key insight:** If the description explains too much about WHAT the skill does, Claude believes it already knows enough and will not activate. Keep it about WHEN.
-**Context budget note:** Skill descriptions load at startup and share a character budget (default 15,000 chars across all skills). Keep descriptions tight — they cost tokens every session.
+### For User-Invoked Only Skills
+The description just needs to tell the user what the skill does. Keep it minimal:
+- One sentence stating the action
+- No trigger phrases needed (Claude will not use them)
+- Example: "Deploy the application to production"
+**Context budget note:** Skill descriptions load at startup and share a character budget (default 15,000 chars across all skills). Keep descriptions tight — they cost tokens every session. User-invoked skills with `disable-model-invocation: true` should have especially minimal descriptions since Claude does not need activation cues.
 ## Plan the File Layout
@@ -103,6 +151,8 @@ List out the steps. Each step becomes one markdown file in `references/`. For ea
 Present the design:
 - Name
 - Type (informational or procedural)
+- Invocation mode (agent-invocable or user-invoked only)
+- Execution context (inline or sub-agent)
 - Frontmatter configuration
 - Description text
 - File layout
@@ -110,4 +160,9 @@ Present the design:
 Ask: "Does this design look right?"
-Read `references/create-step-3-write.md` when the design is confirmed.
+## Next Step
+| Context | Action |
+|---------|--------|
+| Creating a new skill | Read `references/create-step-3-write.md` |
+| Fixing an existing skill (came from fix-step-1-diagnose) | Read `references/fix-step-2-apply.md` to apply the structural changes |

package/src/skills/creating-agent-skills/references/fix-step-1-diagnose.md CHANGED Viewed

@@ -80,4 +80,31 @@ Present the diagnosis to the user:
 - What principle it violates
 - What the fix would look like
-Read `references/fix-step-2-apply.md` when diagnosis is complete and findings are summarized.
+## Classify the Fix Type
+Based on your diagnosis, determine which type of fix is needed:
+**Surface fixes** — wording, style, dead ends, missing chain links, description tweaks:
+- Fixing second person to imperative
+- Adding positive framing
+- Removing meta-descriptions
+- Fixing broken file references
+- Small description improvements
+**Structural changes** — anything that changes the skill's architecture:
+- Converting between informational and procedural types
+- Adding or removing step files
+- Adding new workflow branches
+- Redesigning the file layout
+- Adding scripts/ or templates/ directories
+- Significantly rewriting the description and trigger phrases
+- Changing frontmatter configuration (allowed-tools, context, agent, etc.)
+Tell the user which type of fix is needed.
+## Next Step
+| Fix Type | Action |
+|----------|--------|
+| Surface fixes only | Read `references/fix-step-2-apply.md` |
+| Structural changes needed | Read `references/create-step-2-design.md` first to design the new structure, then read `references/fix-step-2-apply.md` to apply |

package/src/skills/creating-agent-skills/references/fix-step-2-apply.md CHANGED Viewed

@@ -2,6 +2,17 @@
 Fix the issues identified in diagnosis. Plan the changes, confirm with the user, then apply.
+## If Coming From Design Step
+If you just read `create-step-2-design.md` for structural changes, you now have:
+- The skill type decision (informational vs procedural)
+- The frontmatter configuration
+- The crafted description with trigger phrases
+- The file layout plan
+- For procedural: the step breakdown
+Use that design as the blueprint for your changes. The writing principles below still apply to all content you write.
 ## Plan Changes First
 Before editing, state:
@@ -87,4 +98,6 @@ Does this justify its token cost? If Claude already knows it — remove it. If i
 Make all planned modifications now.
+For structural changes that involve writing new files (new step files, new SKILL.md sections, new scripts), reference `references/create-step-3-write.md` for the complete writing guidance — it covers size targets, step file structure, MCP tool references, and anti-patterns in more depth than the principles above.
 Read `references/fix-step-3-validate.md` when all fixes are applied.

package/src/skills/creating-agent-skills/references/fix-step-3-validate.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # Step 3: Validate and Test
+For structural changes, use the full review checklist from `references/create-step-4-review.md` — it covers additional criteria like size targets, frontmatter validation, and description quality that matter when you've changed the skill's architecture.
 ## Public Knowledge Audit
 Go through every changed file. Find each code block, CLI command, API call, and implementation detail.

package/src/skills/project-setup/SKILL.md ADDED Viewed

@@ -0,0 +1,11 @@
+---
+name: project-setup
+description: Set up standardized dev environment with Makefile, scripts, and hooks.
+disable-model-invocation: true
+---
+# Project Setup
+Configure this project with a standardized development environment.
+Read `references/step-1-analyze.md`.

package/src/skills/project-setup/references/step-1-analyze.md ADDED Viewed

@@ -0,0 +1,54 @@
+# Step 1: Analyze the Project
+Understand the project before making changes.
+## Detect Project Type
+Check for language indicators:
+| Language | Indicators |
+|----------|------------|
+| TypeScript/JavaScript | `package.json`, `tsconfig.json`, `.ts`/`.tsx` files |
+| Go | `go.mod`, `go.sum`, `.go` files |
+| C# | `*.csproj`, `*.sln`, `.cs` files |
+If multiple languages exist, this may be a monorepo. Note each component.
+## Check for Submodules
+Run `git submodule status`. If submodules exist, ask the user:
+- Which submodule(s) to set up
+- Whether to orchestrate multiple submodules together under one `make dev`
+## Check Existing Setup
+Look for:
+- `Makefile` in root
+- `scripts/` directory with `build.sh`, `dev.sh`, `test.sh`
+- `.claude/settings.json` with Stop hook
+- `.git/hooks/pre-commit`
+If partial setup exists, show the user what's already configured vs what's missing. Ask whether to:
+- Complete the missing pieces
+- Replace existing setup entirely
+- Verify existing setup works correctly
+## Gather Project-Specific Details
+For the detected language(s), identify:
+- Build command (e.g., `npm run build`, `go build ./...`, `dotnet build`)
+- Dev server command and typical port
+- Test command (e.g., `npm test`, `go test ./...`, `dotnet test`)
+- Lint/format commands (e.g., `npm run lint`, `go fmt ./...`)
+If anything is unclear, ask the user.
+## When Ready
+Once you have:
+- Confirmed project type
+- Resolved any submodule questions
+- Determined what needs to be created vs already exists
+- Gathered language-specific commands
+Read `references/step-2-makefile.md`.

package/src/skills/project-setup/references/step-2-makefile.md ADDED Viewed

@@ -0,0 +1,72 @@
+# Step 2: Create Makefile and Scripts
+Create the Makefile and supporting scripts. The Makefile stays minimal—it just calls scripts.
+## Create the Makefile
+Create `Makefile` in the project root:
+```makefile
+.PHONY: build dev stop test
+build:
+	@./scripts/build.sh
+dev:
+	@./scripts/dev.sh start
+stop:
+	@./scripts/dev.sh stop
+test:
+	@./scripts/test.sh
+```
+## Create scripts/build.sh
+Purpose: Compile, typecheck, lint. Return minimal output. No binaries.
+Requirements:
+- Run all quality checks for the detected language (typecheck, lint, format check)
+- Exit 0 on success with single "Build passed" message
+- Exit non-zero on failure with only the relevant error lines
+- Strip ANSI colors and limit output to essential errors
+## Create scripts/dev.sh
+Purpose: Start/stop dev server as background process. Track state via PID file.
+Requirements:
+- Accept `start` or `stop` argument
+- On `start`:
+  - Check if already running (PID file exists and process alive)
+  - If running, print "Already running on port XXXX" and exit 0
+  - If not running, start in background
+  - Pipe all output to `agent.log` in project root
+  - Truncate `agent.log` on each start (fresh logs)
+  - Save PID to `.dev.pid`
+  - Print "Started on port XXXX. Logs: agent.log"
+- On `stop`:
+  - Kill process from PID file
+  - Clean up PID file
+  - Print "Stopped"
+## Create scripts/test.sh
+Purpose: Run ALL tests. Return pass/fail with minimal output.
+Requirements:
+- Run the full test suite (unit, integration, all test types)
+- On success: print "All tests passed"
+- On failure: print only failing test names and brief error messages
+- Filter verbose test output to extract just failures
+## Make Scripts Executable
+```bash
+chmod +x scripts/build.sh scripts/dev.sh scripts/test.sh
+```
+## When Ready
+Once all scripts are created and executable, read `references/step-3-hooks.md`.

package/src/skills/project-setup/references/step-3-hooks.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Step 3: Configure Hooks
+Set up the stop hook and pre-commit hook.
+## Stop Hook
+The stop hook runs `make build` when the agent finishes. If build fails, the agent continues working to fix it (up to 3 attempts).
+### Create .claude/settings.json
+```json
+{
+  "hooks": {
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node .claude/hooks/quality-gate.cjs"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+If `.claude/settings.json` already exists, merge the hooks configuration.
+### Create the Quality Gate Hook
+Copy the template from this skill's `templates/quality-gate.cjs` to `.claude/hooks/quality-gate.cjs`.
+Adapt it for this project:
+- Update the `checks` array to match project language
+- For TypeScript: tsc, eslint, tests
+- For Go: go build, go vet, go test
+- For C#: dotnet build, dotnet test
+The hook:
+1. Reads JSON from stdin (required by hook protocol)
+2. Runs `make build`
+3. If build passes: returns `{ "decision": "approve" }`
+4. If build fails: returns `{ "decision": "block", "reason": "..." }` with concise error info
+5. Checks `stop_hook_active` field—if true and still failing, approve to prevent infinite loops
+## Pre-Commit Hook
+Create `.git/hooks/pre-commit`:
+```bash
+#!/bin/bash
+set -e
+echo "Running pre-commit checks..."
+# Run build checks
+if ! make build; then
+    echo "Build failed. Commit aborted."
+    exit 1
+fi
+# Run tests
+if ! make test; then
+    echo "Tests failed. Commit aborted."
+    exit 1
+fi
+echo "Pre-commit checks passed."
+```
+Make it executable:
+```bash
+chmod +x .git/hooks/pre-commit
+```
+## When Ready
+Once hooks are configured, read `references/step-4-documentation.md`.

package/src/skills/project-setup/references/step-4-documentation.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Step 4: Update CLAUDE.md
+Document the make commands so the agent knows how to use them.
+## If CLAUDE.md Exists
+Add a "Dev Commands" section. Place it near the top, after any project overview.
+## If CLAUDE.md Does Not Exist
+Create it with the dev commands section plus a minimal project description.
+## Content to Add
+```markdown
+## Dev Commands
+| Command | Purpose |
+|---------|---------|
+| `make dev` | Start dev server (background). Logs to `agent.log`. |
+| `make stop` | Stop the dev server. |
+| `make build` | Run all build checks (typecheck, lint). No output on success. |
+| `make test` | Run all tests. Shows only failures. |
+- `agent.log` is cleared on each `make dev` — check it for errors after changes
+- Stop hook runs `make build` automatically when you finish working
+- Pre-commit hook runs `make build` and `make test` before each commit
+```
+Adjust the table if this project has additional commands or specific notes.
+## When Ready
+Once CLAUDE.md is updated, read `references/step-5-verify.md`.

package/src/skills/project-setup/references/step-5-verify.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Step 5: Verify Setup
+Run each command to confirm everything works.
+## Verification Steps
+### 1. Build
+```bash
+make build
+```
+Expected: exits 0, minimal output (or single success message).
+If it fails, fix the underlying issue—missing dependencies, syntax errors, etc.
+### 2. Test
+```bash
+make test
+```
+Expected: exits 0 with "All tests passed" or similar.
+If tests fail, that's fine—report them to the user but the setup itself is working.
+### 3. Dev Server
+```bash
+make dev
+```
+Expected: prints port and confirms it's running in background.
+Then verify it's actually running:
+```bash
+make dev
+```
+Expected: prints "Already running on port XXXX".
+### 4. Stop
+```bash
+make stop
+```
+Expected: prints "Stopped" or similar.
+Then verify it actually stopped:
+```bash
+make dev
+```
+Expected: starts fresh (not "already running").
+### 5. agent.log
+Check that `agent.log` exists and contains server output.
+## Report Results
+Tell the user:
+- Which commands succeeded
+- Any issues encountered
+- Whether the project is ready to use
+If everything passed, the setup is complete.

package/src/skills/project-setup/templates/quality-gate.cjs ADDED Viewed

@@ -0,0 +1,132 @@
+#!/usr/bin/env node
+/**
+ * Quality gate stop hook.
+ * Runs make build when the agent stops. Blocks if build fails (up to 3 retries).
+ *
+ * Adapt the `checks` array for your project's language/tooling.
+ */
+const { execSync } = require('child_process');
+const fs = require('fs');
+const path = require('path');
+const PROJECT_ROOT = process.cwd();
+const MAX_RETRIES = 3;
+const RETRY_FILE = path.join(PROJECT_ROOT, '.claude', 'hooks', '.quality-gate-retries');
+function approve() {
+  cleanupRetryFile();
+  console.log(JSON.stringify({ decision: 'approve' }));
+  process.exit(0);
+}
+function block(reason) {
+  console.log(JSON.stringify({ decision: 'block', reason }));
+  process.exit(0);
+}
+function cleanupRetryFile() {
+  try {
+    if (fs.existsSync(RETRY_FILE)) {
+      fs.unlinkSync(RETRY_FILE);
+    }
+  } catch (e) {
+    // Ignore cleanup errors
+  }
+}
+function getRetryCount() {
+  try {
+    if (fs.existsSync(RETRY_FILE)) {
+      return parseInt(fs.readFileSync(RETRY_FILE, 'utf8').trim(), 10) || 0;
+    }
+  } catch (e) {
+    // Ignore read errors
+  }
+  return 0;
+}
+function incrementRetryCount() {
+  const count = getRetryCount() + 1;
+  const dir = path.dirname(RETRY_FILE);
+  if (!fs.existsSync(dir)) {
+    fs.mkdirSync(dir, { recursive: true });
+  }
+  fs.writeFileSync(RETRY_FILE, String(count));
+  return count;
+}
+// Strip ANSI color codes
+function stripAnsi(str) {
+  return str.replace(/\x1B\[[0-9;]*[a-zA-Z]/g, '');
+}
+// Extract key error lines from output
+function extractErrors(output, maxLines = 10) {
+  const clean = stripAnsi(output);
+  const lines = clean.split('\n');
+  const errorLines = lines.filter(line =>
+    line.toLowerCase().includes('error') ||
+    line.includes('FAIL') ||
+    line.includes('failed') ||
+    line.match(/:\d+:\d+/) // file:line:col pattern
+  );
+  return errorLines.slice(0, maxLines);
+}
+async function main() {
+  try {
+    // Read hook input from stdin
+    let input = '';
+    for await (const chunk of process.stdin) {
+      input += chunk;
+    }
+    JSON.parse(input); // Validate JSON (content not needed)
+    // Check retry count
+    const retries = getRetryCount();
+    if (retries >= MAX_RETRIES) {
+      // Max retries reached, approve to prevent infinite loop
+      approve();
+      return;
+    }
+    // Run make build
+    try {
+      execSync('make build', {
+        cwd: PROJECT_ROOT,
+        encoding: 'utf8',
+        stdio: 'pipe',
+        timeout: 120000
+      });
+      // Build passed
+      approve();
+    } catch (error) {
+      const output = (error.stdout || '') + '\n' + (error.stderr || '');
+      const errors = extractErrors(output);
+      // Increment retry count
+      const currentRetry = incrementRetryCount();
+      let message = `Build failed (attempt ${currentRetry}/${MAX_RETRIES})`;
+      if (errors.length > 0) {
+        message += ':\n' + errors.map(e => `  ${e}`).join('\n');
+      }
+      if (currentRetry >= MAX_RETRIES) {
+        message += '\n\nMax retries reached. Fix manually and run `make build`.';
+        // Still block this time, but next time will approve
+      }
+      block(message);
+    }
+  } catch (e) {
+    // Parse error or other issue - approve to fail open
+    approve();
+  }
+}
+main();

package/src/skills/spec-interview/SKILL.md CHANGED Viewed

@@ -6,6 +6,23 @@ argument-hint: <spec-name>
 # Spec Interview
+## Context Hygiene
+**IMPORTANT:** During planning, protect the context window. Never write code. Never search, grep, or read files directly.
+Use Explorer subagents for ALL codebase research:
+- Explorer uses a faster, cheaper model
+- Explorer works better with focused tasks
+- Explorer returns only relevant findings, keeping your context clean
+**Layered approach:**
+1. First: One Explorer for broad understanding of a system
+2. Then: Multiple Explorers in parallel for deep dives on specifics
+Spin up as many Explorers as needed. There is no downside to parallel subagents.
+**Why this matters:** Search results and file contents that aren't directly relevant cause context rot, degrading planning quality. Subagents curate information before it enters your context.
 ## What To Do Now
 If an argument was provided, use it as the feature name. Otherwise, ask what feature to spec out.

package/src/skills/spec-interview/references/step-1-opening.md CHANGED Viewed

@@ -4,7 +4,7 @@ Establish understanding of the feature before diving into details.
 ## Opening Questions
-Ask one or two questions at a time. Follow up on anything unclear.
+Use AskUserQuestion to gather information. Ask one or two questions at a time. Follow up on anything unclear.
 Start with:
 - What problem does this feature solve?
@@ -16,6 +16,6 @@ Then explore:
 ## When to Move On
-Move to `references/step-2-deep-dive.md` when:
+Move to `references/step-2-ui-ux.md` when:
 - The core problem and user goal are clear
 - Success criteria are understood at a high level

package/src/skills/spec-interview/references/step-2-ui-ux.md ADDED Viewed

@@ -0,0 +1,73 @@
+# Step 2: UI/UX Design
+If the feature has no user interface, skip to `references/step-3-deep-dive.md`.
+## Determine Design Direction
+Before any wireframes, establish the visual approach. Use AskUserQuestion to confirm:
+**Product context:**
+- What does this product need to feel like?
+- Who uses it? (Power users want density, occasional users want guidance)
+- What's the emotional job? (Trust, efficiency, delight, focus)
+**Design direction options:**
+- Precision & Density — tight spacing, monochrome, information-forward (Linear, Raycast)
+- Warmth & Approachability — generous spacing, soft shadows, friendly (Notion, Coda)
+- Sophistication & Trust — cool tones, layered depth, financial gravitas (Stripe, Mercury)
+- Boldness & Clarity — high contrast, dramatic negative space (Vercel)
+- Utility & Function — muted palette, functional density (GitHub)
+**Color foundation:**
+- Warm (creams, warm grays) — approachable, human
+- Cool (slate, blue-gray) — professional, serious
+- Pure neutrals (true grays) — minimal, technical
+**Layout approach:**
+- Dense grids for scanning/comparing
+- Generous spacing for focused tasks
+- Sidebar navigation for multi-section apps
+- Split panels for list-detail patterns
+Use AskUserQuestion to present 2-3 options and get the user's preference.
+## Create ASCII Wireframes
+Sketch the interface in ASCII. Keep it rough—this is for alignment, not pixel precision.
+```
+Example:
+┌─────────────────────────────────────────┐
+│  Page Title                [Action ▾]   │
+├──────────┬──────────────────────────────┤
+│ Nav Item │  Content Area                │
+│ Nav Item │  ┌─────────────────────────┐ │
+│ Nav Item │  │ Component               │ │
+│          │  └─────────────────────────┘ │
+└──────────┴──────────────────────────────┘
+```
+Create wireframes for:
+- Primary screen(s) the user will interact with
+- Key states (empty, loading, error, populated)
+- Any modals or secondary views
+Present each wireframe to the user. Use AskUserQuestion to confirm or iterate.
+## Map User Flows
+For each primary action, document the interaction sequence:
+1. Where does the user start?
+2. What do they click/type?
+3. What feedback do they see?
+4. Where do they end up?
+Format as simple numbered steps under each flow name.
+## When to Move On
+Proceed to `references/step-3-deep-dive.md` when:
+- Design direction is agreed upon
+- Wireframes exist for primary screens
+- User has confirmed the layout approach

package/src/skills/spec-interview/references/{step-2-deep-dive.md → step-3-deep-dive.md} RENAMED Viewed

@@ -1,7 +1,9 @@
-# Step 2: Deep Dive
+# Step 3: Deep Dive
 Cover all specification areas through conversation. Update `docs/specs/<name>/spec.md` incrementally as information emerges.
+Use AskUserQuestion whenever requirements are ambiguous or multiple approaches exist. Present options with tradeoffs and get explicit decisions.
 ## Areas to Cover
 ### Intent & Goals
@@ -13,7 +15,17 @@ Cover all specification areas through conversation. Update `docs/specs/<name>/sp
 - External services, APIs, or libraries
 - Data flows in and out
-Spawn exploration subagents to investigate the codebase when integration questions arise. They return only relevant findings.
+**IMPORTANT:** Use Explorer subagents for all codebase investigation. Never search or read files directly.
+Layered approach:
+1. First Explorer: "How does [system] work at a high level?"
+2. Parallel Explorers: Deep dive into specific components identified in step 1
+Example: To understand auth integration:
+- Explorer 1: "How does authentication work in this codebase?"
+- Then parallel: "How are auth tokens validated?", "Where is the user session stored?", "What middleware handles protected routes?"
+No assumptions. If you don't know how something works, send an Explorer to find out.
 ### Data Model
 - Entities and relationships
@@ -74,4 +86,4 @@ Write to `docs/specs/<name>/spec.md` with this structure:
 ## When to Move On
-Move to `references/step-3-research-needs.md` when all areas have been covered and the spec document is substantially complete.
+Move to `references/step-4-research-needs.md` when all areas have been covered and the spec document is substantially complete.

package/src/skills/spec-interview/references/{step-3-research-needs.md → step-4-research-needs.md} RENAMED Viewed

@@ -1,4 +1,4 @@
-# Step 3: Identify Research Needs
+# Step 4: Identify Research Needs
 Before finalizing, determine if implementation requires unfamiliar paradigms.
@@ -12,10 +12,16 @@ This is not about whether Claude knows how to do something in general. It's abou
 Review the spec's integration points, data model, and behavior sections.
-For each significant implementation element:
-1. Search the codebase for existing examples of this pattern
-2. If found → this paradigm is established, no research needed
-3. If not found → this is a new paradigm requiring research
+**IMPORTANT:** Use Explorer subagents to check for existing patterns. Never search directly.
+For each significant implementation element, spawn an Explorer:
+- "Does this codebase have an existing example of [pattern]? If yes, where and how does it work?"
+Spin up multiple Explorers in parallel for different patterns.
+Based on Explorer findings:
+- If pattern exists → paradigm is established, no research needed
+- If not found → this is a new paradigm requiring research
 Examples of "new paradigm" triggers:
 - Using a library not yet in the project
@@ -27,18 +33,18 @@ Examples of "new paradigm" triggers:
 For each new paradigm identified:
 1. State what needs research and why (no existing example found)
-2. Ask the user if they want to proceed with research, or if they have existing knowledge to share
+2. Use AskUserQuestion to ask if they want to proceed with research, or if they have existing knowledge to share
 3. If proceeding, invoke the `research` skill for that topic
 Wait for research to complete before continuing. The research output goes to `docs/research/` and informs implementation.
 ## If No Research Needed
-State that all paradigms have existing examples in the codebase. Proceed to `references/step-4-finalize.md`.
+State that all paradigms have existing examples in the codebase. Proceed to `references/step-5-verification.md`.
 ## When to Move On
-Proceed to `references/step-4-finalize.md` when:
+Proceed to `references/step-5-verification.md` when:
 - All new paradigms have been researched, OR
 - User confirmed no research is needed, OR
 - All patterns have existing codebase examples

package/src/skills/spec-interview/references/step-5-verification.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Step 5: Verification Planning
+Every acceptance criterion needs a specific, executable verification method. The goal: autonomous implementation with zero ambiguity about whether something works.
+## Verification Methods
+### UI Verification: agent-browser
+For any criterion involving visual output or user interaction, use Vercel's agent-browser CLI:
+```
+agent-browser open <url>              # Navigate to page
+agent-browser snapshot                # Get accessibility tree with refs
+agent-browser click @ref              # Click element by ref
+agent-browser fill @ref "value"       # Fill input by ref
+agent-browser get text @ref           # Read text content
+agent-browser screenshot file.png     # Capture visual state
+agent-browser close                   # Close browser
+```
+Example verification for "Dashboard shows signup count":
+1. `agent-browser open /admin`
+2. `agent-browser snapshot`
+3. `agent-browser get text @signup-count`
+4. Assert returned value is a number
+### Automated Tests
+For logic, data, and API behavior, specify the exact test:
+- Unit tests for pure functions
+- Integration tests for API endpoints
+- End-to-end tests for critical flows
+Include the test file path: `pnpm test src/convex/featureFlags.test.ts`
+### Database/State Verification
+For data persistence criteria:
+1. Perform the action
+2. Query the database directly
+3. Assert expected state
+### Manual Verification (Fallback)
+If no automated method exists, document exactly what to check. Flag these as candidates for future automation.
+## Update Each Acceptance Criterion
+Review every acceptance criterion in the spec. Add a verification method using this format:
+```markdown
+## Acceptance Criteria
+- [ ] Dashboard loads in under 2s
+  **Verify:** `agent-browser open /admin`, measure time to snapshot ready
+- [ ] Flag toggles persist across refresh
+  **Verify:** `pnpm test src/convex/featureFlags.test.ts` (toggle persistence test)
+- [ ] Signup chart shows accurate counts
+  **Verify:** `agent-browser get text @chart-total`, compare to `npx convex run users:count`
+```
+## Confirm With User
+Use AskUserQuestion to review verification methods with the user:
+- "For [criterion], I'll verify by [method]. Does that prove it works?"
+- Flag any criteria where verification seems insufficient
+The standard: if the agent executes the verification and it passes, the feature is done. No human checking required.
+## When to Move On
+Proceed to `references/step-6-finalize.md` when every acceptance criterion has a verification method and the user agrees each method proves the criterion works.

package/src/skills/spec-interview/references/step-6-finalize.md ADDED Viewed

@@ -0,0 +1,52 @@
+# Step 6: Finalize
+Review the spec for completeness and soundness, then hand off.
+## Run Both Reviews
+Invoke both skills in parallel, specifying the spec path:
+- `spec-review` — checks completeness, format, and implementation readiness
+- `spec-sanity-check` — checks logic, assumptions, and unconsidered scenarios
+Both return findings to you. They do not modify the spec directly.
+## Curate the Findings
+Synthesize findings from both reviews. Some findings may be:
+- Critical issues that must be addressed
+- Valid suggestions worth considering
+- Pedantic or irrelevant items to skip
+For each finding, form a recommendation: address it or skip it, and why.
+## Walk Through With User
+Use AskUserQuestion to present findings in batches (2-3 at a time). For each finding:
+- State what the review found
+- Give your recommendation (always include a recommended option)
+- Let user decide: fix, skip, or something else
+Track two lists:
+- **Addressed**: findings the user chose to fix
+- **Intentionally skipped**: findings the user chose to ignore
+After walking through all findings, make the approved changes to the spec.
+## Offer Another Pass
+Use AskUserQuestion: "Do you want to run the reviews again?"
+If yes, invoke both reviews again with additional context:
+- "We already ran a review. These changes were made: [list]. These findings were intentionally skipped: [list]. Look for anything new we haven't considered."
+Repeat the curate → walk through → offer another pass cycle until user is satisfied.
+## Complete the Interview
+Once user confirms no more review passes needed:
+1. Show the user the final spec
+2. Use AskUserQuestion to confirm they are satisfied
+3. Ask if they want to proceed to task breakdown
+If yes, invoke `spec-to-tasks` and specify which spec to break down.

package/src/skills/spec-review/SKILL.md CHANGED Viewed

@@ -11,8 +11,10 @@ context: fork
 1. **Find the spec** - Use the path from the prompt if provided. Otherwise, find the most recently modified file in `docs/specs/`. If no specs exist, inform the user and stop.
 2. **Read the spec file**
-3. **Evaluate against the checklist below**
-4. **Return structured feedback using the output format**
+3. **Find all CLAUDE.md files** - Search for every CLAUDE.md in the project (root and subdirectories)
+4. **Read all CLAUDE.md files** - These contain project constraints and conventions
+5. **Evaluate against the checklist below** - Including CLAUDE.md alignment
+6. **Return structured feedback using the output format**
 ## Completeness Checklist
@@ -25,22 +27,33 @@ A spec is implementation-ready when ALL of these are satisfied:
 - [ ] **Integration points mapped** - What existing code this touches is documented
 - [ ] **Core behavior specified** - Main flows are step-by-step clear
 - [ ] **Acceptance criteria exist** - Testable requirements are listed
+- [ ] **Verification methods defined** - Every acceptance criterion has a specific way to verify it (test command, agent-browser steps, or explicit check)
+- [ ] **No ambiguities** - Nothing requires interpretation; all requirements are explicit
+- [ ] **No unknowns** - All information needed for implementation is present; nothing left to discover
+- [ ] **CLAUDE.md alignment** - Spec does not conflict with constraints in any CLAUDE.md file
 ### Should Have (Gaps that cause implementation friction)
 - [ ] **Edge cases covered** - Error conditions and boundaries are addressed
 - [ ] **External dependencies documented** - APIs, libraries, services are listed
 - [ ] **Blockers section exists** - Missing credentials, pending decisions are called out
+- [ ] **UI/UX wireframes exist** - If feature has a user interface, ASCII wireframes are present
+- [ ] **Design direction documented** - If feature has UI, visual approach is explicit (not assumed)
 ### Implementation Readiness
-The test: could someone implement this feature completely hands-off, with zero questions?
+The test: could an agent implement this feature with ZERO assumptions? If the agent would need to guess, interpret, or discover anything, the spec is not ready.
 Flag these problems:
 - Vague language ("should handle errors appropriately" — HOW?)
 - Missing details ("integrates with auth" — WHERE? HOW?)
 - Unstated assumptions ("uses the standard pattern" — WHICH pattern?)
 - Blocking dependencies ("needs API access" — DO WE HAVE IT?)
+- Unverifiable criteria ("dashboard works correctly" — HOW DO WE CHECK?)
+- Missing verification ("loads fast" — WHAT COMMAND PROVES IT?)
+- Implicit knowledge ("depends on how X works" — SPECIFY IT)
+- Unverified claims ("the API returns..." — HAS THIS BEEN CONFIRMED?)
+- CLAUDE.md conflicts (spec proposes X but CLAUDE.md requires Y — WHICH IS IT?)
 ## Output Format
@@ -54,6 +67,9 @@ Return the review as:
 ### Missing (Blocking)
 - [Item]: [What's missing and why it blocks implementation]
+### CLAUDE.md Conflicts
+- [Constraint from CLAUDE.md]: [How the spec conflicts with it]
 ### Gaps (Non-blocking but should address)
 - [Item]: [What's unclear or incomplete]

package/src/skills/spec-sanity-check/SKILL.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: spec-sanity-check
+description: This skill should be used alongside spec-review to catch logic gaps and incorrect assumptions. Invoked when the user says "sanity check this spec", "does this plan make sense", or "what am I missing". Also auto-invoked by spec-interview during finalization.
+argument-hint: <spec-path>
+context: fork
+---
+# Spec Sanity Check
+Provide a "fresh eyes" review of the spec. This is different from spec-review — you're not checking format or completeness. You're checking whether the plan will actually work.
+## Find the Spec
+Use the path from the prompt if provided. Otherwise, find the most recently modified file in `docs/specs/`. If no specs exist, inform the user and stop.
+## Read and Understand
+Read the entire spec. Understand what is being built and how.
+## Ask These Questions
+For each section of the spec, challenge it:
+### Logic Gaps
+- Does the described flow actually work end-to-end?
+- Are there steps that assume a previous step succeeded without checking?
+- Are there circular dependencies?
+- Does the order of operations make sense?
+### Incorrect Assumptions
+- Are there assumptions about how existing systems work that might be wrong?
+- Are there assumptions about external APIs, libraries, or services?
+- Are there assumptions about data formats or availability?
+- Use Explorer subagents to verify assumptions against the actual codebase
+### Unconsidered Scenarios
+- What happens in edge cases not explicitly covered?
+- What happens under load or at scale?
+- What happens if external dependencies fail?
+- What happens if data is malformed or missing?
+### Implementation Pitfalls
+- Are there common bugs this approach would likely introduce?
+- Are there security implications not addressed?
+- Are there performance implications not addressed?
+- Are there race conditions or timing issues?
+### The "What If" Test
+- What if [key assumption] is wrong?
+- What if [external dependency] changes?
+- What if [data volume] is 10x what we expect?
+## Output Format
+Return findings as:
+```
+## Sanity Check: [Feature Name]
+### Status: [SOUND | CONCERNS]
+### Logic Issues
+- [Issue]: [Why this is a problem]
+### Questionable Assumptions
+- [Assumption]: [Why this might be wrong] [Suggestion to verify]
+### Unconsidered Scenarios
+- [Scenario]: [What could go wrong]
+### Potential Pitfalls
+- [Pitfall]: [How to avoid]
+### Recommendation
+[Either "Plan is sound" or specific concerns to address]
+```
+**SOUND**: No significant concerns found.
+**CONCERNS**: Issues that should be addressed before implementation.

package/src/skills/spec-interview/references/step-4-finalize.md DELETED Viewed

@@ -1,19 +0,0 @@
-# Step 4: Finalize
-Review the spec for completeness and hand off.
-## Review for Gaps
-Invoke the `spec-review` skill, specifying which spec to review. It analyzes the spec and returns feedback.
-If gaps are found, ask follow-up questions to address them. Repeat until review passes.
-## Complete the Interview
-Once review passes:
-1. Show the user the final spec
-2. Confirm they are satisfied
-3. Ask if they want to proceed to task breakdown
-If yes, invoke `spec-to-tasks` and specify which spec to break down.