npm - pan-wizard - Versions diffs - 2.8.1 → 2.9.1 - Mend

pan-wizard 2.8.1 → 2.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +4 -2
package/bin/install.js +23 -0
package/commands/pan/assumptions.md +38 -3
package/commands/pan/audit-deployment.md +6 -0
package/commands/pan/debug.md +71 -2
package/commands/pan/exec-phase.md +90 -0
package/commands/pan/focus-auto.md +181 -18
package/commands/pan/focus-design.md +302 -14
package/commands/pan/focus-doc-audit.md +530 -0
package/commands/pan/focus-drift-walking.md +525 -0
package/commands/pan/focus-exec.md +168 -46
package/commands/pan/focus-plan.md +204 -12
package/commands/pan/focus-scan.md +17 -5
package/commands/pan/map-codebase.md +32 -6
package/commands/pan/milestone-audit.md +23 -0
package/commands/pan/new-project.md +64 -0
package/commands/pan/pause.md +42 -1
package/commands/pan/plan-phase.md +84 -0
package/commands/pan/profile.md +2 -1
package/commands/pan/quick.md +15 -0
package/commands/pan/resume.md +62 -2
package/commands/pan/verify-phase.md +42 -0
package/package.json +1 -1
package/pan-wizard-core/bin/lib/commands.cjs +29 -7
package/pan-wizard-core/bin/lib/config.cjs +10 -0
package/pan-wizard-core/bin/lib/constants.cjs +3 -1
package/pan-wizard-core/bin/lib/core.cjs +168 -21
package/pan-wizard-core/bin/lib/focus.cjs +5 -0
package/pan-wizard-core/bin/lib/verify.cjs +283 -4
package/pan-wizard-core/bin/pan-tools.cjs +11 -2
package/pan-wizard-core/references/model-profiles.md +191 -62
package/pan-wizard-core/workflows/help.md +11 -1
package/pan-wizard-core/workflows/profile.md +8 -1
package/pan-wizard-core/workflows/settings.md +14 -0
package/scripts/generate-skills-docs.py +560 -0

package/commands/pan/focus-exec.md CHANGED Viewed

@@ -18,6 +18,18 @@ Execute items from the current focus batch with capacity-based sizing, full sess
 **Goal:** One-command pipeline that starts a session, loads the planned batch, implements items with tier-based execution protocols, verifies the work, syncs documentation, and closes the session cleanly.
+<completion_contract>
+Execution is complete when ALL conditions are met:
+1. All batch items processed (each marked DONE or FAILED with reason)
+2. Full test suite passes with count >= Stage 1 baseline
+3. Stage 6 pre-commit checklist passes (all 6 checks)
+4. Commit created listing only VERIFIED items
+5. Session recorded with before/after test counts and budget usage
+6. Active scan file updated with item statuses
+Execution FAILS if: test baseline cannot be established (Stage 1), or test count drops below baseline after all reverts.
+</completion_contract>
 ---
 ## Pipeline Overview
@@ -46,13 +58,33 @@ Execute items from the current focus batch with capacity-based sizing, full sess
     - Commit, record session, generate summary
 ```
+<action_gating>
+Each stage has a restricted set of appropriate actions. Using the wrong tool at the wrong stage causes regressions.
+| Stage | Read | Grep/Glob | Edit/Write | Bash (tests) | Bash (git) |
+|-------|------|-----------|------------|--------------|------------|
+| 1. Session Start | YES | YES | NO | YES | YES |
+| 2. Batch Loading | YES | YES | NO | NO | NO |
+| 3. Execution | YES | YES | YES | YES | NO |
+| 4. Verification | YES | YES | NO | YES | NO |
+| 5. Doc Sync | YES | YES | YES | NO | NO |
+| 6. Session End | YES | NO | YES | NO | YES |
+**Key constraints:**
+- Stage 1: NO Edit/Write — you are establishing baseline, not changing code
+- Stage 2: Read-only — validating the batch, not modifying anything
+- Stage 4: NO Edit/Write — you are verifying work, not doing more work. If tests fail, go back to Stage 3 to fix.
+- Stage 5: Edit docs only — no code changes during doc sync
+- Stage 6: Git operations + session recording only — all work must be done
+</action_gating>
 ---
-## CRITICAL: Project Scope Boundary
+## Project Scope Boundary
-This command executes work on the **host project's source code** — NOT on PAN Wizard's own infrastructure.
+This command executes work on the **host project's source code** — not on PAN Wizard's own infrastructure.
-**NEVER read, modify, or "fix" files in these PAN directories:**
+**Do not read, modify, or fix files in these PAN directories:**
 - `.claude/`, `.github/copilot-instructions.md`, `.opencode/`, `.gemini/`, `.codex/` — PAN runtime directories
 - Any `pan-wizard-core/`, `pan-tools`, agent `.md`, or command `.md` files within PAN runtime directories
@@ -60,9 +92,22 @@ This command executes work on the **host project's source code** — NOT on PAN
 ---
-## MANDATORY: Execute ALL Stages Sequentially
+## Execute All Stages Sequentially
+When `/pan:focus-exec` is invoked, run all 6 stages in order. Do not skip stages or stop between them unless tests regress.
+<stage_dependencies>
+Stage 1 → Stage 2: Baseline MUST exist before batch loads (regression detection requires it)
+Stage 2 → Stage 3: Batch MUST be validated before execution begins (prevents working on stale/empty batches)
+Stage 3 → Stage 4: All items MUST be processed before verification (partial verification produces false confidence)
+Stage 4 → Stage 5: Tests MUST pass before doc sync (don't document broken code)
+Stage 5 → Stage 6: Docs MUST be updated before commit (commit captures the complete state)
-When `/pan:focus-exec` is invoked, run ALL 6 stages in order. Do NOT skip stages. Do NOT stop between stages unless a critical failure occurs (tests regress).
+HARD STOP conditions (do not proceed to next stage):
+- Stage 1: Test suite fails → fix tests before proceeding
+- Stage 2: No batch file found → tell user to run /pan:focus-plan
+- Stage 4: Test count below baseline → revert last changes, re-verify
+</stage_dependencies>
 **Flags:**
 - `--budget N` — Override capacity budget in points (default: 50, min: 5, max: 100)
@@ -86,34 +131,54 @@ When `/pan:focus-exec` is invoked, run ALL 6 stages in order. Do NOT skip stages
 ---
-## AI Behavioral Rules (ALL 9 MANDATORY)
+## AI Behavioral Rules
-### Rule 1: Read Before You Write (MANDATORY)
-Before changing ANY file, read it first. Understand context, callers, and invariants.
+### Rule 1: Read Before You Write
+Before changing any file, read it first. Understand context, callers, and invariants.
-### Rule 2: Understand the Root Cause (MANDATORY)
-Do NOT apply surface-level patches. Trace the code path, identify the actual defect.
+**Violation example:**
+```
+BAD:  Rename parameter `opts` → `options` in utils.cjs without reading callers
+      → 3 callers in api.cjs, workers.cjs break silently
+GOOD: Grep for "utils\." → read all 3 callers → confirm param name is safe to change → edit
+```
-### Rule 3: One Change, One Test (MANDATORY)
+### Rule 2: Understand the Root Cause
+Do not apply surface-level patches. Trace the code path, identify the actual defect.
+**Violation example:**
+```
+BAD:  Test fails with "Cannot read property 'name' of undefined"
+      → Add `if (!obj) return null` at the crash site
+      → Root cause: caller passes wrong argument order — still broken
+GOOD: Trace the call chain → find caller passes (id, name) but function expects (name, id) → fix caller
+```
+### Rule 3: One Change, One Test
 Every code change must be tested before moving to the next item.
 Test cadence by tier:
 - **MICRO (XS/S):** Run specific test after implementing. Batch up to 3 independent items before smoke.
-- **STANDARD (M):** Full test suite after EACH item.
-- **FULL (L/XL):** Build hooks + full test suite after EACH item.
-### Rule 4: Don't Invent — Follow the Plan (MANDATORY)
-Implement exactly what the batch says. No scope creep.
-### Rule 5: Cross-Platform Awareness (MANDATORY)
+- **STANDARD (M):** Full test suite after each item.
+- **FULL (L/XL):** Build hooks + full test suite after each item.
+### Rule 4: Don't Invent — Follow the Plan
+Implement exactly what the batch says. Do not:
+- Add features not in the batch item
+- Refactor surrounding code that isn't broken
+- Add comments or docstrings to unchanged files
+- Create abstractions for one-time operations
+- Add error handling for scenarios that cannot happen
+### Rule 5: Cross-Platform Awareness
 - Use platform-agnostic path APIs (no hardcoded separators)
 - Follow the project's module format conventions (discover from existing code)
 - Use file-based input for shell-sensitive content when needed
-### Rule 6: Revert Fast, Don't Dig Deep (MANDATORY)
+### Rule 6: Revert Fast, Don't Dig Deep
 If a fix doesn't work within 5 minutes, revert and move on. Failed items carry forward.
-### Rule 7: Verify Understanding Before Committing (MANDATORY)
+### Rule 7: Verify Understanding Before Coding
 For M/L/XL items, state your understanding before writing code:
 ```
 Item P2-3 — Add tests for billing module
@@ -123,11 +188,19 @@ Files: billing.ts, tests/billing.test.ts
 Confidence: HIGH
 ```
-### Rule 8: Preserve Existing Test Expectations (MANDATORY)
+### Rule 8: Preserve Existing Test Expectations
 Never change an existing test's expected output to match broken code.
-### Rule 9: Commit Messages Must Be Accurate (MANDATORY)
-List ONLY items that are actually VERIFIED (passed tests). Include actual test counts.
+### Rule 9: Commit Messages Must Be Accurate
+List only items that are verified (passed tests). Include actual test counts.
+### Rule 10: Vary Approach for Similar Items
+When a batch contains 3+ items of the same type (e.g., "add null check to X", "add null check to Y"), deliberately vary your approach to avoid tunnel vision:
+- Item 1: Fix as planned
+- Item 2: Before fixing, re-read the module's error handling pattern — does the same fix apply or does this module handle errors differently?
+- Item 3+: Check if the first fixes introduced a pattern that should be extracted (shared helper) or if each case is genuinely independent
+This catches emergent interactions: 5 "add try-catch" fixes might reveal the module needs a centralized error boundary, not 5 scattered try-catches.
 ---
@@ -185,9 +258,10 @@ Display the execution batch to user, then continue automatically.
 ```
 1. STATE UNDERSTANDING (Rule 7)
 2. READ target files + test files
-3. IMPLEMENT across necessary files
-4. TEST — full test suite
-5. CONFIRM — pass -> DONE | regresses -> REVERT -> FAILED
+3. STATE INTENT — "I will modify [files], adding [what], to achieve [goal]"
+4. IMPLEMENT across necessary files
+5. TEST — full test suite
+6. CONFIRM — pass -> DONE | regresses -> REVERT -> FAILED
 ```
 #### FULL Items (L/XL)
@@ -195,20 +269,55 @@ Display the execution batch to user, then continue automatically.
 1. STATE UNDERSTANDING (detailed)
 2. READ WIDELY — target files, callers, tests, related code
 3. DESIGN — outline approach before coding
-4. IMPLEMENT in logical chunks
-5. BUILD — build hooks if hooks changed
-6. TEST — full test suite
-7. CONFIRM — all pass -> DONE | fail -> investigate (15 min max) -> REVERT -> FAILED
+4. STATE INTENT — "I will modify [files]. Risk: [what could break]"
+5. IMPLEMENT in logical chunks
+6. BUILD — build hooks if hooks changed
+7. TEST — full test suite
+8. CONFIRM — all pass -> DONE | fail -> investigate (15 min max) -> REVERT -> FAILED
 ```
 ### 3.2 Failure Handling
-- Build breaks: fix typo or revert (5 min limit)
-- Test regression: identify cause, one fix attempt, else revert
-- **Never let a failed item block other items**
-### 3.3 Progress Tracking
+Classify every error before acting. The classification determines the recovery protocol.
+**RECOVERABLE (retry with analysis, max 3 attempts):**
+- Test failure after code change — read the error output, fix the root cause, re-test
+- File not found — search for moved/renamed paths via Grep/Glob
+- Build failure from syntax error — fix the typo, rebuild
+- Merge conflict in a non-critical file — attempt auto-resolution
+**UNRECOVERABLE (halt the item, mark FAILED, move to next):**
+- Same test failure persists after 3 fix attempts — revert all changes for this item
+- Permission or auth error on a critical path — cannot proceed without user action
+- State corruption (malformed JSON in planning files) — stop, report to user
+- Persistent build failure unrelated to current item — stop execution, report
+- Test regression in unrelated code — revert, flag for investigation
+**Never let a failed item block other items.** Mark it FAILED with the error classification and move on.
+### 3.3 Failure Pattern Detection
+When marking an item FAILED, check if its error matches a previous failure in this batch:
+- Same error type or root cause category
+- Same file or module involved
+If a pattern repeats (2+ items fail the same way), log it in the session record:
+```
+FAILURE PATTERN: {description} — Items {ID1}, {ID2} — Root cause: {cause}
+Suggested avoidance: {what to check before similar items}
+```
+Before executing remaining items, check if they match the pattern. If so, skip with reason "matches known failure pattern" rather than burning budget on predictable failures.
+### 3.4 Progress Tracking
 Update progress tracker after each item with status and budget tracking.
+**Attention anchor — emit after each item completes:**
+```
+Item {N}/{total} {DONE|FAILED} | Budget: {used}/{budget} pts | Tests: {baseline} → {current}
+Remaining: {count} items [{IDs with sizes}]
+Next: {next item ID} — {title} ({tier})
+```
+This prevents lost-in-the-middle drift in large batches where the agent forgets budget limits or remaining items.
 ---
 ## Stage 4: Verification
@@ -254,17 +363,30 @@ Edit the active scan file:
 ## Stage 6: Session End
-### 6.1 Commit Changes
+### 6.1 Pre-Commit Verification Checklist
+Before committing, run through ALL checks. Do not commit until every check passes.
+1. Every modified file was read before editing (no blind writes)
+2. `git diff --stat` contains only files related to batch items (no stray changes)
+3. Full test suite passes — count matches or exceeds baseline from Stage 1
+4. No `TODO`, `FIXME`, or `HACK` introduced without a matching batch item tracking it
+5. Commit message lists only items that are VERIFIED (tests ran, tests passed)
+6. No secrets, credentials, or `.env` files staged
+If any check fails: fix the issue and re-run all checks. Only proceed to commit when all 6 pass.
+### 6.2 Commit Changes
 Unless `--no-commit`:
 1. Stage modified files (specific paths, not `git add -A`)
 2. Create commit with accurate message listing verified items
 3. Verify commit succeeded
-### 6.2 Record Session
+### 6.3 Record Session
 - Record session summary (items completed, tests before/after, budget used)
 - Append error patterns if any failures occurred
-### 6.3 Final Report
+### 6.4 Final Report
 ```markdown
 ## /pan:focus-exec Complete
@@ -293,15 +415,15 @@ Run `/pan:focus-scan` to regenerate the scan.
 ## NEVER DO
-- Skip reading files before editing them (Rule 1)
-- Apply symptom patches instead of root cause fixes (Rule 2)
-- Batch implement without testing between items (Rule 3)
-- Expand scope beyond the batch item (Rule 4)
-- Ignore cross-platform path issues (Rule 5)
-- Spend more than 5 minutes debugging a single failure (Rule 6)
-- Start coding without stating understanding for M+ items (Rule 7)
-- Change test expectations to match broken code (Rule 8)
-- Claim items are fixed without running tests (Rule 9)
+- Skip reading files before editing them — blind edits break callers, miss invariants, and create regressions (Rule 1)
+- Apply symptom patches instead of root cause fixes — surface patches recur and erode trust in the codebase (Rule 2)
+- Batch implement without testing between items — a silent failure in item 2 corrupts items 3-5 before you detect it (Rule 3)
+- Expand scope beyond the batch item — unplanned changes bypass the budget system and risk compounding failures (Rule 4)
+- Ignore cross-platform path issues — hardcoded separators break on Windows or vice versa (Rule 5)
+- Spend more than 5 minutes debugging a single failure — diminishing returns; revert preserves budget for remaining items (Rule 6)
+- Start coding without stating understanding for M+ items — misunderstanding the problem wastes the entire implementation (Rule 7)
+- Change test expectations to match broken code — this hides bugs instead of fixing them (Rule 8)
+- Claim items are fixed without running tests — unverified claims erode the entire verification pipeline (Rule 9)
 ## ALWAYS DO

package/commands/pan/focus-plan.md CHANGED Viewed

@@ -1,19 +1,21 @@
 ---
 name: focus-plan
 group: Focus
-description: Create capacity-budgeted work batch with 4 execution modes
+description: Create capacity-budgeted work batch with spec coverage verification and 4 execution modes
 allowed-tools:
   - Read
+  - Write
+  - Edit
   - Bash
   - Grep
   - Glob
 ---
-# /pan:focus-plan — Capacity-Budgeted Work Batch Planner
+# /pan:focus-plan — Capacity-Budgeted Work Batch Planner with Spec Coverage Verification
-Create a capacity-budgeted work batch from focus-scan results. $ARGUMENTS
+Create a capacity-budgeted work batch from focus-scan results **with mandatory verification that planned work covers all relevant spec and ADR requirements.** $ARGUMENTS
-**Goal:** Select a right-sized batch of work items that fits within the session's point budget, ordered for maximum impact with minimum risk.
+**Goal:** Select a right-sized batch of work items that (a) fits within the session's point budget, (b) is ordered for maximum impact with minimum risk, and (c) demonstrably covers the requirements from any associated specs, ADRs, and success criteria — flagging coverage gaps BEFORE execution begins.
 ---
@@ -42,10 +44,67 @@ If no recent scan exists, run `/pan:focus-scan` automatically before proceeding.
   - `full` — Full-spectrum: enhanced budget, all priorities equally weighted (60 pts)
 - `--priority P0-P6` — Only pick items from these priority tiers
 - `--lean` — Apply RS filtering: exclude items with RS < 1.5
+- `--no-spec-check` — Skip spec coverage verification (NOT recommended — use only for pure bugfix batches)
 ---
-## Capacity Budget System
+## Phase 1: Spec & ADR Discovery (MANDATORY)
+> *Before planning work, understand what has been designed and promised.*
+### 1.1 Scan for Specifications
+Search the project for feature specifications and design documents:
+- `docs/specs/*.md` or `docs/specs/**/*.md`
+- `.planning/specs/` or `.planning/designs/`
+- Any `*_featureai.md`, `*_spec.md`, `*_design.md` files
+- README sections describing planned features
+For each spec found, extract:
+| Spec File | Feature Name | Status | Requirements Count | Success Criteria Count |
+|-----------|-------------|--------|-------------------|----------------------|
+| [path] | [name] | Proposed/In Progress/Complete | [N] | [N] |
+### 1.2 Scan for ADRs
+Search for Architecture Decision Records:
+- `docs/decisions/ADR-*.md`
+- `.planning/decisions/`
+For each ADR, extract:
+| ADR | Decision | Status | Success Criteria | Implementation Tasks |
+|-----|----------|--------|-----------------|---------------------|
+| [ADR-NNNN] | [summary] | Proposed/Accepted/Implemented | [count or "none defined"] | [count or "none defined"] |
+### 1.3 Extract Requirement Inventory
+From every spec and ADR found, build a **master requirements list**:
+| Req ID | Source | Requirement | Type | Implemented? |
+|--------|--------|-------------|------|-------------|
+| SC-1 | ADR-0015 | JWT auth with 4-role RBAC | Feature | Yes/No/Partial |
+| SC-2 | spec/extraction.md | Image extraction for JPG/PNG | Feature | Yes/No/Partial |
+| T-3 | ADR-0018 §Task 6 | Unmatched description table | Task | Yes/No/Partial |
+| BRK-1 | ADR-0018 §Breaking | Hierarchy roll-up for backward compat | Migration | Yes/No/Partial |
+**Verification method for "Implemented?":**
+- Search the codebase for files, classes, functions, routes, or tests matching each requirement
+- Check if tests exist that verify the requirement
+- Mark as `Partial` if code exists but tests don't, or if the feature is stubbed
+### 1.4 Identify Unimplemented Requirements
+Filter the master list to requirements where `Implemented? = No` or `Partial`:
+| Req ID | Source | Requirement | Gap Type | Estimated Effort |
+|--------|--------|-------------|----------|-----------------|
+| SC-2 | ADR-0018 | Keyword count >= 500 | Not started | M |
+| T-6 | ADR-0018 | Unmatched description table | Not started | M |
+| BRK-1 | ADR-0018 | Hierarchy roll-up | Partial (code, no tests) | S |
+This becomes the **spec gap backlog** — items that specs/ADRs promised but the codebase doesn't deliver yet.
+---
+## Phase 2: Capacity Budget System
 | Size | Points | Per Session | Meaning |
 |------|--------|-------------|---------|
@@ -57,45 +116,178 @@ If no recent scan exists, run `/pan:focus-scan` automatically before proceeding.
 ---
-## Execution Modes
+## Phase 3: Execution Modes & Batch Selection
 ### `bugfix` — Stability-First
 - **Budget:** 40 pts
 - **Algorithm:** P0 mandatory -> P1 -> P2-P4 smallest-first
 - **Feature allocation:** None
+- **Spec coverage:** Verify P0/P1 items close spec gaps where applicable
 ### `balanced` — Mix of Fixes + Features (DEFAULT)
 - **Budget:** 50 pts
 - **Stability pass (60%):** 30 pts for P0-P2
 - **Feature pass (40%):** 20 pts for P3-P6
+- **Spec coverage:** Cross-reference feature items against spec gap backlog — prefer items that close gaps
 ### `features` — Feature-Focused Sprint
 - **Budget:** 50 pts
 - **Mandatory pass:** All P0 items
 - **Feature pass (80%):** 40 pts for P3-P5
 - **Stability pass (20%):** 10 pts for P1-P2 quick wins
+- **Spec coverage:** Feature items MUST map to spec requirements — reject unspecified feature work
 ### `full` — Full-Spectrum Marathon
 - **Budget:** 60 pts
 - **All priorities weighted equally, largest-impact-first**
+- **Spec coverage:** Full traceability — every item maps to a spec/ADR requirement or is flagged as unspecified
+### Batch Selection Algorithm
+1. Build candidate list from focus-scan results
+2. **For each candidate, attempt to map it to a spec/ADR requirement** (by keyword match, file overlap, or feature area)
+3. Score candidates: `impact_score = base_priority_score + spec_coverage_bonus`
+   - Items that close spec gaps get +2 priority bonus
+   - Items that close success criteria get +3 priority bonus
+   - Items with no spec mapping get +0 (no penalty, but no bonus)
+4. Apply mode-specific budget allocation
+5. Select items greedily by score until budget exhausted
+---
+## Phase 4: Spec Coverage Analysis (MANDATORY unless `--no-spec-check`)
+> *The most important output of focus-plan: does the batch actually deliver against what was designed?*
+### 4.1 Coverage Matrix
+For each spec/ADR requirement, show whether the batch covers it:
+| Req ID | Source | Requirement | Batch Item | Coverage |
+|--------|--------|-------------|-----------|----------|
+| SC-1 | ADR-0018 | Category count >= 65 | #3: Expand categories | COVERED |
+| SC-2 | ADR-0018 | Keyword count >= 500 | #4: Expand keywords | COVERED |
+| SC-3 | ADR-0018 | Unmatched queue API | — | **GAP** |
+| SC-4 | ADR-0018 | NCA affordability output | — | **GAP (deferred to v1)** |
+| SC-5 | ADR-0018 | No regression | #1: Run existing tests | COVERED |
+### 4.2 Coverage Score
+```
+Spec Coverage: X / Y requirements covered (Z%)
+├── Fully covered:    N items
+├── Partially covered: N items (code but no tests, or tests but incomplete)
+├── Gaps:             N items (not in batch)
+└── Deferred:         N items (explicitly deferred to future version)
+```
+### 4.3 Gap Analysis & Justification
+For every **GAP** in the coverage matrix, provide:
+| Gap | Requirement | Why Not In This Batch | When Will It Be Addressed |
+|-----|------------|----------------------|--------------------------|
+| SC-3 | Unmatched queue API | Exceeds budget (M=4pts, only 2pts remaining) | Next batch (features mode) |
+| SC-4 | NCA affordability | Depends on SC-1 + SC-2 (must complete first) | After category expansion |
+**CRITICAL:** If the coverage score is < 50% for a spec that has `Status: In Progress`, flag this prominently:
+```
+⚠️ WARNING: Batch covers only X% of [spec name] requirements.
+   Y requirements remain unaddressed. Consider:
+   - Increasing budget (--budget N)
+   - Switching to features mode (--mode features)
+   - Breaking spec into smaller milestones
+```
+### 4.4 Dependency Verification
+Check that batch items respect dependency ordering from specs:
+| Batch Item | Depends On | Dependency In Batch? | Order Correct? |
+|-----------|-----------|---------------------|----------------|
+| #4: Keywords | #3: Categories | Yes | Yes (#3 before #4) |
+| #6: Suggestions | #5: Unmatched API | No — #5 not in batch | **BLOCKED** |
+**If any item is BLOCKED:** Either add the dependency to the batch (if budget allows) or remove the blocked item and flag it.
+### 4.5 Success Criteria Verification Plan
+For each success criterion in the batch, specify HOW it will be verified after execution:
+| SC ID | Criterion | Verification Command | Expected Result |
+|-------|-----------|---------------------|-----------------|
+| SC-1 | Category count >= 65 | `SELECT COUNT(*) FROM stx_category` | >= 65 |
+| SC-2 | Keywords >= 500 | `SELECT COUNT(*) FROM stx_keyword` | >= 500 |
+| SC-5 | No regression | `dotnet test` | All pass, count >= N |
+This becomes the post-execution checklist for `/pan:focus-exec`.
 ---
-## Output
+## Phase 5: Output
 Produce a batch file at `.planning/focus/batch-<YYYY-MM-DD>.json` via `pan-tools focus plan`:
 ```markdown
 ## Focus Batch — <date>
 **Mode:** balanced | **Budget:** 50 pts | **Allocated:** N pts
+**Specs referenced:** N specs, M ADRs
+**Spec coverage:** X/Y requirements (Z%)
+### Batch Items
+| # | ID | Title | Priority | Size | Pts | Tier | Track | Spec Req |
+|---|----|-------|----------|------|-----|------|-------|----------|
+| 1 | P0-1 | Fix crash in state cmd | P0 | S | 2 | MICRO | Stability | ADR-0005 SC-3 |
+| 2 | P2-3 | Add tests for milestone | P2 | M | 4 | STANDARD | Stability | — |
+| 3 | P3-1 | Expand category taxonomy | P3 | M | 4 | STANDARD | Feature | ADR-0018 SC-1 |
-| # | ID | Title | Priority | Size | Pts | Tier | Track |
-|---|----|-------|----------|------|-----|------|-------|
-| 1 | P0-1 | Fix crash in state cmd | P0 | S | 2 | MICRO | Stability |
-| 2 | P2-3 | Add tests for milestone | P2 | M | 4 | STANDARD | Stability |
-| 3 | P3-1 | Add --json flag to phase | P3 | M | 4 | STANDARD | Feature |
+### Spec Coverage Summary
+| Source | Total Reqs | Covered | Gaps | Deferred |
+|--------|-----------|---------|------|----------|
+| ADR-0018 | 7 | 3 | 2 | 2 |
+| spec/extraction.md | 5 | 5 | 0 | 0 |
+| **Total** | **12** | **8 (67%)** | **2** | **2** |
+### Uncovered Requirements (Gaps)
+| Req | Source | Reason | Next Batch? |
+|-----|--------|--------|-------------|
+| Unmatched queue API | ADR-0018 SC-3 | Budget exceeded | Yes — features mode |
+| NCA affordability | ADR-0018 SC-4 | Blocked by SC-1, SC-2 | After this batch |
+### Dependency Order
+```
+#1 (P0 crash fix) → independent
+#3 (categories) → #4 (keywords) → #5 (match types)
+#2 (tests) → independent
+```
+### Post-Execution Verification Checklist
+- [ ] SC-1: Category count >= 65 → `SELECT COUNT(*) FROM stx_category`
+- [ ] SC-2: Keywords >= 500 → `SELECT COUNT(*) FROM stx_keyword`
+- [ ] SC-5: All existing tests pass → `dotnet test`
 Execution Order: MICRO first, then STANDARD, then FULL
 ```
 Ready for `/pan:focus-exec`.
+---
+## NEVER DO
+- Plan a batch without checking specs and ADRs for coverage gaps
+- Include a feature item that contradicts or conflicts with an accepted ADR
+- Ignore dependency ordering defined in specs (Task A before Task B)
+- Claim 100% spec coverage without actually verifying each requirement against the codebase
+- Include blocked items (items whose dependencies are not in the batch and not yet implemented)
+- Silently drop spec requirements — every gap must be justified and scheduled
+- Plan implementation tasks that aren't traceable to a spec, ADR, scan finding, or user request
+- Exceed the capacity budget (hard limit — not "approximately")
+## ALWAYS DO
+- Discover ALL specs and ADRs before selecting batch items
+- Cross-reference every batch item against spec requirements where applicable
+- Flag coverage gaps prominently with justification and scheduling
+- Verify dependency ordering matches spec-defined task dependencies
+- Include a post-execution verification checklist with concrete commands
+- Prefer items that close spec gaps over items with no spec mapping (when priority is equal)
+- State the coverage score as a percentage in the batch header
+- Report unimplemented success criteria that aren't addressed by this batch

package/commands/pan/focus-scan.md CHANGED Viewed

@@ -17,11 +17,11 @@ Survey the project for prioritized work items with evidence-based scoring. $ARGU
 ---
-## CRITICAL: Project Scope Boundary
+## Project Scope Boundary
-This command scans the **host project's source code** for work items — NOT PAN Wizard's own infrastructure.
+This command scans the **host project's source code** for work items — not PAN Wizard's own infrastructure.
-**ALWAYS EXCLUDE these directories from scanning:**
+**Exclude these directories from scanning:**
 - `.claude/`, `.github/copilot-instructions.md`, `.opencode/`, `.gemini/`, `.codex/` — PAN runtime directories
 - `.planning/` — PAN planning state (read for context, but never report PAN planning files as "issues")
 - Any `pan-wizard-core/`, `pan-tools`, agent `.md`, or command `.md` files within PAN runtime directories
@@ -32,9 +32,21 @@ If a scan finding points to a file inside `.claude/`, `.github/`, `.opencode/`,
 ---
-## MANDATORY: Execute ALL Phases Automatically
+## Tool Selection Priority
-When `/pan:focus-scan` is invoked, execute ALL phases without stopping. Do NOT ask questions between phases. Do NOT skip phases. The output is a prioritized work list with Reality Score filtering.
+Use the simplest sufficient tool for each scanning operation:
+1. **Grep** — for finding patterns (TODO, FIXME, error-prone code) across the codebase
+2. **Glob** — for discovering files by name pattern (test files, config files, modules)
+3. **Read** — for examining specific files identified by Grep/Glob
+4. **Bash** — only for commands that dedicated tools cannot do (git log, test runners)
+Do not read entire files when Grep can find the relevant lines. Do not use Bash for searches that Grep handles.
+---
+## Execute All Phases Automatically
+When `/pan:focus-scan` is invoked, execute all phases without stopping. Do not ask questions between phases or skip phases. The output is a prioritized work list with Reality Score filtering.
 **Flags:**
 - `--focus <area>` — Weight items toward a specific area (e.g., `--focus commands`, `--focus hooks`, `--focus tests`)