npm - pan-wizard - Versions diffs - 2.9.0 → 3.4.1 - Mend

pan-wizard 2.9.0 → 3.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/README.md +8 -8
package/agents/pan-conductor.md +189 -0
package/agents/pan-counterfactual.md +112 -0
package/agents/pan-debugger.md +15 -1
package/agents/pan-document_code.md +21 -0
package/agents/pan-executor.md +16 -0
package/agents/pan-hardener.md +113 -0
package/agents/pan-integration-checker.md +2 -0
package/agents/pan-knowledge.md +81 -0
package/agents/pan-meta-reviewer.md +91 -0
package/agents/pan-plan-checker.md +2 -0
package/agents/pan-previewer.md +98 -0
package/agents/pan-project-researcher.md +4 -4
package/agents/pan-reviewer.md +2 -0
package/agents/pan-verifier.md +2 -0
package/bin/install-lib.cjs +197 -0
package/bin/install.js +1999 -1959
package/commands/pan/assumptions.md +38 -3
package/commands/pan/audit-deployment.md +6 -0
package/commands/pan/cost.md +132 -0
package/commands/pan/debug.md +71 -2
package/commands/pan/exec-phase.md +105 -0
package/commands/pan/focus-auto.md +199 -18
package/commands/pan/focus-design.md +67 -2
package/commands/pan/focus-exec.md +178 -47
package/commands/pan/focus-scan.md +17 -5
package/commands/pan/knowledge.md +129 -0
package/commands/pan/map-codebase.md +47 -6
package/commands/pan/mcp-bridge.md +145 -0
package/commands/pan/milestone-audit.md +23 -0
package/commands/pan/new-project.md +64 -0
package/commands/pan/pause.md +42 -1
package/commands/pan/plan-phase.md +95 -0
package/commands/pan/preview.md +114 -0
package/commands/pan/profile.md +37 -0
package/commands/pan/quick.md +15 -0
package/commands/pan/resume.md +62 -2
package/commands/pan/review-deep.md +128 -0
package/commands/pan/verify-phase.md +53 -0
package/commands/pan/what-if.md +146 -0
package/hooks/dist/pan-cost-logger.js +102 -0
package/hooks/dist/pan-statusline.js +154 -108
package/package.json +1 -1
package/pan-wizard-core/bin/lib/bridge.cjs +269 -0
package/pan-wizard-core/bin/lib/bus.cjs +251 -0
package/pan-wizard-core/bin/lib/codebase.cjs +118 -0
package/pan-wizard-core/bin/lib/constants.cjs +42 -1
package/pan-wizard-core/bin/lib/context-budget.cjs +27 -0
package/pan-wizard-core/bin/lib/core.cjs +91 -6
package/pan-wizard-core/bin/lib/cost.cjs +359 -0
package/pan-wizard-core/bin/lib/focus.cjs +105 -2
package/pan-wizard-core/bin/lib/init.cjs +5 -5
package/pan-wizard-core/bin/lib/knowledge.cjs +331 -0
package/pan-wizard-core/bin/lib/memory.cjs +252 -0
package/pan-wizard-core/bin/lib/phase.cjs +40 -13
package/pan-wizard-core/bin/lib/preview.cjs +480 -0
package/pan-wizard-core/bin/lib/review-deep.cjs +280 -0
package/pan-wizard-core/bin/lib/roadmap.cjs +4 -4
package/pan-wizard-core/bin/lib/state.cjs +2 -2
package/pan-wizard-core/bin/lib/verify.cjs +34 -1
package/pan-wizard-core/bin/lib/whatif.cjs +289 -0
package/pan-wizard-core/bin/pan-tools.cjs +239 -4
package/pan-wizard-core/templates/playbook.md +53 -0
package/pan-wizard-core/templates/preview-report.md +93 -0
package/pan-wizard-core/templates/roadmap.md +24 -24
package/pan-wizard-core/templates/state.md +12 -9
package/pan-wizard-core/workflows/plan-phase.md +1 -1
package/scripts/build-hooks.js +2 -1
package/scripts/generate-skills-docs.py +560 -0

package/commands/pan/focus-auto.md CHANGED Viewed

@@ -18,11 +18,11 @@ Run purpose-driven improvement campaigns with a single command. The auto-runner
 **ADR:** ADR-0015 | **Heritage:** execplan budget + PanMonty categories + focus-exec pipeline
-## CRITICAL: Project Scope Boundary
+## Project Scope Boundary
-This command runs improvement campaigns on the **host project's source code** — NOT on PAN Wizard's own infrastructure.
+This command runs improvement campaigns on the **host project's source code** — not on PAN Wizard's own infrastructure.
-**ALWAYS EXCLUDE these directories from scanning and execution:**
+**Exclude these directories from scanning and execution:**
 - `.claude/`, `.github/copilot-instructions.md`, `.opencode/`, `.gemini/`, `.codex/` — PAN runtime directories
 - Any `pan-wizard-core/`, `pan-tools`, agent `.md`, or command `.md` files within PAN runtime directories
@@ -30,6 +30,18 @@ This command runs improvement campaigns on the **host project's source code**
 ---
+<completion_contract>
+A campaign is complete when ANY stop condition is met:
+1. Max cycles reached (--max-cycles, default 10)
+2. Total budget exhausted (--total-budget, default 200)
+3. Scan returns zero items for the selected category
+4. Context window drops below 25% (CRITICAL threshold)
+5. User sends /pan:focus-auto --stop
+6. Category-specific completion (e.g., prompts_remaining === 0)
+Each cycle is complete when: scan → plan → exec → commit succeeds, OR a safety harness triggers and the cycle is cleanly aborted with state preserved.
+</completion_contract>
 ## FIRST ACTION — Category Selection (if no --category argument)
 If `$ARGUMENTS` does NOT contain `--category`, you MUST ask the user before doing anything else.
@@ -45,8 +57,9 @@ Which category should this auto campaign focus on?
 4. **features** — Roadmap items, new capabilities (P3-P5)
 5. **docs** — Stale documentation, missing command descriptions (P5-P6)
 6. **optimize** — Performance bottlenecks, redundant computation, robustness hardening (P1-P4)
+7. **prompts** — Execute micro-prompt documents sequentially, or generate them from specs (P0-P6)
-Reply with a number (1-6) or category name.
+Reply with a number (1-7) or category name.
 ```
 **After the user replies, map their response to a category name:**
@@ -56,8 +69,9 @@ Reply with a number (1-6) or category name.
 - "4" or "features" → SELECTED_CATEGORY = features
 - "5" or "docs" → SELECTED_CATEGORY = docs
 - "6" or "optimize" → SELECTED_CATEGORY = optimize
+- "7" or "prompts" → SELECTED_CATEGORY = prompts
-**Do NOT proceed past this point without a category. Do NOT guess. Do NOT pick a default. STOP and wait for the user's reply.**
+Wait for the user's reply before proceeding. Do not guess or pick a default category.
 ## AUTONOMY RULES (apply AFTER category is selected)
@@ -75,7 +89,7 @@ Reply with a number (1-6) or category name.
 | Flag | Default | Description |
 |------|---------|-------------|
-| `--category` | null (all) | cleanup, tests, stability, features, docs |
+| `--category` | null (all) | cleanup, tests, stability, features, docs, optimize, prompts |
 | `--mode` | category-dependent | bugfix, balanced, features, full |
 | `--budget` | category-dependent | Points per cycle (5-100) |
 | `--max-cycles` | 10 | Maximum iterations (1-50) |
@@ -95,6 +109,7 @@ Reply with a number (1-6) or category name.
 | features | P3-P5 | features | 50 |
 | docs | P5-P6 | balanced | 30 |
 | optimize | P1-P4 | balanced | 50 |
+| prompts | P0-P6 | balanced | 100 |
 ## Pipeline
@@ -122,6 +137,20 @@ Reply with a number (1-6) or category name.
 4. Run `git status` to verify clean working tree (warn if dirty, don't block)
 5. Create safety tag: `git tag -f focus-auto-baseline`
+<phase_dependencies>
+Phase 0 → Phase 1: Init MUST succeed before baseline (state tracking requires valid run)
+Phase 1 → Phase 2: Baseline MUST be captured before main loop (regression circuit breaker needs it)
+Phase 2 (each cycle): Scan → Plan → Exec → Commit is strictly sequential within a cycle
+  - Scan MUST complete before plan (plan needs scan items)
+  - Plan MUST complete before exec (exec needs batch file)
+  - Exec MUST complete and tests pass before commit (never commit broken code)
+HARD STOP conditions:
+- Phase 1 fails (tests broken): Do not enter main loop — report and exit
+- Any cycle: test count drops below baseline after revert → stop campaign, preserve state
+- Context drops below 25%: stop campaign cleanly (safety harness 3)
+</phase_dependencies>
 ### Phase 2: Main Loop
 **For each cycle (1 to max_cycles), execute Steps 2.1 through 2.5 without stopping:**
@@ -144,6 +173,9 @@ Perform a deep codebase scan to find actionable work items with evidence.
   - **features:** roadmap items not yet implemented, README promises without backing code
   - **docs:** stale documentation, missing command descriptions
   - **optimize:** N+1 operations (file I/O / network calls inside loops), redundant re-computation (`JSON.parse`/`stringify` of same data), synchronous blocking in async modules (`readFileSync`/`execSync` alongside async exports), algorithmic complexity (nested `.find()`/`.filter()` in loops creating O(n²)+), unnecessary allocations in hot paths (spread in loops, string concat vs `join()`), regex construction inside loops (should be hoisted), unbounded collection growth (`.push()` without size limits), swallowed errors (`catch {}` / `catch { /* */ }`), suboptimal data structures (array `.includes()` where Set is better), dead assignments, unguarded property access on nullable values (`.length`/`.split()`/`.match()[0]` without null check)
+  - **prompts:** Two operational modes — detect which applies:
+    - **Execute mode:** Find micro-prompt documents (`.md` files containing ordered prompt blocks, e.g., `## Prompt 1`, `## Prompt 2`, or numbered checklist items `- [ ] Prompt: ...`). Look in `.planning/`, project root, and `docs/` for files matching patterns: `*prompts*`, `*micro-prompt*`, `*prompt-plan*`, `*prompt-sequence*`. Each unchecked/incomplete prompt block is one work item.
+    - **Generate mode:** Find specification documents (files matching `*spec*`, `*prd*`, `*requirements*`, `*feature*` in `.planning/`, `docs/specs/`, project root) that do NOT already have a corresponding micro-prompt document. Each spec needing decomposition is one work item.
 **Optimize category: convergent re-scan.** On cycles 2+, cross-reference scan findings against previous cycle completions (`cycles[].items` in auto-run state). Only pick genuinely new items — skip IDs already completed or failed. If the count of new findings drops AND cycle efficiency drops below 30% of the prior cycle's, this signals convergence and the `diminishing_returns` stop condition fires.
 - Use the Agent tool with Explore subagent for thorough analysis if needed
@@ -226,6 +258,11 @@ Implement each item from the batch created in Step 2.2. Record `tests_before` by
 6. Run the project's test suite
 7. Pass = DONE | Fail = investigate (15 min max), then revert, mark FAILED
+**Error Recovery Classification:**
+- RECOVERABLE (retry up to 3 times): test failure after code change, build syntax error, file not found (search for moved path)
+- UNRECOVERABLE (mark FAILED, move to next item): same failure after 3 retries, permission errors, state corruption, unrelated test regression
+A failed item never blocks subsequent items.
 **After all items in the batch:**
 1. Run full test suite — ALL tests must pass
 2. Record `tests_after` from the summary line
@@ -244,12 +281,43 @@ Check the response for stop conditions:
 - `max_cycles`: Maximum iterations reached — go to Phase 3
 - `zero_completed`: No items completed in this cycle — go to Phase 3
 - `diminishing_returns`: Optimize only — cycle efficiency < 30% of previous cycle — go to Phase 3
+- `prompts_complete`: Prompts only — all prompts in document executed — go to Phase 3
 - `null`: Continue to next cycle
-#### Step 2.5: Inter-Cycle Check
+#### Step 2.5: Inter-Cycle Context Management
+Between cycles, manage context to prevent quality degradation over long campaigns:
+- **KEEP:** Current cycle goals, test baseline, error states, active file paths
+- **SUMMARIZE:** Previous cycle results to a one-line summary each
+- **DISCARD:** Raw tool output from previous cycles, superseded scan results
 Display one-line cycle summary: `Cycle N/M | X/Y pts | Z items done | Tests: A -> B`
+#### Step 2.5a: Reflection Gate (Opus 4.7 thinking-capable models only)
+Before committing to the next cycle, call the reflection helper:
+```
+echo '{"run": <run-state>, "cycle": <just-completed-cycle>, "batch": <proposed-next-batch>, "tier": "reasoning"}' \
+  | pan-tools focus reflection
+```
+The helper returns `{reflect: true, prompt: "..."}` when the current model tier supports extended thinking. If `reflect: true`, think through the prompt — which asks whether running another cycle is worthwhile given telemetry and remaining items — and respond with JSON: `{"continue": true|false, "rationale": "..."}`.
+- If `continue: false`: stop the campaign and treat as a user-reason stop (preserve state, skip to Phase 3).
+- If `continue: true`: proceed to the next cycle.
+If the helper returns `reflect: false` (tier doesn't support thinking, or `reflection_enabled: false` in run state, or no next batch): skip this step silently and continue to the next cycle.
+The reflection gate catches "zero progress" or "wrong category" drift earlier than the automatic stop rules.
+**Attention anchor — emit after every cycle summary:**
+```
+Remaining: {cycles_left} cycles | {budget_remaining}/{total_budget} pts | Safety: {active_harness_warnings}
+Next: Cycle {N+1} — Scan → Plan → Exec → Commit
+```
+This prevents lost-in-the-middle drift in 10+ cycle campaigns where the agent forgets budget limits or stop conditions.
 Then continue immediately to the next cycle (back to Step 2.1).
 ### Phase 3: Campaign End
@@ -294,20 +362,133 @@ Then continue immediately to the next cycle (back to Step 2.1).
 7. **Verify Understanding** — State understanding for M+ items before coding.
 8. **Preserve Tests** — Never change test expectations to match broken code.
 9. **Accurate Commits** — Only claim verified items in commit messages. Include actual test counts.
+10. **Vary Similar Fixes** — When 3+ items in a cycle share the same fix pattern (e.g., "add null check"), re-read each module's conventions before applying. The same pattern may need different implementations in different modules. Check after the 3rd fix whether a shared helper would be better than scattered copies.
+## Prompts Category — Execution Details
+The prompts category operates in two distinct modes. Detect which mode applies during the scan phase based on what the scan finds.
+### Execute Mode (micro-prompt document found)
+A micro-prompt document contains an ordered sequence of self-contained implementation prompts. Each prompt describes a single, testable change.
+**Document format recognized:**
+```markdown
+# Micro-Prompts: <Feature Name>
+Source: <spec file or description>
+Generated: <date>
+## Prompt 1: <title>
+- [ ] Complete
+<implementation instructions>
+### Expected outcome
+<what should work after this prompt>
+### Test
+<how to verify>
+---
+## Prompt 2: <title>
+- [ ] Complete
+...
+```
+Alternative format — checklist style:
+```markdown
+- [ ] Prompt 1: <title> — <instructions>
+- [ ] Prompt 2: <title> — <instructions>
+```
+**Execution strategy:**
+1. Read the micro-prompt document, identify all prompt blocks
+2. Find the first uncompleted prompt (unchecked `- [ ]`)
+3. Execute that prompt's instructions — implement the code changes described
+4. Run the project's test suite (or the prompt-specific test if given)
+5. If tests pass: mark the prompt as complete (`- [x]`), commit, move to next prompt
+6. If tests fail: one fix attempt, then revert and mark prompt as FAILED, move to next prompt
+7. Each prompt = one batch item. Budget: 1 prompt per cycle unless prompt is trivial (XS)
+8. Record `prompts_remaining` count in cycle update — when 0, `prompts_complete` stop fires
+**Key rules:**
+- Execute prompts in document order — NEVER skip ahead or reorder
+- Each prompt is atomic — commit after each successful prompt
+- A failed prompt does NOT block subsequent prompts (mark failed, continue)
+- The prompt document is the plan — do not re-plan or expand scope beyond what each prompt says
+### Generate Mode (spec found without corresponding prompt document)
+When a specification document is found that doesn't have a matching micro-prompt document, decompose it into ordered prompts.
+**Generation strategy:**
+1. Read the spec document thoroughly
+2. Identify all discrete implementation steps
+3. Order steps by dependency — foundation first, features that depend on earlier steps later
+4. For each step, write a prompt block containing:
+   - Clear title describing the change
+   - Implementation instructions (files to create/modify, logic to implement)
+   - Expected outcome (what should work after this prompt)
+   - Test instruction (how to verify the prompt succeeded)
+5. Write the micro-prompt document to `.planning/prompts/<spec-slug>-prompts.md`
+6. Each generated document = one batch item (typically M or L size)
+**Decomposition heuristics:**
+- One prompt per logical unit of work (one function, one API endpoint, one component)
+- Each prompt should be independently testable
+- Prompts should be 5-30 minutes of implementation work each
+- Aim for 5-20 prompts per spec (split large specs, combine trivial items)
+- Include a "Prompt 0: Project setup" if the spec requires new dependencies or scaffolding
+- Include a final "Prompt N: Integration test" that verifies the full feature end-to-end
+**After generation:** The document is written and committed. The next cycle will detect it in execute mode and begin executing prompts sequentially.
+<failure_pattern_capture>
+When the same failure pattern appears in 2+ items within a campaign, capture it for future runs.
+**Detection:** After marking an item FAILED, check if the error classification matches any previous failure in this campaign:
+- Same error type (e.g., "test regression in unrelated module")
+- Same file or module involved
+- Same root cause category (e.g., "missing null check pattern", "import path mismatch")
+**Capture (when pattern repeats):**
+Append to `.planning/focus/failure-patterns.md`:
+```markdown
+## Pattern: {short description}
+- **First seen:** Cycle {N}, Item {ID}
+- **Recurrence:** Cycle {M}, Item {ID2}
+- **Error type:** {classification}
+- **Root cause:** {what actually went wrong}
+- **Avoidance rule:** {what to check before attempting similar items}
+- **Files involved:** {paths}
+```
+**Use (on subsequent cycles):**
+Before executing an item, check if its target files or error category match a known failure pattern. If so:
+- Apply the avoidance rule BEFORE implementing
+- If the pattern suggests the item will fail (e.g., "all items touching module X regress"), skip with reason "matches known failure pattern — defer to manual investigation"
+This prevents the campaign from burning budget on items that will predictably fail.
+</failure_pattern_capture>
 ## NEVER DO
-- Invoke the Skill tool (no `/pan:focus-scan`, `/pan:focus-plan`, `/pan:focus-exec`)
-- Stop or pause between phases (unless a safety harness triggers)
-- Ask the user questions after category selection (Phases 1-3 are fully autonomous)
-- Skip the baseline test capture (Phase 1)
-- Continue after a test regression (circuit breaker is mandatory)
-- Expand scope beyond what the scan found
-- Run more cycles than --max-cycles
-- Spend more points than --total-budget
-- Skip recording cycle results via --update
-- Change test expectations to match broken code
-- Use `git add -A` or `git add .` — stage specific files only
+- Invoke the Skill tool — scan/plan/exec must run inline so state stays coherent across cycles
+- Stop or pause between phases — interruptions break the autonomous loop and lose cycle momentum
+- Ask the user questions after category selection — the whole point is autonomous execution; questions defeat that
+- Skip the baseline test capture — without a baseline, the regression circuit breaker has nothing to compare against
+- Continue after a test regression — a test count decrease means code was broken; continuing compounds the damage
+- Expand scope beyond what the scan found — scope creep in an autonomous loop compounds unpredictably across cycles
+- Run more cycles than --max-cycles — the limit exists to cap total cost and prevent runaway loops
+- Spend more points than --total-budget — the budget cap is the user's cost control mechanism
+- Skip recording cycle results via --update — unrecorded cycles break resume, status, and stop-condition checks
+- Change test expectations to match broken code — this hides bugs instead of fixing them
+- Use `git add -A` or `git add .` — bulk staging can accidentally commit secrets, build artifacts, or unrelated changes
 ## ALWAYS DO

package/commands/pan/focus-design.md CHANGED Viewed

@@ -41,9 +41,74 @@ If you find yourself analyzing PAN command files, agent definitions, or `pan-too
 ---
-## MANDATORY: Complete All Phases For Selected Mode
+## Tool Selection Priority
-When `/pan:focus-design` is invoked, execute ALL phases for the selected mode automatically. Do NOT stop to ask questions between phases. Do NOT skip phases beyond what the mode specifies. Complete the FULL investigation and produce all output artifacts. The only permitted pause is the Strategy Gate in Phase 3 (if the user passed `--gate`).
+Use the simplest sufficient tool for each research operation:
+1. **Grep/Glob** — for finding patterns and files in the local codebase
+2. **Read** — for examining specific files identified by Grep/Glob
+3. **Bash** — for git history, test runs, build commands
+4. **Agent (subagent)** — for broad exploration spanning many files (>5 reads)
+5. **WebSearch/WebFetch** — for external research after local sources are exhausted
+6. **mcp__context7__*** — for library documentation lookups
+Prefer local evidence over web research. Start with the codebase, then broaden.
+## Context Management Across Phases
+This pipeline spans 10+ phases. Manage context to maintain quality:
+- **KEEP:** Problem statement, success criteria, key architectural decisions, file paths being designed for
+- **SUMMARIZE:** Research findings (compress to key takeaways after each research phase), competitive analysis results
+- **DISCARD:** Raw web fetch content after extracting relevant data, superseded design drafts
+- After Phase 3 (Strategic Analysis), summarize all findings from Phases 0-3 into a compact brief before entering design phases
+**Progressive context loading — load only what the current phase needs:**
+| Phase | What to Load | What NOT to Load Yet |
+|-------|-------------|---------------------|
+| 0. Problem | User's feature description, project README | Implementation details, test files |
+| 1. Landscape | Web search results, competitor docs | Project internals |
+| 2. Codebase | Relevant source files (Glob→Grep→Read) | Unrelated modules, full test suite |
+| 3. Strategic | Findings from 0-2 (summarized) | Raw web content (discard after summary) |
+| 4-6. Design | Architecture files, key modules, API surface | Test implementation details |
+| 7-8. Spec | Design decisions from 4-6, test patterns | Research raw data (long gone) |
+| 9. Output | Spec template, ADR template | Everything else (already in spec) |
+**Why:** A 10-phase pipeline that loads everything in Phase 0 exhausts context by Phase 5. Each phase loads only its inputs, summarizes its outputs, and discards its raw data.
+---
+## Reasoning Protocol
+For research and analysis phases (0, 1, 2, 3), follow observe-think-act:
+1. **OBSERVE** — State what you found (code patterns, competitive data, user needs)
+2. **THINK** — Reason about what this means for the design
+3. **ACT** — Record the finding and move to the next investigation step
+This keeps research structured and prevents rabbit holes.
+## Meta-Prompting: Self-Generated Investigation Strategy
+Before starting Phase 0, generate your own investigation plan based on the feature description:
+```
+Given: "{feature description}"
+My investigation strategy:
+1. What is the core problem? → {how I'll validate it}
+2. Who are the competitors? → {what to search for}
+3. What codebase areas are affected? → {what to Glob/Grep for}
+4. What are the likely architectural constraints? → {what to read}
+5. What risks should I watch for? → {security, performance, compatibility}
+6. What is the ideal output format? → {spec structure for this feature type}
+```
+This self-generated strategy adapts to the specific feature rather than following a generic checklist. A "add caching layer" feature needs different investigation than "add OAuth provider" — the meta-prompt captures that difference upfront.
+**After Phase 3, regenerate:** The strategy may need revision based on what you've learned. Update it before entering design phases.
+---
+## Complete All Phases For Selected Mode
+When `/pan:focus-design` is invoked, execute all phases for the selected mode automatically. Do not stop to ask questions between phases or skip phases beyond what the mode specifies. Complete the full investigation and produce all output artifacts. The only permitted pause is the Strategy Gate in Phase 3 (if the user passed `--gate`).
 **Modes (mutually exclusive — pick one, default `--full`):**