npm - pan-wizard - Versions diffs - 2.9.1 → 3.5.0 - Mend

pan-wizard 2.9.1 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

package/README.md +31 -9
package/agents/pan-conductor.md +189 -0
package/agents/pan-counterfactual.md +112 -0
package/agents/pan-debugger.md +15 -1
package/agents/pan-distiller.md +82 -0
package/agents/pan-document_code.md +21 -0
package/agents/pan-executor.md +16 -0
package/agents/pan-hardener.md +113 -0
package/agents/pan-integration-checker.md +2 -0
package/agents/pan-knowledge.md +81 -0
package/agents/pan-meta-reviewer.md +91 -0
package/agents/pan-optimizer.md +242 -0
package/agents/pan-plan-checker.md +2 -0
package/agents/pan-previewer.md +98 -0
package/agents/pan-project-researcher.md +4 -4
package/agents/pan-reviewer.md +2 -0
package/agents/pan-verifier.md +2 -0
package/bin/install-lib.cjs +197 -0
package/bin/install.js +2048 -1959
package/commands/pan/cost.md +132 -0
package/commands/pan/exec-phase.md +15 -0
package/commands/pan/focus-auto.md +168 -3
package/commands/pan/focus-exec.md +21 -1
package/commands/pan/focus-scan.md +6 -0
package/commands/pan/git.md +223 -0
package/commands/pan/knowledge.md +129 -0
package/commands/pan/learn.md +61 -0
package/commands/pan/map-codebase.md +15 -0
package/commands/pan/mcp-bridge.md +145 -0
package/commands/pan/milestone-done.md +9 -0
package/commands/pan/optimize.md +86 -0
package/commands/pan/plan-phase.md +11 -0
package/commands/pan/preview.md +114 -0
package/commands/pan/profile.md +37 -0
package/commands/pan/review-deep.md +128 -0
package/commands/pan/verify-phase.md +11 -0
package/commands/pan/what-if.md +146 -0
package/hooks/dist/pan-cost-logger.js +102 -0
package/hooks/dist/pan-statusline.js +154 -108
package/hooks/dist/pan-trace-logger.js +197 -0
package/package.json +1 -1
package/pan-wizard-core/bin/lib/bridge.cjs +269 -0
package/pan-wizard-core/bin/lib/bus.cjs +251 -0
package/pan-wizard-core/bin/lib/codebase.cjs +118 -0
package/pan-wizard-core/bin/lib/commands.cjs +1 -0
package/pan-wizard-core/bin/lib/constants.cjs +44 -1
package/pan-wizard-core/bin/lib/context-budget.cjs +27 -0
package/pan-wizard-core/bin/lib/core.cjs +91 -6
package/pan-wizard-core/bin/lib/cost.cjs +359 -0
package/pan-wizard-core/bin/lib/distill.cjs +510 -0
package/pan-wizard-core/bin/lib/focus.cjs +108 -3
package/pan-wizard-core/bin/lib/git.cjs +407 -0
package/pan-wizard-core/bin/lib/init.cjs +5 -5
package/pan-wizard-core/bin/lib/knowledge.cjs +331 -0
package/pan-wizard-core/bin/lib/memory.cjs +252 -0
package/pan-wizard-core/bin/lib/optimize.cjs +653 -0
package/pan-wizard-core/bin/lib/phase.cjs +40 -13
package/pan-wizard-core/bin/lib/preview.cjs +480 -0
package/pan-wizard-core/bin/lib/review-deep.cjs +280 -0
package/pan-wizard-core/bin/lib/roadmap.cjs +4 -4
package/pan-wizard-core/bin/lib/state.cjs +2 -2
package/pan-wizard-core/bin/lib/verify.cjs +34 -1
package/pan-wizard-core/bin/lib/whatif.cjs +289 -0
package/pan-wizard-core/bin/pan-tools.cjs +317 -4
package/pan-wizard-core/templates/playbook.md +53 -0
package/pan-wizard-core/templates/preview-report.md +93 -0
package/pan-wizard-core/templates/roadmap.md +24 -24
package/pan-wizard-core/templates/state.md +12 -9
package/pan-wizard-core/workflows/exec-phase.md +97 -0
package/pan-wizard-core/workflows/learn.md +91 -0
package/pan-wizard-core/workflows/optimize.md +139 -0
package/pan-wizard-core/workflows/plan-phase.md +28 -1
package/pan-wizard-core/workflows/quick.md +7 -0
package/pan-wizard-core/workflows/verify-phase.md +16 -0
package/scripts/build-hooks.js +3 -1

package/commands/pan/cost.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+name: pan:cost
+group: Observability
+description: Show token usage and estimated cost across PAN commands and agents
+argument-hint: "[report|append|clear] [--format json|table|chart] [--since YYYY-MM-DD] [--until YYYY-MM-DD]"
+allowed-tools:
+  - Read
+  - Bash
+---
+<objective>
+Report token usage and estimated cost across all PAN invocations in this project.
+Reads `.planning/metrics/tokens.jsonl` — an append-only log where each line is one call (agent or command) with token counts and model. Cost is computed from a built-in rate table (overridable via `.planning/config.json` → `cost.rates`).
+Default output is JSON for piping. Use `--format table` for human-readable tables or `--format chart` for an ASCII bar chart of daily spend.
+</objective>
+<execution_context>
+@~/.claude/pan-wizard-core/bin/lib/cost.cjs
+</execution_context>
+<subcommands>
+### `report` (default)
+Aggregate all records into totals + breakdowns by agent, command, tier, and day.
+```
+pan-tools cost report [--format json|table|chart] [--since YYYY-MM-DD] [--until YYYY-MM-DD]
+```
+**Flags:**
+- `--format` — `json` (default, for tools) | `table` (aligned text columns) | `chart` (per-day ASCII bars).
+- `--since` — ISO date lower bound (inclusive). Records without `ts` always pass.
+- `--until` — ISO date upper bound (inclusive).
+**JSON output shape:**
+```json
+{
+  "totals": {
+    "calls": 42,
+    "input_tokens": 123456,
+    "output_tokens": 4567,
+    "cache_read_tokens": 50000,
+    "cache_write_tokens": 5000,
+    "cost_usd": 2.1234,
+    "cost_unknown": 0
+  },
+  "cache_hit_rate_pct": 40.5,
+  "by_agent": { "pan-planner": { "calls": 8, "input": 50000, ... } },
+  "by_command": { ... },
+  "by_tier": { ... },
+  "by_day": { "2026-04-18": { ... } },
+  "window": { "since": null, "until": null }
+}
+```
+### `append`
+Append a single cost record. Normally called by instrumented agent spawns; users rarely invoke directly.
+```
+pan-tools cost append \
+  [--agent <name>] [--command <name>] [--model <id>] [--tier reasoning|mid|fast] \
+  [--input-tokens N] [--output-tokens N] \
+  [--cache-read-tokens N] [--cache-write-tokens N] \
+  [--phase <num>] [--session <id>]
+```
+Missing fields are stored as `null` / `0`. Cost is auto-computed when `model` or `tier` resolves to a known rate.
+### `clear`
+Delete the cost log. Useful at the start of a billing cycle.
+```
+pan-tools cost clear
+```
+</subcommands>
+<rate_table>
+Default rates (USD per million tokens) as of 2026-04. Override per-model in `.planning/config.json`:
+```json
+{
+  "cost": {
+    "rates": {
+      "claude-opus-4-7": { "input": 15.0, "output": 75.0, "cache_read": 1.5, "cache_write": 18.75 },
+      "my-custom-model": { "input": 1.0, "output": 2.0, "cache_read": 0.1, "cache_write": 1.25 }
+    }
+  }
+}
+```
+When a record has neither a known model nor a known tier, its cost is `null` and it counts toward `totals.cost_unknown`.
+</rate_table>
+<workflow>
+**Daily check:** run `/pan:cost --format chart` at the end of a working day to see the spend shape.
+**Before shipping:** run `/pan:cost --since 2026-04-01 --format table` to get a total for the billing period.
+**After an expensive run:** check `by_agent` and `by_command` to see which stage drove the spend.
+**To reconcile with provider bill:** providers report total tokens; PAN's log is append-only and in ISO-8601, so `--since / --until` should match the provider's billing window.
+</workflow>
+<instrumentation_note>
+Token records are written by any caller that knows its usage — typically the host runtime or a wrapper. PAN ships the log format + aggregator (this command); the capture hook itself is opt-in (Wave 5 of Spec B v2). Until then, records can be appended manually via `pan-tools cost append` or by external scripts reading the provider API.
+If `.planning/metrics/tokens.jsonl` is empty, `/pan:cost` returns zero totals — the feature is inert, not broken.
+</instrumentation_note>
+<runtime_compatibility>
+| Runtime | Support |
+|---------|---------|
+| Claude Code | Full — data format + aggregation + all output formats |
+| OpenCode | Full aggregator; token capture depends on OpenCode's own hooks |
+| Gemini | Full aggregator; token capture depends on Gemini CLI instrumentation |
+| Codex | Full aggregator; token capture via external script |
+| Copilot CLI | Full aggregator; Copilot doesn't currently expose per-call usage |
+The aggregator is runtime-agnostic. What varies across runtimes is how records *get into* `tokens.jsonl` in the first place.
+</runtime_compatibility>

package/commands/pan/exec-phase.md CHANGED Viewed

@@ -61,6 +61,8 @@ Phase: $ARGUMENTS
 - `--skip-tests` — Skip automatic test generation after execution completes.
 - `--skip-review` — Skip automatic code review after execution completes.
 - `--fast` — Skip both test generation and code review (implies `--skip-tests --skip-review`).
+- `--deep-review` (v3.4+) — After the normal reviewer step, also run `/pan:review-deep <phase>` (security audit via pan-hardener + cross-check via pan-meta-reviewer). Produces `.planning/reviews/<N>/deep-review.md`. Recommended for phases touching auth, payment, PII, migrations, or public APIs. Costs roughly 3× a normal review.
+- `--hierarchical` (v3.4+, Claude + Opus 4.7 only) — Spawn `pan-conductor` as a top-level orchestrator that decomposes the phase and spawns executor/reviewer/verifier sub-agents in sequence. Bounded by safety harness: max 2 nesting levels, 12 spawns per phase, budget ceiling, `.planning/orchestration/abort` kill-switch. On non-Claude runtimes or older models, this flag is a no-op with a warning and falls back to flat exec. Use only for large phases (≥4 autonomous plans) where wall-clock reduction justifies the ~20-30% orchestration tax.
 Context files are resolved inside the workflow via `pan-tools init execute-phase` and per-subagent `<files_to_read>` blocks.
 </context>
@@ -85,6 +87,19 @@ Each execution stage has a restricted set of appropriate actions. Using the wron
 - Wave commit: git operations only — all code changes must be done before committing
 </action_gating>
+<cache_priming>
+**Before Discovery, prime the prompt cache once per invocation.** All subagents spawned within the next 5 minutes will hit the cache instead of re-sending the full context.
+Run once:
+```
+pan-tools cache prime --summary
+```
+This returns `{blocks: [{path, bytes, cache}], total_bytes, sha}` for the cacheable set (project.md, requirements.md, roadmap.md, state.md, standards.md). The `sha` is stable across identical inputs, so repeated calls within the phase hit cached reads.
+When spawning subagents for wave execution, include the cacheable block paths in each agent's system-context so the host runtime (Claude Code with Opus 4.7) can mark them `cache_control: ephemeral`. On non-Claude runtimes or older models, this step is a no-op — nothing breaks, just no savings.
+</cache_priming>
 <process>
 Execute the execute-phase workflow from @~/.claude/pan-wizard-core/workflows/exec-phase.md end-to-end.
 Preserve all workflow gates (wave execution, checkpoint handling, verification, state updates, routing).

package/commands/pan/focus-auto.md CHANGED Viewed

@@ -58,8 +58,10 @@ Which category should this auto campaign focus on?
 5. **docs** — Stale documentation, missing command descriptions (P5-P6)
 6. **optimize** — Performance bottlenecks, redundant computation, robustness hardening (P1-P4)
 7. **prompts** — Execute micro-prompt documents sequentially, or generate them from specs (P0-P6)
+8. **security** — OWASP Top 10 violations, STRIDE threats, auth/injection/crypto hardening (P0-P2)
+9. **distill** — AI code-bloat: phantom try/catch, unused imports, repeated blocks, premature abstraction, god functions (P1-P5)
-Reply with a number (1-7) or category name.
+Reply with a number (1-9) or category name.
 ```
 **After the user replies, map their response to a category name:**
@@ -70,6 +72,8 @@ Reply with a number (1-7) or category name.
 - "5" or "docs" → SELECTED_CATEGORY = docs
 - "6" or "optimize" → SELECTED_CATEGORY = optimize
 - "7" or "prompts" → SELECTED_CATEGORY = prompts
+- "8" or "security" → SELECTED_CATEGORY = security
+- "9" or "distill" → SELECTED_CATEGORY = distill
 Wait for the user's reply before proceeding. Do not guess or pick a default category.
@@ -85,11 +89,12 @@ Wait for the user's reply before proceeding. Do not guess or pick a default cate
 ```
 /pan:focus-auto [--category CAT] [--mode MODE] [--budget N] [--max-cycles N]
                 [--total-budget N] [--continue] [--stop] [--status] [--dry-run]
+                [--deep-review]
 ```
 | Flag | Default | Description |
 |------|---------|-------------|
-| `--category` | null (all) | cleanup, tests, stability, features, docs, optimize, prompts |
+| `--category` | null (all) | cleanup, tests, stability, features, docs, optimize, prompts, security, distill |
 | `--mode` | category-dependent | bugfix, balanced, features, full |
 | `--budget` | category-dependent | Points per cycle (5-100) |
 | `--max-cycles` | 10 | Maximum iterations (1-50) |
@@ -98,6 +103,7 @@ Wait for the user's reply before proceeding. Do not guess or pick a default cate
 | `--stop` | — | Gracefully stop active run |
 | `--status` | — | Show current campaign progress |
 | `--dry-run` | — | Show plan without executing |
+| `--deep-review` | off | After every exec cycle, run inline OWASP security check on changed files. Verdict `block` or `review_required` stops the campaign (6th safety harness). Works with all categories. |
 ## Category Defaults
@@ -110,6 +116,7 @@ Wait for the user's reply before proceeding. Do not guess or pick a default cate
 | docs | P5-P6 | balanced | 30 |
 | optimize | P1-P4 | balanced | 50 |
 | prompts | P0-P6 | balanced | 100 |
+| security | P0-P2 | bugfix | 40 |
 ## Pipeline
@@ -173,6 +180,11 @@ Perform a deep codebase scan to find actionable work items with evidence.
   - **features:** roadmap items not yet implemented, README promises without backing code
   - **docs:** stale documentation, missing command descriptions
   - **optimize:** N+1 operations (file I/O / network calls inside loops), redundant re-computation (`JSON.parse`/`stringify` of same data), synchronous blocking in async modules (`readFileSync`/`execSync` alongside async exports), algorithmic complexity (nested `.find()`/`.filter()` in loops creating O(n²)+), unnecessary allocations in hot paths (spread in loops, string concat vs `join()`), regex construction inside loops (should be hoisted), unbounded collection growth (`.push()` without size limits), swallowed errors (`catch {}` / `catch { /* */ }`), suboptimal data structures (array `.includes()` where Set is better), dead assignments, unguarded property access on nullable values (`.length`/`.split()`/`.match()[0]` without null check)
+  - **security:** Three-pass approach:
+    - **Pass 1 — Injection & crypto (inline grep):** Scan source files for `eval(`, `execSync`, `exec(`, string concatenation in SQL patterns (`` `SELECT...${`` / `"SELECT..."+`), `md5(`/`sha1(`/`createHash('md5'`/`createHash('sha1'`, hardcoded secrets (`password\s*=\s*['"]`, `api_key\s*=\s*['"]`, `secret\s*=\s*['"`), `Math.random()` used for security purposes.
+    - **Pass 2 — Auth & access control (inline grep):** Routes without auth middleware (look for `router.get/post/put/delete` without preceding `app.use(...auth...)`), `req.params.id` used directly without ownership check, `JSON.parse(` on `req.body` without schema validation, CORS `origin: '*'` or `Access-Control-Allow-Origin: *`, verbose errors that expose stack traces (`res.json({ stack:`).
+    - **Pass 3 — Semantic depth (Agent tool, optional):** For M/L items where grep found a suspicious pattern but fix guidance needs code-path tracing, use the Agent tool with Explore subagent to read the specific file and confirm exploitability before including in the batch.
+    - **Classification:** Map findings to priorities: OWASP critical/exploit-ready → P0, High/auth-bypass → P1, Medium/defense-in-depth → P2. Drop LOW/INFO — they don't meet the P0-P2 filter.
   - **prompts:** Two operational modes — detect which applies:
     - **Execute mode:** Find micro-prompt documents (`.md` files containing ordered prompt blocks, e.g., `## Prompt 1`, `## Prompt 2`, or numbered checklist items `- [ ] Prompt: ...`). Look in `.planning/`, project root, and `docs/` for files matching patterns: `*prompts*`, `*micro-prompt*`, `*prompt-plan*`, `*prompt-sequence*`. Each unchecked/incomplete prompt block is one work item.
     - **Generate mode:** Find specification documents (files matching `*spec*`, `*prd*`, `*requirements*`, `*feature*` in `.planning/`, `docs/specs/`, project root) that do NOT already have a corresponding micro-prompt document. Each spec needing decomposition is one work item.
@@ -271,6 +283,32 @@ A failed item never blocks subsequent items.
 5. Stage specific changed files (not `git add -A`) and commit with accurate message listing only verified items
 6. Count: `items_completed`, `items_failed`, `points_used`
+**If `--deep-review` flag is active (run after commit, before recording cycle):**
+Get changed files from this cycle's commit:
+```bash
+CHANGED=$(git diff HEAD~1 --name-only 2>/dev/null | grep -E '\.(js|ts|jsx|tsx|py|go|rb|java|php)$')
+```
+Run inline OWASP security check on changed files only:
+- Grep each changed file for critical patterns:
+  - Injection: `eval(`, `execSync(`, SQL string concat (`` `SELECT...${`` ), `child_process.exec(`
+  - Crypto: `createHash('md5'`, `createHash('sha1'`, `Math.random()` near auth/token/secret context
+  - Auth bypass: routes with no auth guard added, `req.params` used as DB key without ownership check
+  - Secrets: `password\s*=\s*['"]`, `apiKey\s*=\s*['"]`, `token\s*=\s*['"]` assigned to a literal value
+- Score findings by severity: critical (exploit-ready) → BLOCK; high (auth/injection surface) → WARN; medium/low → LOG
+**Handle deep-review verdict:**
+| Severity found | Verdict | Action |
+|---------------|---------|--------|
+| Critical pattern in changed file | `block` | STOP campaign — do NOT record cycle, revert last commit, present finding to user |
+| High pattern in changed file | `review_required` | STOP campaign — record cycle as completed, flag finding, recommend manual review |
+| Medium/low only | `ok_with_minor` | Continue — append findings to `.planning/focus/security-log-<date>.md` |
+| No patterns | `ok` | Continue silently |
+Write all non-ok findings to `.planning/focus/security-log-<date>.md` with file:line references.
 #### Step 2.4: Record Cycle
 Run: `pan-tools focus auto --update --items-completed N --items-failed N --points-used N --tests-before N --tests-after N --batch-file <path>`
@@ -282,6 +320,8 @@ Check the response for stop conditions:
 - `zero_completed`: No items completed in this cycle — go to Phase 3
 - `diminishing_returns`: Optimize only — cycle efficiency < 30% of previous cycle — go to Phase 3
 - `prompts_complete`: Prompts only — all prompts in document executed — go to Phase 3
+- `security_complete`: Security only — scan found no HIGH/CRITICAL items remaining — go to Phase 3
+- `deep_review_block`: `--deep-review` only — critical pattern detected in changed files — go to Phase 3 with warning
 - `null`: Continue to next cycle
 #### Step 2.5: Inter-Cycle Context Management
@@ -293,6 +333,24 @@ Between cycles, manage context to prevent quality degradation over long campaign
 Display one-line cycle summary: `Cycle N/M | X/Y pts | Z items done | Tests: A -> B`
+#### Step 2.5a: Reflection Gate (Opus 4.7 thinking-capable models only)
+Before committing to the next cycle, call the reflection helper:
+```
+echo '{"run": <run-state>, "cycle": <just-completed-cycle>, "batch": <proposed-next-batch>, "tier": "reasoning"}' \
+  | pan-tools focus reflection
+```
+The helper returns `{reflect: true, prompt: "..."}` when the current model tier supports extended thinking. If `reflect: true`, think through the prompt — which asks whether running another cycle is worthwhile given telemetry and remaining items — and respond with JSON: `{"continue": true|false, "rationale": "..."}`.
+- If `continue: false`: stop the campaign and treat as a user-reason stop (preserve state, skip to Phase 3).
+- If `continue: true`: proceed to the next cycle.
+If the helper returns `reflect: false` (tier doesn't support thinking, or `reflection_enabled: false` in run state, or no next batch): skip this step silently and continue to the next cycle.
+The reflection gate catches "zero progress" or "wrong category" drift earlier than the automatic stop rules.
 **Attention anchor — emit after every cycle summary:**
 ```
 Remaining: {cycles_left} cycles | {budget_remaining}/{total_budget} pts | Safety: {active_harness_warnings}
@@ -323,7 +381,7 @@ Then continue immediately to the next cycle (back to Step 2.1).
 3. Remove safety tag: `git tag -d focus-auto-baseline 2>/dev/null`
-## 5-Layer Safety Harness
+## 6-Layer Safety Harness
 | Layer | Mechanism | Action |
 |-------|-----------|--------|
@@ -332,6 +390,7 @@ Then continue immediately to the next cycle (back to Step 2.1).
 | Iteration limit | `--max-cycles N` | Hard stop on loop count |
 | Regression circuit breaker | tests_after < tests_before | Immediate stop, status=stopped |
 | Zero-completed guard | 0 items done in a cycle | Stop — further cycles won't help |
+| Security gate (`--deep-review`) | Critical/high OWASP pattern in changed files | Revert last commit (critical) or flag for manual review (high), stop campaign |
 ## 9 Behavioral Rules
@@ -430,6 +489,112 @@ When a specification document is found that doesn't have a matching micro-prompt
 **After generation:** The document is written and committed. The next cycle will detect it in execute mode and begin executing prompts sequentially.
+## Security Category — Execution Details
+The security category scans for OWASP Top 10 (2025) violations and STRIDE threats, then fixes them cycle by cycle until the scan returns zero HIGH/CRITICAL findings.
+### Scan approach (Step 2.1)
+Three passes per cycle:
+**Pass 1 — Fast grep scan (always runs):**
+| OWASP | Grep pattern | Priority |
+|-------|-------------|---------|
+| A03 Injection | `eval(`, `execSync(`, `` `SELECT.*\${ ``, `child_process.exec(` | P0 |
+| A02 Crypto | `createHash\(['"]md5\|sha1`, `Math\.random\(\)` near auth/token | P0 |
+| A01 Access | Route without auth middleware, IDOR (raw `req.params.id` to DB) | P1 |
+| A05 Misconfig | `origin:\s*['"]?\*`, `Access-Control-Allow-Origin: \*`, stack in response | P1 |
+| A07 Auth | No session expiry, credentials in URL params | P1 |
+| A04 Design | Missing rate-limit on auth/payment endpoints | P2 |
+| A09 Logging | Security events (`login`, `payment`, `admin`) with no log call nearby | P2 |
+**Pass 2 — Structural check (always runs):**
+- Read route files and check: does every mutating endpoint (POST/PUT/PATCH/DELETE) have auth middleware before the handler?
+- Check for hardcoded secrets: grep for `['"][A-Za-z0-9_]{20,}['"]` assigned to variables named `key`/`token`/`secret`/`password`/`apiKey`
+- Check for prototype pollution risk: `Object.assign(req.body)` or spread from untrusted input into a stored object
+**Pass 3 — Semantic depth (Agent tool, for M/L items only):**
+When a pattern match needs code-path confirmation, spawn an Explore subagent:
+> "Read [file]. Confirm whether [line N] is reachable from an unauthenticated request path and whether the input is sanitized before use."
+Use the confirmation to decide whether to include the item at P0/P1 or drop it as a false positive.
+### Item classification
+| Hardener severity | Focus priority | Example |
+|------------------|----------------|---------|
+| Critical | P0 | `eval(req.body.code)` — direct RCE |
+| High | P1 | Auth bypass on admin route |
+| Medium | P2 | Rate-limiting absent on login |
+| Low / Info | DROP | Missing security header on non-sensitive route |
+### Execution (Step 2.3)
+Treat each security item as a STANDARD or FULL item regardless of effort estimate:
+1. **State threat:** "This is [OWASP category]. The exploit path is: [attacker does X → Y → data/system compromised]."
+2. **Read the file** — confirm the pattern is real, not a false positive
+3. **Implement the fix** — use established patterns (parameterized queries, allowlists, bcrypt, rate-limit middleware)
+4. **Write or update the test** — every security fix MUST have a test that proves the vulnerability is closed (e.g., send the malicious payload, assert 400/403 not 200)
+5. **Run full test suite** — regression check before marking DONE
+### Stop condition
+`security_complete` fires when the scan finds zero P0/P1 items. P2 items (medium) may remain — they won't stop the campaign unless `zero_completed` fires (no items at all).
+A security campaign that ends with `security_complete` means: no critical or high OWASP violations found in the scanned files. Medium/low items can be addressed in subsequent targeted passes or documented as accepted risk.
+---
+## Distill Category — Execution Details
+The `distill` category targets **AI-generated code bloat** with a 5-pass pipeline based on the SOTA agentic-refactoring architecture (deterministic-first, LLM-on-narrow-spans).
+### Pipeline
+| Pass | What | Cost | Tier output |
+|------|------|------|-------------|
+| 1 | **Deterministic patterns** — phantom try/catch, unused imports, magic numbers, long functions, wide param lists | Free | safe / review |
+| 2 | **AST-style analysis** — single-instance factories, deep nesting | Free | review |
+| 3 | **Cross-file graph** — repeated 5+ line blocks, unreferenced exports | Free | review |
+| 4 | **LLM judgment** — pan-distiller agent receives ONLY flagged spans (max 50 lines context per finding); validates pattern, refines tier, proposes minimal rewrite | LLM tokens | safe / review / risky |
+| 5 | **Cross-session memory** — compares findings to `.planning/memory/distill-patterns.md`; flags **regressed** patterns ("we already fixed this") | Free | metadata |
+### Safety Tiers
+| Tier | Rule | Action |
+|------|------|--------|
+| `safe` | Deterministic, behavior-preserving (e.g., remove unused import) | Auto-applied |
+| `review_required` | Behavior preserved under invariants but human should verify | Surfaced to user |
+| `risky` | Cross-file impact or might surface latent bugs | Never auto-applied |
+A finding's confidence below 0.85 is automatically downgraded to `review_required` regardless of original tier.
+### Bloat Budget
+After each cycle, distill computes:
+- **touched_loc** — total LOC modified in cycle
+- **removable_loc** — sum of `loc_saved` across findings
+- **essential_loc** — touched_loc − removable_loc
+- **bloat ratio** — touched_loc / essential_loc
+Default threshold: **2.0x**. If a cycle's ratio exceeds threshold, the bloat budget gate flags it for review.
+### Stop condition
+`distill_complete` fires when the scan finds zero bloat findings. The codebase is fully distilled for the patterns the deterministic + AST + graph passes detect.
+### CLI
+```bash
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs distill scan
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs distill analyze [--touched-loc N] [--bloat-threshold X]
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs distill report
+```
+`scan` returns findings. `analyze` adds bloat budget + regressed pattern detection. `report` writes findings to `.planning/memory/distill-patterns.md` for the next session.
 <failure_pattern_capture>
 When the same failure pattern appears in 2+ items within a campaign, capture it for future runs.

package/commands/pan/focus-exec.md CHANGED Viewed

@@ -116,6 +116,7 @@ HARD STOP conditions (do not proceed to next stage):
 - `--dry-run` — Run Stages 1-2 only (show what WOULD be executed)
 - `--no-commit` — Skip the commit step in Stage 6
 - `--continue` — Resume a previously interrupted execution
+- `--deep-review` (v3.4+) — After each high-stakes item's execution, run `/pan:review-deep` for that item (pan-hardener + pan-meta-reviewer security + cross-check). Slows the campaign by roughly 3× per item that triggers the deep pass; use for batches touching auth/payment/migrations.
 ---
@@ -209,7 +210,14 @@ This catches emergent interactions: 5 "add try-catch" fixes might reveal the mod
 1. **Check Project Status** — git status, recent commits
 2. **Test Baseline** — run test suite, record current counts
 3. **Create rollback snapshot** — git tag for safety
-4. **Report** — Output session start summary
+4. **Prime prompt cache** — `pan-tools cache prime --summary` (once; all sub-agents in the next 5 min hit cached context)
+5. **Report** — Output session start summary
+**Circular optimization — init trace:**
+```bash
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs optimize trace init \
+  --description "focus-exec session" --command "focus-exec" 2>/dev/null || true
+```
 **Record baseline:**
 ```
@@ -243,6 +251,13 @@ Display the execution batch to user, then continue automatically.
 ### 3.0 Pre-Execution Setup
 1. Cache project facts — do NOT re-read later
 2. Create/update progress tracker with the batch table
+3. Classify stages for parallel tool use:
+   ```
+   pan-tools focus classify-stages --raw
+   ```
+   The CLI reads the latest batch and returns `{waves, parallelism_hint}`. When `parallelism_hint` is `emit-micro-in-parallel` or `emit-standard-in-parallel`, all reads and greps for items in the current wave SHOULD be emitted in a single assistant turn (parallel tool calls). Opus 4.7 is markedly better at emitting parallel tool calls than earlier models; use that to collapse Stage 3 latency on MICRO-heavy batches.
+   Serialize on `FULL` tier items — each is its own wave.
 ### 3.1 Process Items by Tier
@@ -386,6 +401,11 @@ Unless `--no-commit`:
 - Record session summary (items completed, tests before/after, budget used)
 - Append error patterns if any failures occurred
+### 6.3.5 Circular optimization — end trace
+```bash
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs optimize trace end 2>/dev/null || true
+```
 ### 6.4 Final Report
 ```markdown

package/commands/pan/focus-scan.md CHANGED Viewed

@@ -58,6 +58,12 @@ When `/pan:focus-scan` is invoked, execute all phases without stopping. Do not a
 ## Phase 0: Orientation & Baseline Snapshot
+**Circular optimization — init trace:**
+```bash
+node ~/.claude/pan-wizard-core/bin/pan-tools.cjs optimize trace init \
+  --description "focus-scan" --command "focus-scan" 2>/dev/null || true
+```
 ### 0.1 Read Current State
 Read these files to establish baseline: