npm - @wazir-dev/cli - Versions diffs - 1.1.0 → 1.3.0 - Mend

@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (138) hide show

package/CHANGELOG.md +74 -10
package/README.md +15 -15
package/assets/demo.cast +47 -0
package/assets/demo.gif +0 -0
package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
package/docs/concepts/architecture.md +1 -1
package/docs/concepts/roles-and-workflows.md +2 -0
package/docs/concepts/why-wazir.md +59 -0
package/docs/decisions/2026-03-19-deferred-items.md +564 -0
package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
package/docs/readmes/INDEX.md +21 -5
package/docs/readmes/features/expertise/README.md +2 -2
package/docs/readmes/features/exports/README.md +2 -2
package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
package/docs/readmes/features/schemas/README.md +3 -0
package/docs/readmes/features/skills/README.md +17 -0
package/docs/readmes/features/skills/clarifier.md +5 -0
package/docs/readmes/features/skills/claude-cli.md +5 -0
package/docs/readmes/features/skills/codex-cli.md +5 -0
package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
package/docs/readmes/features/skills/executing-plans.md +5 -0
package/docs/readmes/features/skills/executor.md +5 -0
package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
package/docs/readmes/features/skills/gemini-cli.md +5 -0
package/docs/readmes/features/skills/humanize.md +5 -0
package/docs/readmes/features/skills/init-pipeline.md +5 -0
package/docs/readmes/features/skills/receiving-code-review.md +5 -0
package/docs/readmes/features/skills/requesting-code-review.md +5 -0
package/docs/readmes/features/skills/reviewer.md +5 -0
package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
package/docs/readmes/features/skills/wazir.md +5 -0
package/docs/readmes/features/skills/writing-skills.md +5 -0
package/docs/readmes/features/workflows/prepare-next.md +1 -1
package/docs/reference/configuration-reference.md +47 -6
package/docs/reference/hooks.md +1 -0
package/docs/reference/launch-checklist.md +4 -4
package/docs/reference/review-loop-pattern.md +119 -9
package/docs/reference/roles-reference.md +1 -0
package/docs/reference/skill-tiers.md +147 -0
package/docs/reference/tooling-cli.md +3 -1
package/docs/truth-claims.yaml +12 -0
package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
package/exports/hosts/claude/.claude/commands/verify.md +30 -1
package/exports/hosts/claude/.claude/settings.json +9 -0
package/exports/hosts/claude/CLAUDE.md +1 -1
package/exports/hosts/claude/export.manifest.json +6 -4
package/exports/hosts/claude/host-package.json +3 -1
package/exports/hosts/codex/AGENTS.md +1 -1
package/exports/hosts/codex/export.manifest.json +6 -4
package/exports/hosts/codex/host-package.json +3 -1
package/exports/hosts/cursor/.cursor/hooks.json +4 -0
package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
package/exports/hosts/cursor/export.manifest.json +6 -4
package/exports/hosts/cursor/host-package.json +3 -1
package/exports/hosts/gemini/GEMINI.md +1 -1
package/exports/hosts/gemini/export.manifest.json +6 -4
package/exports/hosts/gemini/host-package.json +3 -1
package/hooks/context-mode-router +191 -0
package/hooks/definitions/context_mode_router.yaml +19 -0
package/hooks/hooks.json +31 -6
package/hooks/protected-path-write-guard +8 -0
package/hooks/routing-matrix.json +45 -0
package/hooks/session-start +62 -1
package/llms-full.txt +937 -134
package/package.json +2 -4
package/schemas/hook.schema.json +2 -1
package/schemas/phase-report.schema.json +89 -0
package/schemas/usage.schema.json +25 -1
package/schemas/wazir-manifest.schema.json +19 -0
package/skills/brainstorming/SKILL.md +32 -157
package/skills/clarifier/SKILL.md +289 -111
package/skills/claude-cli/SKILL.md +320 -0
package/skills/codex-cli/SKILL.md +260 -0
package/skills/debugging/SKILL.md +13 -0
package/skills/design/SKILL.md +13 -0
package/skills/dispatching-parallel-agents/SKILL.md +13 -0
package/skills/executing-plans/SKILL.md +13 -0
package/skills/executor/SKILL.md +139 -19
package/skills/finishing-a-development-branch/SKILL.md +13 -0
package/skills/gemini-cli/SKILL.md +260 -0
package/skills/humanize/SKILL.md +13 -0
package/skills/init-pipeline/SKILL.md +72 -164
package/skills/prepare-next/SKILL.md +81 -10
package/skills/receiving-code-review/SKILL.md +13 -0
package/skills/requesting-code-review/SKILL.md +13 -0
package/skills/reviewer/SKILL.md +369 -24
package/skills/run-audit/SKILL.md +13 -0
package/skills/scan-project/SKILL.md +13 -0
package/skills/self-audit/SKILL.md +217 -16
package/skills/skill-research/SKILL.md +188 -0
package/skills/subagent-driven-development/SKILL.md +13 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
package/skills/subagent-driven-development/implementer-prompt.md +8 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
package/skills/tdd/SKILL.md +13 -0
package/skills/using-git-worktrees/SKILL.md +13 -0
package/skills/using-skills/SKILL.md +13 -0
package/skills/verification/SKILL.md +54 -3
package/skills/wazir/SKILL.md +464 -381
package/skills/writing-plans/SKILL.md +14 -1
package/skills/writing-skills/SKILL.md +13 -0
package/templates/artifacts/implementation-plan.md +3 -0
package/templates/artifacts/tasks-template.md +133 -0
package/templates/examples/phase-report.example.json +48 -0
package/tooling/src/adapters/composition-engine.js +256 -0
package/tooling/src/adapters/model-router.js +84 -0
package/tooling/src/capture/command.js +41 -2
package/tooling/src/capture/run-config.js +3 -1
package/tooling/src/capture/store.js +56 -0
package/tooling/src/capture/usage.js +106 -0
package/tooling/src/capture/user-input.js +66 -0
package/tooling/src/checks/ac-matrix.js +256 -0
package/tooling/src/checks/command-registry.js +12 -0
package/tooling/src/checks/docs-truth.js +1 -1
package/tooling/src/checks/security-sensitivity.js +69 -0
package/tooling/src/checks/skills.js +111 -0
package/tooling/src/cli.js +31 -20
package/tooling/src/commands/stats.js +161 -0
package/tooling/src/commands/validate.js +5 -1
package/tooling/src/export/compiler.js +33 -37
package/tooling/src/gating/agent.js +145 -0
package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
package/tooling/src/hooks/routing-logic.js +69 -0
package/tooling/src/init/auto-detect.js +258 -0
package/tooling/src/init/command.js +38 -170
package/tooling/src/input/scanner.js +46 -0
package/tooling/src/reports/command.js +103 -0
package/tooling/src/reports/phase-report.js +323 -0
package/tooling/src/state/command.js +160 -0
package/tooling/src/state/db.js +287 -0
package/tooling/src/status/command.js +58 -1
package/tooling/src/verify/proof-collector.js +299 -0
package/wazir.manifest.yaml +26 -14
package/workflows/plan-review.md +3 -1
package/workflows/verify.md +30 -1

package/skills/self-audit/SKILL.md CHANGED Viewed

@@ -5,16 +5,101 @@ description: Run a self-audit loop in an isolated git worktree — validates, au
 # Self-Audit — Worktree-Isolated Audit-Fix Loop
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 ## Overview
 This skill runs a structured self-audit of the Wazir project itself, operating entirely in an isolated git worktree. It validates the project against all canonical checks, performs deeper structural analysis, fixes issues found, verifies the fixes pass, and only merges back on all-green.
 **Safety guarantee:** The main worktree is never modified until all checks pass in isolation.
+## Severity Levels
+Every finding is assigned a severity that determines handling:
+| Severity | Action | Description |
+|----------|--------|-------------|
+| **critical** | Abort loop | Structural integrity threat — cannot safely continue. Discard worktree. |
+| **high** | Fix now | Must be resolved in this loop before proceeding to verify. |
+| **medium** | Fix if time | Fix within remaining loop budget. Skip if loop cap approaching. |
+| **low** | Log and skip | Record in report. No fix attempted. |
+Severity assignment rules:
+- Protected-path violation → **critical**
+- Test failure, broken hook, missing manifest entry → **high**
+- Documentation drift, stale export, missing schema → **medium**
+- Style issues, minor inconsistency, cosmetic → **low**
+## Quality Scoring
+Each loop measures a quality score **before** and **after** fixes:
+```
+quality_score = (checks_passing / total_checks) * 100
+```
+Track per loop:
+- `quality_score_before` — score at start of loop (after Phase 1)
+- `quality_score_after` — score at end of loop (after Phase 4 verify)
+- `delta` — improvement from this loop's fixes
+**Effectiveness threshold:** If 3 consecutive loops show `delta < 2%`, the audit has converged — skip remaining loops.
+## Learning Integration
+After each audit loop, findings feed the learning pipeline:
+1. **Propose learnings:** For each finding category that appeared in this loop:
+   - Check `state.sqlite` findings table for the same `finding_hash` in previous runs
+   - If `recurrence_count >= 2`: auto-propose a learning to `memory/learnings/proposed/`
+   - Learning scope tags: `scope_roles: [executor]`, `scope_concerns: [quality]`
+2. **Store findings:** Insert all findings into `state.sqlite` via `insertFinding()` with severity and finding_hash
+3. **Store audit record:** Insert `{run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after}` into `audit_history`
+## Trend Tracking
+Before starting Loop 1, query previous audit results:
+```javascript
+const trend = getAuditTrend(db, 5); // last 5 audits
+```
+Present trend in the report:
+- Are finding counts trending up or down?
+- Are the same finding_hashes recurring? If so, the fixes aren't preventing recurrence — escalate.
+- Has quality_score improved across runs?
+## Escalation Path
+Manual-required findings that cannot be auto-fixed are escalated:
+1. **Within the audit:** Logged in the report with remediation guidance
+2. **Cross-run recurrence:** If the same manual finding recurs across 3+ audits, escalate to:
+   - Create a task spec in `.wazir/runs/latest/tasks/` describing the fix
+   - Flag in the audit report as **RECURRING — needs dedicated task**
+3. **Critical findings:** Immediately logged. If 2+ critical findings in a single loop, abort the entire audit run.
 ## Trigger
 On-demand: operator invokes `/self-audit` or requests a self-audit loop.
+### Parameters
+| Flag | Default | Max | Description |
+|------|---------|-----|-------------|
+| `--loops N` | 5 | 10 | Number of audit-fix loops to run. Each loop executes the full Phase 1-5 cycle. If a loop finds 0 new issues, subsequent loops are skipped (convergence detection). |
 ## Worktree Isolation Model
 ```
@@ -45,7 +130,9 @@ node tooling/src/cli.js doctor --json
 node tooling/src/cli.js export --check
 ```
-Collect pass/fail for each. Any failure is a finding.
+Collect pass/fail for each. Any failure is a finding. Assign severity per the severity table above.
+Calculate `quality_score_before` from the pass/fail results.
 ## Phase 2: Deep Structural Audit
@@ -79,10 +166,72 @@ Beyond CLI checks, inspect for:
    - Each skill dir under `skills/` has a well-formed `SKILL.md` with frontmatter
    - Skills referenced in documentation actually exist
+7. **Code Quality**
+   - Run `node tooling/src/cli.js validate` (all subcommands) and capture exit codes
+   - If `eslint` is present in `package.json` scripts or devDependencies, run `npx eslint .` and capture results
+   - If `tsc` is present in `package.json` scripts or devDependencies, run `npx tsc --noEmit` and capture results
+   - Tools not found in `package.json` are skipped with a note in the report
+8. **Test Coverage**
+   - Run `npm test` and capture pass/fail counts from output
+   - Any test failure is a finding
+9. **Expertise Coverage**
+   - Read `expertise/composition-map.yaml`
+   - For every module path referenced, check that the file exists under `expertise/`
+   - Missing files are findings
+10. **Export Freshness**
+    - Run `wazir export --check`
+    - Any drift detected is a finding
+11. **Input Coverage** (run-scoped — only when a run directory exists)
+    - Read the original input file(s) from `.wazir/input/` or `.wazir/runs/<id>/sources/`
+    - Read the execution plan from `.wazir/runs/<id>/clarified/execution-plan.md`
+    - Read the actual commits on the branch: `git log --oneline main..HEAD`
+    - Build a coverage matrix: every distinct item in the input should map to:
+      - At least one task in the execution plan
+      - At least one commit in the git log
+    - **Missing items** (in input but not in plan AND not in commits) → **HIGH** severity finding
+    - **Partial items** (in plan but no corresponding commit) → **MEDIUM** severity finding
+    - **Fully covered items** (input → plan → commit) → pass
+    - Output the coverage matrix in the audit report:
+      ```
+      | Input Item | Plan Task | Commit | Status |
+      |------------|-----------|--------|--------|
+      | Item 1     | Task 3    | abc123 | PASS   |
+      | Item 2     | Task 5    | —      | PARTIAL|
+      | Item 3     | —         | —      | MISSING|
+      ```
+    - This dimension catches scope reduction AFTER the fact — a safety net for when the clarifier or planner fails
+## Protected-Path Safety Rails
+Before applying ANY fix in Phase 3, check if the target file is in a protected path. The self-audit loop MUST NOT modify files in:
+- `skills/`
+- `workflows/`
+- `roles/`
+- `schemas/`
+- `wazir.manifest.yaml`
+- `docs/concepts/`
+- `docs/reference/`
+- `expertise/composition-map.yaml`
+- `docs/plans/`
+- `program.md`
+If a fix would touch a protected path, log it as a **manual-required** finding and skip. If `git diff --name-only` shows any protected path was modified during a loop iteration, **ABORT** the loop and discard the worktree.
 ## Phase 3: Fix
-For each finding from Phases 1-2:
+For each finding from Phases 1-2, ordered by severity (critical first):
+1. **Critical findings:** Abort immediately. Discard worktree. Report the critical finding.
+2. **High findings:** Must fix now.
+3. **Medium findings:** Fix if loop budget allows (remaining loops > 1).
+4. **Low findings:** Log only. No fix attempted.
+For high/medium findings:
 1. Categorize as **auto-fixable** or **manual-required**
 2. Auto-fixable issues: apply the fix directly
    - Missing files → create stubs or fix references
@@ -90,7 +239,7 @@ For each finding from Phases 1-2:
    - Documentation drift → update docs to match reality
    - Permission issues → `chmod +x` hook scripts
    - Schema formatting → auto-format
-3. Manual-required issues: document in the audit report with remediation guidance
+3. Manual-required issues: document in the audit report with remediation guidance. Check escalation path for recurrence.
 **Fix constraints:**
 - Never modify `input/` (read-only operator surface)
@@ -106,27 +255,61 @@ If any check fails after fixes:
 - Document the revert and the root cause
 - Re-verify
-## Phase 5: Report & Commit
+## Phase 5: Report, Learn & Commit
+### Quality Score
+Re-run Phase 1 checks and calculate `quality_score_after`. Compute delta.
+### Learning Extraction
+1. Hash each finding description → `finding_hash`
+2. Store all findings in `state.sqlite` via `insertFinding(db, {run_id, phase: 'self-audit', source: 'self-audit', severity, description, finding_hash})`
+3. Query `getRecurringFindingHashes(db, 2)` — findings occurring 2+ times across runs
+4. For each recurring finding not already in `memory/learnings/proposed/`:
+   - Write a learning proposal: `memory/learnings/proposed/self-audit-<hash-prefix>.md`
+   - Content: what the issue is, how often it recurs, recommended prevention
+5. Store audit record: `insertAuditRecord(db, {run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after})`
+### Report
 Produce a structured report:
 ```markdown
 # Self-Audit Report — Loop N — <date>
+## Quality Score
+- Before: X% → After: Y% (delta: +Z%)
+## Trend (last 5 audits)
+| Run | Date | Findings | Fixes | Quality |
+|-----|------|----------|-------|---------|
+| ... | ...  | ...      | ...   | ...     |
 ## Validation Sweep
-| Check | Before | After |
-|-------|--------|-------|
-| manifest | PASS/FAIL | PASS |
-| hooks | PASS/FAIL | PASS |
-| ... | ... | ... |
-## Findings
-### Auto-Fixed (N)
-- [F-001] <description> — fixed by <change>
+| Check | Severity | Before | After |
+|-------|----------|--------|-------|
+| manifest | high | PASS/FAIL | PASS |
+| hooks | high | PASS/FAIL | PASS |
+| ... | ... | ... | ... |
+## Findings by Severity
+### Critical (N) — loop aborted if any
+### High (N) — fixed
+### Medium (N) — fixed if budget allowed
+### Low (N) — logged only
+## Auto-Fixed (N)
+- [F-001] [high] <description> — fixed by <change>
+- ...
+## Manual Required (N)
+- [M-001] [medium] <description> — remediation: <guidance>
+- [M-002] [medium] <description> — **RECURRING (3x)** — needs dedicated task
 - ...
-### Manual Required (N)
-- [M-001] <description> — remediation: <guidance>
+## Proposed Learnings (N)
+- <learning-file>: <summary>
 - ...
 ## Changes Made
@@ -146,8 +329,26 @@ The worktree agent returns its results. If changes were made, the caller can mer
 ## Loop Behavior
+Default: **5 loops** (override with `--loops N`, max 10).
 When running multiple loops:
 - Loop 1 audits the current state, fixes what it finds
 - Loop 2 audits the result of Loop 1, catches anything missed or introduced
 - Each loop is independent and runs in its own fresh worktree
-- Convergence: if Loop N finds 0 issues, the project is clean
+- **Convergence detection:** if Loop N finds **0 new issues** (no new findings beyond what previous loops already reported), all subsequent loops are skipped and the audit terminates early
+- **Effectiveness convergence:** if 3 consecutive loops show `quality_score delta < 2%`, skip remaining loops
+- **Critical abort:** if any loop encounters 2+ critical findings, abort the entire audit run
+- If a loop modifies a protected path (see Protected-Path Safety Rails above), the loop is aborted and the worktree is discarded
+The final branch is **NOT auto-merged** — it requires human review.
+## State Database Integration
+The self-audit skill requires `state.sqlite` (see `tooling/src/state/db.js`). At audit start:
+```javascript
+const { openStateDb, getAuditTrend, insertFinding, insertAuditRecord, getRecurringFindingHashes } = require('../../tooling/src/state/db');
+const db = openStateDb(stateRoot);
+```
+All findings are persisted across runs, enabling trend detection and learning extraction.

package/skills/skill-research/SKILL.md ADDED Viewed

@@ -0,0 +1,188 @@
+---
+name: wz:skill-research
+description: Deep competitive analysis of Wazir skills against the ecosystem. Research only — never auto-applies changes.
+---
+# Skill Research — Overnight Competitive Analysis
+Deeply analyze Wazir skills against equivalent skills in other frameworks. Produces comparison reports with ratings and recommendations. **Research only — never modifies skill files.**
+## Invocation
+```
+/wazir audit skills --all                    # Analyze all skills
+/wazir audit skills --skill tdd,debugging    # Analyze specific skills
+/wazir audit skills --skill executor --deep  # Deep analysis of one skill
+```
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Isolation
+This skill MUST run in an isolated git worktree:
+1. Create worktree: `git worktree add .worktrees/skill-research-<date> -b skill-research-<date>`
+2. All report files are written inside the worktree
+3. Commits contain ONLY report files — never skill changes
+4. On completion, present the branch for user review
+## Per-Skill Research Process
+For each skill being analyzed:
+### Step 1: Read the Wazir Skill
+Read the full `SKILL.md` for the skill being analyzed. Extract:
+- Purpose and trigger conditions
+- Enforcement mechanisms (hard gates, checks, rules)
+- Anti-rationalization coverage (how does it prevent agents from skipping steps?)
+- Token cost estimate (how many tokens does this skill add to context?)
+### Step 2: Research Competitors
+Fetch and analyze equivalent skills from:
+1. **superpowers** — the primary competitor. Fetch the equivalent skill from GitHub.
+2. **2-3 other frameworks** — depending on the skill type:
+   - For TDD: cursor-rules TDD patterns, aider commit conventions
+   - For debugging: rubber-duck debugging frameworks, systematic debugging methodologies
+   - For planning: software architecture patterns, agile story mapping tools
+   - For review: CodeRabbit, GitHub Copilot review, PR review best practices
+Use `WebFetch` or context-mode `fetch_and_index` to retrieve competitor content.
+### Step 3: Side-by-Side Comparison
+Produce a comparison table:
+```markdown
+| Dimension | Wazir | superpowers | Competitor B | Competitor C |
+|-----------|-------|-------------|-------------|-------------|
+| Completeness | ... | ... | ... | ... |
+| Enforcement | ... | ... | ... | ... |
+| Token efficiency | ... | ... | ... | ... |
+| Anti-rationalization | ... | ... | ... | ... |
+```
+For each dimension, note:
+- **Wazir strengths** — what Wazir does better
+- **Wazir weaknesses** — what competitors do better
+- **Gaps** — things competitors have that Wazir lacks entirely
+### Step 4: Rate
+Rate each skill on 4 dimensions (1-5 scale):
+1. **Completeness** (1-5) — Does the skill cover all necessary cases? Are there gaps in the workflow?
+2. **Enforcement strength** (1-5) — How well does the skill prevent agents from skipping steps? Are there hard gates or just suggestions?
+3. **Token efficiency** (1-5) — How concise is the skill? Could it achieve the same enforcement with fewer tokens?
+4. **Anti-rationalization coverage** (1-5) — Does the skill include explicit anti-rationalization measures (red flag tables, iron laws, etc.)?
+Each rating must include a 1-2 sentence justification.
+### Step 5: Recommend
+For each skill, produce specific, actionable recommendations:
+- What to add (with reasoning from competitor analysis)
+- What to remove (token bloat without enforcement value)
+- What to restructure (better organization for the same content)
+- Priority: high / medium / low
+**Recommendations are NEVER auto-applied.** They go in the report for human review.
+## Output Format
+Reports saved to `reports/skill-audit-<YYYY-MM-DD>/`:
+```
+reports/skill-audit-2026-03-20/
+├── README.md              # Summary with aggregate ratings
+├── skill-tdd.md           # Per-skill report
+├── skill-debugging.md
+├── skill-executor.md
+└── ...
+```
+### Per-Skill Report Template
+```markdown
+# Skill Research: [skill name]
+**Date:** YYYY-MM-DD
+**Wazir version:** [commit hash]
+**Competitors analyzed:** [list]
+## Current State
+[Summary of what the Wazir skill does, its enforcement mechanisms, and token cost]
+## Competitor Analysis
+[Side-by-side comparison table]
+## Ratings
+| Dimension | Score | Justification |
+|-----------|-------|---------------|
+| Completeness | X/5 | ... |
+| Enforcement | X/5 | ... |
+| Token efficiency | X/5 | ... |
+| Anti-rationalization | X/5 | ... |
+| **Overall** | **X/20** | |
+## Strengths
+[What Wazir does well]
+## Weaknesses
+[What competitors do better]
+## Recommendations
+| # | Priority | Recommendation | Reasoning |
+|---|----------|---------------|-----------|
+| 1 | high | ... | Based on [competitor] analysis |
+| 2 | medium | ... | ... |
+## Sources
+[URLs and references for all competitor content analyzed]
+```
+### Summary README Template
+```markdown
+# Skill Audit — YYYY-MM-DD
+**Skills analyzed:** N
+**Average score:** X/20
+| Skill | Completeness | Enforcement | Efficiency | Anti-rational | Total |
+|-------|-------------|-------------|------------|--------------|-------|
+| tdd | 4 | 5 | 3 | 4 | 16/20 |
+| debugging | 3 | 3 | 4 | 2 | 12/20 |
+| ... | | | | | |
+## Top Recommendations (cross-skill)
+1. ...
+2. ...
+3. ...
+```
+## Completion
+After all skills are analyzed:
+1. Commit reports in the worktree: `feat(reports): skill audit YYYY-MM-DD`
+2. Present the branch name and summary to the user
+3. Do NOT merge — user reviews and decides what to implement
+4. Do NOT modify any skill files — reports only
+> **Skill research complete.**
+>
+> - Skills analyzed: [N]
+> - Reports: `reports/skill-audit-<date>/` on branch `skill-research-<date>`
+> - Average score: [X]/20
+> - Top recommendations: [list top 3]
+>
+> **Next:** Review reports and decide which recommendations to implement.

package/skills/subagent-driven-development/SKILL.md CHANGED Viewed

@@ -5,6 +5,19 @@ description: Use when executing implementation plans with independent tasks in t
 # Subagent-Driven Development
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.
 **Why subagents:** You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.

package/skills/subagent-driven-development/code-quality-reviewer-prompt.md CHANGED Viewed

@@ -17,6 +17,8 @@ Task tool (wz:code-reviewer):
   DESCRIPTION: [task summary]
 ```
+**Codebase Exploration:** Use wazir index search-symbols before direct file reads. Query `wazir index search-symbols <query>` to locate relevant code, then use `wazir recall file <path> --tier L1` for targeted reads.
 **In addition to standard code quality concerns, the reviewer should check:**
 - Does each file have one clear responsibility with a well-defined interface?
 - Are units decomposed so they can be understood and tested independently?

package/skills/subagent-driven-development/implementer-prompt.md CHANGED Viewed

@@ -26,6 +26,14 @@ Task tool (general-purpose):
     **Ask them now.** Raise any concerns before starting work.
+    ## Codebase Exploration
+    Use wazir index search-symbols before direct file reads.
+    1. Query `wazir index search-symbols <query>` to locate relevant code
+    2. Use `wazir recall file <path> --tier L1` for targeted reads
+    3. Fall back to direct file reads ONLY for files identified by index queries
+    4. If no index exists: `wazir index build && wazir index summarize --tier all`
     ## Your Job
     Once you're clear on requirements:

package/skills/subagent-driven-development/spec-reviewer-prompt.md CHANGED Viewed

@@ -34,6 +34,13 @@ Task tool (general-purpose):
     - Check for missing pieces they claimed to implement
     - Look for extra features they didn't mention
+    ## Codebase Exploration
+    Use wazir index search-symbols before direct file reads.
+    1. Query `wazir index search-symbols <query>` to locate relevant code
+    2. Use `wazir recall file <path> --tier L1` for targeted reads
+    3. Fall back to direct file reads ONLY for files identified by index queries
     ## Your Job
     Read the implementation code and verify:

package/skills/tdd/SKILL.md CHANGED Viewed

@@ -5,6 +5,19 @@ description: Use for implementation work that changes behavior. Follow RED -> GR
 # Test-Driven Development
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 Sequence:
 1. RED

package/skills/using-git-worktrees/SKILL.md CHANGED Viewed

@@ -5,6 +5,19 @@ description: Use when starting feature work that needs isolation from current wo
 # Using Git Worktrees
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 ## Overview
 Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching.

package/skills/using-skills/SKILL.md CHANGED Viewed

@@ -3,6 +3,19 @@ name: wz:using-skills
 description: Use when starting any conversation — establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
 ---
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
 <EXTREMELY_IMPORTANT>
 If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill.

package/skills/verification/SKILL.md CHANGED Viewed

@@ -5,18 +5,69 @@ description: Use before claiming work is complete. Every completion claim needs
 # Verification
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Codebase Exploration
+1. Query `wazir index search-symbols <query>` first
+2. Use `wazir recall file <path> --tier L1` for targeted reads
+3. Fall back to direct file reads ONLY for files identified by index queries
+4. Maximum 10 direct file reads without a justifying index query
+5. If no index exists: `wazir index build && wazir index summarize --tier all`
+## Proof of Implementation
+1. Detect project type: `detectRunnableType(projectRoot)` → web | api | cli | library
+2. Collect evidence: `collectProof(taskSpec, runConfig)`
+3. Save evidence to `.wazir/runs/<id>/artifacts/proof-<task>.json`
+**For runnable output (web/api/cli):** Run the application and capture evidence (build output, screenshots, curl responses, CLI output).
+**For non-runnable output (library/config/skills):** Run lint, format check, type check, and tests. All must pass.
+Evidence collection uses `tooling/src/verify/proof-collector.js`.
+## Verification Requirements
 Every completion claim must include:
 - what was verified
 - the exact command or deterministic check
 - the actual result
-Minimum rule:
+## Proof Collection
+Use `proof-collector` (`tooling/src/verify/proof-collector.js`) for automated evidence gathering:
+1. **`detectRunnableType(projectRoot)`** — detects whether the project is `web`, `api`, `cli`, or `library` from `package.json`. Detection order: `pkg.bin` (cli), web framework deps (web), API framework deps (api), default (library).
+2. **`collectProof(projectRoot, opts?)`** — runs type-appropriate verification commands and returns structured evidence:
+   - **web:** `npm run build` + library checks
+   - **api:** library checks (test, tsc, eslint, prettier)
+   - **cli:** `<bin> --help` + library checks
+   - **library:** `npm test`, `tsc --noEmit`, `eslint .`, `prettier --check .`
+All commands use `execFileSync` (never shell `exec`) for security. Evidence is returned as `{ type, evidence: [{ check, ok, output }] }`.
+## Minimum Rules
 - no success claim without fresh evidence from the current change
+- always use `proof-collector` for Node.js projects to gather deterministic evidence
+- attach the evidence array to the verification proof artifact
 When verification fails:
 - do not mark the work complete
-- fix the issue or report the gap honestly
-- rerun verification after the fix
+- report the gap honestly
+Ask the user via AskUserQuestion:
+- **Question:** "Verification failed for [specific criteria]. How should we proceed?"
+- **Options:**
+  1. "Fix the issue and re-verify" *(Recommended)*
+  2. "Accept partial verification with documented gaps"
+  3. "Abort and review what went wrong"
+Wait for the user's selection before continuing.