@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/CHANGELOG.md +74 -10
  2. package/README.md +15 -15
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/roles-and-workflows.md +2 -0
  9. package/docs/concepts/why-wazir.md +59 -0
  10. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  11. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  12. package/docs/readmes/INDEX.md +21 -5
  13. package/docs/readmes/features/expertise/README.md +2 -2
  14. package/docs/readmes/features/exports/README.md +2 -2
  15. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  16. package/docs/readmes/features/schemas/README.md +3 -0
  17. package/docs/readmes/features/skills/README.md +17 -0
  18. package/docs/readmes/features/skills/clarifier.md +5 -0
  19. package/docs/readmes/features/skills/claude-cli.md +5 -0
  20. package/docs/readmes/features/skills/codex-cli.md +5 -0
  21. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  22. package/docs/readmes/features/skills/executing-plans.md +5 -0
  23. package/docs/readmes/features/skills/executor.md +5 -0
  24. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  25. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  26. package/docs/readmes/features/skills/humanize.md +5 -0
  27. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  28. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  29. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  30. package/docs/readmes/features/skills/reviewer.md +5 -0
  31. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  32. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  33. package/docs/readmes/features/skills/wazir.md +5 -0
  34. package/docs/readmes/features/skills/writing-skills.md +5 -0
  35. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  36. package/docs/reference/configuration-reference.md +47 -6
  37. package/docs/reference/hooks.md +1 -0
  38. package/docs/reference/launch-checklist.md +4 -4
  39. package/docs/reference/review-loop-pattern.md +119 -9
  40. package/docs/reference/roles-reference.md +1 -0
  41. package/docs/reference/skill-tiers.md +147 -0
  42. package/docs/reference/tooling-cli.md +3 -1
  43. package/docs/truth-claims.yaml +12 -0
  44. package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
  45. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  46. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  47. package/exports/hosts/claude/.claude/settings.json +9 -0
  48. package/exports/hosts/claude/CLAUDE.md +1 -1
  49. package/exports/hosts/claude/export.manifest.json +6 -4
  50. package/exports/hosts/claude/host-package.json +3 -1
  51. package/exports/hosts/codex/AGENTS.md +1 -1
  52. package/exports/hosts/codex/export.manifest.json +6 -4
  53. package/exports/hosts/codex/host-package.json +3 -1
  54. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  55. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  56. package/exports/hosts/cursor/export.manifest.json +6 -4
  57. package/exports/hosts/cursor/host-package.json +3 -1
  58. package/exports/hosts/gemini/GEMINI.md +1 -1
  59. package/exports/hosts/gemini/export.manifest.json +6 -4
  60. package/exports/hosts/gemini/host-package.json +3 -1
  61. package/hooks/context-mode-router +191 -0
  62. package/hooks/definitions/context_mode_router.yaml +19 -0
  63. package/hooks/hooks.json +31 -6
  64. package/hooks/protected-path-write-guard +8 -0
  65. package/hooks/routing-matrix.json +45 -0
  66. package/hooks/session-start +62 -1
  67. package/llms-full.txt +937 -134
  68. package/package.json +2 -4
  69. package/schemas/hook.schema.json +2 -1
  70. package/schemas/phase-report.schema.json +89 -0
  71. package/schemas/usage.schema.json +25 -1
  72. package/schemas/wazir-manifest.schema.json +19 -0
  73. package/skills/brainstorming/SKILL.md +32 -157
  74. package/skills/clarifier/SKILL.md +289 -111
  75. package/skills/claude-cli/SKILL.md +320 -0
  76. package/skills/codex-cli/SKILL.md +260 -0
  77. package/skills/debugging/SKILL.md +13 -0
  78. package/skills/design/SKILL.md +13 -0
  79. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  80. package/skills/executing-plans/SKILL.md +13 -0
  81. package/skills/executor/SKILL.md +139 -19
  82. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  83. package/skills/gemini-cli/SKILL.md +260 -0
  84. package/skills/humanize/SKILL.md +13 -0
  85. package/skills/init-pipeline/SKILL.md +72 -164
  86. package/skills/prepare-next/SKILL.md +81 -10
  87. package/skills/receiving-code-review/SKILL.md +13 -0
  88. package/skills/requesting-code-review/SKILL.md +13 -0
  89. package/skills/reviewer/SKILL.md +369 -24
  90. package/skills/run-audit/SKILL.md +13 -0
  91. package/skills/scan-project/SKILL.md +13 -0
  92. package/skills/self-audit/SKILL.md +217 -16
  93. package/skills/skill-research/SKILL.md +188 -0
  94. package/skills/subagent-driven-development/SKILL.md +13 -0
  95. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  96. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  97. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  98. package/skills/tdd/SKILL.md +13 -0
  99. package/skills/using-git-worktrees/SKILL.md +13 -0
  100. package/skills/using-skills/SKILL.md +13 -0
  101. package/skills/verification/SKILL.md +54 -3
  102. package/skills/wazir/SKILL.md +464 -381
  103. package/skills/writing-plans/SKILL.md +14 -1
  104. package/skills/writing-skills/SKILL.md +13 -0
  105. package/templates/artifacts/implementation-plan.md +3 -0
  106. package/templates/artifacts/tasks-template.md +133 -0
  107. package/templates/examples/phase-report.example.json +48 -0
  108. package/tooling/src/adapters/composition-engine.js +256 -0
  109. package/tooling/src/adapters/model-router.js +84 -0
  110. package/tooling/src/capture/command.js +41 -2
  111. package/tooling/src/capture/run-config.js +3 -1
  112. package/tooling/src/capture/store.js +56 -0
  113. package/tooling/src/capture/usage.js +106 -0
  114. package/tooling/src/capture/user-input.js +66 -0
  115. package/tooling/src/checks/ac-matrix.js +256 -0
  116. package/tooling/src/checks/command-registry.js +12 -0
  117. package/tooling/src/checks/docs-truth.js +1 -1
  118. package/tooling/src/checks/security-sensitivity.js +69 -0
  119. package/tooling/src/checks/skills.js +111 -0
  120. package/tooling/src/cli.js +31 -20
  121. package/tooling/src/commands/stats.js +161 -0
  122. package/tooling/src/commands/validate.js +5 -1
  123. package/tooling/src/export/compiler.js +33 -37
  124. package/tooling/src/gating/agent.js +145 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
  126. package/tooling/src/hooks/routing-logic.js +69 -0
  127. package/tooling/src/init/auto-detect.js +258 -0
  128. package/tooling/src/init/command.js +38 -170
  129. package/tooling/src/input/scanner.js +46 -0
  130. package/tooling/src/reports/command.js +103 -0
  131. package/tooling/src/reports/phase-report.js +323 -0
  132. package/tooling/src/state/command.js +160 -0
  133. package/tooling/src/state/db.js +287 -0
  134. package/tooling/src/status/command.js +58 -1
  135. package/tooling/src/verify/proof-collector.js +299 -0
  136. package/wazir.manifest.yaml +26 -14
  137. package/workflows/plan-review.md +3 -1
  138. package/workflows/verify.md +30 -1
@@ -5,16 +5,101 @@ description: Run a self-audit loop in an isolated git worktree — validates, au
5
5
 
6
6
  # Self-Audit — Worktree-Isolated Audit-Fix Loop
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  ## Overview
9
22
 
10
23
  This skill runs a structured self-audit of the Wazir project itself, operating entirely in an isolated git worktree. It validates the project against all canonical checks, performs deeper structural analysis, fixes issues found, verifies the fixes pass, and only merges back on all-green.
11
24
 
12
25
  **Safety guarantee:** The main worktree is never modified until all checks pass in isolation.
13
26
 
27
+ ## Severity Levels
28
+
29
+ Every finding is assigned a severity that determines handling:
30
+
31
+ | Severity | Action | Description |
32
+ |----------|--------|-------------|
33
+ | **critical** | Abort loop | Structural integrity threat — cannot safely continue. Discard worktree. |
34
+ | **high** | Fix now | Must be resolved in this loop before proceeding to verify. |
35
+ | **medium** | Fix if time | Fix within remaining loop budget. Skip if loop cap approaching. |
36
+ | **low** | Log and skip | Record in report. No fix attempted. |
37
+
38
+ Severity assignment rules:
39
+ - Protected-path violation → **critical**
40
+ - Test failure, broken hook, missing manifest entry → **high**
41
+ - Documentation drift, stale export, missing schema → **medium**
42
+ - Style issues, minor inconsistency, cosmetic → **low**
43
+
44
+ ## Quality Scoring
45
+
46
+ Each loop measures a quality score **before** and **after** fixes:
47
+
48
+ ```
49
+ quality_score = (checks_passing / total_checks) * 100
50
+ ```
51
+
52
+ Track per loop:
53
+ - `quality_score_before` — score at start of loop (after Phase 1)
54
+ - `quality_score_after` — score at end of loop (after Phase 4 verify)
55
+ - `delta` — improvement from this loop's fixes
56
+
57
+ **Effectiveness threshold:** If 3 consecutive loops show `delta < 2%`, the audit has converged — skip remaining loops.
58
+
59
+ ## Learning Integration
60
+
61
+ After each audit loop, findings feed the learning pipeline:
62
+
63
+ 1. **Propose learnings:** For each finding category that appeared in this loop:
64
+ - Check `state.sqlite` findings table for the same `finding_hash` in previous runs
65
+ - If `recurrence_count >= 2`: auto-propose a learning to `memory/learnings/proposed/`
66
+ - Learning scope tags: `scope_roles: [executor]`, `scope_concerns: [quality]`
67
+ 2. **Store findings:** Insert all findings into `state.sqlite` via `insertFinding()` with severity and finding_hash
68
+ 3. **Store audit record:** Insert `{run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after}` into `audit_history`
69
+
70
+ ## Trend Tracking
71
+
72
+ Before starting Loop 1, query previous audit results:
73
+
74
+ ```javascript
75
+ const trend = getAuditTrend(db, 5); // last 5 audits
76
+ ```
77
+
78
+ Present trend in the report:
79
+ - Are finding counts trending up or down?
80
+ - Are the same finding_hashes recurring? If so, the fixes aren't preventing recurrence — escalate.
81
+ - Has quality_score improved across runs?
82
+
83
+ ## Escalation Path
84
+
85
+ Manual-required findings that cannot be auto-fixed are escalated:
86
+
87
+ 1. **Within the audit:** Logged in the report with remediation guidance
88
+ 2. **Cross-run recurrence:** If the same manual finding recurs across 3+ audits, escalate to:
89
+ - Create a task spec in `.wazir/runs/latest/tasks/` describing the fix
90
+ - Flag in the audit report as **RECURRING — needs dedicated task**
91
+ 3. **Critical findings:** Immediately logged. If 2+ critical findings in a single loop, abort the entire audit run.
92
+
14
93
  ## Trigger
15
94
 
16
95
  On-demand: operator invokes `/self-audit` or requests a self-audit loop.
17
96
 
97
+ ### Parameters
98
+
99
+ | Flag | Default | Max | Description |
100
+ |------|---------|-----|-------------|
101
+ | `--loops N` | 5 | 10 | Number of audit-fix loops to run. Each loop executes the full Phase 1-5 cycle. If a loop finds 0 new issues, subsequent loops are skipped (convergence detection). |
102
+
18
103
  ## Worktree Isolation Model
19
104
 
20
105
  ```
@@ -45,7 +130,9 @@ node tooling/src/cli.js doctor --json
45
130
  node tooling/src/cli.js export --check
46
131
  ```
47
132
 
48
- Collect pass/fail for each. Any failure is a finding.
133
+ Collect pass/fail for each. Any failure is a finding. Assign severity per the severity table above.
134
+
135
+ Calculate `quality_score_before` from the pass/fail results.
49
136
 
50
137
  ## Phase 2: Deep Structural Audit
51
138
 
@@ -79,10 +166,72 @@ Beyond CLI checks, inspect for:
79
166
  - Each skill dir under `skills/` has a well-formed `SKILL.md` with frontmatter
80
167
  - Skills referenced in documentation actually exist
81
168
 
169
+ 7. **Code Quality**
170
+ - Run `node tooling/src/cli.js validate` (all subcommands) and capture exit codes
171
+ - If `eslint` is present in `package.json` scripts or devDependencies, run `npx eslint .` and capture results
172
+ - If `tsc` is present in `package.json` scripts or devDependencies, run `npx tsc --noEmit` and capture results
173
+ - Tools not found in `package.json` are skipped with a note in the report
174
+
175
+ 8. **Test Coverage**
176
+ - Run `npm test` and capture pass/fail counts from output
177
+ - Any test failure is a finding
178
+
179
+ 9. **Expertise Coverage**
180
+ - Read `expertise/composition-map.yaml`
181
+ - For every module path referenced, check that the file exists under `expertise/`
182
+ - Missing files are findings
183
+
184
+ 10. **Export Freshness**
185
+ - Run `wazir export --check`
186
+ - Any drift detected is a finding
187
+
188
+ 11. **Input Coverage** (run-scoped — only when a run directory exists)
189
+ - Read the original input file(s) from `.wazir/input/` or `.wazir/runs/<id>/sources/`
190
+ - Read the execution plan from `.wazir/runs/<id>/clarified/execution-plan.md`
191
+ - Read the actual commits on the branch: `git log --oneline main..HEAD`
192
+ - Build a coverage matrix: every distinct item in the input should map to:
193
+ - At least one task in the execution plan
194
+ - At least one commit in the git log
195
+ - **Missing items** (in input but not in plan AND not in commits) → **HIGH** severity finding
196
+ - **Partial items** (in plan but no corresponding commit) → **MEDIUM** severity finding
197
+ - **Fully covered items** (input → plan → commit) → pass
198
+ - Output the coverage matrix in the audit report:
199
+ ```
200
+ | Input Item | Plan Task | Commit | Status |
201
+ |------------|-----------|--------|--------|
202
+ | Item 1 | Task 3 | abc123 | PASS |
203
+ | Item 2 | Task 5 | — | PARTIAL|
204
+ | Item 3 | — | — | MISSING|
205
+ ```
206
+ - This dimension catches scope reduction AFTER the fact — a safety net for when the clarifier or planner fails
207
+
208
+ ## Protected-Path Safety Rails
209
+
210
+ Before applying ANY fix in Phase 3, check if the target file is in a protected path. The self-audit loop MUST NOT modify files in:
211
+
212
+ - `skills/`
213
+ - `workflows/`
214
+ - `roles/`
215
+ - `schemas/`
216
+ - `wazir.manifest.yaml`
217
+ - `docs/concepts/`
218
+ - `docs/reference/`
219
+ - `expertise/composition-map.yaml`
220
+ - `docs/plans/`
221
+ - `program.md`
222
+
223
+ If a fix would touch a protected path, log it as a **manual-required** finding and skip. If `git diff --name-only` shows any protected path was modified during a loop iteration, **ABORT** the loop and discard the worktree.
224
+
82
225
  ## Phase 3: Fix
83
226
 
84
- For each finding from Phases 1-2:
227
+ For each finding from Phases 1-2, ordered by severity (critical first):
228
+
229
+ 1. **Critical findings:** Abort immediately. Discard worktree. Report the critical finding.
230
+ 2. **High findings:** Must fix now.
231
+ 3. **Medium findings:** Fix if loop budget allows (remaining loops > 1).
232
+ 4. **Low findings:** Log only. No fix attempted.
85
233
 
234
+ For high/medium findings:
86
235
  1. Categorize as **auto-fixable** or **manual-required**
87
236
  2. Auto-fixable issues: apply the fix directly
88
237
  - Missing files → create stubs or fix references
@@ -90,7 +239,7 @@ For each finding from Phases 1-2:
90
239
  - Documentation drift → update docs to match reality
91
240
  - Permission issues → `chmod +x` hook scripts
92
241
  - Schema formatting → auto-format
93
- 3. Manual-required issues: document in the audit report with remediation guidance
242
+ 3. Manual-required issues: document in the audit report with remediation guidance. Check escalation path for recurrence.
94
243
 
95
244
  **Fix constraints:**
96
245
  - Never modify `input/` (read-only operator surface)
@@ -106,27 +255,61 @@ If any check fails after fixes:
106
255
  - Document the revert and the root cause
107
256
  - Re-verify
108
257
 
109
- ## Phase 5: Report & Commit
258
+ ## Phase 5: Report, Learn & Commit
259
+
260
+ ### Quality Score
261
+
262
+ Re-run Phase 1 checks and calculate `quality_score_after`. Compute delta.
263
+
264
+ ### Learning Extraction
265
+
266
+ 1. Hash each finding description → `finding_hash`
267
+ 2. Store all findings in `state.sqlite` via `insertFinding(db, {run_id, phase: 'self-audit', source: 'self-audit', severity, description, finding_hash})`
268
+ 3. Query `getRecurringFindingHashes(db, 2)` — findings occurring 2+ times across runs
269
+ 4. For each recurring finding not already in `memory/learnings/proposed/`:
270
+ - Write a learning proposal: `memory/learnings/proposed/self-audit-<hash-prefix>.md`
271
+ - Content: what the issue is, how often it recurs, recommended prevention
272
+ 5. Store audit record: `insertAuditRecord(db, {run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after})`
273
+
274
+ ### Report
110
275
 
111
276
  Produce a structured report:
112
277
 
113
278
  ```markdown
114
279
  # Self-Audit Report — Loop N — <date>
115
280
 
281
+ ## Quality Score
282
+ - Before: X% → After: Y% (delta: +Z%)
283
+
284
+ ## Trend (last 5 audits)
285
+ | Run | Date | Findings | Fixes | Quality |
286
+ |-----|------|----------|-------|---------|
287
+ | ... | ... | ... | ... | ... |
288
+
116
289
  ## Validation Sweep
117
- | Check | Before | After |
118
- |-------|--------|-------|
119
- | manifest | PASS/FAIL | PASS |
120
- | hooks | PASS/FAIL | PASS |
121
- | ... | ... | ... |
122
-
123
- ## Findings
124
- ### Auto-Fixed (N)
125
- - [F-001] <description> — fixed by <change>
290
+ | Check | Severity | Before | After |
291
+ |-------|----------|--------|-------|
292
+ | manifest | high | PASS/FAIL | PASS |
293
+ | hooks | high | PASS/FAIL | PASS |
294
+ | ... | ... | ... | ... |
295
+
296
+ ## Findings by Severity
297
+ ### Critical (N) — loop aborted if any
298
+ ### High (N) — fixed
299
+ ### Medium (N) — fixed if budget allowed
300
+ ### Low (N) — logged only
301
+
302
+ ## Auto-Fixed (N)
303
+ - [F-001] [high] <description> — fixed by <change>
304
+ - ...
305
+
306
+ ## Manual Required (N)
307
+ - [M-001] [medium] <description> — remediation: <guidance>
308
+ - [M-002] [medium] <description> — **RECURRING (3x)** — needs dedicated task
126
309
  - ...
127
310
 
128
- ### Manual Required (N)
129
- - [M-001] <description> — remediation: <guidance>
311
+ ## Proposed Learnings (N)
312
+ - <learning-file>: <summary>
130
313
  - ...
131
314
 
132
315
  ## Changes Made
@@ -146,8 +329,26 @@ The worktree agent returns its results. If changes were made, the caller can mer
146
329
 
147
330
  ## Loop Behavior
148
331
 
332
+ Default: **5 loops** (override with `--loops N`, max 10).
333
+
149
334
  When running multiple loops:
150
335
  - Loop 1 audits the current state, fixes what it finds
151
336
  - Loop 2 audits the result of Loop 1, catches anything missed or introduced
152
337
  - Each loop is independent and runs in its own fresh worktree
153
- - Convergence: if Loop N finds 0 issues, the project is clean
338
+ - **Convergence detection:** if Loop N finds **0 new issues** (no new findings beyond what previous loops already reported), all subsequent loops are skipped and the audit terminates early
339
+ - **Effectiveness convergence:** if 3 consecutive loops show `quality_score delta < 2%`, skip remaining loops
340
+ - **Critical abort:** if any loop encounters 2+ critical findings, abort the entire audit run
341
+ - If a loop modifies a protected path (see Protected-Path Safety Rails above), the loop is aborted and the worktree is discarded
342
+
343
+ The final branch is **NOT auto-merged** — it requires human review.
344
+
345
+ ## State Database Integration
346
+
347
+ The self-audit skill requires `state.sqlite` (see `tooling/src/state/db.js`). At audit start:
348
+
349
+ ```javascript
350
+ const { openStateDb, getAuditTrend, insertFinding, insertAuditRecord, getRecurringFindingHashes } = require('../../tooling/src/state/db');
351
+ const db = openStateDb(stateRoot);
352
+ ```
353
+
354
+ All findings are persisted across runs, enabling trend detection and learning extraction.
@@ -0,0 +1,188 @@
1
+ ---
2
+ name: wz:skill-research
3
+ description: Deep competitive analysis of Wazir skills against the ecosystem. Research only — never auto-applies changes.
4
+ ---
5
+
6
+ # Skill Research — Overnight Competitive Analysis
7
+
8
+ Deeply analyze Wazir skills against equivalent skills in other frameworks. Produces comparison reports with ratings and recommendations. **Research only — never modifies skill files.**
9
+
10
+ ## Invocation
11
+
12
+ ```
13
+ /wazir audit skills --all # Analyze all skills
14
+ /wazir audit skills --skill tdd,debugging # Analyze specific skills
15
+ /wazir audit skills --skill executor --deep # Deep analysis of one skill
16
+ ```
17
+
18
+ ## Command Routing
19
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
20
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
21
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
22
+ - If context-mode unavailable, fall back to native Bash with warning
23
+
24
+ ## Isolation
25
+
26
+ This skill MUST run in an isolated git worktree:
27
+
28
+ 1. Create worktree: `git worktree add .worktrees/skill-research-<date> -b skill-research-<date>`
29
+ 2. All report files are written inside the worktree
30
+ 3. Commits contain ONLY report files — never skill changes
31
+ 4. On completion, present the branch for user review
32
+
33
+ ## Per-Skill Research Process
34
+
35
+ For each skill being analyzed:
36
+
37
+ ### Step 1: Read the Wazir Skill
38
+
39
+ Read the full `SKILL.md` for the skill being analyzed. Extract:
40
+ - Purpose and trigger conditions
41
+ - Enforcement mechanisms (hard gates, checks, rules)
42
+ - Anti-rationalization coverage (how does it prevent agents from skipping steps?)
43
+ - Token cost estimate (how many tokens does this skill add to context?)
44
+
45
+ ### Step 2: Research Competitors
46
+
47
+ Fetch and analyze equivalent skills from:
48
+
49
+ 1. **superpowers** — the primary competitor. Fetch the equivalent skill from GitHub.
50
+ 2. **2-3 other frameworks** — depending on the skill type:
51
+ - For TDD: cursor-rules TDD patterns, aider commit conventions
52
+ - For debugging: rubber-duck debugging frameworks, systematic debugging methodologies
53
+ - For planning: software architecture patterns, agile story mapping tools
54
+ - For review: CodeRabbit, GitHub Copilot review, PR review best practices
55
+
56
+ Use `WebFetch` or context-mode `fetch_and_index` to retrieve competitor content.
57
+
58
+ ### Step 3: Side-by-Side Comparison
59
+
60
+ Produce a comparison table:
61
+
62
+ ```markdown
63
+ | Dimension | Wazir | superpowers | Competitor B | Competitor C |
64
+ |-----------|-------|-------------|-------------|-------------|
65
+ | Completeness | ... | ... | ... | ... |
66
+ | Enforcement | ... | ... | ... | ... |
67
+ | Token efficiency | ... | ... | ... | ... |
68
+ | Anti-rationalization | ... | ... | ... | ... |
69
+ ```
70
+
71
+ For each dimension, note:
72
+ - **Wazir strengths** — what Wazir does better
73
+ - **Wazir weaknesses** — what competitors do better
74
+ - **Gaps** — things competitors have that Wazir lacks entirely
75
+
76
+ ### Step 4: Rate
77
+
78
+ Rate each skill on 4 dimensions (1-5 scale):
79
+
80
+ 1. **Completeness** (1-5) — Does the skill cover all necessary cases? Are there gaps in the workflow?
81
+ 2. **Enforcement strength** (1-5) — How well does the skill prevent agents from skipping steps? Are there hard gates or just suggestions?
82
+ 3. **Token efficiency** (1-5) — How concise is the skill? Could it achieve the same enforcement with fewer tokens?
83
+ 4. **Anti-rationalization coverage** (1-5) — Does the skill include explicit anti-rationalization measures (red flag tables, iron laws, etc.)?
84
+
85
+ Each rating must include a 1-2 sentence justification.
86
+
87
+ ### Step 5: Recommend
88
+
89
+ For each skill, produce specific, actionable recommendations:
90
+
91
+ - What to add (with reasoning from competitor analysis)
92
+ - What to remove (token bloat without enforcement value)
93
+ - What to restructure (better organization for the same content)
94
+ - Priority: high / medium / low
95
+
96
+ **Recommendations are NEVER auto-applied.** They go in the report for human review.
97
+
98
+ ## Output Format
99
+
100
+ Reports saved to `reports/skill-audit-<YYYY-MM-DD>/`:
101
+
102
+ ```
103
+ reports/skill-audit-2026-03-20/
104
+ ├── README.md # Summary with aggregate ratings
105
+ ├── skill-tdd.md # Per-skill report
106
+ ├── skill-debugging.md
107
+ ├── skill-executor.md
108
+ └── ...
109
+ ```
110
+
111
+ ### Per-Skill Report Template
112
+
113
+ ```markdown
114
+ # Skill Research: [skill name]
115
+
116
+ **Date:** YYYY-MM-DD
117
+ **Wazir version:** [commit hash]
118
+ **Competitors analyzed:** [list]
119
+
120
+ ## Current State
121
+ [Summary of what the Wazir skill does, its enforcement mechanisms, and token cost]
122
+
123
+ ## Competitor Analysis
124
+ [Side-by-side comparison table]
125
+
126
+ ## Ratings
127
+
128
+ | Dimension | Score | Justification |
129
+ |-----------|-------|---------------|
130
+ | Completeness | X/5 | ... |
131
+ | Enforcement | X/5 | ... |
132
+ | Token efficiency | X/5 | ... |
133
+ | Anti-rationalization | X/5 | ... |
134
+ | **Overall** | **X/20** | |
135
+
136
+ ## Strengths
137
+ [What Wazir does well]
138
+
139
+ ## Weaknesses
140
+ [What competitors do better]
141
+
142
+ ## Recommendations
143
+ | # | Priority | Recommendation | Reasoning |
144
+ |---|----------|---------------|-----------|
145
+ | 1 | high | ... | Based on [competitor] analysis |
146
+ | 2 | medium | ... | ... |
147
+
148
+ ## Sources
149
+ [URLs and references for all competitor content analyzed]
150
+ ```
151
+
152
+ ### Summary README Template
153
+
154
+ ```markdown
155
+ # Skill Audit — YYYY-MM-DD
156
+
157
+ **Skills analyzed:** N
158
+ **Average score:** X/20
159
+
160
+ | Skill | Completeness | Enforcement | Efficiency | Anti-rational | Total |
161
+ |-------|-------------|-------------|------------|--------------|-------|
162
+ | tdd | 4 | 5 | 3 | 4 | 16/20 |
163
+ | debugging | 3 | 3 | 4 | 2 | 12/20 |
164
+ | ... | | | | | |
165
+
166
+ ## Top Recommendations (cross-skill)
167
+ 1. ...
168
+ 2. ...
169
+ 3. ...
170
+ ```
171
+
172
+ ## Completion
173
+
174
+ After all skills are analyzed:
175
+
176
+ 1. Commit reports in the worktree: `feat(reports): skill audit YYYY-MM-DD`
177
+ 2. Present the branch name and summary to the user
178
+ 3. Do NOT merge — user reviews and decides what to implement
179
+ 4. Do NOT modify any skill files — reports only
180
+
181
+ > **Skill research complete.**
182
+ >
183
+ > - Skills analyzed: [N]
184
+ > - Reports: `reports/skill-audit-<date>/` on branch `skill-research-<date>`
185
+ > - Average score: [X]/20
186
+ > - Top recommendations: [list top 3]
187
+ >
188
+ > **Next:** Review reports and decide which recommendations to implement.
@@ -5,6 +5,19 @@ description: Use when executing implementation plans with independent tasks in t
5
5
 
6
6
  # Subagent-Driven Development
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.
9
22
 
10
23
  **Why subagents:** You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.
@@ -17,6 +17,8 @@ Task tool (wz:code-reviewer):
17
17
  DESCRIPTION: [task summary]
18
18
  ```
19
19
 
20
+ **Codebase Exploration:** Use wazir index search-symbols before direct file reads. Query `wazir index search-symbols <query>` to locate relevant code, then use `wazir recall file <path> --tier L1` for targeted reads.
21
+
20
22
  **In addition to standard code quality concerns, the reviewer should check:**
21
23
  - Does each file have one clear responsibility with a well-defined interface?
22
24
  - Are units decomposed so they can be understood and tested independently?
@@ -26,6 +26,14 @@ Task tool (general-purpose):
26
26
 
27
27
  **Ask them now.** Raise any concerns before starting work.
28
28
 
29
+ ## Codebase Exploration
30
+
31
+ Use wazir index search-symbols before direct file reads.
32
+ 1. Query `wazir index search-symbols <query>` to locate relevant code
33
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
34
+ 3. Fall back to direct file reads ONLY for files identified by index queries
35
+ 4. If no index exists: `wazir index build && wazir index summarize --tier all`
36
+
29
37
  ## Your Job
30
38
 
31
39
  Once you're clear on requirements:
@@ -34,6 +34,13 @@ Task tool (general-purpose):
34
34
  - Check for missing pieces they claimed to implement
35
35
  - Look for extra features they didn't mention
36
36
 
37
+ ## Codebase Exploration
38
+
39
+ Use wazir index search-symbols before direct file reads.
40
+ 1. Query `wazir index search-symbols <query>` to locate relevant code
41
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
42
+ 3. Fall back to direct file reads ONLY for files identified by index queries
43
+
37
44
  ## Your Job
38
45
 
39
46
  Read the implementation code and verify:
@@ -5,6 +5,19 @@ description: Use for implementation work that changes behavior. Follow RED -> GR
5
5
 
6
6
  # Test-Driven Development
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  Sequence:
9
22
 
10
23
  1. RED
@@ -5,6 +5,19 @@ description: Use when starting feature work that needs isolation from current wo
5
5
 
6
6
  # Using Git Worktrees
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  ## Overview
9
22
 
10
23
  Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching.
@@ -3,6 +3,19 @@ name: wz:using-skills
3
3
  description: Use when starting any conversation — establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
4
4
  ---
5
5
 
6
+ ## Command Routing
7
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
8
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
9
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
10
+ - If context-mode unavailable, fall back to native Bash with warning
11
+
12
+ ## Codebase Exploration
13
+ 1. Query `wazir index search-symbols <query>` first
14
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
15
+ 3. Fall back to direct file reads ONLY for files identified by index queries
16
+ 4. Maximum 10 direct file reads without a justifying index query
17
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
18
+
6
19
  <EXTREMELY_IMPORTANT>
7
20
  If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill.
8
21
 
@@ -5,18 +5,69 @@ description: Use before claiming work is complete. Every completion claim needs
5
5
 
6
6
  # Verification
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
21
+ ## Proof of Implementation
22
+
23
+ 1. Detect project type: `detectRunnableType(projectRoot)` → web | api | cli | library
24
+ 2. Collect evidence: `collectProof(taskSpec, runConfig)`
25
+ 3. Save evidence to `.wazir/runs/<id>/artifacts/proof-<task>.json`
26
+
27
+ **For runnable output (web/api/cli):** Run the application and capture evidence (build output, screenshots, curl responses, CLI output).
28
+
29
+ **For non-runnable output (library/config/skills):** Run lint, format check, type check, and tests. All must pass.
30
+
31
+ Evidence collection uses `tooling/src/verify/proof-collector.js`.
32
+
33
+ ## Verification Requirements
34
+
8
35
  Every completion claim must include:
9
36
 
10
37
  - what was verified
11
38
  - the exact command or deterministic check
12
39
  - the actual result
13
40
 
14
- Minimum rule:
41
+ ## Proof Collection
42
+
43
+ Use `proof-collector` (`tooling/src/verify/proof-collector.js`) for automated evidence gathering:
44
+
45
+ 1. **`detectRunnableType(projectRoot)`** — detects whether the project is `web`, `api`, `cli`, or `library` from `package.json`. Detection order: `pkg.bin` (cli), web framework deps (web), API framework deps (api), default (library).
46
+
47
+ 2. **`collectProof(projectRoot, opts?)`** — runs type-appropriate verification commands and returns structured evidence:
48
+ - **web:** `npm run build` + library checks
49
+ - **api:** library checks (test, tsc, eslint, prettier)
50
+ - **cli:** `<bin> --help` + library checks
51
+ - **library:** `npm test`, `tsc --noEmit`, `eslint .`, `prettier --check .`
52
+
53
+ All commands use `execFileSync` (never shell `exec`) for security. Evidence is returned as `{ type, evidence: [{ check, ok, output }] }`.
54
+
55
+ ## Minimum Rules
15
56
 
16
57
  - no success claim without fresh evidence from the current change
58
+ - always use `proof-collector` for Node.js projects to gather deterministic evidence
59
+ - attach the evidence array to the verification proof artifact
17
60
 
18
61
  When verification fails:
19
62
 
20
63
  - do not mark the work complete
21
- - fix the issue or report the gap honestly
22
- - rerun verification after the fix
64
+ - report the gap honestly
65
+
66
+ Ask the user via AskUserQuestion:
67
+ - **Question:** "Verification failed for [specific criteria]. How should we proceed?"
68
+ - **Options:**
69
+ 1. "Fix the issue and re-verify" *(Recommended)*
70
+ 2. "Accept partial verification with documented gaps"
71
+ 3. "Abort and review what went wrong"
72
+
73
+ Wait for the user's selection before continuing.