@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/CHANGELOG.md +74 -10
  2. package/README.md +15 -15
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/roles-and-workflows.md +2 -0
  9. package/docs/concepts/why-wazir.md +59 -0
  10. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  11. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  12. package/docs/readmes/INDEX.md +21 -5
  13. package/docs/readmes/features/expertise/README.md +2 -2
  14. package/docs/readmes/features/exports/README.md +2 -2
  15. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  16. package/docs/readmes/features/schemas/README.md +3 -0
  17. package/docs/readmes/features/skills/README.md +17 -0
  18. package/docs/readmes/features/skills/clarifier.md +5 -0
  19. package/docs/readmes/features/skills/claude-cli.md +5 -0
  20. package/docs/readmes/features/skills/codex-cli.md +5 -0
  21. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  22. package/docs/readmes/features/skills/executing-plans.md +5 -0
  23. package/docs/readmes/features/skills/executor.md +5 -0
  24. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  25. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  26. package/docs/readmes/features/skills/humanize.md +5 -0
  27. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  28. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  29. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  30. package/docs/readmes/features/skills/reviewer.md +5 -0
  31. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  32. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  33. package/docs/readmes/features/skills/wazir.md +5 -0
  34. package/docs/readmes/features/skills/writing-skills.md +5 -0
  35. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  36. package/docs/reference/configuration-reference.md +47 -6
  37. package/docs/reference/hooks.md +1 -0
  38. package/docs/reference/launch-checklist.md +4 -4
  39. package/docs/reference/review-loop-pattern.md +119 -9
  40. package/docs/reference/roles-reference.md +1 -0
  41. package/docs/reference/skill-tiers.md +147 -0
  42. package/docs/reference/tooling-cli.md +3 -1
  43. package/docs/truth-claims.yaml +12 -0
  44. package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
  45. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  46. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  47. package/exports/hosts/claude/.claude/settings.json +9 -0
  48. package/exports/hosts/claude/CLAUDE.md +1 -1
  49. package/exports/hosts/claude/export.manifest.json +6 -4
  50. package/exports/hosts/claude/host-package.json +3 -1
  51. package/exports/hosts/codex/AGENTS.md +1 -1
  52. package/exports/hosts/codex/export.manifest.json +6 -4
  53. package/exports/hosts/codex/host-package.json +3 -1
  54. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  55. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  56. package/exports/hosts/cursor/export.manifest.json +6 -4
  57. package/exports/hosts/cursor/host-package.json +3 -1
  58. package/exports/hosts/gemini/GEMINI.md +1 -1
  59. package/exports/hosts/gemini/export.manifest.json +6 -4
  60. package/exports/hosts/gemini/host-package.json +3 -1
  61. package/hooks/context-mode-router +191 -0
  62. package/hooks/definitions/context_mode_router.yaml +19 -0
  63. package/hooks/hooks.json +31 -6
  64. package/hooks/protected-path-write-guard +8 -0
  65. package/hooks/routing-matrix.json +45 -0
  66. package/hooks/session-start +62 -1
  67. package/llms-full.txt +937 -134
  68. package/package.json +2 -4
  69. package/schemas/hook.schema.json +2 -1
  70. package/schemas/phase-report.schema.json +89 -0
  71. package/schemas/usage.schema.json +25 -1
  72. package/schemas/wazir-manifest.schema.json +19 -0
  73. package/skills/brainstorming/SKILL.md +32 -157
  74. package/skills/clarifier/SKILL.md +289 -111
  75. package/skills/claude-cli/SKILL.md +320 -0
  76. package/skills/codex-cli/SKILL.md +260 -0
  77. package/skills/debugging/SKILL.md +13 -0
  78. package/skills/design/SKILL.md +13 -0
  79. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  80. package/skills/executing-plans/SKILL.md +13 -0
  81. package/skills/executor/SKILL.md +139 -19
  82. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  83. package/skills/gemini-cli/SKILL.md +260 -0
  84. package/skills/humanize/SKILL.md +13 -0
  85. package/skills/init-pipeline/SKILL.md +72 -164
  86. package/skills/prepare-next/SKILL.md +81 -10
  87. package/skills/receiving-code-review/SKILL.md +13 -0
  88. package/skills/requesting-code-review/SKILL.md +13 -0
  89. package/skills/reviewer/SKILL.md +369 -24
  90. package/skills/run-audit/SKILL.md +13 -0
  91. package/skills/scan-project/SKILL.md +13 -0
  92. package/skills/self-audit/SKILL.md +217 -16
  93. package/skills/skill-research/SKILL.md +188 -0
  94. package/skills/subagent-driven-development/SKILL.md +13 -0
  95. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  96. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  97. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  98. package/skills/tdd/SKILL.md +13 -0
  99. package/skills/using-git-worktrees/SKILL.md +13 -0
  100. package/skills/using-skills/SKILL.md +13 -0
  101. package/skills/verification/SKILL.md +54 -3
  102. package/skills/wazir/SKILL.md +464 -381
  103. package/skills/writing-plans/SKILL.md +14 -1
  104. package/skills/writing-skills/SKILL.md +13 -0
  105. package/templates/artifacts/implementation-plan.md +3 -0
  106. package/templates/artifacts/tasks-template.md +133 -0
  107. package/templates/examples/phase-report.example.json +48 -0
  108. package/tooling/src/adapters/composition-engine.js +256 -0
  109. package/tooling/src/adapters/model-router.js +84 -0
  110. package/tooling/src/capture/command.js +41 -2
  111. package/tooling/src/capture/run-config.js +3 -1
  112. package/tooling/src/capture/store.js +56 -0
  113. package/tooling/src/capture/usage.js +106 -0
  114. package/tooling/src/capture/user-input.js +66 -0
  115. package/tooling/src/checks/ac-matrix.js +256 -0
  116. package/tooling/src/checks/command-registry.js +12 -0
  117. package/tooling/src/checks/docs-truth.js +1 -1
  118. package/tooling/src/checks/security-sensitivity.js +69 -0
  119. package/tooling/src/checks/skills.js +111 -0
  120. package/tooling/src/cli.js +31 -20
  121. package/tooling/src/commands/stats.js +161 -0
  122. package/tooling/src/commands/validate.js +5 -1
  123. package/tooling/src/export/compiler.js +33 -37
  124. package/tooling/src/gating/agent.js +145 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
  126. package/tooling/src/hooks/routing-logic.js +69 -0
  127. package/tooling/src/init/auto-detect.js +258 -0
  128. package/tooling/src/init/command.js +38 -170
  129. package/tooling/src/input/scanner.js +46 -0
  130. package/tooling/src/reports/command.js +103 -0
  131. package/tooling/src/reports/phase-report.js +323 -0
  132. package/tooling/src/state/command.js +160 -0
  133. package/tooling/src/state/db.js +287 -0
  134. package/tooling/src/status/command.js +58 -1
  135. package/tooling/src/verify/proof-collector.js +299 -0
  136. package/wazir.manifest.yaml +26 -14
  137. package/workflows/plan-review.md +3 -1
  138. package/workflows/verify.md +30 -1
@@ -15,6 +15,7 @@ These hook definitions are product contracts first. Host-specific native hooks o
15
15
  | `stop_handoff_harvest` | Persist final handoff and stop-time observability data | capture |
16
16
  | `protected_path_write_guard` | Block writes to protected canonical paths outside approved flows | block |
17
17
  | `loop_cap_guard` | Block extra iterations after the configured loop cap | block |
18
+ | `context_mode_router` | Route large command output through context-mode tools to avoid flooding model context | warn |
18
19
 
19
20
  ## Source of truth
20
21
 
@@ -26,7 +26,7 @@ Submit pull requests to these curated lists (one PR per list, follow each repo's
26
26
  ### awesome-claude-code
27
27
  - **Repo:** `github.com/anthropics/awesome-claude-code` (or the most-starred community fork)
28
28
  - **Section:** Tools / Plugins / Extensions
29
- - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 14 phases, and 308 expertise modules.`
29
+ - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 4 phases (15 workflows), and 315 expertise modules.`
30
30
  - **Tips:** Keep the description under 120 characters. Link directly to the repo.
31
31
 
32
32
  ### awesome-ai-agents
@@ -56,7 +56,7 @@ Show HN: Wazir – Engineering OS kit for AI coding agents (Claude, Codex, Gemin
56
56
  ### First comment
57
57
  Post a comment immediately after submission explaining:
58
58
  1. What problem Wazir solves (AI agents lack structured engineering workflows)
59
- 2. How it works (10 canonical roles, 14-phase pipeline, 308 expertise modules)
59
+ 2. How it works (10 canonical roles, 15-workflow pipeline, 315 expertise modules)
60
60
  3. What makes it different (host-native, works across Claude/Codex/Gemini/Cursor)
61
61
  4. Quick install: `npx @wazir-dev/cli init`
62
62
  5. Invite feedback -- HN readers appreciate genuine requests for input
@@ -75,7 +75,7 @@ Post a comment immediately after submission explaining:
75
75
  **Title:** "How I Built an Engineering OS for AI Coding Agents"
76
76
 
77
77
  1. **Hook** -- The problem: AI agents write code but lack engineering discipline.
78
- 2. **Architecture overview** -- 10 roles, 14 phases, expertise modules, quality gates.
78
+ 2. **Architecture overview** -- 10 roles, 4 phases (15 workflows), expertise modules, quality gates.
79
79
  3. **Code walkthrough** -- Show a real workflow: how a feature moves from requirements through TDD to deployment.
80
80
  4. **Host-native approach** -- Explain why one kit works across Claude, Codex, Gemini, and Cursor.
81
81
  5. **Results** -- Concrete metrics or before/after comparisons.
@@ -100,7 +100,7 @@ Structure as a 5-7 tweet thread:
100
100
 
101
101
  1. **Hook tweet:** One-liner about the problem + link to repo.
102
102
  2. **What it is:** Brief description of Wazir.
103
- 3. **Architecture:** 10 roles, 14 phases, 308 modules (include a diagram image).
103
+ 3. **Architecture:** 10 roles, 4 phases (15 workflows), 315 modules (include a diagram image).
104
104
  4. **Demo:** Short GIF or screenshot of a workflow in action.
105
105
  5. **Multi-host:** Works with Claude, Codex, Gemini, and Cursor.
106
106
  6. **Install:** `npx @wazir-dev/cli init`
@@ -134,10 +134,25 @@ review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
134
134
  log(pass_number+1, dimension, findings) -> log_path
135
135
 
136
136
  if findings.has_issues:
137
- # --- Fix inline, do NOT return ---
137
+ # --- Fix and re-submit (MANDATORY) ---
138
+ # The producer MUST fix findings and the reviewer MUST re-review.
139
+ # "Fix and continue without re-review" is EXPLICITLY PROHIBITED.
138
140
  producer_fix(artifact_path, findings)
139
141
  # Continue to next pass -- the fix will be re-reviewed
140
142
 
143
+ # --- Post-loop: escalation if issues remain ---
144
+ if remaining.has_issues:
145
+ # Cap reached with unresolved findings. Present to user:
146
+ # 1. Approve with known issues (Recommended if non-blocking)
147
+ # 2. Fix manually and re-run
148
+ # 3. Abort
149
+ escalate_to_user(remaining, options=[
150
+ "approve-with-issues",
151
+ "fix-manually-and-rerun",
152
+ "abort"
153
+ ])
154
+ # User decides. If approved, log "user-approved-with-issues" in final pass file.
155
+
141
156
  return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
142
157
  ```
143
158
 
@@ -278,7 +293,7 @@ Matches canonical `workflows/design-review.md`:
278
293
  4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
279
294
  5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
280
295
 
281
- ### Plan Dimensions (7)
296
+ ### Plan Dimensions (8)
282
297
 
283
298
  1. **Completeness** -- all design decisions mapped to tasks
284
299
  2. **Ordering** -- dependencies correct, parallelizable identified
@@ -287,6 +302,7 @@ Matches canonical `workflows/design-review.md`:
287
302
  5. **Edge cases** -- error paths covered
288
303
  6. **Security** -- auth, injection, data exposure
289
304
  7. **Integration** -- tasks connect end-to-end
305
+ 8. **Input Coverage** -- every distinct item in the original input maps to at least one task. If `tasks < input items`, HIGH finding listing missing items
290
306
 
291
307
  ### Task Execution Dimensions (5)
292
308
 
@@ -328,10 +344,11 @@ Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7
328
344
 
329
345
  ## Loop Cap Configuration
330
346
 
331
- The `phase_policy` section of `run-config.yaml` controls which phases are enabled and sets an absolute safety ceiling per phase. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not phase policy.
347
+ The `workflow_policy` section of `run-config.yaml` (legacy: `phase_policy`) controls which workflows are enabled and sets an absolute safety ceiling per workflow. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not workflow policy.
332
348
 
333
349
  ```yaml
334
- phase_policy:
350
+ workflow_policy:
351
+ # Clarifier phase workflows
335
352
  discover: { enabled: true, loop_cap: 10 }
336
353
  clarify: { enabled: true, loop_cap: 10 }
337
354
  specify: { enabled: true, loop_cap: 10 }
@@ -341,21 +358,24 @@ phase_policy:
341
358
  design-review: { enabled: true, loop_cap: 10 }
342
359
  plan: { enabled: true, loop_cap: 10 }
343
360
  plan-review: { enabled: true, loop_cap: 10 }
361
+ # Executor phase workflows
344
362
  execute: { enabled: true, loop_cap: 10 }
345
363
  verify: { enabled: true, loop_cap: 5 }
346
364
  review: { enabled: true, loop_cap: 10 }
347
- learn: { enabled: false, loop_cap: 5 }
348
- prepare_next: { enabled: false, loop_cap: 5 }
365
+ learn: { enabled: true, loop_cap: 5 }
366
+ prepare_next: { enabled: true, loop_cap: 5 }
349
367
  run_audit: { enabled: false, loop_cap: 10 }
350
368
  ```
351
369
 
352
370
  **`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
353
371
 
354
- **Adaptive phases** (`author`, `learn`, `prepare_next`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection. They do not participate in the standard review loop pattern because:
372
+ **Adaptive workflows** (`author`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection.
373
+
374
+ **Post-run workflows** (`learn`, `prepare_next`) default to `enabled: true`. They run as part of the Final Review phase:
355
375
 
376
+ - `learn` extracts durable learnings from review findings -- recurring findings become accepted learnings.
377
+ - `prepare_next` prepares context and handoff for the next run.
356
378
  - `author` has a human approval gate, not an iterative review loop.
357
- - `learn` extracts learnings from the completed run -- it is post-execution housekeeping.
358
- - `prepare_next` prepares context for the next run -- it is a handoff phase.
359
379
  - `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
360
380
 
361
381
  ---
@@ -427,3 +447,93 @@ Do NOT load or invoke any skills."
427
447
 
428
448
  For committed changes, replace `--uncommitted` with `--base <sha>`.
429
449
  Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
450
+
451
+ ---
452
+
453
+ ## Codex Output Context Protection
454
+
455
+ Codex CLI output includes internal traces (file reads, tool calls, reasoning) that are NOT useful for the review — only the final findings matter. To prevent context flooding:
456
+
457
+ ### Tee + Extract Pattern
458
+
459
+ 1. **Always tee** Codex output to a file:
460
+ ```bash
461
+ codex exec ... 2>&1 | tee .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
462
+ ```
463
+
464
+ 2. **Extract findings** after the last `codex` marker using `execute_file`:
465
+ ```bash
466
+ # If context-mode available (has_execute_file: true):
467
+ mcp__plugin_context-mode_context-mode__execute_file(
468
+ path: ".wazir/runs/latest/reviews/<phase>-review-pass-<N>.md",
469
+ language: "shell",
470
+ code: "tac $FILE | sed '/^codex$/q' | tac | tail -n +2"
471
+ )
472
+ ```
473
+
474
+ 3. **Present extracted findings only** — the raw trace stays in the file for debugging but never enters the main context window.
475
+
476
+ ### Fallback (no context-mode)
477
+
478
+ If `context_mode.has_execute_file` is false, extract using shell directly:
479
+
480
+ ```bash
481
+ tac <file> | sed '/^codex$/q' | tac | tail -n +2
482
+ ```
483
+
484
+ This reverses the file, finds the first (= last original) `codex` marker, reverses back, and skips the marker line.
485
+
486
+ **If no marker found:** fail closed
487
+
488
+ ---
489
+
490
+ ## Phase Scoring: First vs Final Artifact Comparison
491
+
492
+ At the start of each review loop (pass 1), score the artifact on its phase's canonical dimension set (1-10 per dimension). At the end of the loop (final pass), score again using the **same canonical dimensions**. Present the delta in the end-of-phase report.
493
+
494
+ ### Canonical Dimension Sets Per Phase
495
+
496
+ These are the fixed rubrics — no ad-hoc dimension selection:
497
+
498
+ | Phase | Canonical Dimensions |
499
+ |-------|---------------------|
500
+ | research-review | Coverage, Source quality, Relevance, Gaps identified, Actionability |
501
+ | clarification-review / spec-challenge | Completeness, Testability, Ambiguity, Assumptions, Scope creep |
502
+ | design-review | Spec coverage, Design-spec consistency, Accessibility, Visual consistency, Exported-code fidelity |
503
+ | plan-review | Completeness, Testability, Task granularity, Dependency correctness, Phase structure, File coverage, Estimation accuracy, Input coverage |
504
+ | task-review | Correctness, Tests, Wiring, Drift, Quality |
505
+ | final | Correctness, Completeness, Wiring, Verification, Drift, Quality, Documentation |
506
+
507
+ ### Scoring Rules
508
+
509
+ 1. Initial and final scores MUST use the **same dimension set** — the delta is only meaningful on the same rubric.
510
+ 2. The reviewer records which dimension set was used in each pass file.
511
+ 3. Delta format: `Dimension: X/10 → Y/10 (+Z)`.
512
+
513
+ ### Quality Delta Report Section
514
+
515
+ The end-of-phase report (see "End-of-Phase Report" below) includes a **Quality Delta** section:
516
+
517
+ ```markdown
518
+ ## Quality Delta
519
+
520
+ | Dimension | Initial | Final | Delta |
521
+ |-----------|---------|-------|-------|
522
+ | Completeness | 4/10 | 9/10 | +5 |
523
+ | Testability | 3/10 | 8/10 | +5 |
524
+ | Ambiguity | 5/10 | 9/10 | +4 |
525
+ ```
526
+
527
+ ---
528
+
529
+ ## End-of-Phase Report
530
+
531
+ Every phase exit produces a report saved to `.wazir/runs/latest/reviews/<phase>-report.md` containing:
532
+
533
+ 1. **Summary** — what the phase produced
534
+ 2. **Key Changes** — first-version vs final-version highlights (not full diff — what improved)
535
+ 3. **Quality Delta** — per-dimension before/after scores (see Phase Scoring above)
536
+ 4. **Findings Log** — per-pass finding counts by severity (e.g., "Pass 1: 6 findings (3 blocking, 2 warning, 1 note). Pass 7: 0 findings. All resolved.")
537
+ 5. **Usage** — token usage from `wazir capture usage` (runs before report generation)
538
+ 6. **Context Savings** — context-mode stats if available, omit section if not
539
+ 7. **Time Spent** — wall-clock elapsed time from phase start to end — log "codex marker not found in output, cannot extract findings" and present a warning to the user with 0 findings extracted. The raw file is preserved for manual review. Do NOT fall back to `tail` or any best-effort extraction that could leak traces into context.
@@ -35,6 +35,7 @@ This is the lookup reference for canonical roles, workflows, and their contracts
35
35
  | `review` | `verify` | Adversarial quality review |
36
36
  | `learn` | `review` | Capture scoped learnings |
37
37
  | `prepare-next` | `learn` | Produce clean next-run handoff |
38
+ | `run-audit` | (standalone) | Structured codebase audit with source-backed findings |
38
39
 
39
40
  ## Role routing valid values
40
41
 
@@ -0,0 +1,147 @@
1
+ # Skill Tier Classification
2
+
3
+ Audit of Wazir skills against Superpowers v4.3.1 skills.
4
+ Each skill is classified into one of three tiers:
5
+
6
+ - **Delegate** -- use superpowers skill as-is, delete Wazir fork
7
+ - **Augment** -- use superpowers skill + inject Wazir context addendum (strictly additive, no overrides). **NOTE:** R2 validation found this tier is not implementable -- see [Augment Mechanism](#augment-mechanism) below.
8
+ - **Own** -- Wazir-original or structurally rewritten skill, rename to `wz:` prefix
9
+
10
+ ---
11
+
12
+ ## Classification Table
13
+
14
+ | Wazir Skill | Superpowers Equivalent | Tier | Rationale | Risk Notes |
15
+ |---|---|---|---|---|
16
+ | brainstorming | brainstorming | **Own** | Structurally rewritten. Superpowers version is a linear checklist (explore context, ask questions, propose approaches, present design, write doc, invoke writing-plans). Wazir replaces the entire process: adds Command Routing and Codebase Exploration preambles, replaces the design-doc step with a design-review loop (`--mode design-review` with canonical dimensions), and outputs to `.wazir/runs/latest/clarified/design.md` instead of `docs/plans/`. None of the superpowers process steps survive intact. | -- |
17
+ | clarifier | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
18
+ | debugging | systematic-debugging | **Own** | Structurally rewritten. Superpowers has a 4-phase process (Root Cause Investigation with 5 substeps, Pattern Analysis, Hypothesis and Testing, Implementation) totaling ~300 lines with detailed examples, rationalization tables, and supporting technique references. Wazir condenses this to a 4-step observe-hypothesize-test-fix loop (~75 lines), replaces all codebase exploration with Wazir CLI symbol-first exploration (`wazir index search-symbols`, `wazir recall symbol` and `wazir recall file`), adds loop cap awareness (pipeline mode with `wazir capture loop-check` vs. standalone mode), and removes all superpowers examples, rationalization tables, and red-flag lists. The methodology is fundamentally different in structure despite sharing the spirit of "root cause first." | Delegating would lose Wazir CLI integration and loop cap awareness. Superpowers version is far more detailed on anti-patterns and may be worth referencing separately. |
19
+ | design | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
20
+ | dispatching-parallel-agents | dispatching-parallel-agents | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core (When to Use decision tree, The Pattern with 4 steps, Agent Prompt Structure, Common Mistakes section) plus Wazir additions (Command Routing preamble, Codebase Exploration preamble, philosophical paragraph in Overview, Problem/Fix format for Common Mistakes). Drops superpowers-only sections: "When NOT to Use," "Real Example from Session," "Key Benefits," "Verification," "Real-World Impact." | Superpowers informational sections (Real Example, Key Benefits, Verification, Real-World Impact) not carried forward. Low risk -- these are teaching content, not behavioral. |
21
+ | executing-plans | executing-plans | **Own** | Structurally rewritten. Superpowers uses batch execution (default first 3 tasks) with report-and-wait checkpoints and explicit batch feedback loops. Wazir replaces batching with per-task execution, adds a per-task review loop (`--mode task-review` with 5 task-execution dimensions, Codex integration, review log filenames, loop cap tracking via `wazir capture loop-check`), adds standalone vs. pipeline mode detection, and adds a note recommending wz:subagent-driven-development when subagents are available. The batch-vs-per-task change is a core behavioral difference. All integration references point to `wz:` skills. | Delegating would lose per-task review loops and pipeline mode integration. |
22
+ | executor | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
23
+ | finishing-a-development-branch | finishing-a-development-branch | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers process (5 steps: verify tests, determine base branch, present 4 options, execute choice, cleanup worktree) preserved with identical structure and identical option semantics. Wazir adds Command Routing and Codebase Exploration preambles. Minor cosmetic changes: `<N>` removed from failure template, `<base-branch>` shortened to `<base>`, emoji checkmarks replaced with Y/-, `<commit-list>` changed to `<count>`, PR body simplified. Red Flags and Integration sections trimmed but no behavioral contradiction. | Low risk. The superpowers version has more detailed Red Flags and Integration sections not carried forward. |
24
+ | humanize | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
25
+ | init-pipeline | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
26
+ | prepare-next | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
27
+ | receiving-code-review | receiving-code-review | **Own** | Structurally rewritten. Superpowers has extensive sections: Forbidden Responses, Source-Specific Handling, YAGNI Check, Implementation Order, When To Push Back, Acknowledging Correct Feedback (with detailed anti-patterns for gratitude), Gracefully Correcting Pushback, Common Mistakes table, Real Examples, and GitHub Thread Replies. Wazir preserves the core Response Pattern and Forbidden Responses but: (1) adds Loop Tracking section (pipeline mode with `wazir capture loop-check` and standalone pass counts), (2) restructures Implementation Order to a 4-tier priority (blocking, functional, quality, nice-to-have) instead of 3-tier, (3) adds a Quick Reference decision table, (4) removes the entire "Acknowledging Correct Feedback" anti-gratitude section, the "Gracefully Correcting Pushback" section, the Common Mistakes table, all Real Examples, the "When To Push Back" enumeration, and the GitHub Thread Replies section. The Loop Tracking addition and structural deletions make this a substantive rewrite. | Delegating would lose loop tracking. The removed anti-gratitude and pushback sections from superpowers are valuable behavioral guardrails worth preserving. |
28
+ | requesting-code-review | requesting-code-review | **Own** | Structurally rewritten. Both skills share the same When to Request triggers and Example structure. But Wazir: (1) replaces `superpowers:code-reviewer` with `wz:code-reviewer`, (2) adds explicit review loop parameters (`--mode`, depth-aware dimensions, pass number), (3) adds `codex review --uncommitted` and `codex review --base` commands, (4) adds Codex Error Handling section, (5) adds `{REVIEW_MODE}` placeholder, (6) changes Integration section to reference per-task review checkpoints instead of batch review, (7) adds "Dispatch review without explicit `--mode`" to Red Flags. The Codex integration and review loop parameter system are structural additions that change how reviews are dispatched. | Delegating would lose Codex integration and review loop protocol. |
29
+ | reviewer | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
30
+ | run-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
31
+ | scan-project | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
32
+ | self-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
33
+ | subagent-driven-development | subagent-driven-development | **Own** | Structurally rewritten. Both share the same high-level process (fresh subagent per task, two-stage review, spec then quality). But Wazir: (1) adds `Capture PRE_TASK_SHA` step to the process flowchart for diff scoping, (2) adds Code Review Scoping section (`codex review --base <pre-task-sha>`), (3) adds Review Loop Alignment section (explicit `--mode task-review`, task-scoped log filenames, loop cap via `wazir capture loop-check`), (4) adds Codex Error Handling section, (5) adds standalone mode fallback, (6) changes all skill references from `superpowers:` to `wz:`, (7) adds "Review the wrong diff" to Red Flags, (8) removes the Example Workflow, Advantages detail, and Cost breakdown from superpowers. The diff-scoping and review-loop integration are structural process changes. | Delegating would lose diff-scoped reviews and Codex integration. The removed Example Workflow from superpowers is a useful teaching tool. |
34
+ | tdd | test-driven-development | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~370 lines): detailed Red-Green-Refactor with Good/Bad code examples, Iron Law with explicit "delete and start over" rules, a Verification Checklist, extensive Why Order Matters section, Common Rationalizations table, When Stuck guide, Testing Anti-Patterns reference, and Debugging Integration. Wazir condenses to ~45 lines with 3 steps (RED, GREEN, REFACTOR), adds a single-pass test quality check in RED phase ("Are these tests testing the right behavior? Are they real assertions?"), and removes all examples, rationalization tables, and elaboration. Different description and name (`wz:tdd` vs `test-driven-development`). | Delegating would lose the test quality check. The superpowers version's extensive rationalization prevention and examples are valuable for discipline enforcement but costly in tokens. |
35
+ | using-git-worktrees | using-git-worktrees | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core process (directory selection priority, safety verification with `git check-ignore`, creation steps, project setup auto-detection, clean baseline verification) preserved structurally intact. Wazir adds: Command Routing preamble, Codebase Exploration preamble, global directory changed from `~/.config/superpowers/worktrees/` to `~/.wazir/worktrees/`, Cleanup and Common Issues sections (submodules, lock files, stale worktrees). Drops superpowers-only sections: Example Workflow, Quick Reference table, Common Mistakes, Red Flags, Integration. | Dropped superpowers sections (Quick Reference, Common Mistakes, Red Flags, Integration) reduce operational guardrails. Could be recovered into the Own skill. |
36
+ | using-skills | using-superpowers | **Own** | Structurally rewritten. Both enforce the same core rule (invoke skills before any response, even at 1% chance). But Wazir: (1) renames from `using-superpowers` to `using-skills`, (2) changes all internal skill references from `superpowers:` to `wz:` throughout flowchart and examples, (3) removes the Skill Types section detail about "Rigid vs Flexible" elaboration, (4) removes User Instructions elaboration. The name change and systematic `wz:` prefix replacement throughout the flowchart make this a namespace-level rewrite. | Could potentially be Augment if namespace mapping were handled at a routing layer rather than in-skill. |
37
+ | verification | verification-before-completion | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~140 lines): Iron Law, Gate Function (5-step IDENTIFY/RUN/READ/VERIFY/CLAIM), Common Failures table, Red Flags list, Rationalization Prevention table, Key Patterns (tests, regression, build, requirements, agent delegation), Why This Matters section with 24 failure memories, and When To Apply section. Wazir condenses to ~35 lines with 3 bullet requirements (what was verified, exact command, actual result), a minimum rule, and a brief "when verification fails" section. Different name (`wz:verification` vs `verification-before-completion`). | Delegating would lose the concise Wazir format. The superpowers version's extensive rationalization prevention is valuable for discipline but token-expensive. The Wazir version may be too terse to enforce the discipline effectively. |
38
+ | wazir | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
39
+ | writing-plans | writing-plans | **Own** | Structurally rewritten. Superpowers focuses on plan document format (header template, task structure with bite-sized steps, code examples in plan, execution handoff to subagent-driven or parallel session). Wazir: (1) changes inputs to "approved design or approved clarified direction" instead of "spec or requirements", (2) adds pipeline-aware output paths (`.wazir/runs/latest/clarified/execution-plan.md` and `.wazir/runs/latest/tasks/task-NNN/spec.md` vs. standalone `docs/plans/`), (3) removes the plan document format template entirely (no header template, no task structure template, no code examples), (4) adds Plan Review Loop section with `wz:reviewer --mode plan-review`, Codex integration via stdin pipe, Codex error handling, depth-aware pass counts, and standalone fallback. The plan review loop and pipeline path system are structural additions; the removal of the format template is a structural deletion. | Delegating would lose pipeline integration and plan review loop. The removed format template from superpowers is valuable for plan quality and could be worth recovering. |
40
+ | writing-skills | writing-skills | **Own** | Structurally rewritten. Both share the TDD-for-skills philosophy and RED-GREEN-REFACTOR mapping. But Wazir: (1) condenses from ~650 lines to ~170 lines, (2) removes the extensive SKILL.md Structure template, CSO (Claude Search Optimization) section, Flowchart Usage guidelines, Code Examples guidelines, Token Efficiency section, File Organization examples, Testing All Skill Types section (discipline/technique/pattern/reference), Common Rationalizations for Skipping Testing table, Bulletproofing Skills Against Rationalization section (with Cialdini psychology reference), Skill Creation Checklist, Discovery Workflow, Anti-Patterns section, and STOP deployment gate, (3) adds "Be Prescriptive, Not Descriptive" guidance, "Use Rationalization Prevention" example, "Include Decision Trees" guidance, and skill reference syntax. The massive content reduction and different teaching approach make this a structural rewrite. | Delegating would lose the concise prescriptive format. The superpowers version's CSO guidelines, testing methodology, and anti-pattern catalog are extremely valuable reference material. |
41
+
42
+ ---
43
+
44
+ ## Superpowers Skills with No Wazir Counterpart
45
+
46
+ These superpowers skills have no Wazir fork. They could be used as-is via the superpowers plugin.
47
+
48
+ | Superpowers Skill | Status | Notes |
49
+ |---|---|---|
50
+ | using-superpowers | Replaced by `wz:using-skills` | See using-skills row above. |
51
+
52
+ All 14 superpowers skills have a Wazir counterpart (using-superpowers maps to using-skills, systematic-debugging maps to debugging, test-driven-development maps to tdd, verification-before-completion maps to verification).
53
+
54
+ ---
55
+
56
+ ## Summary by Tier
57
+
58
+ | Tier | Count | Skills |
59
+ |---|---|---|
60
+ | **Own** | 25 | brainstorming, clarifier, debugging, design, dispatching-parallel-agents, executing-plans, executor, finishing-a-development-branch, humanize, init-pipeline, prepare-next, receiving-code-review, requesting-code-review, reviewer, run-audit, scan-project, self-audit, subagent-driven-development, tdd, using-git-worktrees, using-skills, verification, wazir, writing-plans, writing-skills |
61
+ | **Augment** | 0 | _(none -- tier not implementable, see [Augment Mechanism](#augment-mechanism))_ |
62
+ | **Delegate** | 0 | _(none)_ |
63
+
64
+ ---
65
+
66
+ ## Common Wazir Additions (Appear in All Forked Skills)
67
+
68
+ Every Wazir fork of a superpowers skill adds these two preamble sections:
69
+
70
+ 1. **Command Routing** -- routes large commands to context-mode tools and small commands to native Bash, following `hooks/routing-matrix.json`.
71
+ 2. **Codebase Exploration** -- prescribes symbol-first exploration via `wazir index search-symbols` and `wazir recall`, with fallback to direct file reads.
72
+
73
+ These preambles alone would justify **Augment** tier for any skill where no other structural changes exist.
74
+
75
+ ---
76
+
77
+ ## Augment Mechanism
78
+
79
+ **Research date:** 2026-03-19 (R2: Composition Infrastructure Validation)
80
+
81
+ ### Finding: Augment tier is not implementable
82
+
83
+ The Augment tier assumed that placing a Wazir addendum at `~/.claude/skills/<skill-name>/SKILL.md` would layer Wazir context on top of the superpowers base skill. This assumption is wrong. **Skill shadowing is full-override, not merge/append.**
84
+
85
+ ### Evidence
86
+
87
+ **1. `skills-core.js` `resolveSkillPath()` (superpowers v4.3.1)**
88
+
89
+ The function at `lib/skills-core.js:108-140` checks personal skills directory first. If `~/.claude/skills/<name>/SKILL.md` exists, it returns that file immediately and never reads the superpowers version. There is no content merging.
90
+
91
+ ```
92
+ // Try personal skills first (unless explicitly superpowers:)
93
+ if (!forceSuperpowers && personalDir) {
94
+ const personalSkillFile = path.join(personalDir, actualSkillName, 'SKILL.md');
95
+ if (fs.existsSync(personalSkillFile)) {
96
+ return { skillFile: personalSkillFile, sourceType: 'personal', ... };
97
+ // ^^^ returns here -- superpowers version never consulted
98
+ }
99
+ }
100
+ ```
101
+
102
+ **2. Superpowers test suite confirms override behavior**
103
+
104
+ `tests/opencode/test-skills-core.sh` line 336 asserts:
105
+ ```
106
+ [PASS] Personal skills shadow superpowers skills
107
+ ```
108
+
109
+ The test creates `personal-skills/shared-skill/SKILL.md` and `superpowers-skills/shared-skill/SKILL.md`, resolves `shared-skill`, and verifies `sourceType` is `"personal"` -- the superpowers version is invisible.
110
+
111
+ **3. Superpowers RELEASE-NOTES.md v3.3.0**
112
+
113
+ Line 385 documents the behavior explicitly: "Personal skills override superpowers skills when names match."
114
+
115
+ **4. The `superpowers:` prefix bypass is not available in Claude Code**
116
+
117
+ `skills-core.js` supports `superpowers:skill-name` syntax to force resolution to the superpowers version even when a personal skill shadows it. However, `skills-core.js` is only used by the OpenCode plugin (`/.opencode/plugins/superpowers.js`). Claude Code's native `Skill` tool has its own built-in resolution logic that does not expose this prefix bypass.
118
+
119
+ ### Alternatives Considered
120
+
121
+ | Approach | Viable? | Why |
122
+ |---|---|---|
123
+ | Place addendum in `~/.claude/skills/<name>/` | No | Full override -- base skill content lost |
124
+ | Merge base + addendum in SKILL.md at install time | Partial | Would work but creates a maintenance coupling: every superpowers update requires re-merging. This is functionally identical to Own tier. |
125
+ | Inject Wazir context via CLAUDE.md | No | CLAUDE.md is project-scoped; skill behavior should be global across all projects |
126
+ | Use `superpowers:` prefix to load base, then append | No | Prefix only works in OpenCode's `skills-core.js`, not in Claude Code's native Skill tool |
127
+ | Propose upstream merge/append feature | Future | Would require a superpowers or Claude Code platform change |
128
+
129
+ ### Conclusion
130
+
131
+ The Augment tier is architecturally impossible with the current skill discovery mechanism. All three former Augment skills (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) are reclassified to **Own** tier. Since the Wazir versions already carry the full superpowers base content plus Wazir additions, no content is lost -- the skills simply cannot delegate to a shared base.
132
+
133
+ If superpowers or Claude Code introduces a composition/layering mechanism in the future (e.g., `extends: superpowers:dispatching-parallel-agents` in frontmatter), the Augment tier could be revisited.
134
+
135
+ ---
136
+
137
+ ## Observations
138
+
139
+ 1. **No Delegate candidates exist.** Every Wazir fork adds at minimum the Command Routing and Codebase Exploration preambles, which prevents pure delegation.
140
+
141
+ 2. **Augment tier is not implementable.** R2 validation (2026-03-19) found that skill shadowing in both superpowers `skills-core.js` and Claude Code's native Skill tool is full-override: placing a SKILL.md in `~/.claude/skills/<name>/` completely replaces the superpowers skill with the same name. There is no merge or append mechanism. The three former Augment candidates (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) have been reclassified to Own. See [Augment Mechanism](#augment-mechanism) for full analysis.
142
+
143
+ 3. **All 14 forked skills are Own** because either (a) they introduce structural process changes (review loops, pipeline mode, Codex integration, content restructuring) or (b) the Augment composition mechanism does not exist in the platform.
144
+
145
+ 4. **Token cost tradeoff is significant.** Several Wazir Own skills (tdd, verification, debugging, writing-skills) are dramatically shorter than their superpowers counterparts. The superpowers versions contain valuable rationalization prevention tables, detailed examples, and anti-pattern catalogs that enforce discipline. The Wazir versions trade this for token efficiency. This tradeoff should be revisited -- some of the removed discipline content may be worth recovering as separate reference files.
146
+
147
+ 5. **The `wz:` prefix is already applied** in skill names within the Wazir SKILL.md frontmatter for all forked skills, consistent with the Own tier convention.
@@ -15,6 +15,7 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
15
15
  | `wazir validate commits` | implemented | Validates conventional commit format for commits in the range `--base..--head` (or auto-detected base to HEAD). |
16
16
  | `wazir validate changelog` | implemented | Validates `CHANGELOG.md` structure; with `--require-entries` and `--base`, enforces new entries since the base. |
17
17
  | `wazir validate docs-drift` | implemented | Detects when source files (roles, workflows, skills, hooks) change without corresponding documentation updates. Advisory by default; `--strict` exits non-zero on drift. |
18
+ | `wazir validate skills` | implemented | Validates skill frontmatter and checks for name conflicts with superpowers skills (requires `wz:` prefix). Rejects any `CONTEXT.md` files (augment tier concluded not implementable in R2). |
18
19
  | `wazir validate artifacts` | reserved | Exits `2` until artifact-template and example validation expands. |
19
20
  | `wazir export build` | implemented | Generates host packages under `exports/hosts/*` from canonical sources. |
20
21
  | `wazir export --check` | implemented | Verifies generated host packages still match current canonical source hashes. |
@@ -28,7 +29,8 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
28
29
  | `wazir recall file` | implemented | Returns an exact line-bounded slice from an indexed file. Supports `--tier L0\|L1` for summary recall. |
29
30
  | `wazir recall symbol` | implemented | Returns an exact slice for an indexed symbol match. Supports `--tier L0\|L1` for summary recall. |
30
31
  | `wazir doctor` | implemented | Validates the active repo surface for manifest, hooks, state-root policy, and host export directory presence. |
31
- | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. |
32
+ | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. Includes a one-line context savings summary when usage data is available. |
33
+ | `wazir stats` | implemented | Shows token savings statistics for a run, including total queries, estimated tokens saved, bytes avoided, per-tool breakdown, and overall savings ratio. |
32
34
  | `wazir capture init` | implemented | Creates a run ledger with `status.json`, `events.ndjson`, and a captures directory under the configured state root. |
33
35
  | `wazir capture event` | implemented | Appends a run event and can update phase, status, and loop counts in `status.json`. |
34
36
  | `wazir capture route` | implemented | Reserves a run-local capture file path for large tool output. |
@@ -130,6 +130,12 @@
130
130
  subject: wazir status
131
131
  verifier: command_registry
132
132
  required: true
133
+ - id: command-stats
134
+ file: docs/reference/tooling-cli.md
135
+ claim_type: command
136
+ subject: wazir stats
137
+ verifier: command_registry
138
+ required: true
133
139
  - id: command-capture-family
134
140
  file: docs/reference/tooling-cli.md
135
141
  claim_type: command
@@ -202,6 +208,12 @@
202
208
  subject: wazir validate docs-drift
203
209
  verifier: command_registry
204
210
  required: true
211
+ - id: command-validate-skills
212
+ file: docs/reference/tooling-cli.md
213
+ claim_type: command
214
+ subject: wazir validate skills
215
+ verifier: command_registry
216
+ required: true
205
217
  - id: generated-claude-package
206
218
  file: docs/reference/host-exports.md
207
219
  claim_type: generated_file