@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/CHANGELOG.md +74 -10
  2. package/README.md +15 -15
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/roles-and-workflows.md +2 -0
  9. package/docs/concepts/why-wazir.md +59 -0
  10. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  11. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  12. package/docs/readmes/INDEX.md +21 -5
  13. package/docs/readmes/features/expertise/README.md +2 -2
  14. package/docs/readmes/features/exports/README.md +2 -2
  15. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  16. package/docs/readmes/features/schemas/README.md +3 -0
  17. package/docs/readmes/features/skills/README.md +17 -0
  18. package/docs/readmes/features/skills/clarifier.md +5 -0
  19. package/docs/readmes/features/skills/claude-cli.md +5 -0
  20. package/docs/readmes/features/skills/codex-cli.md +5 -0
  21. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  22. package/docs/readmes/features/skills/executing-plans.md +5 -0
  23. package/docs/readmes/features/skills/executor.md +5 -0
  24. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  25. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  26. package/docs/readmes/features/skills/humanize.md +5 -0
  27. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  28. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  29. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  30. package/docs/readmes/features/skills/reviewer.md +5 -0
  31. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  32. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  33. package/docs/readmes/features/skills/wazir.md +5 -0
  34. package/docs/readmes/features/skills/writing-skills.md +5 -0
  35. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  36. package/docs/reference/configuration-reference.md +47 -6
  37. package/docs/reference/hooks.md +1 -0
  38. package/docs/reference/launch-checklist.md +4 -4
  39. package/docs/reference/review-loop-pattern.md +119 -9
  40. package/docs/reference/roles-reference.md +1 -0
  41. package/docs/reference/skill-tiers.md +147 -0
  42. package/docs/reference/tooling-cli.md +3 -1
  43. package/docs/truth-claims.yaml +12 -0
  44. package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
  45. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  46. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  47. package/exports/hosts/claude/.claude/settings.json +9 -0
  48. package/exports/hosts/claude/CLAUDE.md +1 -1
  49. package/exports/hosts/claude/export.manifest.json +6 -4
  50. package/exports/hosts/claude/host-package.json +3 -1
  51. package/exports/hosts/codex/AGENTS.md +1 -1
  52. package/exports/hosts/codex/export.manifest.json +6 -4
  53. package/exports/hosts/codex/host-package.json +3 -1
  54. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  55. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  56. package/exports/hosts/cursor/export.manifest.json +6 -4
  57. package/exports/hosts/cursor/host-package.json +3 -1
  58. package/exports/hosts/gemini/GEMINI.md +1 -1
  59. package/exports/hosts/gemini/export.manifest.json +6 -4
  60. package/exports/hosts/gemini/host-package.json +3 -1
  61. package/hooks/context-mode-router +191 -0
  62. package/hooks/definitions/context_mode_router.yaml +19 -0
  63. package/hooks/hooks.json +31 -6
  64. package/hooks/protected-path-write-guard +8 -0
  65. package/hooks/routing-matrix.json +45 -0
  66. package/hooks/session-start +62 -1
  67. package/llms-full.txt +937 -134
  68. package/package.json +2 -4
  69. package/schemas/hook.schema.json +2 -1
  70. package/schemas/phase-report.schema.json +89 -0
  71. package/schemas/usage.schema.json +25 -1
  72. package/schemas/wazir-manifest.schema.json +19 -0
  73. package/skills/brainstorming/SKILL.md +32 -157
  74. package/skills/clarifier/SKILL.md +289 -111
  75. package/skills/claude-cli/SKILL.md +320 -0
  76. package/skills/codex-cli/SKILL.md +260 -0
  77. package/skills/debugging/SKILL.md +13 -0
  78. package/skills/design/SKILL.md +13 -0
  79. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  80. package/skills/executing-plans/SKILL.md +13 -0
  81. package/skills/executor/SKILL.md +139 -19
  82. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  83. package/skills/gemini-cli/SKILL.md +260 -0
  84. package/skills/humanize/SKILL.md +13 -0
  85. package/skills/init-pipeline/SKILL.md +72 -164
  86. package/skills/prepare-next/SKILL.md +81 -10
  87. package/skills/receiving-code-review/SKILL.md +13 -0
  88. package/skills/requesting-code-review/SKILL.md +13 -0
  89. package/skills/reviewer/SKILL.md +369 -24
  90. package/skills/run-audit/SKILL.md +13 -0
  91. package/skills/scan-project/SKILL.md +13 -0
  92. package/skills/self-audit/SKILL.md +217 -16
  93. package/skills/skill-research/SKILL.md +188 -0
  94. package/skills/subagent-driven-development/SKILL.md +13 -0
  95. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  96. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  97. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  98. package/skills/tdd/SKILL.md +13 -0
  99. package/skills/using-git-worktrees/SKILL.md +13 -0
  100. package/skills/using-skills/SKILL.md +13 -0
  101. package/skills/verification/SKILL.md +54 -3
  102. package/skills/wazir/SKILL.md +464 -381
  103. package/skills/writing-plans/SKILL.md +14 -1
  104. package/skills/writing-skills/SKILL.md +13 -0
  105. package/templates/artifacts/implementation-plan.md +3 -0
  106. package/templates/artifacts/tasks-template.md +133 -0
  107. package/templates/examples/phase-report.example.json +48 -0
  108. package/tooling/src/adapters/composition-engine.js +256 -0
  109. package/tooling/src/adapters/model-router.js +84 -0
  110. package/tooling/src/capture/command.js +41 -2
  111. package/tooling/src/capture/run-config.js +3 -1
  112. package/tooling/src/capture/store.js +56 -0
  113. package/tooling/src/capture/usage.js +106 -0
  114. package/tooling/src/capture/user-input.js +66 -0
  115. package/tooling/src/checks/ac-matrix.js +256 -0
  116. package/tooling/src/checks/command-registry.js +12 -0
  117. package/tooling/src/checks/docs-truth.js +1 -1
  118. package/tooling/src/checks/security-sensitivity.js +69 -0
  119. package/tooling/src/checks/skills.js +111 -0
  120. package/tooling/src/cli.js +31 -20
  121. package/tooling/src/commands/stats.js +161 -0
  122. package/tooling/src/commands/validate.js +5 -1
  123. package/tooling/src/export/compiler.js +33 -37
  124. package/tooling/src/gating/agent.js +145 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
  126. package/tooling/src/hooks/routing-logic.js +69 -0
  127. package/tooling/src/init/auto-detect.js +258 -0
  128. package/tooling/src/init/command.js +38 -170
  129. package/tooling/src/input/scanner.js +46 -0
  130. package/tooling/src/reports/command.js +103 -0
  131. package/tooling/src/reports/phase-report.js +323 -0
  132. package/tooling/src/state/command.js +160 -0
  133. package/tooling/src/state/db.js +287 -0
  134. package/tooling/src/status/command.js +58 -1
  135. package/tooling/src/verify/proof-collector.js +299 -0
  136. package/wazir.manifest.yaml +26 -14
  137. package/workflows/plan-review.md +3 -1
  138. package/workflows/verify.md +30 -1
package/llms-full.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  # Wazir — Complete Documentation
2
2
 
3
- > Generated: 2026-03-16T23:53:58Z
3
+ > Generated: 2026-03-20T04:20:52Z
4
4
 
5
5
  ---
6
6
  ## Source: docs/concepts/architecture.md
@@ -17,7 +17,7 @@ Wazir is a host-native engineering OS kit. The host environment (Claude, Codex,
17
17
  | Workflows | Phase entrypoints that sequence roles through delivery |
18
18
  | Skills | Reusable procedures (wz:tdd, wz:debugging, wz:verification, wz:brainstorming) |
19
19
  | Hooks | Guardrails enforcing protected paths, loop caps, and capture routing |
20
- | Expertise | 308 curated knowledge modules composed into agent prompts |
20
+ | Expertise | 315 curated knowledge modules composed into agent prompts |
21
21
  | Templates | Artifact templates for phase outputs and handoff |
22
22
  | Schemas | Validation schemas for manifest, hooks, artifacts, and exports |
23
23
  | Exports | Generated host packages tailored per supported host |
@@ -93,6 +93,7 @@ open-pencil is integrated as an optional adapter (`open_pencil`) — it is not r
93
93
  - [Hooks](../reference/hooks.md)
94
94
  - [Expertise & Antipatterns](composition-engine.md)
95
95
 
96
+
96
97
  ---
97
98
  ## Source: docs/concepts/artifact-model.md
98
99
 
@@ -157,6 +158,7 @@ That path must remain gitignored so a normal run still leaves `git status` clean
157
158
  - archive stale or disproven learnings instead of rewriting history in place
158
159
  - prune external run-state captures when they no longer provide audit or debugging value
159
160
 
161
+
160
162
  ---
161
163
  ## Source: docs/concepts/composition-engine.md
162
164
 
@@ -197,6 +199,7 @@ Brute-force loading of all expertise modules would flood the context window. The
197
199
 
198
200
  For the complete module listing and anti-pattern catalog, see the [Expertise Index reference](../reference/expertise-index.md).
199
201
 
202
+
200
203
  ---
201
204
  ## Source: docs/concepts/indexing-and-recall.md
202
205
 
@@ -361,6 +364,7 @@ The optional `context-mode` adapter remains:
361
364
 
362
365
  See the adapter docs for current status and constraints.
363
366
 
367
+
364
368
  ---
365
369
  ## Source: docs/concepts/observability.md
366
370
 
@@ -406,6 +410,7 @@ The capture command family writes under:
406
410
  - captured output should reduce context flooding, not increase it
407
411
  - summaries must point back to captured files instead of pretending the full output stayed in context
408
412
 
413
+
409
414
  ---
410
415
  ## Source: docs/concepts/roles-and-workflows.md
411
416
 
@@ -442,6 +447,8 @@ The canonical workflow sequence is:
442
447
  13. **learn** — capture scoped learnings
443
448
  14. **prepare-next** — produce a clean handoff for the next run
444
449
 
450
+ Additionally, **run-audit** is a standalone workflow that can be invoked outside the linear pipeline to perform structured codebase audits with source-backed findings.
451
+
445
452
  ## Role routing
446
453
 
447
454
  The orchestrator dispatches three roles per task: `executor`, `reviewer`, and `verifier`. By default, all three run for every task. The `required_roles` field in a task's YAML frontmatter controls which roles are dispatched, allowing the orchestrator to skip unnecessary roles and save context window budget.
@@ -469,6 +476,7 @@ If `security_critical: true`, `reviewer` is always included.
469
476
 
470
477
  Use the files under `roles/` and `workflows/` as the canonical source for role contracts and phase entrypoints. For exact role and workflow tables, see the [Roles Reference](../reference/roles-reference.md).
471
478
 
479
+
472
480
  ---
473
481
  ## Source: docs/concepts/terminology-policy.md
474
482
 
@@ -500,6 +508,71 @@ Do not use terms that describe Wazir as a background service, a web-based contro
500
508
  - Use the canonical terms above in all roles, workflows, skills, and documentation.
501
509
  - When in doubt, describe what Wazir is, not what it is not.
502
510
 
511
+
512
+ ---
513
+ ## Source: docs/concepts/why-wazir.md
514
+
515
+ # Why Wazir
516
+
517
+ What makes Wazir the best engineering OS you can add to an AI coding agent.
518
+
519
+ ## 1. Measure Twice, Cut Once
520
+
521
+ Wazir clarifies before coding. The pipeline forces research, spec hardening, design review, and plan approval before a single line of implementation code is written. Most AI agents jump straight to code and fix mistakes after. Wazir prevents the mistakes.
522
+
523
+ ## 2. Deep Research
524
+
525
+ Every AI agent knows how to research. Users don't ask them to. Wazir makes research a mandatory phase — the researcher role scans the codebase, fetches external sources, and produces a research brief before clarification begins. The agent starts informed, not guessing.
526
+
527
+ ## 3. Clarifier + Task Planning
528
+
529
+ A structured clarification pipeline turns vague requests into measurable specs. Spec hardening catches ambiguity, missing constraints, and untestable acceptance criteria before they become bugs. Task planning produces execution-grade task specs — not TODO lists.
530
+
531
+ ## 4. Content Author
532
+
533
+ A dedicated role for any content need — database seeding, sample content, test fixtures, translations, copy, email templates, notification text. Most AI agents treat content as an afterthought bolted onto code tasks. Wazir gives content its own phase with editorial standards, i18n awareness, and humanization rules.
534
+
535
+ ## 5. Self-Audit
536
+
537
+ The agent audits its own work in an isolated git worktree. Validates, finds structural issues, fixes what it can, verifies the fixes, and only merges on all-green. 5-loop cycle with convergence detection. Protected-path safety rails prevent the agent from modifying its own identity-defining files. Safe self-improvement.
538
+
539
+ ## 6. Composer
540
+
541
+ 315 curated expertise modules across 12 domains. The composition engine assembles task-specific agents by loading the right expertise for each role, stack, and concern. The executor building a Flutter RTL app gets Flutter patterns, RTL layout rules, and mobile antipatterns composed into its context. The reviewer gets the corresponding antipattern catalog. Every dispatched agent is a specialist, not a generalist pretending.
542
+
543
+ ## 7. Review Loops
544
+
545
+ Multi-pass adversarial review at every pipeline checkpoint — not a single rubber-stamp at the end. Research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions. Findings are resolved before advancing. The reviewer is an adversary, not a cheerleader.
546
+
547
+ ## 8. Continuous Learning
548
+
549
+ Wazir evolves from its own mistakes. Review findings, audit findings, and user corrections feed into a learning system. Recurring issues become accepted learnings injected into future runs. A drift budget prevents learned behavior from diverging too far from the original design. The agent that builds your 10th feature is better than the one that built your 1st.
550
+
551
+ ## 9. Antipatterns
552
+
553
+ A first-class antipattern catalog loaded into reviewer context BEFORE domain expertise. Catches AI-specific failure modes: fake completion, unwired abstractions, shallow tests, security theater, architecture drift. The reviewer's first lens is "what could go wrong" — not "does this look right."
554
+
555
+ ## 10. Multi-Host
556
+
557
+ One canonical source, four host exports. Wazir works on Claude Code, Codex, Gemini, and Cursor from a single `wazir export build`. Roles, workflows, skills, and expertise are written once and compiled into each host's native format. Switch hosts without rewriting your engineering process.
558
+
559
+ ## 11. Context Efficiency
560
+
561
+ AI agents waste most of their context window on brute-force file reads and verbose command output. Wazir's routing hook auto-routes large commands through context-mode. The index provides symbol-first exploration — query first, read only what's needed. Capture routing redirects large output to files. Result: 60-80% token reduction on exploration-heavy phases. The agent thinks more, reads less.
562
+
563
+ ## 12. Verification Before Completion
564
+
565
+ No success claims without evidence. The verify phase produces deterministic proof — test results, lint output, type-check results — not "I believe it works." Every completion claim is backed by a command that was actually run and output that was actually checked. Evidence before assertions, always.
566
+
567
+ ## 13. Gating Agent
568
+
569
+ Autonomous phase transition decisions. After each phase, a gating agent reads the phase report and decides: continue (all gates pass), loop back (specific failures with fix paths), or escalate to human (ambiguous trade-offs, scope changes). Default posture: escalate. The pipeline doesn't blindly advance — it stops when it should stop.
570
+
571
+ ## 14. Humanize
572
+
573
+ Anti-AI-writing patterns across all text output. A vocabulary blacklist, domain-specific rules, and a self-audit checklist ensure that specs, plans, code comments, commit messages, and documentation read like they were written by a human engineer — not generated by an LLM. Because AI-sounding output erodes trust.
574
+
575
+
503
576
  ---
504
577
  ## Source: docs/getting-started/01-installation.md
505
578
 
@@ -582,6 +655,7 @@ npx wazir doctor
582
655
 
583
656
  [Your First Run](02-first-run.md) — walk through the full pipeline from brief to shipped code.
584
657
 
658
+
585
659
  ---
586
660
  ## Source: docs/getting-started/02-first-run.md
587
661
 
@@ -688,6 +762,7 @@ The 7 steps above map to 14 internal phases:
688
762
  - [Roles & Workflows](../concepts/roles-and-workflows.md) — deep dive into role contracts
689
763
  - [Composition Engine](../concepts/composition-engine.md) — how expertise modules are loaded
690
764
 
765
+
691
766
  ---
692
767
  ## Source: docs/guides/memory-and-learnings.md
693
768
 
@@ -726,6 +801,7 @@ Wazir keeps learning durable but scoped.
726
801
  3. Promote it to `memory/learnings/accepted/` only when the scope and evidence are durable.
727
802
  4. Move disproven or obsolete learnings to `memory/learnings/archived/`.
728
803
 
804
+
729
805
  ---
730
806
  ## Source: docs/guides/troubleshooting.md
731
807
 
@@ -831,6 +907,7 @@ If it says the run status is missing:
831
907
  - confirm the file exists on disk
832
908
  - use `--json` for machine-readable output during automation
833
909
 
910
+
834
911
  ---
835
912
  ## Source: docs/reference/configuration-reference.md
836
913
 
@@ -969,15 +1046,56 @@ Out of scope for this manifest check:
969
1046
 
970
1047
  Maintainers are responsible for policing those surfaces with the separate docs-truth, runtime-surface, and repository review checks.
971
1048
 
972
- ## Workflows vs phases
1049
+ ## Phases vs workflows
973
1050
 
974
- - `phases` are the core lifecycle states of the operating model.
975
- - `workflows` are the canonical callable or review-gated entrypoints that drive those phases.
1051
+ The pipeline has **4 phases** (Init, Clarifier, Executor, Final Review) and **15 workflows** (atomic units within those phases).
976
1052
 
977
- They overlap heavily, but they are not identical:
1053
+ - **Phases** are the top-level pipeline stages. Event capture and tracking use phase names: `init`, `clarifier`, `executor`, `final_review`.
1054
+ - **Workflows** are the canonical callable or review-gated entrypoints that run within phases. Each workflow can be independently enabled/disabled via `workflow_policy` in run-config.
978
1055
 
979
- - `spec_challenge`, `plan_review`, and `prepare_next` are workflows that sit between or around the core execution phases.
980
- - Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
1056
+ | Phase | Workflows |
1057
+ |-------|-----------|
1058
+ | Init | (inline — no workflow files) |
1059
+ | Clarifier | clarify, discover, specify, spec_challenge, author, design, design_review, plan, plan_review |
1060
+ | Executor | execute, verify |
1061
+ | Final Review | review, learn, prepare_next |
1062
+
1063
+ `run_audit` is a standalone on-demand workflow, not part of the main pipeline flow.
1064
+
1065
+ Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
1066
+
1067
+ ## Hook configuration
1068
+
1069
+ ### `hooks/routing-matrix.json`
1070
+
1071
+ The routing matrix defines how the context-mode router classifies commands:
1072
+
1073
+ - `large` — array of command prefixes that always route to context-mode (AC-3.1). The `# wazir:passthrough` marker does NOT exempt commands in this category.
1074
+ - `small` — array of command prefixes that always pass through without context-mode processing.
1075
+ - `ambiguous_heuristic` — rules for commands that match neither large nor small:
1076
+ - `pipe_detected` — classify piped commands as ambiguous
1077
+ - `redirect_detected` — classify redirected commands as ambiguous
1078
+ - `verbose_binaries` — array of binary names whose output is typically large
1079
+
1080
+ ### `config/gating-rules.yaml`
1081
+
1082
+ The gating rules file defines conditions for phase transition decisions:
1083
+
1084
+ - `rules.continue` — all conditions must pass for a phase to advance (test failures, lint errors, type errors, drift delta, risk flags, uncertain outcomes)
1085
+ - `rules.loop_back` — any deterministic failure (test failures, lint errors, or type errors) triggers a loop-back with actionable fix descriptions
1086
+ - `rules.escalate` — fallback when neither continue nor loop_back match
1087
+ - `default_verdict` — verdict when the report is empty or missing (defaults to `escalate`)
1088
+
1089
+ ### Composition proof artifacts
1090
+
1091
+ The composition engine (`tooling/src/adapters/composition-engine.js`) writes a proof artifact per dispatch to `.wazir/runs/<id>/artifacts/composition-<role>-<task>.json` containing:
1092
+
1093
+ - `modules_included[]` — `{ path, layer, tokens }` for each loaded module
1094
+ - `modules_dropped[]` — `{ path, layer, tokens, reason }` for each dropped module. Reason values:
1095
+ - `module_cap_exceeded` — module count exceeded the 15-module cap
1096
+ - `token_ceiling_exceeded` — total tokens exceeded the configurable ceiling (default: 50,000)
1097
+ - `total_tokens` — total token count of composed prompt
1098
+ - `prompt_hash` — SHA-256 hash of the composed prompt for audit traceability
981
1099
 
982
1100
  ## Current index parser roster
983
1101
 
@@ -994,6 +1112,7 @@ The active manifest currently declares built-in heuristic extractors for:
994
1112
  - YAML
995
1113
  - Markdown
996
1114
 
1115
+
997
1116
  ---
998
1117
  ## Source: docs/reference/expertise-index.md
999
1118
 
@@ -1050,6 +1169,7 @@ The `expertise/humanize/` domain provides AI text pattern detection and removal.
1050
1169
 
1051
1170
  For conceptual understanding of how the composition engine works, see [Composition Engine](../concepts/composition-engine.md).
1052
1171
 
1172
+
1053
1173
  ---
1054
1174
  ## Source: docs/reference/git-flow.md
1055
1175
 
@@ -1097,6 +1217,7 @@ Allowed types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `ci`, `perf`,
1097
1217
  - **CI:** All three validators run on pull requests; `--require-entries` blocks feature/codex/hotfix branches without changelog entries
1098
1218
  - **Roles:** Each role has documented git-flow responsibilities in its contract
1099
1219
 
1220
+
1100
1221
  ---
1101
1222
  ## Source: docs/reference/hooks.md
1102
1223
 
@@ -1188,6 +1309,7 @@ Hook definitions are the authoritative product contracts. The canonical definiti
1188
1309
  - `0` allow
1189
1310
  - `43` block
1190
1311
 
1312
+
1191
1313
  ---
1192
1314
  ## Source: docs/reference/host-exports.md
1193
1315
 
@@ -1242,6 +1364,7 @@ The compiler generates the canonical host packages under `exports/hosts/*`.
1242
1364
 
1243
1365
  The only root host bootstrap retained is `.claude/settings.json`, which mirrors the generated Claude settings contract.
1244
1366
 
1367
+
1245
1368
  ---
1246
1369
  ## Source: docs/reference/launch-checklist.md
1247
1370
 
@@ -1273,7 +1396,7 @@ Submit pull requests to these curated lists (one PR per list, follow each repo's
1273
1396
  ### awesome-claude-code
1274
1397
  - **Repo:** `github.com/anthropics/awesome-claude-code` (or the most-starred community fork)
1275
1398
  - **Section:** Tools / Plugins / Extensions
1276
- - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 14 phases, and 308 expertise modules.`
1399
+ - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 4 phases (15 workflows), and 315 expertise modules.`
1277
1400
  - **Tips:** Keep the description under 120 characters. Link directly to the repo.
1278
1401
 
1279
1402
  ### awesome-ai-agents
@@ -1303,7 +1426,7 @@ Show HN: Wazir – Engineering OS kit for AI coding agents (Claude, Codex, Gemin
1303
1426
  ### First comment
1304
1427
  Post a comment immediately after submission explaining:
1305
1428
  1. What problem Wazir solves (AI agents lack structured engineering workflows)
1306
- 2. How it works (10 canonical roles, 14-phase pipeline, 308 expertise modules)
1429
+ 2. How it works (10 canonical roles, 4-phase pipeline with 15 workflows, 315 expertise modules)
1307
1430
  3. What makes it different (host-native, works across Claude/Codex/Gemini/Cursor)
1308
1431
  4. Quick install: `npx @wazir-dev/cli init`
1309
1432
  5. Invite feedback -- HN readers appreciate genuine requests for input
@@ -1322,7 +1445,7 @@ Post a comment immediately after submission explaining:
1322
1445
  **Title:** "How I Built an Engineering OS for AI Coding Agents"
1323
1446
 
1324
1447
  1. **Hook** -- The problem: AI agents write code but lack engineering discipline.
1325
- 2. **Architecture overview** -- 10 roles, 14 phases, expertise modules, quality gates.
1448
+ 2. **Architecture overview** -- 10 roles, 4 phases (15 workflows), expertise modules, quality gates.
1326
1449
  3. **Code walkthrough** -- Show a real workflow: how a feature moves from requirements through TDD to deployment.
1327
1450
  4. **Host-native approach** -- Explain why one kit works across Claude, Codex, Gemini, and Cursor.
1328
1451
  5. **Results** -- Concrete metrics or before/after comparisons.
@@ -1347,7 +1470,7 @@ Structure as a 5-7 tweet thread:
1347
1470
 
1348
1471
  1. **Hook tweet:** One-liner about the problem + link to repo.
1349
1472
  2. **What it is:** Brief description of Wazir.
1350
- 3. **Architecture:** 10 roles, 14 phases, 308 modules (include a diagram image).
1473
+ 3. **Architecture:** 10 roles, 4 phases (15 workflows), 315 modules (include a diagram image).
1351
1474
  4. **Demo:** Short GIF or screenshot of a workflow in action.
1352
1475
  5. **Multi-host:** Works with Claude, Codex, Gemini, and Cursor.
1353
1476
  6. **Install:** `npx @wazir-dev/cli init`
@@ -1418,6 +1541,7 @@ Monitor these metrics weekly for the first month, then monthly:
1418
1541
  | External PRs | 2+ |
1419
1542
  | HN points | 50+ |
1420
1543
 
1544
+
1421
1545
  ---
1422
1546
  ## Source: docs/reference/marketplace-listings.md
1423
1547
 
@@ -1498,6 +1622,7 @@ Run through this checklist after every `npm publish`:
1498
1622
  - [ ] **Host exports:** Run `npx wazir export --check` to verify no drift
1499
1623
  - [ ] **CHANGELOG:** Verify `CHANGELOG.md` is updated with the new version entry
1500
1624
 
1625
+
1501
1626
  ---
1502
1627
  ## Source: docs/reference/release-process.md
1503
1628
 
@@ -1536,6 +1661,550 @@ When no Wazir release tag exists yet:
1536
1661
  - Legacy tags are not considered release boundaries
1537
1662
  - The first release tag will be `v1.0.0` (or `v0.1.0` if pre-stable)
1538
1663
 
1664
+
1665
+ ---
1666
+ ## Source: docs/reference/review-loop-pattern.md
1667
+
1668
+ # Review Loop Pattern Reference
1669
+
1670
+ Canonical reference for the review loop pattern used across all Wazir pipeline phases. Skills and workflows link to this document rather than embedding loop logic inline.
1671
+
1672
+ ---
1673
+
1674
+ ## Core Principle: Producer-Reviewer Separation
1675
+
1676
+ The producer skill (clarifier, planner, designer, etc.) **emits** an artifact and calls for review. The **reviewer role** owns the review loop. The producer receives findings and resolves them. No role reviews its own output.
1677
+
1678
+ ```
1679
+ Producer emits artifact
1680
+ -> Reviewer runs review loop (N passes, Codex if available)
1681
+ -> Findings returned to producer
1682
+ -> Producer fixes and resubmits
1683
+ -> Loop until all passes exhausted or cap reached
1684
+ -> Escalate to user if cap exceeded
1685
+ ```
1686
+
1687
+ When Codex is available, the reviewer role delegates to `codex review` as a secondary input while maintaining its own independent primary verdict.
1688
+
1689
+ ---
1690
+
1691
+ ## Per-Task Review vs Final Review
1692
+
1693
+ These are two structurally different constructs:
1694
+
1695
+ | | Per-Task Review | Final Review |
1696
+ |---|---|---|
1697
+ | **When** | During execution, after each task | After all execution + verification complete |
1698
+ | **Dimensions** | 5 task-execution dims (correctness, tests, wiring, drift, quality) | 7 scored dims (correctness, completeness, wiring, verification, drift, quality, documentation) |
1699
+ | **Scope** | Single task's uncommitted changes | Entire implementation vs spec/plan |
1700
+ | **Output** | Pass/fix loop, no score | Scored verdict (0-70), PASS/FAIL |
1701
+ | **Workflow** | Inline in execution flow | `workflows/review.md` |
1702
+ | **Skill** | `wz:reviewer` in `task-review` mode | `wz:reviewer` in `final` mode |
1703
+ | **Log filename** | `<phase>-task-<NNN>-review-pass-<N>.md` | `final-review.md` |
1704
+
1705
+ ---
1706
+
1707
+ ## Standalone Mode
1708
+
1709
+ When no `.wazir/runs/latest/` directory exists (standalone skill invocation outside a pipeline run):
1710
+
1711
+ 1. **Review loops still run** -- the review logic is embedded in the skill, not dependent on run state.
1712
+ 2. **Artifact location** -- artifacts live in `docs/plans/`. This is the canonical standalone artifact path.
1713
+ 3. **Review log location** -- review logs go alongside the artifact: `docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md`. No temp dir.
1714
+ 4. **Loop cap is SKIPPED entirely** -- no `wazir capture loop-check` call. The loop runs for exactly `pass_counts[depth]` passes (3/5/7) and stops. No cap guard, no fallback constant.
1715
+ 5. **`wazir capture loop-check`** -- not invoked in standalone mode. The standalone detection happens before the cap guard call.
1716
+
1717
+ Detection logic:
1718
+
1719
+ ```
1720
+ if .wazir/runs/latest/ exists:
1721
+ run_mode = "pipeline"
1722
+ log_dir = .wazir/runs/latest/reviews/
1723
+ cap_guard = wazir capture loop-check (full guard)
1724
+ else:
1725
+ run_mode = "standalone"
1726
+ artifact_dir = docs/plans/
1727
+ log_dir = docs/plans/ (alongside artifact)
1728
+ cap_guard = none (depth pass count is the only limit)
1729
+ ```
1730
+
1731
+ ---
1732
+
1733
+ ## Review Loop Pseudocode
1734
+
1735
+ ```
1736
+ review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
1737
+
1738
+ # options.mode -- explicit review mode (required)
1739
+ # options.task_id -- task identifier for task-scoped reviews (optional)
1740
+
1741
+ # Standalone detection
1742
+ run_mode = detect_run_mode() # "pipeline" or "standalone"
1743
+
1744
+ # Fixed pass counts -- no extension
1745
+ pass_counts = { quick: 3, standard: 5, deep: 7 }
1746
+ total_passes = pass_counts[depth]
1747
+
1748
+ # Depth-aware dimension subsets (coverage contract)
1749
+ depth_dimensions = {
1750
+ quick: dimensions[0:3], # first 3 dimensions only
1751
+ standard: dimensions[0:5], # first 5
1752
+ deep: dimensions, # all available
1753
+ }
1754
+ active_dims = depth_dimensions[depth]
1755
+
1756
+ codex_available = check_codex() # which codex && codex --version
1757
+
1758
+ for pass_number in 0..total_passes-1:
1759
+
1760
+ # --- Cap guard check (pipeline mode only, before each pass) ---
1761
+ if run_mode == "pipeline":
1762
+ loop_check_args = "--run <run-id> --phase <phase> --loop-count <pass_number+1>"
1763
+ if options.task_id:
1764
+ loop_check_args += " --task-id <task_id>"
1765
+ wazir capture loop-check $loop_check_args
1766
+ # loop-check wraps: event capture + evaluateLoopCapGuard
1767
+ # If loop_cap_guard fires (exit 43), stop immediately:
1768
+ if last_exit_code == 43:
1769
+ log("Loop cap reached for phase: <phase>. Escalating to user.")
1770
+ escalate_to_user(evidence_gathered_so_far)
1771
+ return { pass_count: pass_number, escalated: true }
1772
+ # Standalone mode: no cap guard. Loop runs for total_passes and stops.
1773
+
1774
+ dimension = active_dims[pass_number % len(active_dims)]
1775
+
1776
+ # --- Primary review (reviewer role, not producer) ---
1777
+ # Mode is always explicit -- passed by caller via options.mode
1778
+ findings = self_review(artifact_path, focus=dimension, mode=options.mode)
1779
+
1780
+ # --- Secondary review (Codex, if available) ---
1781
+ if codex_available:
1782
+ codex_exit_code, codex_output = run_codex_review(artifact_path, dimension)
1783
+ if codex_exit_code != 0:
1784
+ # Codex failed -- log error, fall back to self-review for this pass
1785
+ log_error("Codex exited " + codex_exit_code + ": " + codex_output.stderr)
1786
+ mark_pass_codex_unavailable(pass_number)
1787
+ # Do NOT treat Codex failure as clean. Self-review findings stand alone.
1788
+ else:
1789
+ codex_findings = parse(codex_output.stdout)
1790
+ merge(findings, codex_findings, preserve_attribution=true)
1791
+
1792
+ # --- Log the review pass ---
1793
+ if run_mode == "pipeline":
1794
+ if options.task_id:
1795
+ log_path = .wazir/runs/latest/reviews/<phase>-task-<task_id>-review-pass-<N>.md
1796
+ else:
1797
+ log_path = .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
1798
+ log(pass_number+1, dimension, findings) -> log_path
1799
+ else:
1800
+ log_path = docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md
1801
+ log(pass_number+1, dimension, findings) -> log_path
1802
+
1803
+ if findings.has_issues:
1804
+ # --- Fix and re-submit (MANDATORY) ---
1805
+ # The producer MUST fix findings and the reviewer MUST re-review.
1806
+ # "Fix and continue without re-review" is EXPLICITLY PROHIBITED.
1807
+ producer_fix(artifact_path, findings)
1808
+ # Continue to next pass -- the fix will be re-reviewed
1809
+
1810
+ # --- Post-loop: escalation if issues remain ---
1811
+ if remaining.has_issues:
1812
+ # Cap reached with unresolved findings. Present to user:
1813
+ # 1. Approve with known issues (Recommended if non-blocking)
1814
+ # 2. Fix manually and re-run
1815
+ # 3. Abort
1816
+ escalate_to_user(remaining, options=[
1817
+ "approve-with-issues",
1818
+ "fix-manually-and-rerun",
1819
+ "abort"
1820
+ ])
1821
+ # User decides. If approved, log "user-approved-with-issues" in final pass file.
1822
+
1823
+ return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
1824
+ ```
1825
+
1826
+ Key properties of this pseudocode:
1827
+
1828
+ 1. **Fixed pass counts** -- Quick is exactly 3, standard exactly 5, deep exactly 7. No `max_passes = min_passes + 3`. No clean-streak early-exit. No extension.
1829
+ 2. **Task-scoped log filenames** -- `<phase>-task-<NNN>-review-pass-<N>.md` for per-task reviews, preventing log clobbering in parallel mode.
1830
+ 3. **Task-scoped loop cap keys** -- `--task-id` flag on `loop-check` so each task gets its own counter in `phase_loop_counts`.
1831
+ 4. **Explicit review mode** -- `options.mode` is always passed by the caller. No auto-detection.
1832
+ 5. **Codex error handling** -- non-zero exit is logged, pass marked `codex-unavailable`, self-review findings used alone. Never treated as clean.
1833
+ 6. **Standalone mode** -- uses `docs/plans/` for artifacts and logs. No temp dir. No cap guard at all.
1834
+
1835
+ ---
1836
+
1837
+ ## Codex Error Handling Contract
1838
+
1839
+ ```
1840
+ run_codex_review(artifact_path, dimension):
1841
+ CODEX_MODEL = read_config('.wazir/state/config.json', '.multi_tool.codex.model') or "gpt-5.4"
1842
+
1843
+ if is_code_artifact:
1844
+ cmd = codex review -c model="$CODEX_MODEL" --uncommitted --title "..." "Review for [dimension]..."
1845
+ # or: codex review -c model="$CODEX_MODEL" --base <sha> for committed changes
1846
+ else:
1847
+ cmd = cat <artifact_path> | codex exec -c model="$CODEX_MODEL" "Review this [type] for [dimension]..."
1848
+
1849
+ result = execute(cmd, timeout=120s, capture_stderr=true)
1850
+
1851
+ if result.exit_code != 0:
1852
+ return (result.exit_code, { stderr: result.stderr, stdout: "" })
1853
+ # Caller handles: log error, mark codex-unavailable, use self-review only
1854
+
1855
+ return (0, { stdout: result.stdout, stderr: result.stderr })
1856
+ ```
1857
+
1858
+ Rules:
1859
+
1860
+ - If Codex exits non-zero, log the full stderr.
1861
+ - Mark the pass as `codex-unavailable` in the review log metadata.
1862
+ - Fall back to self-review for that pass only. Do not skip the pass.
1863
+ - Do not retry Codex on the same pass. If Codex fails on pass 2, pass 3 still tries Codex (transient failures recover).
1864
+ - Never treat a Codex failure as a clean review pass.
1865
+
1866
+ ---
1867
+
1868
+ ## Codex Availability Probe
1869
+
1870
+ Before any Codex call, verify availability once at loop start:
1871
+
1872
+ ```bash
1873
+ which codex >/dev/null 2>&1 && codex --version >/dev/null 2>&1
1874
+ ```
1875
+
1876
+ If the probe fails, set `codex_available = false` for the entire loop. Fall back to self-review only. Never error out.
1877
+
1878
+ Per-invocation failures (Codex available but a single call fails) are handled separately by the error contract above.
1879
+
1880
+ ---
1881
+
1882
+ ## Codex Artifact-Scoped Review
1883
+
1884
+ Never use `codex review` for non-code artifacts (specs, plans, designs). Instead, pipe the artifact content via stdin:
1885
+
1886
+ ```bash
1887
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1888
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1889
+ cat .wazir/runs/latest/clarified/spec-hardened.md | \
1890
+ codex exec -c model="$CODEX_MODEL" "Review this specification for: [dimension]. Be specific, cite sections. Say CLEAN if no issues." \
1891
+ 2>&1 | tee .wazir/runs/latest/reviews/spec-challenge-review-pass-N.md
1892
+ ```
1893
+
1894
+ For code artifacts, use `codex review -c model="$CODEX_MODEL" --uncommitted` (or `--base` for committed changes). See the next section for details.
1895
+
1896
+ ---
1897
+
1898
+ ## Code Review Scoping
1899
+
1900
+ **Rule: review BEFORE commit.**
1901
+
1902
+ For each task during execution:
1903
+
1904
+ 1. Implement the task (changes are uncommitted).
1905
+ 2. Review the uncommitted changes using the **5 task-execution dimensions** (NOT the 7 final-review dimensions):
1906
+ ```bash
1907
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1908
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1909
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
1910
+ "Review against acceptance criteria: <criteria>" \
1911
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
1912
+ ```
1913
+ 3. Fix any findings (still uncommitted).
1914
+ 4. Re-review until all passes exhausted or cap reached.
1915
+ 5. **Only after review passes:** commit with conventional commit format.
1916
+
1917
+ **If changes are already committed** (e.g., subagent workflow where the implementer subagent commits before review):
1918
+
1919
+ ```bash
1920
+ # Capture the SHA before the task starts
1921
+ PRE_TASK_SHA=$(git rev-parse HEAD)
1922
+
1923
+ # ... subagent implements and commits ...
1924
+
1925
+ # Review the committed changes against the pre-task baseline
1926
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1927
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1928
+ codex review -c model="$CODEX_MODEL" --base $PRE_TASK_SHA --title "Task NNN: <summary>" \
1929
+ "Review against acceptance criteria: <criteria>" \
1930
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
1931
+ ```
1932
+
1933
+ ---
1934
+
1935
+ ## Dimension Sets
1936
+
1937
+ ### Research Dimensions (5)
1938
+
1939
+ 1. **Coverage** -- all briefing topics researched
1940
+ 2. **Source quality** -- authoritative, current sources
1941
+ 3. **Relevance** -- research answers the actual questions
1942
+ 4. **Gaps** -- missing info that blocks later phases
1943
+ 5. **Contradictions** -- conflicting sources identified
1944
+
1945
+ ### Spec/Clarification Dimensions (5)
1946
+
1947
+ 1. **Completeness** -- all requirements covered
1948
+ 2. **Testability** -- each criterion verifiable
1949
+ 3. **Ambiguity** -- no dual-interpretation statements
1950
+ 4. **Assumptions** -- hidden assumptions explicit
1951
+ 5. **Scope creep** -- nothing beyond briefing
1952
+
1953
+ ### Design-Review Dimensions (5)
1954
+
1955
+ Matches canonical `workflows/design-review.md`:
1956
+
1957
+ 1. **Spec coverage** -- does the design address every acceptance criterion with a visual component?
1958
+ 2. **Design-spec consistency** -- does the design introduce anything not in the spec? (scope creep check)
1959
+ 3. **Accessibility** -- color contrast ratios (WCAG 2.1 AA), focus states, touch target sizes (44x44px minimum)
1960
+ 4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
1961
+ 5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
1962
+
1963
+ ### Plan Dimensions (7)
1964
+
1965
+ 1. **Completeness** -- all design decisions mapped to tasks
1966
+ 2. **Ordering** -- dependencies correct, parallelizable identified
1967
+ 3. **Atomicity** -- each task fits one session
1968
+ 4. **Testability** -- concrete verification per task
1969
+ 5. **Edge cases** -- error paths covered
1970
+ 6. **Security** -- auth, injection, data exposure
1971
+ 7. **Integration** -- tasks connect end-to-end
1972
+
1973
+ ### Task Execution Dimensions (5)
1974
+
1975
+ Used for per-task review during execution:
1976
+
1977
+ 1. **Correctness** -- code matches spec
1978
+ 2. **Tests** -- real tests, not mocked/faked
1979
+ 3. **Wiring** -- all paths connected
1980
+ 4. **Drift** -- matches task spec
1981
+ 5. **Quality** -- naming, error handling
1982
+
1983
+ ### Final Review Dimensions (7)
1984
+
1985
+ Used for `workflows/review.md` scored gate:
1986
+
1987
+ 1. **Correctness** -- does the code do what the spec says?
1988
+ 2. **Completeness** -- are all acceptance criteria met?
1989
+ 3. **Wiring** -- are all paths connected end-to-end?
1990
+ 4. **Verification** -- is there evidence (tests, type checks) for each claim?
1991
+ 5. **Drift** -- does the implementation match the approved plan?
1992
+ 6. **Quality** -- code style, naming, error handling, security
1993
+ 7. **Documentation** -- changelog entries, commit messages, comments
1994
+
1995
+ The final review dimensions are the existing 7 from `skills/reviewer/SKILL.md`. `workflows/review.md` is not modified by this pattern.
1996
+
1997
+ ---
1998
+
1999
+ ## Per-Depth Coverage Contract
2000
+
2001
+ | Depth | Research | Spec | Design-Review | Plan | Task Execution | Final Review |
2002
+ |-------|----------|------|---------------|------|----------------|--------------|
2003
+ | Quick | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | always 7 dims, 1 pass |
2004
+ | Standard | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | always 7 dims, 1 pass |
2005
+ | Deep | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-7, 7 passes | dims 1-5, 7 passes | always 7 dims, 1 pass |
2006
+
2007
+ Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7 passes. No extension. No early-exit. Final review is always a single scored pass across all 7 dimensions -- it is a gate, not a loop.
2008
+
2009
+ ---
2010
+
2011
+ ## Loop Cap Configuration
2012
+
2013
+ The `workflow_policy` section of `run-config.yaml` (legacy: `phase_policy`) controls which workflows are enabled and sets an absolute safety ceiling per workflow. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not workflow policy.
2014
+
2015
+ ```yaml
2016
+ workflow_policy:
2017
+ # Clarifier phase workflows
2018
+ discover: { enabled: true, loop_cap: 10 }
2019
+ clarify: { enabled: true, loop_cap: 10 }
2020
+ specify: { enabled: true, loop_cap: 10 }
2021
+ spec-challenge: { enabled: true, loop_cap: 10 }
2022
+ author: { enabled: false, loop_cap: 10 }
2023
+ design: { enabled: true, loop_cap: 10 }
2024
+ design-review: { enabled: true, loop_cap: 10 }
2025
+ plan: { enabled: true, loop_cap: 10 }
2026
+ plan-review: { enabled: true, loop_cap: 10 }
2027
+ # Executor phase workflows
2028
+ execute: { enabled: true, loop_cap: 10 }
2029
+ verify: { enabled: true, loop_cap: 5 }
2030
+ review: { enabled: true, loop_cap: 10 }
2031
+ learn: { enabled: true, loop_cap: 5 }
2032
+ prepare_next: { enabled: true, loop_cap: 5 }
2033
+ run_audit: { enabled: false, loop_cap: 10 }
2034
+ ```
2035
+
2036
+ **`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
2037
+
2038
+ **Adaptive workflows** (`author`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection.
2039
+
2040
+ **Post-run workflows** (`learn`, `prepare_next`) default to `enabled: true`. They run as part of the Final Review phase:
2041
+
2042
+ - `learn` extracts durable learnings from review findings -- recurring findings become accepted learnings.
2043
+ - `prepare_next` prepares context and handoff for the next run.
2044
+ - `author` has a human approval gate, not an iterative review loop.
2045
+ - `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
2046
+
2047
+ ---
2048
+
2049
+ ## Reviewer Mode Table
2050
+
2051
+ The reviewer skill operates in different modes depending on the phase. **Mode is always explicit** -- the caller passes `--mode <mode>`. There is no auto-detection based on artifact availability.
2052
+
2053
+ | Mode | Invoked during | Prerequisites | Dimensions | Output |
2054
+ |------|---------------|---------------|------------|--------|
2055
+ | `final` | After execution + verification | Completed task artifacts in `.wazir/runs/latest/artifacts/` | 7 final-review dims, scored 0-70 | Verdict: PASS/NEEDS FIXES/NEEDS REWORK/FAIL |
2056
+ | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Findings with severity, no score |
2057
+ | `design-review` | After design approval | Design artifact, approved spec, accessibility guidelines | 5 design-review dims (canonical) | Findings with severity (blocking/advisory) |
2058
+ | `plan-review` | After planning | Draft plan, approved spec, design artifact | 7 plan dims | Findings with severity, no score |
2059
+ | `task-review` | During execution, per task | Uncommitted changes (or committed with known base SHA) | 5 task-execution dims | Pass/fail per task, no score |
2060
+ | `research-review` | During discover | Research artifact | 5 research dims | Findings with severity, no score |
2061
+ | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Findings with severity, no score |
2062
+
2063
+ If `--mode` is not provided, the reviewer asks the user which review to run. Auto-detection based on artifact availability is NOT used -- it causes ambiguity in resumed/multi-phase runs where stale artifacts from prior phases exist.
2064
+
2065
+ Each caller is responsible for passing the correct mode:
2066
+
2067
+ - Clarifier passes `--mode clarification-review` after Phase 1A
2068
+ - Discover workflow passes `--mode research-review` after research
2069
+ - Specifier flow passes `--mode spec-challenge` after specify
2070
+ - Brainstorming passes `--mode design-review` after user approval
2071
+ - Writing-plans passes `--mode plan-review` after planning
2072
+ - Executor passes `--mode task-review` for each task
2073
+ - `/wazir` runner passes `--mode final` for the final review gate
2074
+
2075
+ ---
2076
+
2077
+ ## Codex Prompt Templates
2078
+
2079
+ All Codex invocations read the model from config with a fallback:
2080
+
2081
+ ```bash
2082
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
2083
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
2084
+ ```
2085
+
2086
+ ### Artifact Review (specs, plans, designs via stdin)
2087
+
2088
+ Use this template with `codex exec` for non-code artifacts piped via stdin:
2089
+
2090
+ ```bash
2091
+ cat <artifact_path> | codex exec -c model="$CODEX_MODEL" \
2092
+ "You are reviewing a [ARTIFACT_TYPE] for the Wazir engineering OS.
2093
+ Focus on [DIMENSION]: [dimension description].
2094
+ Rules: cite specific sections, be actionable, say CLEAN if no issues.
2095
+ Do NOT load or invoke any skills. Do NOT read the codebase.
2096
+ Review ONLY the content provided via stdin."
2097
+ ```
2098
+
2099
+ Replace `[ARTIFACT_TYPE]` with: `specification`, `implementation plan`, `design document`, `research brief`, or `clarification`.
2100
+ Replace `[DIMENSION]` and `[dimension description]` with the current review pass dimension from the relevant dimension set above.
2101
+
2102
+ ### Code Review (diffs via --uncommitted or --base)
2103
+
2104
+ Use this template with `codex review` for code changes:
2105
+
2106
+ ```bash
2107
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
2108
+ "Review the code changes for [DIMENSION]: [dimension description].
2109
+ Check against acceptance criteria: [criteria].
2110
+ Flag: correctness issues, missing tests, unwired paths, drift from spec.
2111
+ Do NOT load or invoke any skills."
2112
+ ```
2113
+
2114
+ For committed changes, replace `--uncommitted` with `--base <sha>`.
2115
+ Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
2116
+
2117
+ ---
2118
+
2119
+ ## Codex Output Context Protection
2120
+
2121
+ Codex CLI output includes internal traces (file reads, tool calls, reasoning) that are NOT useful for the review — only the final findings matter. To prevent context flooding:
2122
+
2123
+ ### Tee + Extract Pattern
2124
+
2125
+ 1. **Always tee** Codex output to a file:
2126
+ ```bash
2127
+ codex exec ... 2>&1 | tee .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
2128
+ ```
2129
+
2130
+ 2. **Extract findings** after the last `codex` marker using `execute_file`:
2131
+ ```bash
2132
+ # If context-mode available (has_execute_file: true):
2133
+ mcp__plugin_context-mode_context-mode__execute_file(
2134
+ path: ".wazir/runs/latest/reviews/<phase>-review-pass-<N>.md",
2135
+ language: "shell",
2136
+ code: "tac $FILE | sed '/^codex$/q' | tac | tail -n +2"
2137
+ )
2138
+ ```
2139
+
2140
+ 3. **Present extracted findings only** — the raw trace stays in the file for debugging but never enters the main context window.
2141
+
2142
+ ### Fallback (no context-mode)
2143
+
2144
+ If `context_mode.has_execute_file` is false, extract using shell directly:
2145
+
2146
+ ```bash
2147
+ tac <file> | sed '/^codex$/q' | tac | tail -n +2
2148
+ ```
2149
+
2150
+ This reverses the file, finds the first (= last original) `codex` marker, reverses back, and skips the marker line.
2151
+
2152
+ **If no marker found:** fail closed
2153
+
2154
+ ---
2155
+
2156
+ ## Phase Scoring: First vs Final Artifact Comparison
2157
+
2158
+ At the start of each review loop (pass 1), score the artifact on its phase's canonical dimension set (1-10 per dimension). At the end of the loop (final pass), score again using the **same canonical dimensions**. Present the delta in the end-of-phase report.
2159
+
2160
+ ### Canonical Dimension Sets Per Phase
2161
+
2162
+ These are the fixed rubrics — no ad-hoc dimension selection:
2163
+
2164
+ | Phase | Canonical Dimensions |
2165
+ |-------|---------------------|
2166
+ | research-review | Coverage, Source quality, Relevance, Gaps identified, Actionability |
2167
+ | clarification-review / spec-challenge | Completeness, Testability, Ambiguity, Assumptions, Scope creep |
2168
+ | design-review | Spec coverage, Design-spec consistency, Accessibility, Visual consistency, Exported-code fidelity |
2169
+ | plan-review | Completeness, Testability, Task granularity, Dependency correctness, Phase structure, File coverage, Estimation accuracy |
2170
+ | task-review | Correctness, Tests, Wiring, Drift, Quality |
2171
+ | final | Correctness, Completeness, Wiring, Verification, Drift, Quality, Documentation |
2172
+
2173
+ ### Scoring Rules
2174
+
2175
+ 1. Initial and final scores MUST use the **same dimension set** — the delta is only meaningful on the same rubric.
2176
+ 2. The reviewer records which dimension set was used in each pass file.
2177
+ 3. Delta format: `Dimension: X/10 → Y/10 (+Z)`.
2178
+
2179
+ ### Quality Delta Report Section
2180
+
2181
+ The end-of-phase report (see "End-of-Phase Report" below) includes a **Quality Delta** section:
2182
+
2183
+ ```markdown
2184
+ ## Quality Delta
2185
+
2186
+ | Dimension | Initial | Final | Delta |
2187
+ |-----------|---------|-------|-------|
2188
+ | Completeness | 4/10 | 9/10 | +5 |
2189
+ | Testability | 3/10 | 8/10 | +5 |
2190
+ | Ambiguity | 5/10 | 9/10 | +4 |
2191
+ ```
2192
+
2193
+ ---
2194
+
2195
+ ## End-of-Phase Report
2196
+
2197
+ Every phase exit produces a report saved to `.wazir/runs/latest/reviews/<phase>-report.md` containing:
2198
+
2199
+ 1. **Summary** — what the phase produced
2200
+ 2. **Key Changes** — first-version vs final-version highlights (not full diff — what improved)
2201
+ 3. **Quality Delta** — per-dimension before/after scores (see Phase Scoring above)
2202
+ 4. **Findings Log** — per-pass finding counts by severity (e.g., "Pass 1: 6 findings (3 blocking, 2 warning, 1 note). Pass 7: 0 findings. All resolved.")
2203
+ 5. **Usage** — token usage from `wazir capture usage` (runs before report generation)
2204
+ 6. **Context Savings** — context-mode stats if available, omit section if not
2205
+ 7. **Time Spent** — wall-clock elapsed time from phase start to end — log "codex marker not found in output, cannot extract findings" and present a warning to the user with 0 findings extracted. The raw file is preserved for manual review. Do NOT fall back to `tail` or any best-effort extraction that could leak traces into context.
2206
+
2207
+
1539
2208
  ---
1540
2209
  ## Source: docs/reference/roles-reference.md
1541
2210
 
@@ -1576,6 +2245,7 @@ This is the lookup reference for canonical roles, workflows, and their contracts
1576
2245
  | `review` | `verify` | Adversarial quality review |
1577
2246
  | `learn` | `review` | Capture scoped learnings |
1578
2247
  | `prepare-next` | `learn` | Produce clean next-run handoff |
2248
+ | `run-audit` | (standalone) | Structured codebase audit with source-backed findings |
1579
2249
 
1580
2250
  ## Role routing valid values
1581
2251
 
@@ -1617,6 +2287,159 @@ Roles that explore broadly (clarifier, researcher, planner) benefit most from L1
1617
2287
 
1618
2288
  See [Indexing and Recall](../concepts/indexing-and-recall.md) for full details on tiers and commands.
1619
2289
 
2290
+
2291
+ ---
2292
+ ## Source: docs/reference/skill-tiers.md
2293
+
2294
+ # Skill Tier Classification
2295
+
2296
+ Audit of Wazir skills against Superpowers v4.3.1 skills.
2297
+ Each skill is classified into one of three tiers:
2298
+
2299
+ - **Delegate** -- use superpowers skill as-is, delete Wazir fork
2300
+ - **Augment** -- use superpowers skill + inject Wazir context addendum (strictly additive, no overrides). **NOTE:** R2 validation found this tier is not implementable -- see [Augment Mechanism](#augment-mechanism) below.
2301
+ - **Own** -- Wazir-original or structurally rewritten skill, rename to `wz:` prefix
2302
+
2303
+ ---
2304
+
2305
+ ## Classification Table
2306
+
2307
+ | Wazir Skill | Superpowers Equivalent | Tier | Rationale | Risk Notes |
2308
+ |---|---|---|---|---|
2309
+ | brainstorming | brainstorming | **Own** | Structurally rewritten. Superpowers version is a linear checklist (explore context, ask questions, propose approaches, present design, write doc, invoke writing-plans). Wazir replaces the entire process: adds Command Routing and Codebase Exploration preambles, replaces the design-doc step with a design-review loop (`--mode design-review` with canonical dimensions), outputs to `.wazir/runs/latest/clarified/design.md` instead of `docs/plans/`, and adds a complete Agent Teams multi-agent brainstorming mode (Free Thinker / Grounder / Synthesizer / Arbiter pattern using TeamCreate/SendMessage). None of the superpowers process steps survive intact. | Dropping the Agent Teams mode would lose Wazir's most differentiated brainstorming capability. |
2310
+ | clarifier | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2311
+ | debugging | systematic-debugging | **Own** | Structurally rewritten. Superpowers has a 4-phase process (Root Cause Investigation with 5 substeps, Pattern Analysis, Hypothesis and Testing, Implementation) totaling ~300 lines with detailed examples, rationalization tables, and supporting technique references. Wazir condenses this to a 4-step observe-hypothesize-test-fix loop (~75 lines), replaces all codebase exploration with Wazir CLI symbol-first exploration (`wazir index search-symbols`, `wazir recall symbol` and `wazir recall file`), adds loop cap awareness (pipeline mode with `wazir capture loop-check` vs. standalone mode), and removes all superpowers examples, rationalization tables, and red-flag lists. The methodology is fundamentally different in structure despite sharing the spirit of "root cause first." | Delegating would lose Wazir CLI integration and loop cap awareness. Superpowers version is far more detailed on anti-patterns and may be worth referencing separately. |
2312
+ | design | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2313
+ | dispatching-parallel-agents | dispatching-parallel-agents | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core (When to Use decision tree, The Pattern with 4 steps, Agent Prompt Structure, Common Mistakes section) plus Wazir additions (Command Routing preamble, Codebase Exploration preamble, philosophical paragraph in Overview, Problem/Fix format for Common Mistakes). Drops superpowers-only sections: "When NOT to Use," "Real Example from Session," "Key Benefits," "Verification," "Real-World Impact." | Superpowers informational sections (Real Example, Key Benefits, Verification, Real-World Impact) not carried forward. Low risk -- these are teaching content, not behavioral. |
2314
+ | executing-plans | executing-plans | **Own** | Structurally rewritten. Superpowers uses batch execution (default first 3 tasks) with report-and-wait checkpoints and explicit batch feedback loops. Wazir replaces batching with per-task execution, adds a per-task review loop (`--mode task-review` with 5 task-execution dimensions, Codex integration, review log filenames, loop cap tracking via `wazir capture loop-check`), adds standalone vs. pipeline mode detection, and adds a note recommending wz:subagent-driven-development when subagents are available. The batch-vs-per-task change is a core behavioral difference. All integration references point to `wz:` skills. | Delegating would lose per-task review loops and pipeline mode integration. |
2315
+ | executor | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2316
+ | finishing-a-development-branch | finishing-a-development-branch | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers process (5 steps: verify tests, determine base branch, present 4 options, execute choice, cleanup worktree) preserved with identical structure and identical option semantics. Wazir adds Command Routing and Codebase Exploration preambles. Minor cosmetic changes: `<N>` removed from failure template, `<base-branch>` shortened to `<base>`, emoji checkmarks replaced with Y/-, `<commit-list>` changed to `<count>`, PR body simplified. Red Flags and Integration sections trimmed but no behavioral contradiction. | Low risk. The superpowers version has more detailed Red Flags and Integration sections not carried forward. |
2317
+ | humanize | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2318
+ | init-pipeline | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2319
+ | prepare-next | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2320
+ | receiving-code-review | receiving-code-review | **Own** | Structurally rewritten. Superpowers has extensive sections: Forbidden Responses, Source-Specific Handling, YAGNI Check, Implementation Order, When To Push Back, Acknowledging Correct Feedback (with detailed anti-patterns for gratitude), Gracefully Correcting Pushback, Common Mistakes table, Real Examples, and GitHub Thread Replies. Wazir preserves the core Response Pattern and Forbidden Responses but: (1) adds Loop Tracking section (pipeline mode with `wazir capture loop-check` and standalone pass counts), (2) restructures Implementation Order to a 4-tier priority (blocking, functional, quality, nice-to-have) instead of 3-tier, (3) adds a Quick Reference decision table, (4) removes the entire "Acknowledging Correct Feedback" anti-gratitude section, the "Gracefully Correcting Pushback" section, the Common Mistakes table, all Real Examples, the "When To Push Back" enumeration, and the GitHub Thread Replies section. The Loop Tracking addition and structural deletions make this a substantive rewrite. | Delegating would lose loop tracking. The removed anti-gratitude and pushback sections from superpowers are valuable behavioral guardrails worth preserving. |
2321
+ | requesting-code-review | requesting-code-review | **Own** | Structurally rewritten. Both skills share the same When to Request triggers and Example structure. But Wazir: (1) replaces `superpowers:code-reviewer` with `wz:code-reviewer`, (2) adds explicit review loop parameters (`--mode`, depth-aware dimensions, pass number), (3) adds `codex review --uncommitted` and `codex review --base` commands, (4) adds Codex Error Handling section, (5) adds `{REVIEW_MODE}` placeholder, (6) changes Integration section to reference per-task review checkpoints instead of batch review, (7) adds "Dispatch review without explicit `--mode`" to Red Flags. The Codex integration and review loop parameter system are structural additions that change how reviews are dispatched. | Delegating would lose Codex integration and review loop protocol. |
2322
+ | reviewer | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2323
+ | run-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2324
+ | scan-project | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2325
+ | self-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2326
+ | subagent-driven-development | subagent-driven-development | **Own** | Structurally rewritten. Both share the same high-level process (fresh subagent per task, two-stage review, spec then quality). But Wazir: (1) adds `Capture PRE_TASK_SHA` step to the process flowchart for diff scoping, (2) adds Code Review Scoping section (`codex review --base <pre-task-sha>`), (3) adds Review Loop Alignment section (explicit `--mode task-review`, task-scoped log filenames, loop cap via `wazir capture loop-check`), (4) adds Codex Error Handling section, (5) adds standalone mode fallback, (6) changes all skill references from `superpowers:` to `wz:`, (7) adds "Review the wrong diff" to Red Flags, (8) removes the Example Workflow, Advantages detail, and Cost breakdown from superpowers. The diff-scoping and review-loop integration are structural process changes. | Delegating would lose diff-scoped reviews and Codex integration. The removed Example Workflow from superpowers is a useful teaching tool. |
2327
+ | tdd | test-driven-development | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~370 lines): detailed Red-Green-Refactor with Good/Bad code examples, Iron Law with explicit "delete and start over" rules, a Verification Checklist, extensive Why Order Matters section, Common Rationalizations table, When Stuck guide, Testing Anti-Patterns reference, and Debugging Integration. Wazir condenses to ~45 lines with 3 steps (RED, GREEN, REFACTOR), adds a single-pass test quality check in RED phase ("Are these tests testing the right behavior? Are they real assertions?"), and removes all examples, rationalization tables, and elaboration. Different description and name (`wz:tdd` vs `test-driven-development`). | Delegating would lose the test quality check. The superpowers version's extensive rationalization prevention and examples are valuable for discipline enforcement but costly in tokens. |
2328
+ | using-git-worktrees | using-git-worktrees | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core process (directory selection priority, safety verification with `git check-ignore`, creation steps, project setup auto-detection, clean baseline verification) preserved structurally intact. Wazir adds: Command Routing preamble, Codebase Exploration preamble, global directory changed from `~/.config/superpowers/worktrees/` to `~/.wazir/worktrees/`, Cleanup and Common Issues sections (submodules, lock files, stale worktrees). Drops superpowers-only sections: Example Workflow, Quick Reference table, Common Mistakes, Red Flags, Integration. | Dropped superpowers sections (Quick Reference, Common Mistakes, Red Flags, Integration) reduce operational guardrails. Could be recovered into the Own skill. |
2329
+ | using-skills | using-superpowers | **Own** | Structurally rewritten. Both enforce the same core rule (invoke skills before any response, even at 1% chance). But Wazir: (1) renames from `using-superpowers` to `using-skills`, (2) changes all internal skill references from `superpowers:` to `wz:` throughout flowchart and examples, (3) removes the Skill Types section detail about "Rigid vs Flexible" elaboration, (4) removes User Instructions elaboration. The name change and systematic `wz:` prefix replacement throughout the flowchart make this a namespace-level rewrite. | Could potentially be Augment if namespace mapping were handled at a routing layer rather than in-skill. |
2330
+ | verification | verification-before-completion | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~140 lines): Iron Law, Gate Function (5-step IDENTIFY/RUN/READ/VERIFY/CLAIM), Common Failures table, Red Flags list, Rationalization Prevention table, Key Patterns (tests, regression, build, requirements, agent delegation), Why This Matters section with 24 failure memories, and When To Apply section. Wazir condenses to ~35 lines with 3 bullet requirements (what was verified, exact command, actual result), a minimum rule, and a brief "when verification fails" section. Different name (`wz:verification` vs `verification-before-completion`). | Delegating would lose the concise Wazir format. The superpowers version's extensive rationalization prevention is valuable for discipline but token-expensive. The Wazir version may be too terse to enforce the discipline effectively. |
2331
+ | wazir | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2332
+ | writing-plans | writing-plans | **Own** | Structurally rewritten. Superpowers focuses on plan document format (header template, task structure with bite-sized steps, code examples in plan, execution handoff to subagent-driven or parallel session). Wazir: (1) changes inputs to "approved design or approved clarified direction" instead of "spec or requirements", (2) adds pipeline-aware output paths (`.wazir/runs/latest/clarified/execution-plan.md` and `.wazir/runs/latest/tasks/task-NNN/spec.md` vs. standalone `docs/plans/`), (3) removes the plan document format template entirely (no header template, no task structure template, no code examples), (4) adds Plan Review Loop section with `wz:reviewer --mode plan-review`, Codex integration via stdin pipe, Codex error handling, depth-aware pass counts, and standalone fallback. The plan review loop and pipeline path system are structural additions; the removal of the format template is a structural deletion. | Delegating would lose pipeline integration and plan review loop. The removed format template from superpowers is valuable for plan quality and could be worth recovering. |
2333
+ | writing-skills | writing-skills | **Own** | Structurally rewritten. Both share the TDD-for-skills philosophy and RED-GREEN-REFACTOR mapping. But Wazir: (1) condenses from ~650 lines to ~170 lines, (2) removes the extensive SKILL.md Structure template, CSO (Claude Search Optimization) section, Flowchart Usage guidelines, Code Examples guidelines, Token Efficiency section, File Organization examples, Testing All Skill Types section (discipline/technique/pattern/reference), Common Rationalizations for Skipping Testing table, Bulletproofing Skills Against Rationalization section (with Cialdini psychology reference), Skill Creation Checklist, Discovery Workflow, Anti-Patterns section, and STOP deployment gate, (3) adds "Be Prescriptive, Not Descriptive" guidance, "Use Rationalization Prevention" example, "Include Decision Trees" guidance, and skill reference syntax. The massive content reduction and different teaching approach make this a structural rewrite. | Delegating would lose the concise prescriptive format. The superpowers version's CSO guidelines, testing methodology, and anti-pattern catalog are extremely valuable reference material. |
2334
+
2335
+ ---
2336
+
2337
+ ## Superpowers Skills with No Wazir Counterpart
2338
+
2339
+ These superpowers skills have no Wazir fork. They could be used as-is via the superpowers plugin.
2340
+
2341
+ | Superpowers Skill | Status | Notes |
2342
+ |---|---|---|
2343
+ | using-superpowers | Replaced by `wz:using-skills` | See using-skills row above. |
2344
+
2345
+ All 14 superpowers skills have a Wazir counterpart (using-superpowers maps to using-skills, systematic-debugging maps to debugging, test-driven-development maps to tdd, verification-before-completion maps to verification).
2346
+
2347
+ ---
2348
+
2349
+ ## Summary by Tier
2350
+
2351
+ | Tier | Count | Skills |
2352
+ |---|---|---|
2353
+ | **Own** | 25 | brainstorming, clarifier, debugging, design, dispatching-parallel-agents, executing-plans, executor, finishing-a-development-branch, humanize, init-pipeline, prepare-next, receiving-code-review, requesting-code-review, reviewer, run-audit, scan-project, self-audit, subagent-driven-development, tdd, using-git-worktrees, using-skills, verification, wazir, writing-plans, writing-skills |
2354
+ | **Augment** | 0 | _(none -- tier not implementable, see [Augment Mechanism](#augment-mechanism))_ |
2355
+ | **Delegate** | 0 | _(none)_ |
2356
+
2357
+ ---
2358
+
2359
+ ## Common Wazir Additions (Appear in All Forked Skills)
2360
+
2361
+ Every Wazir fork of a superpowers skill adds these two preamble sections:
2362
+
2363
+ 1. **Command Routing** -- routes large commands to context-mode tools and small commands to native Bash, following `hooks/routing-matrix.json`.
2364
+ 2. **Codebase Exploration** -- prescribes symbol-first exploration via `wazir index search-symbols` and `wazir recall`, with fallback to direct file reads.
2365
+
2366
+ These preambles alone would justify **Augment** tier for any skill where no other structural changes exist.
2367
+
2368
+ ---
2369
+
2370
+ ## Augment Mechanism
2371
+
2372
+ **Research date:** 2026-03-19 (R2: Composition Infrastructure Validation)
2373
+
2374
+ ### Finding: Augment tier is not implementable
2375
+
2376
+ The Augment tier assumed that placing a Wazir addendum at `~/.claude/skills/<skill-name>/SKILL.md` would layer Wazir context on top of the superpowers base skill. This assumption is wrong. **Skill shadowing is full-override, not merge/append.**
2377
+
2378
+ ### Evidence
2379
+
2380
+ **1. `skills-core.js` `resolveSkillPath()` (superpowers v4.3.1)**
2381
+
2382
+ The function at `lib/skills-core.js:108-140` checks personal skills directory first. If `~/.claude/skills/<name>/SKILL.md` exists, it returns that file immediately and never reads the superpowers version. There is no content merging.
2383
+
2384
+ ```
2385
+ // Try personal skills first (unless explicitly superpowers:)
2386
+ if (!forceSuperpowers && personalDir) {
2387
+ const personalSkillFile = path.join(personalDir, actualSkillName, 'SKILL.md');
2388
+ if (fs.existsSync(personalSkillFile)) {
2389
+ return { skillFile: personalSkillFile, sourceType: 'personal', ... };
2390
+ // ^^^ returns here -- superpowers version never consulted
2391
+ }
2392
+ }
2393
+ ```
2394
+
2395
+ **2. Superpowers test suite confirms override behavior**
2396
+
2397
+ `tests/opencode/test-skills-core.sh` line 336 asserts:
2398
+ ```
2399
+ [PASS] Personal skills shadow superpowers skills
2400
+ ```
2401
+
2402
+ The test creates `personal-skills/shared-skill/SKILL.md` and `superpowers-skills/shared-skill/SKILL.md`, resolves `shared-skill`, and verifies `sourceType` is `"personal"` -- the superpowers version is invisible.
2403
+
2404
+ **3. Superpowers RELEASE-NOTES.md v3.3.0**
2405
+
2406
+ Line 385 documents the behavior explicitly: "Personal skills override superpowers skills when names match."
2407
+
2408
+ **4. The `superpowers:` prefix bypass is not available in Claude Code**
2409
+
2410
+ `skills-core.js` supports `superpowers:skill-name` syntax to force resolution to the superpowers version even when a personal skill shadows it. However, `skills-core.js` is only used by the OpenCode plugin (`/.opencode/plugins/superpowers.js`). Claude Code's native `Skill` tool has its own built-in resolution logic that does not expose this prefix bypass.
2411
+
2412
+ ### Alternatives Considered
2413
+
2414
+ | Approach | Viable? | Why |
2415
+ |---|---|---|
2416
+ | Place addendum in `~/.claude/skills/<name>/` | No | Full override -- base skill content lost |
2417
+ | Merge base + addendum in SKILL.md at install time | Partial | Would work but creates a maintenance coupling: every superpowers update requires re-merging. This is functionally identical to Own tier. |
2418
+ | Inject Wazir context via CLAUDE.md | No | CLAUDE.md is project-scoped; skill behavior should be global across all projects |
2419
+ | Use `superpowers:` prefix to load base, then append | No | Prefix only works in OpenCode's `skills-core.js`, not in Claude Code's native Skill tool |
2420
+ | Propose upstream merge/append feature | Future | Would require a superpowers or Claude Code platform change |
2421
+
2422
+ ### Conclusion
2423
+
2424
+ The Augment tier is architecturally impossible with the current skill discovery mechanism. All three former Augment skills (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) are reclassified to **Own** tier. Since the Wazir versions already carry the full superpowers base content plus Wazir additions, no content is lost -- the skills simply cannot delegate to a shared base.
2425
+
2426
+ If superpowers or Claude Code introduces a composition/layering mechanism in the future (e.g., `extends: superpowers:dispatching-parallel-agents` in frontmatter), the Augment tier could be revisited.
2427
+
2428
+ ---
2429
+
2430
+ ## Observations
2431
+
2432
+ 1. **No Delegate candidates exist.** Every Wazir fork adds at minimum the Command Routing and Codebase Exploration preambles, which prevents pure delegation.
2433
+
2434
+ 2. **Augment tier is not implementable.** R2 validation (2026-03-19) found that skill shadowing in both superpowers `skills-core.js` and Claude Code's native Skill tool is full-override: placing a SKILL.md in `~/.claude/skills/<name>/` completely replaces the superpowers skill with the same name. There is no merge or append mechanism. The three former Augment candidates (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) have been reclassified to Own. See [Augment Mechanism](#augment-mechanism) for full analysis.
2435
+
2436
+ 3. **All 14 forked skills are Own** because either (a) they introduce structural process changes (review loops, pipeline mode, Codex integration, Agent Teams, content restructuring) or (b) the Augment composition mechanism does not exist in the platform.
2437
+
2438
+ 4. **Token cost tradeoff is significant.** Several Wazir Own skills (tdd, verification, debugging, writing-skills) are dramatically shorter than their superpowers counterparts. The superpowers versions contain valuable rationalization prevention tables, detailed examples, and anti-pattern catalogs that enforce discipline. The Wazir versions trade this for token efficiency. This tradeoff should be revisited -- some of the removed discipline content may be worth recovering as separate reference files.
2439
+
2440
+ 5. **The `wz:` prefix is already applied** in skill names within the Wazir SKILL.md frontmatter for all forked skills, consistent with the Own tier convention.
2441
+
2442
+
1620
2443
  ---
1621
2444
  ## Source: docs/reference/skills.md
1622
2445
 
@@ -1654,6 +2477,7 @@ These skills remain on the active surface:
1654
2477
  - Skills must not instruct users to run background services or wrapper scripts that are not part of the canonical workflow surface.
1655
2478
  - When a skill becomes contradictory to the current operating model, remove it from `skills/`.
1656
2479
 
2480
+
1657
2481
  ---
1658
2482
  ## Source: docs/reference/templates.md
1659
2483
 
@@ -1687,6 +2511,7 @@ Each template requires run metadata, sources, loop number, and approval status w
1687
2511
 
1688
2512
  Schema-backed examples under `templates/examples/` exist to keep schemas, examples, and validation in sync.
1689
2513
 
2514
+
1690
2515
  ---
1691
2516
  ## Source: docs/reference/tooling-cli.md
1692
2517
 
@@ -1707,6 +2532,7 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
1707
2532
  | `wazir validate commits` | implemented | Validates conventional commit format for commits in the range `--base..--head` (or auto-detected base to HEAD). |
1708
2533
  | `wazir validate changelog` | implemented | Validates `CHANGELOG.md` structure; with `--require-entries` and `--base`, enforces new entries since the base. |
1709
2534
  | `wazir validate docs-drift` | implemented | Detects when source files (roles, workflows, skills, hooks) change without corresponding documentation updates. Advisory by default; `--strict` exits non-zero on drift. |
2535
+ | `wazir validate skills` | implemented | Validates skill frontmatter and checks for name conflicts with superpowers skills (requires `wz:` prefix). Rejects any `CONTEXT.md` files (augment tier concluded not implementable in R2). |
1710
2536
  | `wazir validate artifacts` | reserved | Exits `2` until artifact-template and example validation expands. |
1711
2537
  | `wazir export build` | implemented | Generates host packages under `exports/hosts/*` from canonical sources. |
1712
2538
  | `wazir export --check` | implemented | Verifies generated host packages still match current canonical source hashes. |
@@ -1720,19 +2546,22 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
1720
2546
  | `wazir recall file` | implemented | Returns an exact line-bounded slice from an indexed file. Supports `--tier L0\|L1` for summary recall. |
1721
2547
  | `wazir recall symbol` | implemented | Returns an exact slice for an indexed symbol match. Supports `--tier L0\|L1` for summary recall. |
1722
2548
  | `wazir doctor` | implemented | Validates the active repo surface for manifest, hooks, state-root policy, and host export directory presence. |
1723
- | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. |
2549
+ | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. Includes a one-line context savings summary when usage data is available. |
2550
+ | `wazir stats` | implemented | Shows token savings statistics for a run, including total queries, estimated tokens saved, bytes avoided, per-tool breakdown, and overall savings ratio. |
1724
2551
  | `wazir capture init` | implemented | Creates a run ledger with `status.json`, `events.ndjson`, and a captures directory under the configured state root. |
1725
2552
  | `wazir capture event` | implemented | Appends a run event and can update phase, status, and loop counts in `status.json`. |
1726
2553
  | `wazir capture route` | implemented | Reserves a run-local capture file path for large tool output. |
1727
2554
  | `wazir capture output` | implemented | Writes captured tool output to a run-local file and records a `post_tool_capture` event. |
1728
2555
  | `wazir capture summary` | implemented | Writes `summary.md` and records the chosen summary or handoff event. |
1729
2556
  | `wazir capture usage` | implemented | Generates a token savings report for a run, showing capture routing statistics and context window savings. |
2557
+ | `wazir capture loop-check` | implemented | Records a loop iteration event and evaluates the loop cap guard. Exits 43 if the phase loop cap is exceeded. Accepts `--task-id` for task-scoped cap tracking. In standalone mode (no status.json), exits 0. |
1730
2558
 
1731
2559
  ## Exit codes
1732
2560
 
1733
2561
  - `0`: requested check passed
1734
2562
  - `1`: invalid input or validation failure
1735
2563
  - `2`: command surface exists but the implementation is intentionally not complete yet
2564
+ - `43`: phase loop cap exceeded (returned by `wazir capture loop-check`)
1736
2565
 
1737
2566
  ## Root discovery
1738
2567
 
@@ -1785,6 +2614,7 @@ Executable documentation claims are registered in:
1785
2614
 
1786
2615
  `wazir validate docs` uses that file plus active markdown link checks to prevent stale command and path claims from silently drifting.
1787
2616
 
2617
+
1788
2618
  ---
1789
2619
  ## Source: README.md
1790
2620
 
@@ -1796,7 +2626,7 @@ Executable documentation claims are registered in:
1796
2626
  </picture>
1797
2627
  </p>
1798
2628
 
1799
- <h3 align="center">Wazir: engineering with itqan.</h3>
2629
+ <h3 align="center">Engineering with itqan.</h3>
1800
2630
 
1801
2631
  <p align="center">
1802
2632
  <a href="https://github.com/MohamedAbdallah-14/Wazir/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/MohamedAbdallah-14/Wazir/ci.yml?branch=main&label=CI" alt="CI"></a>
@@ -1814,80 +2644,60 @@ Executable documentation claims are registered in:
1814
2644
  <img src="https://img.shields.io/badge/Cursor-supported-FF6B35" alt="Cursor">
1815
2645
  </p>
1816
2646
 
1817
- <!-- Demo GIF: run assets/record-demo.sh to generate assets/demo.gif, then uncomment the img tag below -->
1818
- <!-- <p align="center"><img src="assets/demo.gif" alt="Wazir Demo" width="700"></p> -->
1819
-
1820
- A host-native operating model for AI coding agents. Wazir gives Claude, Codex, Gemini, and Cursor a 14-phase delivery pipeline, 10 canonical roles with enforceable contracts, 3 adversarial review phases with 9 hard approval gates, and 261 curated expertise modules loaded automatically per task. No server. No wrapper. No custom orchestration.
1821
-
1822
- Install once. Your agent works the way your best engineer does.
1823
-
1824
- ---
1825
-
1826
- ## Table of Contents
1827
-
1828
- - [Why Wazir?](#why-wazir)
1829
- - [Quick Start](#quick-start)
1830
- - [The Pipeline](#the-pipeline)
1831
- - [How It Works](#how-it-works)
1832
- - [How Wazir Handles Complex Tasks](#how-wazir-handles-complex-tasks)
1833
- - [Token Savings](#token-savings)
1834
- - [What's Included](#whats-included)
1835
- - [Compared to Other Tools](#compared-to-other-tools)
1836
- - [Install](#install)
1837
- - [Documentation](#documentation)
1838
- - [Project Status](#project-status)
1839
- - [Acknowledgments](#acknowledgments)
1840
- - [Contributing](#contributing)
1841
- - [License](#license)
1842
2647
 
1843
2648
  ---
1844
2649
 
1845
- ## Why Wazir?
1846
-
1847
- AI coding agents fail the same five ways. Every time.
1848
-
1849
- **Ambiguous specs become wrong code.** The clarifier role escalates unresolved ambiguity instead of guessing. No spec ships until material questions get answers. Escalation is a required output, not an option.
1850
-
1851
- **Output quality varies randomly.** The reviewer role is never the phase author. Adversarial review runs at three chokepoints -- spec-challenge, design-review, and final review -- always by a different model or model family. Nine hard approval gates block advancement until artifacts pass.
2650
+ > AI agents don't have a quality problem. They have a management problem.
1852
2651
 
1853
- **Context floods the window.** A 4-layer composition engine assembles only the relevant expertise modules per role per phase from a library of 261 curated modules across 12 domains. Max 15 modules per dispatch, token budget enforced. Three-tier recall (L0/L1/direct read) lets exploration roles load structural summaries instead of full files. Result: 60-80% fewer tokens on exploration-heavy phases. Run `wazir capture usage` to measure it.
2652
+ I'm Mohamed Abdallah. I kept watching AI agents write confident code that broke in production, skip tests, and forget what we agreed on yesterday. So I stopped asking them to be better and built them an engineering department instead.
1854
2653
 
1855
- **Good solutions vanish between sessions.** Proposed learnings start isolated. Only learnings that pass explicit review and scope-tagging get promoted into future runs. Stale or disproven learnings are archived. The system improves per-project without drifting.
1856
-
1857
- **Nothing prevents structural failures.** Seven hook contracts enforce protected paths (exit 42), loop caps (exit 43), and session observability. Hooks are enforcement, not suggestions.
2654
+ **Wazir puts engineering discipline inside AI coding agents.**
2655
+ No wrapper. No server. Just structure -- inside Claude, Codex, Gemini, and Cursor. Built on 300+ research sources distilled into 315 curated expertise modules across 12 domains.
1858
2656
 
1859
2657
  ---
1860
2658
 
1861
2659
  ## Quick Start
1862
2660
 
1863
- **Step 1: Install**
1864
-
1865
2661
  ```bash
1866
2662
  /plugin marketplace add MohamedAbdallah-14/Wazir
1867
2663
  /plugin install wazir
1868
2664
  ```
1869
2665
 
1870
- **Step 2: Initialize**
2666
+ Then tell your agent what to build:
1871
2667
 
1872
- ```bash
1873
- /init-pipeline
2668
+ ```
2669
+ /wazir Build a REST API for managing tasks with authentication
1874
2670
  ```
1875
2671
 
1876
- **Step 3: Build something**
2672
+ That's it. The pipeline takes over -- clarifies your requirements, writes a spec, plans the work, implements with TDD, reviews, and learns for next time. You approve at the gates. Everything else is automatic.
1877
2673
 
1878
- Drop your requirements in the input directory or just tell the agent what you want:
2674
+ You can also control the depth and intent directly:
1879
2675
 
1880
2676
  ```
1881
- /clarifier Build a REST API for managing tasks with authentication
2677
+ /wazir quick fix the login redirect bug
2678
+ /wazir deep design a new onboarding flow
2679
+ /wazir audit security
1882
2680
  ```
1883
2681
 
1884
- That's it. The pipeline takes over -- clarifies your requirements, writes a spec, plans the work, implements with TDD, reviews, and learns for next time. You approve at the gates. Everything else is automatic.
2682
+ ---
2683
+
2684
+ ### The reviewer is never the author.
2685
+
2686
+ When your AI agent reviews its own code, it finds what it expected to find -- nothing. Wazir's adversarial reviewer is a separate agent with different expertise modules. It catches the mistakes your agent is structurally blind to.
2687
+
2688
+ ### Silence isn't confidence -- it's assumptions.
2689
+
2690
+ Your AI agent doesn't ask questions because it's sure. It doesn't ask questions because it's trained to be helpful. Wazir's clarifier forces ambiguity to the surface before a single line is written.
2691
+
2692
+ ### Done means verified, not declared.
2693
+
2694
+ AI agents love to announce they're finished. Wazir doesn't care. Every phase loops until the work and its verification converge. The agent doesn't get to say "done." The process decides.
1885
2695
 
1886
2696
  ---
1887
2697
 
1888
2698
  ## The Pipeline
1889
2699
 
1890
- Every task flows through 14 phases. Three are adversarial review gates that block progress until the reviewer explicitly approves. Rejection loops back to the authoring phase.
2700
+ Every task flows through 15 workflows grouped into 4 phases. Three are adversarial review gates that block progress until the reviewer explicitly approves. Rejection loops back to the authoring phase.
1891
2701
 
1892
2702
  ```mermaid
1893
2703
  graph LR
@@ -1920,6 +2730,8 @@ graph LR
1920
2730
  style P8 fill:#c62828,color:#fff
1921
2731
  ```
1922
2732
 
2733
+
2734
+
1923
2735
  > **GATE** = Approval gate. The phase blocks until the reviewer explicitly approves. Rejection loops back to the authoring phase.
1924
2736
 
1925
2737
  ---
@@ -1930,23 +2742,9 @@ Three concepts.
1930
2742
 
1931
2743
  **1 -- Roles are isolation boundaries, not personas.** Each of the 10 roles has defined inputs, allowed tools, required outputs, escalation rules, and failure conditions. An agent inside a role cannot write to protected paths, cannot skip required outputs, and must escalate when ambiguity conditions are met. The discipline is structural, not instructional. See [Roles & Workflows](docs/concepts/roles-and-workflows.md).
1932
2744
 
1933
- **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 18 JSON schemas. See [Architecture](docs/concepts/architecture.md).
2745
+ **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 19 JSON schemas. See [Architecture](docs/concepts/architecture.md).
1934
2746
 
1935
- **3 -- The composition engine loads the right expert automatically.** A 4-layer system (always, auto, stacks, concerns) decides which of 261 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
1936
-
1937
- ---
1938
-
1939
- ## How Wazir Handles Complex Tasks
1940
-
1941
- Large coding tasks fail when agents lose track of quality. Wazir addresses this with three reinforcing mechanisms.
1942
-
1943
- **14-phase pipeline with 9 hard approval gates.** Every task passes through clarify, research, specify, design, plan, execute, verify, review, and learn. Nine transitions have hard blocking conditions. No phase is skipped, no shortcut taken. The pipeline is defined in `workflows/` and enforced by the orchestrator.
1944
-
1945
- **Adversarial review built in.** The reviewer role operates independently from the executor. It starts with structural summaries (L1 recall) to triage, then reads full source for logic errors, security concerns, or ambiguous code. Review criteria come from expertise modules, not guesswork.
1946
-
1947
- **TDD and verification-before-completion.** The executor writes failing tests before implementation (red-green-refactor). The verifier independently runs all tests, checks truth claims, and validates exports. No task completes until the verifier confirms all acceptance criteria pass. This catches regressions that the executor's own testing misses.
1948
-
1949
- The output is code held to the same standard a senior engineering team would enforce.
2747
+ **3 -- The composition engine loads the right expert automatically.** One agent pretending to be an expert in everything is an expert in nothing. A 4-layer system (always, auto, stacks, concerns) decides which of 315 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
1950
2748
 
1951
2749
  ---
1952
2750
 
@@ -1954,11 +2752,13 @@ The output is code held to the same standard a senior engineering team would enf
1954
2752
 
1955
2753
  Wazir's tiered recall system loads the minimum context each role needs.
1956
2754
 
1957
- | Tier | Tokens | Content | Used by |
1958
- |------|--------|---------|---------|
1959
- | L0 | ~100 | One-line identifier | learner (inventory scans) |
1960
- | L1 | ~500-2k | Structural summary | clarifier, researcher, planner, reviewer (exploration) |
1961
- | Direct read | Full file | Exact source lines | executor, verifier (implementation) |
2755
+
2756
+ | Tier | Tokens | Content | Used by |
2757
+ | ----------- | --------- | ------------------- | ------------------------------------------------------ |
2758
+ | L0 | ~100 | One-line identifier | learner (inventory scans) |
2759
+ | L1 | ~500-2k | Structural summary | clarifier, researcher, planner, reviewer (exploration) |
2760
+ | Direct read | Full file | Exact source lines | executor, verifier (implementation) |
2761
+
1962
2762
 
1963
2763
  Capture routing redirects large tool output to run-local files. The agent gets a file path (~50 tokens) instead of the full output. Combined with tiered recall, this yields 60-80% token reduction on exploration-heavy phases.
1964
2764
 
@@ -1987,23 +2787,21 @@ Run `wazir capture usage` at the end of a session to see the savings:
1987
2787
 
1988
2788
  ## What's Included
1989
2789
 
1990
- **10 canonical role contracts.** Clarifier, researcher, specifier, content-author, designer, planner, executor, verifier, reviewer, learner. Each has enforceable inputs, outputs, and escalation rules. The spec-challenge phase adversarially reviews every spec before planning begins. [Roles reference](docs/reference/roles-reference.md)
2790
+ **10 canonical role contracts.** Clarifier, researcher, specifier, content-author, designer, planner, executor, verifier, reviewer, learner. Each has enforceable inputs, outputs, and escalation rules. [Roles reference](docs/reference/roles-reference.md)
1991
2791
 
1992
- **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Three review phases and nine hard approval gates span the 14-phase pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
2792
+ **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Nine hard approval gates span the 15-workflow pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
1993
2793
 
1994
- **261 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. [Expertise index](docs/reference/expertise-index.md)
2794
+ **315 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. Wazir ships with 315. Yours could be next. [Expertise index](docs/reference/expertise-index.md)
1995
2795
 
1996
- **Three-tier recall for token savings.** L0 (~100 tokens), L1 (~500-2k tokens), direct read for full source. Symbol-first exploration searches the index before reading source. Capture routing redirects large tool output to files. Result: 60-80% token reduction on exploration-heavy phases, measured per-session by `wazir capture usage`. [Indexing and Recall](docs/concepts/indexing-and-recall.md)
2796
+ **Three-tier recall for token savings.** L0 (~~100 tokens), L1 (~~500-2k tokens), direct read for full source. Symbol-first exploration searches the index before reading source. Capture routing redirects large tool output to files. Result: 60-80% token reduction on exploration-heavy phases, measured per-session by `wazir capture usage`. [Indexing and Recall](docs/concepts/indexing-and-recall.md)
1997
2797
 
1998
2798
  **Structured learning.** Proposed learnings require explicit review and scope tagging before promotion. Only learnings whose file patterns overlap the current task get injected into context. The system improves per-project without drifting.
1999
2799
 
2000
- **7 hook contracts for structural guardrails.** These enforce protected path writes (exit 42), loop caps (exit 43), and session observability. [Hooks](docs/reference/hooks.md)
2001
-
2002
- **20 callable skills.** wz:tdd, wz:verification, wz:debugging, wz:scan-project, wz:writing-plans, and 14 more. Each enforces an exact procedure with evidence at each step. [Skills](docs/reference/skills.md)
2800
+ **8 hook contracts for structural guardrails.** These enforce protected path writes (exit 42), loop caps (exit 43), and session observability. [Hooks](docs/reference/hooks.md)
2003
2801
 
2004
- **Built-in text humanization.** The `wz:humanize` skill and 7 dedicated expertise modules automatically remove AI vocabulary patterns from generated text. The composition engine loads domain-specific rules per role: code rules for the executor (commit messages, comments), content rules for the content-author (microcopy, glossary), and technical-docs rules for the specifier, planner, reviewer, and learner. A 61-item vocabulary blacklist, 24-pattern sentence taxonomy, and two-pass self-audit checklist keep all output sounding like it was written by a person.
2802
+ **20+ callable skills.** `/wazir` runs the full pipeline. `/wazir audit security` runs a codebase audit. `/wazir prd` generates a product requirements document from completed runs. Plus TDD, verification, debugging, and more -- each enforcing an exact procedure with evidence at every step. [Skills](docs/reference/skills.md)
2005
2803
 
2006
- **Content-author role before design.** This role produces finalized i18n keys, microcopy, glossary entries, state coverage, and accessibility copy before design begins.
2804
+ **Built-in text humanization.** The composition engine loads domain-specific language rules per role: code rules for the executor (commit messages, comments), content rules for the content-author (microcopy, glossary), and technical-docs rules for the specifier, planner, reviewer, and learner. A 61-item vocabulary blacklist, 24-pattern sentence taxonomy, and two-pass self-audit checklist keep all output sounding like it was written by a person.
2007
2805
 
2008
2806
  **Runs on 4 platforms.** `wazir export build` compiles canonical sources into native packages for Claude, Codex, Gemini, and Cursor. SHA-256 drift detection catches stale exports in CI. [Host exports](docs/reference/host-exports.md)
2009
2807
 
@@ -2011,26 +2809,24 @@ Run `wazir capture usage` at the end of a session to see the savings:
2011
2809
 
2012
2810
  ## Compared to Other Tools
2013
2811
 
2014
- The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Research shows this approach has a cost: tool selection accuracy drops to 13.6% when models face too many tools (Gan & Sun, 2025), and 20 tools can consume 62% of an 8k context window before the task even begins (PromptForward, 2025).
2812
+ The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Not every project needs 15 workflows. For a weekend hack, prompting is fine. For production, you want structure.
2015
2813
 
2016
- Wazir takes a different path: one integrated operating model instead of many independent plugins.
2017
2814
 
2018
- | Dimension | Wazir | [Superpowers](https://github.com/obra/superpowers) | [Spec-Kit](https://github.com/github/spec-kit) | [Micro-Agent](https://github.com/BuilderIO/micro-agent) | [Distill](https://github.com/samuelfaj/distill) | [Claude-Mem](https://github.com/thedotmack/claude-mem) | [OMC](https://github.com/yeachan-heo/oh-my-claudecode) |
2019
- |---|---|---|---|---|---|---|---|
2020
- | **Category** | Engineering OS | Skills framework | Spec toolkit | Code gen agent | Output compressor | Memory plugin | Orchestration layer |
2021
- | **Scope** | Full lifecycle (14 phases) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
2022
- | **Enforced roles** | 10 canonical, contractual | None (skills only) | None | None | None | None | 32 agents (behavioral) |
2023
- | **Phase model** | 14 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
2024
- | **Adversarial review** | 3 gate phases | Code review skill | No | No | No | No | team-verify step |
2025
- | **Context management** | L0/L1 tiered recall | None | None | None | LLM compression | Vector DB (ChromaDB) | Token routing |
2026
- | **Schema validation** | 18 JSON schemas | No | No | No | No | No | No |
2027
- | **Guardrails** | 7 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
2028
- | **External deps** | None (host-native) | None (prompt-only) | Python CLI | Node.js CLI | Node.js + LLM | ChromaDB, SQLite, Bun | tmux, exp. teams API |
2029
- | **Host support** | Claude, Codex, Gemini, Cursor | Claude, Codex, Gemini, Cursor, OpenCode | Claude, Copilot, Gemini | Any LLM provider | Any LLM | Claude Code only | Claude Code (+ workers) |
2815
+ | Dimension | Wazir | [Superpowers](https://github.com/obra/superpowers) | [Spec-Kit](https://github.com/github/spec-kit) | [Micro-Agent](https://github.com/BuilderIO/micro-agent) | [Distill](https://github.com/samuelfaj/distill) | [Claude-Mem](https://github.com/thedotmack/claude-mem) | [OMC](https://github.com/yeachan-heo/oh-my-claudecode) |
2816
+ | ---------------------- | ----------------------------- | -------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
2817
+ | **Category** | Engineering OS | Skills framework | Spec toolkit | Code gen agent | Output compressor | Memory plugin | Orchestration layer |
2818
+ | **Scope** | Full lifecycle (15 workflows) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
2819
+ | **Enforced roles** | 10 canonical, contractual | None (skills only) | None | None | None | None | 32 agents (behavioral) |
2820
+ | **Phase model** | 15 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
2821
+ | **Adversarial review** | 3 gate phases | Code review skill | No | No | No | No | team-verify step |
2822
+ | **Context management** | L0/L1 tiered recall | None | None | None | LLM compression | Vector DB (ChromaDB) | Token routing |
2823
+ | **Schema validation** | 19 JSON schemas | No | No | No | No | No | No |
2824
+ | **Guardrails** | 8 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
2825
+ | **External deps** | None (host-native) | None (prompt-only) | Python CLI | Node.js CLI | Node.js + LLM | ChromaDB, SQLite, Bun | tmux, exp. teams API |
2826
+ | **Host support** | Claude, Codex, Gemini, Cursor | Claude, Codex, Gemini, Cursor, OpenCode | Claude, Copilot, Gemini | Any LLM provider | Any LLM | Claude Code only | Claude Code (+ workers) |
2030
2827
 
2031
- Each of these tools solves a real problem. Wazir's approach is to solve them together -- one system, shared context, structural enforcement -- instead of asking developers to wire separate plugins into a coherent workflow.
2032
2828
 
2033
- **Research sources:** [RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection](https://arxiv.org/abs/2505.03275) (Gan & Sun, 2025). [MCP Overload: Why Your LLM Agent Doesn't Need 20 Tools](https://promptforward.dev/blog/mcp-overload) (PromptForward, 2025). [Less is More: Optimizing Function Calling for LLM Execution](https://arxiv.org/abs/2411.15399) (Paramanayakam et al., 2024). [Tool RAG: The Next Breakthrough in Scalable AI Agents](https://next.redhat.com/2025/11/26/tool-rag-the-next-breakthrough-in-scalable-ai-agents/) (Red Hat, 2025).
2829
+ Each of these tools solves a real problem. Wazir's approach is to solve them together -- one system, shared context, structural enforcement -- instead of asking developers to wire separate plugins into a coherent workflow.
2034
2830
 
2035
2831
  ---
2036
2832
 
@@ -2043,63 +2839,58 @@ Each of these tools solves a real problem. Wazir's approach is to solve them tog
2043
2839
  /plugin install wazir
2044
2840
  ```
2045
2841
 
2046
- The plugin loads skills, roles, and workflows into your Claude sessions. Done.
2842
+ The plugin loads skills, roles, and workflows into your Claude sessions. Then type `/wazir` and go.
2047
2843
 
2048
2844
  **npm / Homebrew:**
2049
2845
 
2050
2846
  ```bash
2051
- npm install -g @wazir-dev/cli # npm
2052
- brew tap MohamedAbdallah-14/Wazir && brew install wazir # Homebrew
2847
+ npm install -g @wazir-dev/cli # npm
2848
+ brew tap MohamedAbdallah-14/homebrew-wazir && brew install wazir # Homebrew
2053
2849
  ```
2054
2850
 
2055
- **Deploy to your project:**
2056
-
2057
- | Host | Command |
2058
- |------|---------|
2059
- | **Claude** | `cp -r exports/hosts/claude/.claude ~/your-project/ && cp exports/hosts/claude/CLAUDE.md ~/your-project/` |
2060
- | **Codex** | `cp exports/hosts/codex/AGENTS.md ~/your-project/` |
2061
- | **Gemini** | `cp exports/hosts/gemini/GEMINI.md ~/your-project/` |
2062
- | **Cursor** | `cp -r exports/hosts/cursor/.cursor ~/your-project/` |
2063
-
2064
- > npm/Homebrew users: clone the source and run `npx wazir export build` to generate host exports. See [Installation Guide](docs/getting-started/01-installation.md) for the full path.
2065
-
2066
2851
  ---
2067
2852
 
2068
2853
  ## Documentation
2069
2854
 
2070
2855
  **For users:**
2071
2856
 
2072
- | I want to... | Go to |
2073
- |---|---|
2074
- | Install and get started | [Installation](docs/getting-started/01-installation.md) |
2075
- | Run my first task | [First Run](docs/getting-started/02-first-run.md) |
2076
- | Understand the architecture | [Architecture](docs/concepts/architecture.md) |
2857
+
2858
+ | I want to... | Go to |
2859
+ | ------------------------------- | --------------------------------------------------------- |
2860
+ | Install and get started | [Installation](docs/getting-started/01-installation.md) |
2861
+ | Run my first task | [First Run](docs/getting-started/02-first-run.md) |
2862
+ | Understand the architecture | [Architecture](docs/concepts/architecture.md) |
2077
2863
  | Learn about roles and workflows | [Roles & Workflows](docs/concepts/roles-and-workflows.md) |
2078
2864
 
2865
+
2079
2866
  **For contributors:**
2080
2867
 
2081
- | I want to... | Go to |
2082
- |---|---|
2083
- | Set up for development | [CONTRIBUTING.md](CONTRIBUTING.md) |
2084
- | Look up CLI commands | [CLI Reference](docs/reference/tooling-cli.md) |
2085
- | Configure the manifest | [Configuration Reference](docs/reference/configuration-reference.md) |
2086
- | Browse all documentation | [Documentation Hub](docs/README.md) |
2868
+
2869
+ | I want to... | Go to |
2870
+ | ------------------------ | -------------------------------------------------------------------- |
2871
+ | Set up for development | [CONTRIBUTING.md](CONTRIBUTING.md) |
2872
+ | Look up CLI commands | [CLI Reference](docs/reference/tooling-cli.md) |
2873
+ | Configure the manifest | [Configuration Reference](docs/reference/configuration-reference.md) |
2874
+ | Browse all documentation | [Documentation Hub](docs/README.md) |
2875
+
2087
2876
 
2088
2877
  ---
2089
2878
 
2090
2879
  ## Project Status
2091
2880
 
2092
- Wazir is in active early development (**v0.1.0**, pre-1.0-alpha).
2881
+ Wazir is in active early development (pre-1.0-alpha).
2093
2882
 
2094
2883
  The pipeline, roles, and expertise modules are stable and used in production by the maintainers. The CLI, schemas, and hook contracts work. But this is early software -- APIs may change before 1.0.
2095
2884
 
2096
2885
  What's solid:
2097
- - The 14-phase pipeline and 10 role contracts
2098
- - 261 expertise modules across 12 domains
2886
+
2887
+ - The 4-phase pipeline (15 workflows) and 10 role contracts
2888
+ - 315 expertise modules across 12 domains
2099
2889
  - Host exports for Claude, Codex, Gemini, and Cursor
2100
2890
  - The composition engine and tiered recall system
2101
2891
 
2102
2892
  What may change:
2893
+
2103
2894
  - CLI command surface and flags
2104
2895
  - Schema field names
2105
2896
  - Hook contract signatures
@@ -2109,6 +2900,14 @@ Feedback and contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).
2109
2900
 
2110
2901
  ---
2111
2902
 
2903
+ ## Why "Wazir"?
2904
+
2905
+ Wazir (وزير) -- the vizier. The operational mastermind who ran empires while the sultan held authority. In Arabic chess, the wazir became the queen: the most powerful piece on the board.
2906
+
2907
+ The Arabic word *itqan* (إتقان) means mastery -- doing something so well that nothing remains to improve. This isn't a tagline. It's the test every commit runs against.
2908
+
2909
+ ---
2910
+
2112
2911
  ## Acknowledgments
2113
2912
 
2114
2913
  Wazir builds on ideas and patterns from these projects:
@@ -2120,6 +2919,7 @@ Wazir builds on ideas and patterns from these projects:
2120
2919
  - **[micro-agent](https://github.com/BuilderIO/micro-agent)** by Builder.io -- test-driven code generation patterns
2121
2920
  - **[distill](https://github.com/samuelfaj/distill)** by [@samuelfaj](https://github.com/samuelfaj) -- CLI output compression for token savings
2122
2921
  - **[claude-mem](https://github.com/thedotmack/claude-mem)** by [@thedotmack](https://github.com/thedotmack) -- persistent memory patterns for coding agents
2922
+ - **[ideation](https://github.com/bladnman/ideation_team_skill)** by [@bladnman](https://github.com/bladnman) -- multi-agent structured dialogue patterns
2123
2923
 
2124
2924
  ---
2125
2925
 
@@ -2238,6 +3038,7 @@ Not sure where to start? Open a [Discussion](https://github.com/MohamedAbdallah-
2238
3038
  4. **Merge:** Once approved and all checks pass, a maintainer will merge your PR using a squash merge with a conventional commit message.
2239
3039
  5. **Timeline:** We aim to provide initial review feedback within a few days. If your PR has been open for more than a week without a response, feel free to leave a comment or ping in Discussions.
2240
3040
 
3041
+
2241
3042
  ---
2242
3043
  ## Source: AGENTS.md
2243
3044
 
@@ -2353,3 +3154,5 @@ This project uses Codex as a secondary reviewer. Review artifacts are in `tasks/
2353
3154
  - Use isolated feature branches
2354
3155
  - Reference `wazir.manifest.yaml` for the project manifest and schema
2355
3156
 
3157
+
3158
+ ---