@wazir-dev/cli 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. package/CHANGELOG.md +73 -4
  2. package/README.md +6 -6
  3. package/docs/concepts/architecture.md +1 -1
  4. package/docs/concepts/roles-and-workflows.md +2 -0
  5. package/docs/concepts/why-wazir.md +59 -0
  6. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  7. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  8. package/docs/readmes/INDEX.md +21 -5
  9. package/docs/readmes/features/expertise/README.md +2 -2
  10. package/docs/readmes/features/exports/README.md +2 -2
  11. package/docs/readmes/features/schemas/README.md +3 -0
  12. package/docs/readmes/features/skills/README.md +17 -0
  13. package/docs/readmes/features/skills/clarifier.md +5 -0
  14. package/docs/readmes/features/skills/claude-cli.md +5 -0
  15. package/docs/readmes/features/skills/codex-cli.md +5 -0
  16. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  17. package/docs/readmes/features/skills/executing-plans.md +5 -0
  18. package/docs/readmes/features/skills/executor.md +5 -0
  19. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  20. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  21. package/docs/readmes/features/skills/humanize.md +5 -0
  22. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  23. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  24. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  25. package/docs/readmes/features/skills/reviewer.md +5 -0
  26. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  27. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  28. package/docs/readmes/features/skills/wazir.md +5 -0
  29. package/docs/readmes/features/skills/writing-skills.md +5 -0
  30. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  31. package/docs/reference/configuration-reference.md +47 -6
  32. package/docs/reference/launch-checklist.md +4 -4
  33. package/docs/reference/review-loop-pattern.md +117 -8
  34. package/docs/reference/roles-reference.md +1 -0
  35. package/docs/reference/skill-tiers.md +147 -0
  36. package/docs/reference/tooling-cli.md +3 -1
  37. package/docs/truth-claims.yaml +12 -0
  38. package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
  39. package/exports/hosts/claude/.claude/settings.json +9 -0
  40. package/exports/hosts/claude/CLAUDE.md +1 -1
  41. package/exports/hosts/claude/export.manifest.json +4 -2
  42. package/exports/hosts/claude/host-package.json +3 -1
  43. package/exports/hosts/codex/AGENTS.md +1 -1
  44. package/exports/hosts/codex/export.manifest.json +4 -2
  45. package/exports/hosts/codex/host-package.json +3 -1
  46. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  47. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  48. package/exports/hosts/cursor/export.manifest.json +4 -2
  49. package/exports/hosts/cursor/host-package.json +3 -1
  50. package/exports/hosts/gemini/GEMINI.md +1 -1
  51. package/exports/hosts/gemini/export.manifest.json +4 -2
  52. package/exports/hosts/gemini/host-package.json +3 -1
  53. package/hooks/context-mode-router +191 -0
  54. package/hooks/definitions/context_mode_router.yaml +19 -0
  55. package/hooks/hooks.json +31 -6
  56. package/hooks/protected-path-write-guard +8 -0
  57. package/hooks/routing-matrix.json +45 -0
  58. package/hooks/session-start +62 -1
  59. package/llms-full.txt +905 -132
  60. package/package.json +2 -3
  61. package/schemas/hook.schema.json +2 -1
  62. package/schemas/phase-report.schema.json +80 -0
  63. package/schemas/usage.schema.json +25 -1
  64. package/schemas/wazir-manifest.schema.json +19 -0
  65. package/skills/brainstorming/SKILL.md +18 -155
  66. package/skills/clarifier/SKILL.md +122 -98
  67. package/skills/claude-cli/SKILL.md +320 -0
  68. package/skills/codex-cli/SKILL.md +260 -0
  69. package/skills/debugging/SKILL.md +13 -0
  70. package/skills/design/SKILL.md +13 -0
  71. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  72. package/skills/executing-plans/SKILL.md +13 -0
  73. package/skills/executor/SKILL.md +72 -19
  74. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  75. package/skills/gemini-cli/SKILL.md +260 -0
  76. package/skills/humanize/SKILL.md +13 -0
  77. package/skills/init-pipeline/SKILL.md +73 -164
  78. package/skills/prepare-next/SKILL.md +81 -10
  79. package/skills/receiving-code-review/SKILL.md +13 -0
  80. package/skills/requesting-code-review/SKILL.md +13 -0
  81. package/skills/reviewer/SKILL.md +287 -15
  82. package/skills/run-audit/SKILL.md +13 -0
  83. package/skills/scan-project/SKILL.md +13 -0
  84. package/skills/self-audit/SKILL.md +197 -16
  85. package/skills/subagent-driven-development/SKILL.md +13 -0
  86. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  87. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  88. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  89. package/skills/tdd/SKILL.md +13 -0
  90. package/skills/using-git-worktrees/SKILL.md +13 -0
  91. package/skills/using-skills/SKILL.md +13 -0
  92. package/skills/verification/SKILL.md +13 -0
  93. package/skills/wazir/SKILL.md +194 -377
  94. package/skills/writing-plans/SKILL.md +14 -1
  95. package/skills/writing-skills/SKILL.md +13 -0
  96. package/templates/artifacts/implementation-plan.md +3 -0
  97. package/templates/artifacts/tasks-template.md +133 -0
  98. package/templates/examples/phase-report.example.json +48 -0
  99. package/tooling/src/adapters/composition-engine.js +256 -0
  100. package/tooling/src/adapters/model-router.js +84 -0
  101. package/tooling/src/capture/command.js +24 -1
  102. package/tooling/src/capture/run-config.js +3 -1
  103. package/tooling/src/capture/store.js +24 -0
  104. package/tooling/src/capture/usage.js +106 -0
  105. package/tooling/src/checks/ac-matrix.js +256 -0
  106. package/tooling/src/checks/command-registry.js +12 -0
  107. package/tooling/src/checks/docs-truth.js +1 -1
  108. package/tooling/src/checks/skills.js +111 -0
  109. package/tooling/src/cli.js +9 -0
  110. package/tooling/src/commands/stats.js +161 -0
  111. package/tooling/src/commands/validate.js +5 -1
  112. package/tooling/src/export/compiler.js +33 -37
  113. package/tooling/src/gating/agent.js +145 -0
  114. package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
  115. package/tooling/src/hooks/routing-logic.js +69 -0
  116. package/tooling/src/init/auto-detect.js +260 -0
  117. package/tooling/src/init/command.js +95 -135
  118. package/tooling/src/input/scanner.js +46 -0
  119. package/tooling/src/reports/command.js +103 -0
  120. package/tooling/src/reports/phase-report.js +323 -0
  121. package/tooling/src/state/command.js +160 -0
  122. package/tooling/src/state/db.js +287 -0
  123. package/tooling/src/status/command.js +53 -1
  124. package/wazir.manifest.yaml +26 -14
package/llms-full.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  # Wazir — Complete Documentation
2
2
 
3
- > Generated: 2026-03-16T23:53:58Z
3
+ > Generated: 2026-03-19T16:28:21Z
4
4
 
5
5
  ---
6
6
  ## Source: docs/concepts/architecture.md
@@ -17,7 +17,7 @@ Wazir is a host-native engineering OS kit. The host environment (Claude, Codex,
17
17
  | Workflows | Phase entrypoints that sequence roles through delivery |
18
18
  | Skills | Reusable procedures (wz:tdd, wz:debugging, wz:verification, wz:brainstorming) |
19
19
  | Hooks | Guardrails enforcing protected paths, loop caps, and capture routing |
20
- | Expertise | 308 curated knowledge modules composed into agent prompts |
20
+ | Expertise | 268 curated knowledge modules composed into agent prompts |
21
21
  | Templates | Artifact templates for phase outputs and handoff |
22
22
  | Schemas | Validation schemas for manifest, hooks, artifacts, and exports |
23
23
  | Exports | Generated host packages tailored per supported host |
@@ -442,6 +442,8 @@ The canonical workflow sequence is:
442
442
  13. **learn** — capture scoped learnings
443
443
  14. **prepare-next** — produce a clean handoff for the next run
444
444
 
445
+ Additionally, **run-audit** is a standalone workflow that can be invoked outside the linear pipeline to perform structured codebase audits with source-backed findings.
446
+
445
447
  ## Role routing
446
448
 
447
449
  The orchestrator dispatches three roles per task: `executor`, `reviewer`, and `verifier`. By default, all three run for every task. The `required_roles` field in a task's YAML frontmatter controls which roles are dispatched, allowing the orchestrator to skip unnecessary roles and save context window budget.
@@ -500,6 +502,69 @@ Do not use terms that describe Wazir as a background service, a web-based contro
500
502
  - Use the canonical terms above in all roles, workflows, skills, and documentation.
501
503
  - When in doubt, describe what Wazir is, not what it is not.
502
504
 
505
+ ---
506
+ ## Source: docs/concepts/why-wazir.md
507
+
508
+ # Why Wazir
509
+
510
+ What makes Wazir the best engineering OS you can add to an AI coding agent.
511
+
512
+ ## 1. Measure Twice, Cut Once
513
+
514
+ Wazir clarifies before coding. The pipeline forces research, spec hardening, design review, and plan approval before a single line of implementation code is written. Most AI agents jump straight to code and fix mistakes after. Wazir prevents the mistakes.
515
+
516
+ ## 2. Deep Research
517
+
518
+ Every AI agent knows how to research. Users don't ask them to. Wazir makes research a mandatory phase — the researcher role scans the codebase, fetches external sources, and produces a research brief before clarification begins. The agent starts informed, not guessing.
519
+
520
+ ## 3. Clarifier + Task Planning
521
+
522
+ A structured clarification pipeline turns vague requests into measurable specs. Spec hardening catches ambiguity, missing constraints, and untestable acceptance criteria before they become bugs. Task planning produces execution-grade task specs — not TODO lists.
523
+
524
+ ## 4. Content Author
525
+
526
+ A dedicated role for any content need — database seeding, sample content, test fixtures, translations, copy, email templates, notification text. Most AI agents treat content as an afterthought bolted onto code tasks. Wazir gives content its own phase with editorial standards, i18n awareness, and humanization rules.
527
+
528
+ ## 5. Self-Audit
529
+
530
+ The agent audits its own work in an isolated git worktree. Validates, finds structural issues, fixes what it can, verifies the fixes, and only merges on all-green. 5-loop cycle with convergence detection. Protected-path safety rails prevent the agent from modifying its own identity-defining files. Safe self-improvement.
531
+
532
+ ## 6. Composer
533
+
534
+ ~300 curated expertise modules across 12 domains. The composition engine assembles task-specific agents by loading the right expertise for each role, stack, and concern. The executor building a Flutter RTL app gets Flutter patterns, RTL layout rules, and mobile antipatterns composed into its context. The reviewer gets the corresponding antipattern catalog. Every dispatched agent is a specialist, not a generalist pretending.
535
+
536
+ ## 7. Review Loops
537
+
538
+ Multi-pass adversarial review at every pipeline checkpoint — not a single rubber-stamp at the end. Research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions. Findings are resolved before advancing. The reviewer is an adversary, not a cheerleader.
539
+
540
+ ## 8. Continuous Learning
541
+
542
+ Wazir evolves from its own mistakes. Review findings, audit findings, and user corrections feed into a learning system. Recurring issues become accepted learnings injected into future runs. A drift budget prevents learned behavior from diverging too far from the original design. The agent that builds your 10th feature is better than the one that built your 1st.
543
+
544
+ ## 9. Antipatterns
545
+
546
+ A first-class antipattern catalog loaded into reviewer context BEFORE domain expertise. Catches AI-specific failure modes: fake completion, unwired abstractions, shallow tests, security theater, architecture drift. The reviewer's first lens is "what could go wrong" — not "does this look right."
547
+
548
+ ## 10. Multi-Host
549
+
550
+ One canonical source, four host exports. Wazir works on Claude Code, Codex, Gemini, and Cursor from a single `wazir export build`. Roles, workflows, skills, and expertise are written once and compiled into each host's native format. Switch hosts without rewriting your engineering process.
551
+
552
+ ## 11. Context Efficiency
553
+
554
+ AI agents waste most of their context window on brute-force file reads and verbose command output. Wazir's routing hook auto-routes large commands through context-mode. The index provides symbol-first exploration — query first, read only what's needed. Capture routing redirects large output to files. Result: 60-80% token reduction on exploration-heavy phases. The agent thinks more, reads less.
555
+
556
+ ## 12. Verification Before Completion
557
+
558
+ No success claims without evidence. The verify phase produces deterministic proof — test results, lint output, type-check results — not "I believe it works." Every completion claim is backed by a command that was actually run and output that was actually checked. Evidence before assertions, always.
559
+
560
+ ## 13. Gating Agent
561
+
562
+ Autonomous phase transition decisions. After each phase, a gating agent reads the phase report and decides: continue (all gates pass), loop back (specific failures with fix paths), or escalate to human (ambiguous trade-offs, scope changes). Default posture: escalate. The pipeline doesn't blindly advance — it stops when it should stop.
563
+
564
+ ## 14. Humanize
565
+
566
+ Anti-AI-writing patterns across all text output. A vocabulary blacklist, domain-specific rules, and a self-audit checklist ensure that specs, plans, code comments, commit messages, and documentation read like they were written by a human engineer — not generated by an LLM. Because AI-sounding output erodes trust.
567
+
503
568
  ---
504
569
  ## Source: docs/getting-started/01-installation.md
505
570
 
@@ -969,15 +1034,56 @@ Out of scope for this manifest check:
969
1034
 
970
1035
  Maintainers are responsible for policing those surfaces with the separate docs-truth, runtime-surface, and repository review checks.
971
1036
 
972
- ## Workflows vs phases
1037
+ ## Phases vs workflows
1038
+
1039
+ The pipeline has **4 phases** (Init, Clarifier, Executor, Final Review) and **15 workflows** (atomic units within those phases).
1040
+
1041
+ - **Phases** are the top-level pipeline stages. Event capture and tracking use phase names: `init`, `clarifier`, `executor`, `final_review`.
1042
+ - **Workflows** are the canonical callable or review-gated entrypoints that run within phases. Each workflow can be independently enabled/disabled via `workflow_policy` in run-config.
1043
+
1044
+ | Phase | Workflows |
1045
+ |-------|-----------|
1046
+ | Init | (inline — no workflow files) |
1047
+ | Clarifier | clarify, discover, specify, spec_challenge, author, design, design_review, plan, plan_review |
1048
+ | Executor | execute, verify |
1049
+ | Final Review | review, learn, prepare_next |
1050
+
1051
+ `run_audit` is a standalone on-demand workflow, not part of the main pipeline flow.
1052
+
1053
+ Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
1054
+
1055
+ ## Hook configuration
1056
+
1057
+ ### `hooks/routing-matrix.json`
1058
+
1059
+ The routing matrix defines how the context-mode router classifies commands:
973
1060
 
974
- - `phases` are the core lifecycle states of the operating model.
975
- - `workflows` are the canonical callable or review-gated entrypoints that drive those phases.
1061
+ - `large` array of command prefixes that always route to context-mode (AC-3.1). The `# wazir:passthrough` marker does NOT exempt commands in this category.
1062
+ - `small` array of command prefixes that always pass through without context-mode processing.
1063
+ - `ambiguous_heuristic` — rules for commands that match neither large nor small:
1064
+ - `pipe_detected` — classify piped commands as ambiguous
1065
+ - `redirect_detected` — classify redirected commands as ambiguous
1066
+ - `verbose_binaries` — array of binary names whose output is typically large
976
1067
 
977
- They overlap heavily, but they are not identical:
1068
+ ### `config/gating-rules.yaml`
978
1069
 
979
- - `spec_challenge`, `plan_review`, and `prepare_next` are workflows that sit between or around the core execution phases.
980
- - Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
1070
+ The gating rules file defines conditions for phase transition decisions:
1071
+
1072
+ - `rules.continue` — all conditions must pass for a phase to advance (test failures, lint errors, type errors, drift delta, risk flags, uncertain outcomes)
1073
+ - `rules.loop_back` — any deterministic failure (test failures, lint errors, or type errors) triggers a loop-back with actionable fix descriptions
1074
+ - `rules.escalate` — fallback when neither continue nor loop_back match
1075
+ - `default_verdict` — verdict when the report is empty or missing (defaults to `escalate`)
1076
+
1077
+ ### Composition proof artifacts
1078
+
1079
+ The composition engine (`tooling/src/adapters/composition-engine.js`) writes a proof artifact per dispatch to `.wazir/runs/<id>/artifacts/composition-<role>-<task>.json` containing:
1080
+
1081
+ - `modules_included[]` — `{ path, layer, tokens }` for each loaded module
1082
+ - `modules_dropped[]` — `{ path, layer, tokens, reason }` for each dropped module. Reason values:
1083
+ - `module_cap_exceeded` — module count exceeded the 15-module cap
1084
+ - `token_ceiling_exceeded` — total tokens exceeded the configurable ceiling (default: 50,000)
1085
+ - `total_tokens` — total token count of composed prompt
1086
+ - `prompt_hash` — SHA-256 hash of the composed prompt for audit traceability
981
1087
 
982
1088
  ## Current index parser roster
983
1089
 
@@ -1273,7 +1379,7 @@ Submit pull requests to these curated lists (one PR per list, follow each repo's
1273
1379
  ### awesome-claude-code
1274
1380
  - **Repo:** `github.com/anthropics/awesome-claude-code` (or the most-starred community fork)
1275
1381
  - **Section:** Tools / Plugins / Extensions
1276
- - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 14 phases, and 308 expertise modules.`
1382
+ - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 4 phases (15 workflows), and 268 expertise modules.`
1277
1383
  - **Tips:** Keep the description under 120 characters. Link directly to the repo.
1278
1384
 
1279
1385
  ### awesome-ai-agents
@@ -1303,7 +1409,7 @@ Show HN: Wazir – Engineering OS kit for AI coding agents (Claude, Codex, Gemin
1303
1409
  ### First comment
1304
1410
  Post a comment immediately after submission explaining:
1305
1411
  1. What problem Wazir solves (AI agents lack structured engineering workflows)
1306
- 2. How it works (10 canonical roles, 14-phase pipeline, 308 expertise modules)
1412
+ 2. How it works (10 canonical roles, 14-phase pipeline, 268 expertise modules)
1307
1413
  3. What makes it different (host-native, works across Claude/Codex/Gemini/Cursor)
1308
1414
  4. Quick install: `npx @wazir-dev/cli init`
1309
1415
  5. Invite feedback -- HN readers appreciate genuine requests for input
@@ -1322,7 +1428,7 @@ Post a comment immediately after submission explaining:
1322
1428
  **Title:** "How I Built an Engineering OS for AI Coding Agents"
1323
1429
 
1324
1430
  1. **Hook** -- The problem: AI agents write code but lack engineering discipline.
1325
- 2. **Architecture overview** -- 10 roles, 14 phases, expertise modules, quality gates.
1431
+ 2. **Architecture overview** -- 10 roles, 4 phases (15 workflows), expertise modules, quality gates.
1326
1432
  3. **Code walkthrough** -- Show a real workflow: how a feature moves from requirements through TDD to deployment.
1327
1433
  4. **Host-native approach** -- Explain why one kit works across Claude, Codex, Gemini, and Cursor.
1328
1434
  5. **Results** -- Concrete metrics or before/after comparisons.
@@ -1347,7 +1453,7 @@ Structure as a 5-7 tweet thread:
1347
1453
 
1348
1454
  1. **Hook tweet:** One-liner about the problem + link to repo.
1349
1455
  2. **What it is:** Brief description of Wazir.
1350
- 3. **Architecture:** 10 roles, 14 phases, 308 modules (include a diagram image).
1456
+ 3. **Architecture:** 10 roles, 4 phases (15 workflows), 308 modules (include a diagram image).
1351
1457
  4. **Demo:** Short GIF or screenshot of a workflow in action.
1352
1458
  5. **Multi-host:** Works with Claude, Codex, Gemini, and Cursor.
1353
1459
  6. **Install:** `npx @wazir-dev/cli init`
@@ -1536,6 +1642,548 @@ When no Wazir release tag exists yet:
1536
1642
  - Legacy tags are not considered release boundaries
1537
1643
  - The first release tag will be `v1.0.0` (or `v0.1.0` if pre-stable)
1538
1644
 
1645
+ ---
1646
+ ## Source: docs/reference/review-loop-pattern.md
1647
+
1648
+ # Review Loop Pattern Reference
1649
+
1650
+ Canonical reference for the review loop pattern used across all Wazir pipeline phases. Skills and workflows link to this document rather than embedding loop logic inline.
1651
+
1652
+ ---
1653
+
1654
+ ## Core Principle: Producer-Reviewer Separation
1655
+
1656
+ The producer skill (clarifier, planner, designer, etc.) **emits** an artifact and calls for review. The **reviewer role** owns the review loop. The producer receives findings and resolves them. No role reviews its own output.
1657
+
1658
+ ```
1659
+ Producer emits artifact
1660
+ -> Reviewer runs review loop (N passes, Codex if available)
1661
+ -> Findings returned to producer
1662
+ -> Producer fixes and resubmits
1663
+ -> Loop until all passes exhausted or cap reached
1664
+ -> Escalate to user if cap exceeded
1665
+ ```
1666
+
1667
+ When Codex is available, the reviewer role delegates to `codex review` as a secondary input while maintaining its own independent primary verdict.
1668
+
1669
+ ---
1670
+
1671
+ ## Per-Task Review vs Final Review
1672
+
1673
+ These are two structurally different constructs:
1674
+
1675
+ | | Per-Task Review | Final Review |
1676
+ |---|---|---|
1677
+ | **When** | During execution, after each task | After all execution + verification complete |
1678
+ | **Dimensions** | 5 task-execution dims (correctness, tests, wiring, drift, quality) | 7 scored dims (correctness, completeness, wiring, verification, drift, quality, documentation) |
1679
+ | **Scope** | Single task's uncommitted changes | Entire implementation vs spec/plan |
1680
+ | **Output** | Pass/fix loop, no score | Scored verdict (0-70), PASS/FAIL |
1681
+ | **Workflow** | Inline in execution flow | `workflows/review.md` |
1682
+ | **Skill** | `wz:reviewer` in `task-review` mode | `wz:reviewer` in `final` mode |
1683
+ | **Log filename** | `<phase>-task-<NNN>-review-pass-<N>.md` | `final-review.md` |
1684
+
1685
+ ---
1686
+
1687
+ ## Standalone Mode
1688
+
1689
+ When no `.wazir/runs/latest/` directory exists (standalone skill invocation outside a pipeline run):
1690
+
1691
+ 1. **Review loops still run** -- the review logic is embedded in the skill, not dependent on run state.
1692
+ 2. **Artifact location** -- artifacts live in `docs/plans/`. This is the canonical standalone artifact path.
1693
+ 3. **Review log location** -- review logs go alongside the artifact: `docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md`. No temp dir.
1694
+ 4. **Loop cap is SKIPPED entirely** -- no `wazir capture loop-check` call. The loop runs for exactly `pass_counts[depth]` passes (3/5/7) and stops. No cap guard, no fallback constant.
1695
+ 5. **`wazir capture loop-check`** -- not invoked in standalone mode. The standalone detection happens before the cap guard call.
1696
+
1697
+ Detection logic:
1698
+
1699
+ ```
1700
+ if .wazir/runs/latest/ exists:
1701
+ run_mode = "pipeline"
1702
+ log_dir = .wazir/runs/latest/reviews/
1703
+ cap_guard = wazir capture loop-check (full guard)
1704
+ else:
1705
+ run_mode = "standalone"
1706
+ artifact_dir = docs/plans/
1707
+ log_dir = docs/plans/ (alongside artifact)
1708
+ cap_guard = none (depth pass count is the only limit)
1709
+ ```
1710
+
1711
+ ---
1712
+
1713
+ ## Review Loop Pseudocode
1714
+
1715
+ ```
1716
+ review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
1717
+
1718
+ # options.mode -- explicit review mode (required)
1719
+ # options.task_id -- task identifier for task-scoped reviews (optional)
1720
+
1721
+ # Standalone detection
1722
+ run_mode = detect_run_mode() # "pipeline" or "standalone"
1723
+
1724
+ # Fixed pass counts -- no extension
1725
+ pass_counts = { quick: 3, standard: 5, deep: 7 }
1726
+ total_passes = pass_counts[depth]
1727
+
1728
+ # Depth-aware dimension subsets (coverage contract)
1729
+ depth_dimensions = {
1730
+ quick: dimensions[0:3], # first 3 dimensions only
1731
+ standard: dimensions[0:5], # first 5
1732
+ deep: dimensions, # all available
1733
+ }
1734
+ active_dims = depth_dimensions[depth]
1735
+
1736
+ codex_available = check_codex() # which codex && codex --version
1737
+
1738
+ for pass_number in 0..total_passes-1:
1739
+
1740
+ # --- Cap guard check (pipeline mode only, before each pass) ---
1741
+ if run_mode == "pipeline":
1742
+ loop_check_args = "--run <run-id> --phase <phase> --loop-count <pass_number+1>"
1743
+ if options.task_id:
1744
+ loop_check_args += " --task-id <task_id>"
1745
+ wazir capture loop-check $loop_check_args
1746
+ # loop-check wraps: event capture + evaluateLoopCapGuard
1747
+ # If loop_cap_guard fires (exit 43), stop immediately:
1748
+ if last_exit_code == 43:
1749
+ log("Loop cap reached for phase: <phase>. Escalating to user.")
1750
+ escalate_to_user(evidence_gathered_so_far)
1751
+ return { pass_count: pass_number, escalated: true }
1752
+ # Standalone mode: no cap guard. Loop runs for total_passes and stops.
1753
+
1754
+ dimension = active_dims[pass_number % len(active_dims)]
1755
+
1756
+ # --- Primary review (reviewer role, not producer) ---
1757
+ # Mode is always explicit -- passed by caller via options.mode
1758
+ findings = self_review(artifact_path, focus=dimension, mode=options.mode)
1759
+
1760
+ # --- Secondary review (Codex, if available) ---
1761
+ if codex_available:
1762
+ codex_exit_code, codex_output = run_codex_review(artifact_path, dimension)
1763
+ if codex_exit_code != 0:
1764
+ # Codex failed -- log error, fall back to self-review for this pass
1765
+ log_error("Codex exited " + codex_exit_code + ": " + codex_output.stderr)
1766
+ mark_pass_codex_unavailable(pass_number)
1767
+ # Do NOT treat Codex failure as clean. Self-review findings stand alone.
1768
+ else:
1769
+ codex_findings = parse(codex_output.stdout)
1770
+ merge(findings, codex_findings, preserve_attribution=true)
1771
+
1772
+ # --- Log the review pass ---
1773
+ if run_mode == "pipeline":
1774
+ if options.task_id:
1775
+ log_path = .wazir/runs/latest/reviews/<phase>-task-<task_id>-review-pass-<N>.md
1776
+ else:
1777
+ log_path = .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
1778
+ log(pass_number+1, dimension, findings) -> log_path
1779
+ else:
1780
+ log_path = docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md
1781
+ log(pass_number+1, dimension, findings) -> log_path
1782
+
1783
+ if findings.has_issues:
1784
+ # --- Fix and re-submit (MANDATORY) ---
1785
+ # The producer MUST fix findings and the reviewer MUST re-review.
1786
+ # "Fix and continue without re-review" is EXPLICITLY PROHIBITED.
1787
+ producer_fix(artifact_path, findings)
1788
+ # Continue to next pass -- the fix will be re-reviewed
1789
+
1790
+ # --- Post-loop: escalation if issues remain ---
1791
+ if remaining.has_issues:
1792
+ # Cap reached with unresolved findings. Present to user:
1793
+ # 1. Approve with known issues (Recommended if non-blocking)
1794
+ # 2. Fix manually and re-run
1795
+ # 3. Abort
1796
+ escalate_to_user(remaining, options=[
1797
+ "approve-with-issues",
1798
+ "fix-manually-and-rerun",
1799
+ "abort"
1800
+ ])
1801
+ # User decides. If approved, log "user-approved-with-issues" in final pass file.
1802
+
1803
+ return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
1804
+ ```
1805
+
1806
+ Key properties of this pseudocode:
1807
+
1808
+ 1. **Fixed pass counts** -- Quick is exactly 3, standard exactly 5, deep exactly 7. No `max_passes = min_passes + 3`. No clean-streak early-exit. No extension.
1809
+ 2. **Task-scoped log filenames** -- `<phase>-task-<NNN>-review-pass-<N>.md` for per-task reviews, preventing log clobbering in parallel mode.
1810
+ 3. **Task-scoped loop cap keys** -- `--task-id` flag on `loop-check` so each task gets its own counter in `phase_loop_counts`.
1811
+ 4. **Explicit review mode** -- `options.mode` is always passed by the caller. No auto-detection.
1812
+ 5. **Codex error handling** -- non-zero exit is logged, pass marked `codex-unavailable`, self-review findings used alone. Never treated as clean.
1813
+ 6. **Standalone mode** -- uses `docs/plans/` for artifacts and logs. No temp dir. No cap guard at all.
1814
+
1815
+ ---
1816
+
1817
+ ## Codex Error Handling Contract
1818
+
1819
+ ```
1820
+ run_codex_review(artifact_path, dimension):
1821
+ CODEX_MODEL = read_config('.wazir/state/config.json', '.multi_tool.codex.model') or "gpt-5.4"
1822
+
1823
+ if is_code_artifact:
1824
+ cmd = codex review -c model="$CODEX_MODEL" --uncommitted --title "..." "Review for [dimension]..."
1825
+ # or: codex review -c model="$CODEX_MODEL" --base <sha> for committed changes
1826
+ else:
1827
+ cmd = cat <artifact_path> | codex exec -c model="$CODEX_MODEL" "Review this [type] for [dimension]..."
1828
+
1829
+ result = execute(cmd, timeout=120s, capture_stderr=true)
1830
+
1831
+ if result.exit_code != 0:
1832
+ return (result.exit_code, { stderr: result.stderr, stdout: "" })
1833
+ # Caller handles: log error, mark codex-unavailable, use self-review only
1834
+
1835
+ return (0, { stdout: result.stdout, stderr: result.stderr })
1836
+ ```
1837
+
1838
+ Rules:
1839
+
1840
+ - If Codex exits non-zero, log the full stderr.
1841
+ - Mark the pass as `codex-unavailable` in the review log metadata.
1842
+ - Fall back to self-review for that pass only. Do not skip the pass.
1843
+ - Do not retry Codex on the same pass. If Codex fails on pass 2, pass 3 still tries Codex (transient failures recover).
1844
+ - Never treat a Codex failure as a clean review pass.
1845
+
1846
+ ---
1847
+
1848
+ ## Codex Availability Probe
1849
+
1850
+ Before any Codex call, verify availability once at loop start:
1851
+
1852
+ ```bash
1853
+ which codex >/dev/null 2>&1 && codex --version >/dev/null 2>&1
1854
+ ```
1855
+
1856
+ If the probe fails, set `codex_available = false` for the entire loop. Fall back to self-review only. Never error out.
1857
+
1858
+ Per-invocation failures (Codex available but a single call fails) are handled separately by the error contract above.
1859
+
1860
+ ---
1861
+
1862
+ ## Codex Artifact-Scoped Review
1863
+
1864
+ Never use `codex review` for non-code artifacts (specs, plans, designs). Instead, pipe the artifact content via stdin:
1865
+
1866
+ ```bash
1867
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1868
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1869
+ cat .wazir/runs/latest/clarified/spec-hardened.md | \
1870
+ codex exec -c model="$CODEX_MODEL" "Review this specification for: [dimension]. Be specific, cite sections. Say CLEAN if no issues." \
1871
+ 2>&1 | tee .wazir/runs/latest/reviews/spec-challenge-review-pass-N.md
1872
+ ```
1873
+
1874
+ For code artifacts, use `codex review -c model="$CODEX_MODEL" --uncommitted` (or `--base` for committed changes). See the next section for details.
1875
+
1876
+ ---
1877
+
1878
+ ## Code Review Scoping
1879
+
1880
+ **Rule: review BEFORE commit.**
1881
+
1882
+ For each task during execution:
1883
+
1884
+ 1. Implement the task (changes are uncommitted).
1885
+ 2. Review the uncommitted changes using the **5 task-execution dimensions** (NOT the 7 final-review dimensions):
1886
+ ```bash
1887
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1888
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1889
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
1890
+ "Review against acceptance criteria: <criteria>" \
1891
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
1892
+ ```
1893
+ 3. Fix any findings (still uncommitted).
1894
+ 4. Re-review until all passes exhausted or cap reached.
1895
+ 5. **Only after review passes:** commit with conventional commit format.
1896
+
1897
+ **If changes are already committed** (e.g., subagent workflow where the implementer subagent commits before review):
1898
+
1899
+ ```bash
1900
+ # Capture the SHA before the task starts
1901
+ PRE_TASK_SHA=$(git rev-parse HEAD)
1902
+
1903
+ # ... subagent implements and commits ...
1904
+
1905
+ # Review the committed changes against the pre-task baseline
1906
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
1907
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
1908
+ codex review -c model="$CODEX_MODEL" --base $PRE_TASK_SHA --title "Task NNN: <summary>" \
1909
+ "Review against acceptance criteria: <criteria>" \
1910
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
1911
+ ```
1912
+
1913
+ ---
1914
+
1915
+ ## Dimension Sets
1916
+
1917
+ ### Research Dimensions (5)
1918
+
1919
+ 1. **Coverage** -- all briefing topics researched
1920
+ 2. **Source quality** -- authoritative, current sources
1921
+ 3. **Relevance** -- research answers the actual questions
1922
+ 4. **Gaps** -- missing info that blocks later phases
1923
+ 5. **Contradictions** -- conflicting sources identified
1924
+
1925
+ ### Spec/Clarification Dimensions (5)
1926
+
1927
+ 1. **Completeness** -- all requirements covered
1928
+ 2. **Testability** -- each criterion verifiable
1929
+ 3. **Ambiguity** -- no dual-interpretation statements
1930
+ 4. **Assumptions** -- hidden assumptions explicit
1931
+ 5. **Scope creep** -- nothing beyond briefing
1932
+
1933
+ ### Design-Review Dimensions (5)
1934
+
1935
+ Matches canonical `workflows/design-review.md`:
1936
+
1937
+ 1. **Spec coverage** -- does the design address every acceptance criterion with a visual component?
1938
+ 2. **Design-spec consistency** -- does the design introduce anything not in the spec? (scope creep check)
1939
+ 3. **Accessibility** -- color contrast ratios (WCAG 2.1 AA), focus states, touch target sizes (44x44px minimum)
1940
+ 4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
1941
+ 5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
1942
+
1943
+ ### Plan Dimensions (7)
1944
+
1945
+ 1. **Completeness** -- all design decisions mapped to tasks
1946
+ 2. **Ordering** -- dependencies correct, parallelizable identified
1947
+ 3. **Atomicity** -- each task fits one session
1948
+ 4. **Testability** -- concrete verification per task
1949
+ 5. **Edge cases** -- error paths covered
1950
+ 6. **Security** -- auth, injection, data exposure
1951
+ 7. **Integration** -- tasks connect end-to-end
1952
+
1953
+ ### Task Execution Dimensions (5)
1954
+
1955
+ Used for per-task review during execution:
1956
+
1957
+ 1. **Correctness** -- code matches spec
1958
+ 2. **Tests** -- real tests, not mocked/faked
1959
+ 3. **Wiring** -- all paths connected
1960
+ 4. **Drift** -- matches task spec
1961
+ 5. **Quality** -- naming, error handling
1962
+
1963
+ ### Final Review Dimensions (7)
1964
+
1965
+ Used for `workflows/review.md` scored gate:
1966
+
1967
+ 1. **Correctness** -- does the code do what the spec says?
1968
+ 2. **Completeness** -- are all acceptance criteria met?
1969
+ 3. **Wiring** -- are all paths connected end-to-end?
1970
+ 4. **Verification** -- is there evidence (tests, type checks) for each claim?
1971
+ 5. **Drift** -- does the implementation match the approved plan?
1972
+ 6. **Quality** -- code style, naming, error handling, security
1973
+ 7. **Documentation** -- changelog entries, commit messages, comments
1974
+
1975
+ The final review dimensions are the existing 7 from `skills/reviewer/SKILL.md`. `workflows/review.md` is not modified by this pattern.
1976
+
1977
+ ---
1978
+
1979
+ ## Per-Depth Coverage Contract
1980
+
1981
+ | Depth | Research | Spec | Design-Review | Plan | Task Execution | Final Review |
1982
+ |-------|----------|------|---------------|------|----------------|--------------|
1983
+ | Quick | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | always 7 dims, 1 pass |
1984
+ | Standard | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | always 7 dims, 1 pass |
1985
+ | Deep | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-7, 7 passes | dims 1-5, 7 passes | always 7 dims, 1 pass |
1986
+
1987
+ Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7 passes. No extension. No early-exit. Final review is always a single scored pass across all 7 dimensions -- it is a gate, not a loop.
1988
+
1989
+ ---
1990
+
1991
+ ## Loop Cap Configuration
1992
+
1993
+ The `workflow_policy` section of `run-config.yaml` (legacy: `phase_policy`) controls which workflows are enabled and sets an absolute safety ceiling per workflow. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not workflow policy.
1994
+
1995
+ ```yaml
1996
+ workflow_policy:
1997
+ # Clarifier phase workflows
1998
+ discover: { enabled: true, loop_cap: 10 }
1999
+ clarify: { enabled: true, loop_cap: 10 }
2000
+ specify: { enabled: true, loop_cap: 10 }
2001
+ spec-challenge: { enabled: true, loop_cap: 10 }
2002
+ author: { enabled: false, loop_cap: 10 }
2003
+ design: { enabled: true, loop_cap: 10 }
2004
+ design-review: { enabled: true, loop_cap: 10 }
2005
+ plan: { enabled: true, loop_cap: 10 }
2006
+ plan-review: { enabled: true, loop_cap: 10 }
2007
+ # Executor phase workflows
2008
+ execute: { enabled: true, loop_cap: 10 }
2009
+ verify: { enabled: true, loop_cap: 5 }
2010
+ review: { enabled: true, loop_cap: 10 }
2011
+ learn: { enabled: true, loop_cap: 5 }
2012
+ prepare_next: { enabled: true, loop_cap: 5 }
2013
+ run_audit: { enabled: false, loop_cap: 10 }
2014
+ ```
2015
+
2016
+ **`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
2017
+
2018
+ **Adaptive workflows** (`author`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection.
2019
+
2020
+ **Post-run workflows** (`learn`, `prepare_next`) default to `enabled: true`. They run as part of the Final Review phase:
2021
+
2022
+ - `learn` extracts durable learnings from review findings -- recurring findings become accepted learnings.
2023
+ - `prepare_next` prepares context and handoff for the next run.
2024
+ - `author` has a human approval gate, not an iterative review loop.
2025
+ - `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
2026
+
2027
+ ---
2028
+
2029
+ ## Reviewer Mode Table
2030
+
2031
+ The reviewer skill operates in different modes depending on the phase. **Mode is always explicit** -- the caller passes `--mode <mode>`. There is no auto-detection based on artifact availability.
2032
+
2033
+ | Mode | Invoked during | Prerequisites | Dimensions | Output |
2034
+ |------|---------------|---------------|------------|--------|
2035
+ | `final` | After execution + verification | Completed task artifacts in `.wazir/runs/latest/artifacts/` | 7 final-review dims, scored 0-70 | Verdict: PASS/NEEDS FIXES/NEEDS REWORK/FAIL |
2036
+ | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Findings with severity, no score |
2037
+ | `design-review` | After design approval | Design artifact, approved spec, accessibility guidelines | 5 design-review dims (canonical) | Findings with severity (blocking/advisory) |
2038
+ | `plan-review` | After planning | Draft plan, approved spec, design artifact | 7 plan dims | Findings with severity, no score |
2039
+ | `task-review` | During execution, per task | Uncommitted changes (or committed with known base SHA) | 5 task-execution dims | Pass/fail per task, no score |
2040
+ | `research-review` | During discover | Research artifact | 5 research dims | Findings with severity, no score |
2041
+ | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Findings with severity, no score |
2042
+
2043
+ If `--mode` is not provided, the reviewer asks the user which review to run. Auto-detection based on artifact availability is NOT used -- it causes ambiguity in resumed/multi-phase runs where stale artifacts from prior phases exist.
2044
+
2045
+ Each caller is responsible for passing the correct mode:
2046
+
2047
+ - Clarifier passes `--mode clarification-review` after Phase 1A
2048
+ - Discover workflow passes `--mode research-review` after research
2049
+ - Specifier flow passes `--mode spec-challenge` after specify
2050
+ - Brainstorming passes `--mode design-review` after user approval
2051
+ - Writing-plans passes `--mode plan-review` after planning
2052
+ - Executor passes `--mode task-review` for each task
2053
+ - `/wazir` runner passes `--mode final` for the final review gate
2054
+
2055
+ ---
2056
+
2057
+ ## Codex Prompt Templates
2058
+
2059
+ All Codex invocations read the model from config with a fallback:
2060
+
2061
+ ```bash
2062
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
2063
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
2064
+ ```
2065
+
2066
+ ### Artifact Review (specs, plans, designs via stdin)
2067
+
2068
+ Use this template with `codex exec` for non-code artifacts piped via stdin:
2069
+
2070
+ ```bash
2071
+ cat <artifact_path> | codex exec -c model="$CODEX_MODEL" \
2072
+ "You are reviewing a [ARTIFACT_TYPE] for the Wazir engineering OS.
2073
+ Focus on [DIMENSION]: [dimension description].
2074
+ Rules: cite specific sections, be actionable, say CLEAN if no issues.
2075
+ Do NOT load or invoke any skills. Do NOT read the codebase.
2076
+ Review ONLY the content provided via stdin."
2077
+ ```
2078
+
2079
+ Replace `[ARTIFACT_TYPE]` with: `specification`, `implementation plan`, `design document`, `research brief`, or `clarification`.
2080
+ Replace `[DIMENSION]` and `[dimension description]` with the current review pass dimension from the relevant dimension set above.
2081
+
2082
+ ### Code Review (diffs via --uncommitted or --base)
2083
+
2084
+ Use this template with `codex review` for code changes:
2085
+
2086
+ ```bash
2087
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
2088
+ "Review the code changes for [DIMENSION]: [dimension description].
2089
+ Check against acceptance criteria: [criteria].
2090
+ Flag: correctness issues, missing tests, unwired paths, drift from spec.
2091
+ Do NOT load or invoke any skills."
2092
+ ```
2093
+
2094
+ For committed changes, replace `--uncommitted` with `--base <sha>`.
2095
+ Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
2096
+
2097
+ ---
2098
+
2099
+ ## Codex Output Context Protection
2100
+
2101
+ Codex CLI output includes internal traces (file reads, tool calls, reasoning) that are NOT useful for the review — only the final findings matter. To prevent context flooding:
2102
+
2103
+ ### Tee + Extract Pattern
2104
+
2105
+ 1. **Always tee** Codex output to a file:
2106
+ ```bash
2107
+ codex exec ... 2>&1 | tee .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
2108
+ ```
2109
+
2110
+ 2. **Extract findings** after the last `codex` marker using `execute_file`:
2111
+ ```bash
2112
+ # If context-mode available (has_execute_file: true):
2113
+ mcp__plugin_context-mode_context-mode__execute_file(
2114
+ path: ".wazir/runs/latest/reviews/<phase>-review-pass-<N>.md",
2115
+ language: "shell",
2116
+ code: "tac $FILE | sed '/^codex$/q' | tac | tail -n +2"
2117
+ )
2118
+ ```
2119
+
2120
+ 3. **Present extracted findings only** — the raw trace stays in the file for debugging but never enters the main context window.
2121
+
2122
+ ### Fallback (no context-mode)
2123
+
2124
+ If `context_mode.has_execute_file` is false, extract using shell directly:
2125
+
2126
+ ```bash
2127
+ tac <file> | sed '/^codex$/q' | tac | tail -n +2
2128
+ ```
2129
+
2130
+ This reverses the file, finds the first (= last original) `codex` marker, reverses back, and skips the marker line.
2131
+
2132
+ **If no marker found:** fail closed
2133
+
2134
+ ---
2135
+
2136
+ ## Phase Scoring: First vs Final Artifact Comparison
2137
+
2138
+ At the start of each review loop (pass 1), score the artifact on its phase's canonical dimension set (1-10 per dimension). At the end of the loop (final pass), score again using the **same canonical dimensions**. Present the delta in the end-of-phase report.
2139
+
2140
+ ### Canonical Dimension Sets Per Phase
2141
+
2142
+ These are the fixed rubrics — no ad-hoc dimension selection:
2143
+
2144
+ | Phase | Canonical Dimensions |
2145
+ |-------|---------------------|
2146
+ | research-review | Coverage, Source quality, Relevance, Gaps identified, Actionability |
2147
+ | clarification-review / spec-challenge | Completeness, Testability, Ambiguity, Assumptions, Scope creep |
2148
+ | design-review | Spec coverage, Design-spec consistency, Accessibility, Visual consistency, Exported-code fidelity |
2149
+ | plan-review | Completeness, Testability, Task granularity, Dependency correctness, Phase structure, File coverage, Estimation accuracy |
2150
+ | task-review | Correctness, Tests, Wiring, Drift, Quality |
2151
+ | final | Correctness, Completeness, Wiring, Verification, Drift, Quality, Documentation |
2152
+
2153
+ ### Scoring Rules
2154
+
2155
+ 1. Initial and final scores MUST use the **same dimension set** — the delta is only meaningful on the same rubric.
2156
+ 2. The reviewer records which dimension set was used in each pass file.
2157
+ 3. Delta format: `Dimension: X/10 → Y/10 (+Z)`.
2158
+
2159
+ ### Quality Delta Report Section
2160
+
2161
+ The end-of-phase report (see "End-of-Phase Report" below) includes a **Quality Delta** section:
2162
+
2163
+ ```markdown
2164
+ ## Quality Delta
2165
+
2166
+ | Dimension | Initial | Final | Delta |
2167
+ |-----------|---------|-------|-------|
2168
+ | Completeness | 4/10 | 9/10 | +5 |
2169
+ | Testability | 3/10 | 8/10 | +5 |
2170
+ | Ambiguity | 5/10 | 9/10 | +4 |
2171
+ ```
2172
+
2173
+ ---
2174
+
2175
+ ## End-of-Phase Report
2176
+
2177
+ Every phase exit produces a report saved to `.wazir/runs/latest/reviews/<phase>-report.md` containing:
2178
+
2179
+ 1. **Summary** — what the phase produced
2180
+ 2. **Key Changes** — first-version vs final-version highlights (not full diff — what improved)
2181
+ 3. **Quality Delta** — per-dimension before/after scores (see Phase Scoring above)
2182
+ 4. **Findings Log** — per-pass finding counts by severity (e.g., "Pass 1: 6 findings (3 blocking, 2 warning, 1 note). Pass 7: 0 findings. All resolved.")
2183
+ 5. **Usage** — token usage from `wazir capture usage` (runs before report generation)
2184
+ 6. **Context Savings** — context-mode stats if available, omit section if not
2185
+ 7. **Time Spent** — wall-clock elapsed time from phase start to end — log "codex marker not found in output, cannot extract findings" and present a warning to the user with 0 findings extracted. The raw file is preserved for manual review. Do NOT fall back to `tail` or any best-effort extraction that could leak traces into context.
2186
+
1539
2187
  ---
1540
2188
  ## Source: docs/reference/roles-reference.md
1541
2189
 
@@ -1576,6 +2224,7 @@ This is the lookup reference for canonical roles, workflows, and their contracts
1576
2224
  | `review` | `verify` | Adversarial quality review |
1577
2225
  | `learn` | `review` | Capture scoped learnings |
1578
2226
  | `prepare-next` | `learn` | Produce clean next-run handoff |
2227
+ | `run-audit` | (standalone) | Structured codebase audit with source-backed findings |
1579
2228
 
1580
2229
  ## Role routing valid values
1581
2230
 
@@ -1617,6 +2266,157 @@ Roles that explore broadly (clarifier, researcher, planner) benefit most from L1
1617
2266
 
1618
2267
  See [Indexing and Recall](../concepts/indexing-and-recall.md) for full details on tiers and commands.
1619
2268
 
2269
+ ---
2270
+ ## Source: docs/reference/skill-tiers.md
2271
+
2272
+ # Skill Tier Classification
2273
+
2274
+ Audit of Wazir skills against Superpowers v4.3.1 skills.
2275
+ Each skill is classified into one of three tiers:
2276
+
2277
+ - **Delegate** -- use superpowers skill as-is, delete Wazir fork
2278
+ - **Augment** -- use superpowers skill + inject Wazir context addendum (strictly additive, no overrides). **NOTE:** R2 validation found this tier is not implementable -- see [Augment Mechanism](#augment-mechanism) below.
2279
+ - **Own** -- Wazir-original or structurally rewritten skill, rename to `wz:` prefix
2280
+
2281
+ ---
2282
+
2283
+ ## Classification Table
2284
+
2285
+ | Wazir Skill | Superpowers Equivalent | Tier | Rationale | Risk Notes |
2286
+ |---|---|---|---|---|
2287
+ | brainstorming | brainstorming | **Own** | Structurally rewritten. Superpowers version is a linear checklist (explore context, ask questions, propose approaches, present design, write doc, invoke writing-plans). Wazir replaces the entire process: adds Command Routing and Codebase Exploration preambles, replaces the design-doc step with a design-review loop (`--mode design-review` with canonical dimensions), outputs to `.wazir/runs/latest/clarified/design.md` instead of `docs/plans/`, and adds a complete Agent Teams multi-agent brainstorming mode (Free Thinker / Grounder / Synthesizer / Arbiter pattern using TeamCreate/SendMessage). None of the superpowers process steps survive intact. | Dropping the Agent Teams mode would lose Wazir's most differentiated brainstorming capability. |
2288
+ | clarifier | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2289
+ | debugging | systematic-debugging | **Own** | Structurally rewritten. Superpowers has a 4-phase process (Root Cause Investigation with 5 substeps, Pattern Analysis, Hypothesis and Testing, Implementation) totaling ~300 lines with detailed examples, rationalization tables, and supporting technique references. Wazir condenses this to a 4-step observe-hypothesize-test-fix loop (~75 lines), replaces all codebase exploration with Wazir CLI symbol-first exploration (`wazir index search-symbols`, `wazir recall symbol` and `wazir recall file`), adds loop cap awareness (pipeline mode with `wazir capture loop-check` vs. standalone mode), and removes all superpowers examples, rationalization tables, and red-flag lists. The methodology is fundamentally different in structure despite sharing the spirit of "root cause first." | Delegating would lose Wazir CLI integration and loop cap awareness. Superpowers version is far more detailed on anti-patterns and may be worth referencing separately. |
2290
+ | design | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2291
+ | dispatching-parallel-agents | dispatching-parallel-agents | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core (When to Use decision tree, The Pattern with 4 steps, Agent Prompt Structure, Common Mistakes section) plus Wazir additions (Command Routing preamble, Codebase Exploration preamble, philosophical paragraph in Overview, Problem/Fix format for Common Mistakes). Drops superpowers-only sections: "When NOT to Use," "Real Example from Session," "Key Benefits," "Verification," "Real-World Impact." | Superpowers informational sections (Real Example, Key Benefits, Verification, Real-World Impact) not carried forward. Low risk -- these are teaching content, not behavioral. |
2292
+ | executing-plans | executing-plans | **Own** | Structurally rewritten. Superpowers uses batch execution (default first 3 tasks) with report-and-wait checkpoints and explicit batch feedback loops. Wazir replaces batching with per-task execution, adds a per-task review loop (`--mode task-review` with 5 task-execution dimensions, Codex integration, review log filenames, loop cap tracking via `wazir capture loop-check`), adds standalone vs. pipeline mode detection, and adds a note recommending wz:subagent-driven-development when subagents are available. The batch-vs-per-task change is a core behavioral difference. All integration references point to `wz:` skills. | Delegating would lose per-task review loops and pipeline mode integration. |
2293
+ | executor | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2294
+ | finishing-a-development-branch | finishing-a-development-branch | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers process (5 steps: verify tests, determine base branch, present 4 options, execute choice, cleanup worktree) preserved with identical structure and identical option semantics. Wazir adds Command Routing and Codebase Exploration preambles. Minor cosmetic changes: `<N>` removed from failure template, `<base-branch>` shortened to `<base>`, emoji checkmarks replaced with Y/-, `<commit-list>` changed to `<count>`, PR body simplified. Red Flags and Integration sections trimmed but no behavioral contradiction. | Low risk. The superpowers version has more detailed Red Flags and Integration sections not carried forward. |
2295
+ | humanize | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2296
+ | init-pipeline | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2297
+ | prepare-next | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2298
+ | receiving-code-review | receiving-code-review | **Own** | Structurally rewritten. Superpowers has extensive sections: Forbidden Responses, Source-Specific Handling, YAGNI Check, Implementation Order, When To Push Back, Acknowledging Correct Feedback (with detailed anti-patterns for gratitude), Gracefully Correcting Pushback, Common Mistakes table, Real Examples, and GitHub Thread Replies. Wazir preserves the core Response Pattern and Forbidden Responses but: (1) adds Loop Tracking section (pipeline mode with `wazir capture loop-check` and standalone pass counts), (2) restructures Implementation Order to a 4-tier priority (blocking, functional, quality, nice-to-have) instead of 3-tier, (3) adds a Quick Reference decision table, (4) removes the entire "Acknowledging Correct Feedback" anti-gratitude section, the "Gracefully Correcting Pushback" section, the Common Mistakes table, all Real Examples, the "When To Push Back" enumeration, and the GitHub Thread Replies section. The Loop Tracking addition and structural deletions make this a substantive rewrite. | Delegating would lose loop tracking. The removed anti-gratitude and pushback sections from superpowers are valuable behavioral guardrails worth preserving. |
2299
+ | requesting-code-review | requesting-code-review | **Own** | Structurally rewritten. Both skills share the same When to Request triggers and Example structure. But Wazir: (1) replaces `superpowers:code-reviewer` with `wz:code-reviewer`, (2) adds explicit review loop parameters (`--mode`, depth-aware dimensions, pass number), (3) adds `codex review --uncommitted` and `codex review --base` commands, (4) adds Codex Error Handling section, (5) adds `{REVIEW_MODE}` placeholder, (6) changes Integration section to reference per-task review checkpoints instead of batch review, (7) adds "Dispatch review without explicit `--mode`" to Red Flags. The Codex integration and review loop parameter system are structural additions that change how reviews are dispatched. | Delegating would lose Codex integration and review loop protocol. |
2300
+ | reviewer | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2301
+ | run-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2302
+ | scan-project | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2303
+ | self-audit | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2304
+ | subagent-driven-development | subagent-driven-development | **Own** | Structurally rewritten. Both share the same high-level process (fresh subagent per task, two-stage review, spec then quality). But Wazir: (1) adds `Capture PRE_TASK_SHA` step to the process flowchart for diff scoping, (2) adds Code Review Scoping section (`codex review --base <pre-task-sha>`), (3) adds Review Loop Alignment section (explicit `--mode task-review`, task-scoped log filenames, loop cap via `wazir capture loop-check`), (4) adds Codex Error Handling section, (5) adds standalone mode fallback, (6) changes all skill references from `superpowers:` to `wz:`, (7) adds "Review the wrong diff" to Red Flags, (8) removes the Example Workflow, Advantages detail, and Cost breakdown from superpowers. The diff-scoping and review-loop integration are structural process changes. | Delegating would lose diff-scoped reviews and Codex integration. The removed Example Workflow from superpowers is a useful teaching tool. |
2305
+ | tdd | test-driven-development | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~370 lines): detailed Red-Green-Refactor with Good/Bad code examples, Iron Law with explicit "delete and start over" rules, a Verification Checklist, extensive Why Order Matters section, Common Rationalizations table, When Stuck guide, Testing Anti-Patterns reference, and Debugging Integration. Wazir condenses to ~45 lines with 3 steps (RED, GREEN, REFACTOR), adds a single-pass test quality check in RED phase ("Are these tests testing the right behavior? Are they real assertions?"), and removes all examples, rationalization tables, and elaboration. Different description and name (`wz:tdd` vs `test-driven-development`). | Delegating would lose the test quality check. The superpowers version's extensive rationalization prevention and examples are valuable for discipline enforcement but costly in tokens. |
2306
+ | using-git-worktrees | using-git-worktrees | **Own** | Reclassified from Augment to Own (R2). Skill shadowing is full-override, so Augment tier is not implementable via `~/.claude/skills/`. Wazir already carries the full content: superpowers core process (directory selection priority, safety verification with `git check-ignore`, creation steps, project setup auto-detection, clean baseline verification) preserved structurally intact. Wazir adds: Command Routing preamble, Codebase Exploration preamble, global directory changed from `~/.config/superpowers/worktrees/` to `~/.wazir/worktrees/`, Cleanup and Common Issues sections (submodules, lock files, stale worktrees). Drops superpowers-only sections: Example Workflow, Quick Reference table, Common Mistakes, Red Flags, Integration. | Dropped superpowers sections (Quick Reference, Common Mistakes, Red Flags, Integration) reduce operational guardrails. Could be recovered into the Own skill. |
2307
+ | using-skills | using-superpowers | **Own** | Structurally rewritten. Both enforce the same core rule (invoke skills before any response, even at 1% chance). But Wazir: (1) renames from `using-superpowers` to `using-skills`, (2) changes all internal skill references from `superpowers:` to `wz:` throughout flowchart and examples, (3) removes the Skill Types section detail about "Rigid vs Flexible" elaboration, (4) removes User Instructions elaboration. The name change and systematic `wz:` prefix replacement throughout the flowchart make this a namespace-level rewrite. | Could potentially be Augment if namespace mapping were handled at a routing layer rather than in-skill. |
2308
+ | verification | verification-before-completion | **Own** | Structurally rewritten. Superpowers has an exhaustive treatment (~140 lines): Iron Law, Gate Function (5-step IDENTIFY/RUN/READ/VERIFY/CLAIM), Common Failures table, Red Flags list, Rationalization Prevention table, Key Patterns (tests, regression, build, requirements, agent delegation), Why This Matters section with 24 failure memories, and When To Apply section. Wazir condenses to ~35 lines with 3 bullet requirements (what was verified, exact command, actual result), a minimum rule, and a brief "when verification fails" section. Different name (`wz:verification` vs `verification-before-completion`). | Delegating would lose the concise Wazir format. The superpowers version's extensive rationalization prevention is valuable for discipline but token-expensive. The Wazir version may be too terse to enforce the discipline effectively. |
2309
+ | wazir | _(none)_ | **Own** | Wazir-original. No superpowers counterpart exists. | -- |
2310
+ | writing-plans | writing-plans | **Own** | Structurally rewritten. Superpowers focuses on plan document format (header template, task structure with bite-sized steps, code examples in plan, execution handoff to subagent-driven or parallel session). Wazir: (1) changes inputs to "approved design or approved clarified direction" instead of "spec or requirements", (2) adds pipeline-aware output paths (`.wazir/runs/latest/clarified/execution-plan.md` and `.wazir/runs/latest/tasks/task-NNN/spec.md` vs. standalone `docs/plans/`), (3) removes the plan document format template entirely (no header template, no task structure template, no code examples), (4) adds Plan Review Loop section with `wz:reviewer --mode plan-review`, Codex integration via stdin pipe, Codex error handling, depth-aware pass counts, and standalone fallback. The plan review loop and pipeline path system are structural additions; the removal of the format template is a structural deletion. | Delegating would lose pipeline integration and plan review loop. The removed format template from superpowers is valuable for plan quality and could be worth recovering. |
2311
+ | writing-skills | writing-skills | **Own** | Structurally rewritten. Both share the TDD-for-skills philosophy and RED-GREEN-REFACTOR mapping. But Wazir: (1) condenses from ~650 lines to ~170 lines, (2) removes the extensive SKILL.md Structure template, CSO (Claude Search Optimization) section, Flowchart Usage guidelines, Code Examples guidelines, Token Efficiency section, File Organization examples, Testing All Skill Types section (discipline/technique/pattern/reference), Common Rationalizations for Skipping Testing table, Bulletproofing Skills Against Rationalization section (with Cialdini psychology reference), Skill Creation Checklist, Discovery Workflow, Anti-Patterns section, and STOP deployment gate, (3) adds "Be Prescriptive, Not Descriptive" guidance, "Use Rationalization Prevention" example, "Include Decision Trees" guidance, and skill reference syntax. The massive content reduction and different teaching approach make this a structural rewrite. | Delegating would lose the concise prescriptive format. The superpowers version's CSO guidelines, testing methodology, and anti-pattern catalog are extremely valuable reference material. |
2312
+
2313
+ ---
2314
+
2315
+ ## Superpowers Skills with No Wazir Counterpart
2316
+
2317
+ These superpowers skills have no Wazir fork. They could be used as-is via the superpowers plugin.
2318
+
2319
+ | Superpowers Skill | Status | Notes |
2320
+ |---|---|---|
2321
+ | using-superpowers | Replaced by `wz:using-skills` | See using-skills row above. |
2322
+
2323
+ All 14 superpowers skills have a Wazir counterpart (using-superpowers maps to using-skills, systematic-debugging maps to debugging, test-driven-development maps to tdd, verification-before-completion maps to verification).
2324
+
2325
+ ---
2326
+
2327
+ ## Summary by Tier
2328
+
2329
+ | Tier | Count | Skills |
2330
+ |---|---|---|
2331
+ | **Own** | 25 | brainstorming, clarifier, debugging, design, dispatching-parallel-agents, executing-plans, executor, finishing-a-development-branch, humanize, init-pipeline, prepare-next, receiving-code-review, requesting-code-review, reviewer, run-audit, scan-project, self-audit, subagent-driven-development, tdd, using-git-worktrees, using-skills, verification, wazir, writing-plans, writing-skills |
2332
+ | **Augment** | 0 | _(none -- tier not implementable, see [Augment Mechanism](#augment-mechanism))_ |
2333
+ | **Delegate** | 0 | _(none)_ |
2334
+
2335
+ ---
2336
+
2337
+ ## Common Wazir Additions (Appear in All Forked Skills)
2338
+
2339
+ Every Wazir fork of a superpowers skill adds these two preamble sections:
2340
+
2341
+ 1. **Command Routing** -- routes large commands to context-mode tools and small commands to native Bash, following `hooks/routing-matrix.json`.
2342
+ 2. **Codebase Exploration** -- prescribes symbol-first exploration via `wazir index search-symbols` and `wazir recall`, with fallback to direct file reads.
2343
+
2344
+ These preambles alone would justify **Augment** tier for any skill where no other structural changes exist.
2345
+
2346
+ ---
2347
+
2348
+ ## Augment Mechanism
2349
+
2350
+ **Research date:** 2026-03-19 (R2: Composition Infrastructure Validation)
2351
+
2352
+ ### Finding: Augment tier is not implementable
2353
+
2354
+ The Augment tier assumed that placing a Wazir addendum at `~/.claude/skills/<skill-name>/SKILL.md` would layer Wazir context on top of the superpowers base skill. This assumption is wrong. **Skill shadowing is full-override, not merge/append.**
2355
+
2356
+ ### Evidence
2357
+
2358
+ **1. `skills-core.js` `resolveSkillPath()` (superpowers v4.3.1)**
2359
+
2360
+ The function at `lib/skills-core.js:108-140` checks personal skills directory first. If `~/.claude/skills/<name>/SKILL.md` exists, it returns that file immediately and never reads the superpowers version. There is no content merging.
2361
+
2362
+ ```
2363
+ // Try personal skills first (unless explicitly superpowers:)
2364
+ if (!forceSuperpowers && personalDir) {
2365
+ const personalSkillFile = path.join(personalDir, actualSkillName, 'SKILL.md');
2366
+ if (fs.existsSync(personalSkillFile)) {
2367
+ return { skillFile: personalSkillFile, sourceType: 'personal', ... };
2368
+ // ^^^ returns here -- superpowers version never consulted
2369
+ }
2370
+ }
2371
+ ```
2372
+
2373
+ **2. Superpowers test suite confirms override behavior**
2374
+
2375
+ `tests/opencode/test-skills-core.sh` line 336 asserts:
2376
+ ```
2377
+ [PASS] Personal skills shadow superpowers skills
2378
+ ```
2379
+
2380
+ The test creates `personal-skills/shared-skill/SKILL.md` and `superpowers-skills/shared-skill/SKILL.md`, resolves `shared-skill`, and verifies `sourceType` is `"personal"` -- the superpowers version is invisible.
2381
+
2382
+ **3. Superpowers RELEASE-NOTES.md v3.3.0**
2383
+
2384
+ Line 385 documents the behavior explicitly: "Personal skills override superpowers skills when names match."
2385
+
2386
+ **4. The `superpowers:` prefix bypass is not available in Claude Code**
2387
+
2388
+ `skills-core.js` supports `superpowers:skill-name` syntax to force resolution to the superpowers version even when a personal skill shadows it. However, `skills-core.js` is only used by the OpenCode plugin (`/.opencode/plugins/superpowers.js`). Claude Code's native `Skill` tool has its own built-in resolution logic that does not expose this prefix bypass.
2389
+
2390
+ ### Alternatives Considered
2391
+
2392
+ | Approach | Viable? | Why |
2393
+ |---|---|---|
2394
+ | Place addendum in `~/.claude/skills/<name>/` | No | Full override -- base skill content lost |
2395
+ | Merge base + addendum in SKILL.md at install time | Partial | Would work but creates a maintenance coupling: every superpowers update requires re-merging. This is functionally identical to Own tier. |
2396
+ | Inject Wazir context via CLAUDE.md | No | CLAUDE.md is project-scoped; skill behavior should be global across all projects |
2397
+ | Use `superpowers:` prefix to load base, then append | No | Prefix only works in OpenCode's `skills-core.js`, not in Claude Code's native Skill tool |
2398
+ | Propose upstream merge/append feature | Future | Would require a superpowers or Claude Code platform change |
2399
+
2400
+ ### Conclusion
2401
+
2402
+ The Augment tier is architecturally impossible with the current skill discovery mechanism. All three former Augment skills (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) are reclassified to **Own** tier. Since the Wazir versions already carry the full superpowers base content plus Wazir additions, no content is lost -- the skills simply cannot delegate to a shared base.
2403
+
2404
+ If superpowers or Claude Code introduces a composition/layering mechanism in the future (e.g., `extends: superpowers:dispatching-parallel-agents` in frontmatter), the Augment tier could be revisited.
2405
+
2406
+ ---
2407
+
2408
+ ## Observations
2409
+
2410
+ 1. **No Delegate candidates exist.** Every Wazir fork adds at minimum the Command Routing and Codebase Exploration preambles, which prevents pure delegation.
2411
+
2412
+ 2. **Augment tier is not implementable.** R2 validation (2026-03-19) found that skill shadowing in both superpowers `skills-core.js` and Claude Code's native Skill tool is full-override: placing a SKILL.md in `~/.claude/skills/<name>/` completely replaces the superpowers skill with the same name. There is no merge or append mechanism. The three former Augment candidates (dispatching-parallel-agents, finishing-a-development-branch, using-git-worktrees) have been reclassified to Own. See [Augment Mechanism](#augment-mechanism) for full analysis.
2413
+
2414
+ 3. **All 14 forked skills are Own** because either (a) they introduce structural process changes (review loops, pipeline mode, Codex integration, Agent Teams, content restructuring) or (b) the Augment composition mechanism does not exist in the platform.
2415
+
2416
+ 4. **Token cost tradeoff is significant.** Several Wazir Own skills (tdd, verification, debugging, writing-skills) are dramatically shorter than their superpowers counterparts. The superpowers versions contain valuable rationalization prevention tables, detailed examples, and anti-pattern catalogs that enforce discipline. The Wazir versions trade this for token efficiency. This tradeoff should be revisited -- some of the removed discipline content may be worth recovering as separate reference files.
2417
+
2418
+ 5. **The `wz:` prefix is already applied** in skill names within the Wazir SKILL.md frontmatter for all forked skills, consistent with the Own tier convention.
2419
+
1620
2420
  ---
1621
2421
  ## Source: docs/reference/skills.md
1622
2422
 
@@ -1707,6 +2507,7 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
1707
2507
  | `wazir validate commits` | implemented | Validates conventional commit format for commits in the range `--base..--head` (or auto-detected base to HEAD). |
1708
2508
  | `wazir validate changelog` | implemented | Validates `CHANGELOG.md` structure; with `--require-entries` and `--base`, enforces new entries since the base. |
1709
2509
  | `wazir validate docs-drift` | implemented | Detects when source files (roles, workflows, skills, hooks) change without corresponding documentation updates. Advisory by default; `--strict` exits non-zero on drift. |
2510
+ | `wazir validate skills` | implemented | Validates skill frontmatter and checks for name conflicts with superpowers skills (requires `wz:` prefix). Rejects any `CONTEXT.md` files (augment tier concluded not implementable in R2). |
1710
2511
  | `wazir validate artifacts` | reserved | Exits `2` until artifact-template and example validation expands. |
1711
2512
  | `wazir export build` | implemented | Generates host packages under `exports/hosts/*` from canonical sources. |
1712
2513
  | `wazir export --check` | implemented | Verifies generated host packages still match current canonical source hashes. |
@@ -1720,19 +2521,22 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
1720
2521
  | `wazir recall file` | implemented | Returns an exact line-bounded slice from an indexed file. Supports `--tier L0\|L1` for summary recall. |
1721
2522
  | `wazir recall symbol` | implemented | Returns an exact slice for an indexed symbol match. Supports `--tier L0\|L1` for summary recall. |
1722
2523
  | `wazir doctor` | implemented | Validates the active repo surface for manifest, hooks, state-root policy, and host export directory presence. |
1723
- | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. |
2524
+ | `wazir status` | implemented | Reads run status directly from `<state-root>/runs/<run-id>/status.json`. Includes a one-line context savings summary when usage data is available. |
2525
+ | `wazir stats` | implemented | Shows token savings statistics for a run, including total queries, estimated tokens saved, bytes avoided, per-tool breakdown, and overall savings ratio. |
1724
2526
  | `wazir capture init` | implemented | Creates a run ledger with `status.json`, `events.ndjson`, and a captures directory under the configured state root. |
1725
2527
  | `wazir capture event` | implemented | Appends a run event and can update phase, status, and loop counts in `status.json`. |
1726
2528
  | `wazir capture route` | implemented | Reserves a run-local capture file path for large tool output. |
1727
2529
  | `wazir capture output` | implemented | Writes captured tool output to a run-local file and records a `post_tool_capture` event. |
1728
2530
  | `wazir capture summary` | implemented | Writes `summary.md` and records the chosen summary or handoff event. |
1729
2531
  | `wazir capture usage` | implemented | Generates a token savings report for a run, showing capture routing statistics and context window savings. |
2532
+ | `wazir capture loop-check` | implemented | Records a loop iteration event and evaluates the loop cap guard. Exits 43 if the phase loop cap is exceeded. Accepts `--task-id` for task-scoped cap tracking. In standalone mode (no status.json), exits 0. |
1730
2533
 
1731
2534
  ## Exit codes
1732
2535
 
1733
2536
  - `0`: requested check passed
1734
2537
  - `1`: invalid input or validation failure
1735
2538
  - `2`: command surface exists but the implementation is intentionally not complete yet
2539
+ - `43`: phase loop cap exceeded (returned by `wazir capture loop-check`)
1736
2540
 
1737
2541
  ## Root discovery
1738
2542
 
@@ -1796,7 +2600,7 @@ Executable documentation claims are registered in:
1796
2600
  </picture>
1797
2601
  </p>
1798
2602
 
1799
- <h3 align="center">Wazir: engineering with itqan.</h3>
2603
+ <h3 align="center">Engineering with itqan.</h3>
1800
2604
 
1801
2605
  <p align="center">
1802
2606
  <a href="https://github.com/MohamedAbdallah-14/Wazir/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/MohamedAbdallah-14/Wazir/ci.yml?branch=main&label=CI" alt="CI"></a>
@@ -1814,74 +2618,54 @@ Executable documentation claims are registered in:
1814
2618
  <img src="https://img.shields.io/badge/Cursor-supported-FF6B35" alt="Cursor">
1815
2619
  </p>
1816
2620
 
1817
- <!-- Demo GIF: run assets/record-demo.sh to generate assets/demo.gif, then uncomment the img tag below -->
1818
- <!-- <p align="center"><img src="assets/demo.gif" alt="Wazir Demo" width="700"></p> -->
1819
-
1820
- A host-native operating model for AI coding agents. Wazir gives Claude, Codex, Gemini, and Cursor a 14-phase delivery pipeline, 10 canonical roles with enforceable contracts, 3 adversarial review phases with 9 hard approval gates, and 261 curated expertise modules loaded automatically per task. No server. No wrapper. No custom orchestration.
1821
-
1822
- Install once. Your agent works the way your best engineer does.
1823
-
1824
- ---
1825
-
1826
- ## Table of Contents
1827
-
1828
- - [Why Wazir?](#why-wazir)
1829
- - [Quick Start](#quick-start)
1830
- - [The Pipeline](#the-pipeline)
1831
- - [How It Works](#how-it-works)
1832
- - [How Wazir Handles Complex Tasks](#how-wazir-handles-complex-tasks)
1833
- - [Token Savings](#token-savings)
1834
- - [What's Included](#whats-included)
1835
- - [Compared to Other Tools](#compared-to-other-tools)
1836
- - [Install](#install)
1837
- - [Documentation](#documentation)
1838
- - [Project Status](#project-status)
1839
- - [Acknowledgments](#acknowledgments)
1840
- - [Contributing](#contributing)
1841
- - [License](#license)
1842
2621
 
1843
2622
  ---
1844
2623
 
1845
- ## Why Wazir?
2624
+ > AI agents don't have a quality problem. They have a management problem.
1846
2625
 
1847
- AI coding agents fail the same five ways. Every time.
2626
+ I'm Mohamed Abdallah. I kept watching AI agents write confident code that broke in production, skip tests, and forget what we agreed on yesterday. So I stopped asking them to be better and built them an engineering department instead.
1848
2627
 
1849
- **Ambiguous specs become wrong code.** The clarifier role escalates unresolved ambiguity instead of guessing. No spec ships until material questions get answers. Escalation is a required output, not an option.
1850
-
1851
- **Output quality varies randomly.** The reviewer role is never the phase author. Adversarial review runs at three chokepoints -- spec-challenge, design-review, and final review -- always by a different model or model family. Nine hard approval gates block advancement until artifacts pass.
1852
-
1853
- **Context floods the window.** A 4-layer composition engine assembles only the relevant expertise modules per role per phase from a library of 261 curated modules across 12 domains. Max 15 modules per dispatch, token budget enforced. Three-tier recall (L0/L1/direct read) lets exploration roles load structural summaries instead of full files. Result: 60-80% fewer tokens on exploration-heavy phases. Run `wazir capture usage` to measure it.
1854
-
1855
- **Good solutions vanish between sessions.** Proposed learnings start isolated. Only learnings that pass explicit review and scope-tagging get promoted into future runs. Stale or disproven learnings are archived. The system improves per-project without drifting.
1856
-
1857
- **Nothing prevents structural failures.** Seven hook contracts enforce protected paths (exit 42), loop caps (exit 43), and session observability. Hooks are enforcement, not suggestions.
2628
+ **Wazir puts engineering discipline inside AI coding agents.**
2629
+ No wrapper. No server. Just structure -- inside Claude, Codex, Gemini, and Cursor. Built on 300+ research sources distilled into 268 curated expertise modules across 12 domains.
1858
2630
 
1859
2631
  ---
1860
2632
 
1861
2633
  ## Quick Start
1862
2634
 
1863
- **Step 1: Install**
1864
-
1865
2635
  ```bash
1866
2636
  /plugin marketplace add MohamedAbdallah-14/Wazir
1867
2637
  /plugin install wazir
1868
2638
  ```
1869
2639
 
1870
- **Step 2: Initialize**
2640
+ Then tell your agent what to build:
1871
2641
 
1872
- ```bash
1873
- /init-pipeline
2642
+ ```
2643
+ /wazir Build a REST API for managing tasks with authentication
1874
2644
  ```
1875
2645
 
1876
- **Step 3: Build something**
2646
+ That's it. The pipeline takes over -- clarifies your requirements, writes a spec, plans the work, implements with TDD, reviews, and learns for next time. You approve at the gates. Everything else is automatic.
1877
2647
 
1878
- Drop your requirements in the input directory or just tell the agent what you want:
2648
+ You can also control the depth and intent directly:
1879
2649
 
1880
2650
  ```
1881
- /clarifier Build a REST API for managing tasks with authentication
2651
+ /wazir quick fix the login redirect bug
2652
+ /wazir deep design a new onboarding flow
2653
+ /wazir audit security
1882
2654
  ```
1883
2655
 
1884
- That's it. The pipeline takes over -- clarifies your requirements, writes a spec, plans the work, implements with TDD, reviews, and learns for next time. You approve at the gates. Everything else is automatic.
2656
+ ---
2657
+
2658
+ ### The reviewer is never the author.
2659
+
2660
+ When your AI agent reviews its own code, it finds what it expected to find -- nothing. Wazir's adversarial reviewer is a separate agent with different expertise modules. It catches the mistakes your agent is structurally blind to.
2661
+
2662
+ ### Silence isn't confidence -- it's assumptions.
2663
+
2664
+ Your AI agent doesn't ask questions because it's sure. It doesn't ask questions because it's trained to be helpful. Wazir's clarifier forces ambiguity to the surface before a single line is written.
2665
+
2666
+ ### Done means verified, not declared.
2667
+
2668
+ AI agents love to announce they're finished. Wazir doesn't care. Every phase loops until the work and its verification converge. The agent doesn't get to say "done." The process decides.
1885
2669
 
1886
2670
  ---
1887
2671
 
@@ -1920,6 +2704,8 @@ graph LR
1920
2704
  style P8 fill:#c62828,color:#fff
1921
2705
  ```
1922
2706
 
2707
+
2708
+
1923
2709
  > **GATE** = Approval gate. The phase blocks until the reviewer explicitly approves. Rejection loops back to the authoring phase.
1924
2710
 
1925
2711
  ---
@@ -1930,23 +2716,9 @@ Three concepts.
1930
2716
 
1931
2717
  **1 -- Roles are isolation boundaries, not personas.** Each of the 10 roles has defined inputs, allowed tools, required outputs, escalation rules, and failure conditions. An agent inside a role cannot write to protected paths, cannot skip required outputs, and must escalate when ambiguity conditions are met. The discipline is structural, not instructional. See [Roles & Workflows](docs/concepts/roles-and-workflows.md).
1932
2718
 
1933
- **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 18 JSON schemas. See [Architecture](docs/concepts/architecture.md).
2719
+ **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 19 JSON schemas. See [Architecture](docs/concepts/architecture.md).
1934
2720
 
1935
- **3 -- The composition engine loads the right expert automatically.** A 4-layer system (always, auto, stacks, concerns) decides which of 261 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
1936
-
1937
- ---
1938
-
1939
- ## How Wazir Handles Complex Tasks
1940
-
1941
- Large coding tasks fail when agents lose track of quality. Wazir addresses this with three reinforcing mechanisms.
1942
-
1943
- **14-phase pipeline with 9 hard approval gates.** Every task passes through clarify, research, specify, design, plan, execute, verify, review, and learn. Nine transitions have hard blocking conditions. No phase is skipped, no shortcut taken. The pipeline is defined in `workflows/` and enforced by the orchestrator.
1944
-
1945
- **Adversarial review built in.** The reviewer role operates independently from the executor. It starts with structural summaries (L1 recall) to triage, then reads full source for logic errors, security concerns, or ambiguous code. Review criteria come from expertise modules, not guesswork.
1946
-
1947
- **TDD and verification-before-completion.** The executor writes failing tests before implementation (red-green-refactor). The verifier independently runs all tests, checks truth claims, and validates exports. No task completes until the verifier confirms all acceptance criteria pass. This catches regressions that the executor's own testing misses.
1948
-
1949
- The output is code held to the same standard a senior engineering team would enforce.
2721
+ **3 -- The composition engine loads the right expert automatically.** One agent pretending to be an expert in everything is an expert in nothing. A 4-layer system (always, auto, stacks, concerns) decides which of 268 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
1950
2722
 
1951
2723
  ---
1952
2724
 
@@ -1954,11 +2726,13 @@ The output is code held to the same standard a senior engineering team would enf
1954
2726
 
1955
2727
  Wazir's tiered recall system loads the minimum context each role needs.
1956
2728
 
1957
- | Tier | Tokens | Content | Used by |
1958
- |------|--------|---------|---------|
1959
- | L0 | ~100 | One-line identifier | learner (inventory scans) |
1960
- | L1 | ~500-2k | Structural summary | clarifier, researcher, planner, reviewer (exploration) |
1961
- | Direct read | Full file | Exact source lines | executor, verifier (implementation) |
2729
+
2730
+ | Tier | Tokens | Content | Used by |
2731
+ | ----------- | --------- | ------------------- | ------------------------------------------------------ |
2732
+ | L0 | ~100 | One-line identifier | learner (inventory scans) |
2733
+ | L1 | ~500-2k | Structural summary | clarifier, researcher, planner, reviewer (exploration) |
2734
+ | Direct read | Full file | Exact source lines | executor, verifier (implementation) |
2735
+
1962
2736
 
1963
2737
  Capture routing redirects large tool output to run-local files. The agent gets a file path (~50 tokens) instead of the full output. Combined with tiered recall, this yields 60-80% token reduction on exploration-heavy phases.
1964
2738
 
@@ -1987,23 +2761,21 @@ Run `wazir capture usage` at the end of a session to see the savings:
1987
2761
 
1988
2762
  ## What's Included
1989
2763
 
1990
- **10 canonical role contracts.** Clarifier, researcher, specifier, content-author, designer, planner, executor, verifier, reviewer, learner. Each has enforceable inputs, outputs, and escalation rules. The spec-challenge phase adversarially reviews every spec before planning begins. [Roles reference](docs/reference/roles-reference.md)
2764
+ **10 canonical role contracts.** Clarifier, researcher, specifier, content-author, designer, planner, executor, verifier, reviewer, learner. Each has enforceable inputs, outputs, and escalation rules. [Roles reference](docs/reference/roles-reference.md)
1991
2765
 
1992
- **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Three review phases and nine hard approval gates span the 14-phase pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
2766
+ **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Nine hard approval gates span the 14-phase pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
1993
2767
 
1994
- **261 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. [Expertise index](docs/reference/expertise-index.md)
2768
+ **268 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. Wazir ships with 268. Yours could be next. [Expertise index](docs/reference/expertise-index.md)
1995
2769
 
1996
- **Three-tier recall for token savings.** L0 (~100 tokens), L1 (~500-2k tokens), direct read for full source. Symbol-first exploration searches the index before reading source. Capture routing redirects large tool output to files. Result: 60-80% token reduction on exploration-heavy phases, measured per-session by `wazir capture usage`. [Indexing and Recall](docs/concepts/indexing-and-recall.md)
2770
+ **Three-tier recall for token savings.** L0 (~~100 tokens), L1 (~~500-2k tokens), direct read for full source. Symbol-first exploration searches the index before reading source. Capture routing redirects large tool output to files. Result: 60-80% token reduction on exploration-heavy phases, measured per-session by `wazir capture usage`. [Indexing and Recall](docs/concepts/indexing-and-recall.md)
1997
2771
 
1998
2772
  **Structured learning.** Proposed learnings require explicit review and scope tagging before promotion. Only learnings whose file patterns overlap the current task get injected into context. The system improves per-project without drifting.
1999
2773
 
2000
2774
  **7 hook contracts for structural guardrails.** These enforce protected path writes (exit 42), loop caps (exit 43), and session observability. [Hooks](docs/reference/hooks.md)
2001
2775
 
2002
- **20 callable skills.** wz:tdd, wz:verification, wz:debugging, wz:scan-project, wz:writing-plans, and 14 more. Each enforces an exact procedure with evidence at each step. [Skills](docs/reference/skills.md)
2003
-
2004
- **Built-in text humanization.** The `wz:humanize` skill and 7 dedicated expertise modules automatically remove AI vocabulary patterns from generated text. The composition engine loads domain-specific rules per role: code rules for the executor (commit messages, comments), content rules for the content-author (microcopy, glossary), and technical-docs rules for the specifier, planner, reviewer, and learner. A 61-item vocabulary blacklist, 24-pattern sentence taxonomy, and two-pass self-audit checklist keep all output sounding like it was written by a person.
2776
+ **20+ callable skills.** `/wazir` runs the full pipeline. `/wazir audit security` runs a codebase audit. `/wazir prd` generates a product requirements document from completed runs. Plus TDD, verification, debugging, and more -- each enforcing an exact procedure with evidence at every step. [Skills](docs/reference/skills.md)
2005
2777
 
2006
- **Content-author role before design.** This role produces finalized i18n keys, microcopy, glossary entries, state coverage, and accessibility copy before design begins.
2778
+ **Built-in text humanization.** The composition engine loads domain-specific language rules per role: code rules for the executor (commit messages, comments), content rules for the content-author (microcopy, glossary), and technical-docs rules for the specifier, planner, reviewer, and learner. A 61-item vocabulary blacklist, 24-pattern sentence taxonomy, and two-pass self-audit checklist keep all output sounding like it was written by a person.
2007
2779
 
2008
2780
  **Runs on 4 platforms.** `wazir export build` compiles canonical sources into native packages for Claude, Codex, Gemini, and Cursor. SHA-256 drift detection catches stale exports in CI. [Host exports](docs/reference/host-exports.md)
2009
2781
 
@@ -2011,26 +2783,24 @@ Run `wazir capture usage` at the end of a session to see the savings:
2011
2783
 
2012
2784
  ## Compared to Other Tools
2013
2785
 
2014
- The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Research shows this approach has a cost: tool selection accuracy drops to 13.6% when models face too many tools (Gan & Sun, 2025), and 20 tools can consume 62% of an 8k context window before the task even begins (PromptForward, 2025).
2786
+ The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Not every project needs 14 phases. For a weekend hack, prompting is fine. For production, you want structure.
2015
2787
 
2016
- Wazir takes a different path: one integrated operating model instead of many independent plugins.
2017
2788
 
2018
- | Dimension | Wazir | [Superpowers](https://github.com/obra/superpowers) | [Spec-Kit](https://github.com/github/spec-kit) | [Micro-Agent](https://github.com/BuilderIO/micro-agent) | [Distill](https://github.com/samuelfaj/distill) | [Claude-Mem](https://github.com/thedotmack/claude-mem) | [OMC](https://github.com/yeachan-heo/oh-my-claudecode) |
2019
- |---|---|---|---|---|---|---|---|
2020
- | **Category** | Engineering OS | Skills framework | Spec toolkit | Code gen agent | Output compressor | Memory plugin | Orchestration layer |
2021
- | **Scope** | Full lifecycle (14 phases) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
2022
- | **Enforced roles** | 10 canonical, contractual | None (skills only) | None | None | None | None | 32 agents (behavioral) |
2023
- | **Phase model** | 14 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
2024
- | **Adversarial review** | 3 gate phases | Code review skill | No | No | No | No | team-verify step |
2025
- | **Context management** | L0/L1 tiered recall | None | None | None | LLM compression | Vector DB (ChromaDB) | Token routing |
2026
- | **Schema validation** | 18 JSON schemas | No | No | No | No | No | No |
2027
- | **Guardrails** | 7 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
2028
- | **External deps** | None (host-native) | None (prompt-only) | Python CLI | Node.js CLI | Node.js + LLM | ChromaDB, SQLite, Bun | tmux, exp. teams API |
2029
- | **Host support** | Claude, Codex, Gemini, Cursor | Claude, Codex, Gemini, Cursor, OpenCode | Claude, Copilot, Gemini | Any LLM provider | Any LLM | Claude Code only | Claude Code (+ workers) |
2789
+ | Dimension | Wazir | [Superpowers](https://github.com/obra/superpowers) | [Spec-Kit](https://github.com/github/spec-kit) | [Micro-Agent](https://github.com/BuilderIO/micro-agent) | [Distill](https://github.com/samuelfaj/distill) | [Claude-Mem](https://github.com/thedotmack/claude-mem) | [OMC](https://github.com/yeachan-heo/oh-my-claudecode) |
2790
+ | ---------------------- | ----------------------------- | -------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
2791
+ | **Category** | Engineering OS | Skills framework | Spec toolkit | Code gen agent | Output compressor | Memory plugin | Orchestration layer |
2792
+ | **Scope** | Full lifecycle (14 phases) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
2793
+ | **Enforced roles** | 10 canonical, contractual | None (skills only) | None | None | None | None | 32 agents (behavioral) |
2794
+ | **Phase model** | 14 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
2795
+ | **Adversarial review** | 3 gate phases | Code review skill | No | No | No | No | team-verify step |
2796
+ | **Context management** | L0/L1 tiered recall | None | None | None | LLM compression | Vector DB (ChromaDB) | Token routing |
2797
+ | **Schema validation** | 19 JSON schemas | No | No | No | No | No | No |
2798
+ | **Guardrails** | 7 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
2799
+ | **External deps** | None (host-native) | None (prompt-only) | Python CLI | Node.js CLI | Node.js + LLM | ChromaDB, SQLite, Bun | tmux, exp. teams API |
2800
+ | **Host support** | Claude, Codex, Gemini, Cursor | Claude, Codex, Gemini, Cursor, OpenCode | Claude, Copilot, Gemini | Any LLM provider | Any LLM | Claude Code only | Claude Code (+ workers) |
2030
2801
 
2031
- Each of these tools solves a real problem. Wazir's approach is to solve them together -- one system, shared context, structural enforcement -- instead of asking developers to wire separate plugins into a coherent workflow.
2032
2802
 
2033
- **Research sources:** [RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection](https://arxiv.org/abs/2505.03275) (Gan & Sun, 2025). [MCP Overload: Why Your LLM Agent Doesn't Need 20 Tools](https://promptforward.dev/blog/mcp-overload) (PromptForward, 2025). [Less is More: Optimizing Function Calling for LLM Execution](https://arxiv.org/abs/2411.15399) (Paramanayakam et al., 2024). [Tool RAG: The Next Breakthrough in Scalable AI Agents](https://next.redhat.com/2025/11/26/tool-rag-the-next-breakthrough-in-scalable-ai-agents/) (Red Hat, 2025).
2803
+ Each of these tools solves a real problem. Wazir's approach is to solve them together -- one system, shared context, structural enforcement -- instead of asking developers to wire separate plugins into a coherent workflow.
2034
2804
 
2035
2805
  ---
2036
2806
 
@@ -2043,63 +2813,58 @@ Each of these tools solves a real problem. Wazir's approach is to solve them tog
2043
2813
  /plugin install wazir
2044
2814
  ```
2045
2815
 
2046
- The plugin loads skills, roles, and workflows into your Claude sessions. Done.
2816
+ The plugin loads skills, roles, and workflows into your Claude sessions. Then type `/wazir` and go.
2047
2817
 
2048
2818
  **npm / Homebrew:**
2049
2819
 
2050
2820
  ```bash
2051
- npm install -g @wazir-dev/cli # npm
2052
- brew tap MohamedAbdallah-14/Wazir && brew install wazir # Homebrew
2821
+ npm install -g @wazir-dev/cli # npm
2822
+ brew tap MohamedAbdallah-14/homebrew-wazir && brew install wazir # Homebrew
2053
2823
  ```
2054
2824
 
2055
- **Deploy to your project:**
2056
-
2057
- | Host | Command |
2058
- |------|---------|
2059
- | **Claude** | `cp -r exports/hosts/claude/.claude ~/your-project/ && cp exports/hosts/claude/CLAUDE.md ~/your-project/` |
2060
- | **Codex** | `cp exports/hosts/codex/AGENTS.md ~/your-project/` |
2061
- | **Gemini** | `cp exports/hosts/gemini/GEMINI.md ~/your-project/` |
2062
- | **Cursor** | `cp -r exports/hosts/cursor/.cursor ~/your-project/` |
2063
-
2064
- > npm/Homebrew users: clone the source and run `npx wazir export build` to generate host exports. See [Installation Guide](docs/getting-started/01-installation.md) for the full path.
2065
-
2066
2825
  ---
2067
2826
 
2068
2827
  ## Documentation
2069
2828
 
2070
2829
  **For users:**
2071
2830
 
2072
- | I want to... | Go to |
2073
- |---|---|
2074
- | Install and get started | [Installation](docs/getting-started/01-installation.md) |
2075
- | Run my first task | [First Run](docs/getting-started/02-first-run.md) |
2076
- | Understand the architecture | [Architecture](docs/concepts/architecture.md) |
2831
+
2832
+ | I want to... | Go to |
2833
+ | ------------------------------- | --------------------------------------------------------- |
2834
+ | Install and get started | [Installation](docs/getting-started/01-installation.md) |
2835
+ | Run my first task | [First Run](docs/getting-started/02-first-run.md) |
2836
+ | Understand the architecture | [Architecture](docs/concepts/architecture.md) |
2077
2837
  | Learn about roles and workflows | [Roles & Workflows](docs/concepts/roles-and-workflows.md) |
2078
2838
 
2839
+
2079
2840
  **For contributors:**
2080
2841
 
2081
- | I want to... | Go to |
2082
- |---|---|
2083
- | Set up for development | [CONTRIBUTING.md](CONTRIBUTING.md) |
2084
- | Look up CLI commands | [CLI Reference](docs/reference/tooling-cli.md) |
2085
- | Configure the manifest | [Configuration Reference](docs/reference/configuration-reference.md) |
2086
- | Browse all documentation | [Documentation Hub](docs/README.md) |
2842
+
2843
+ | I want to... | Go to |
2844
+ | ------------------------ | -------------------------------------------------------------------- |
2845
+ | Set up for development | [CONTRIBUTING.md](CONTRIBUTING.md) |
2846
+ | Look up CLI commands | [CLI Reference](docs/reference/tooling-cli.md) |
2847
+ | Configure the manifest | [Configuration Reference](docs/reference/configuration-reference.md) |
2848
+ | Browse all documentation | [Documentation Hub](docs/README.md) |
2849
+
2087
2850
 
2088
2851
  ---
2089
2852
 
2090
2853
  ## Project Status
2091
2854
 
2092
- Wazir is in active early development (**v0.1.0**, pre-1.0-alpha).
2855
+ Wazir is in active early development (pre-1.0-alpha).
2093
2856
 
2094
2857
  The pipeline, roles, and expertise modules are stable and used in production by the maintainers. The CLI, schemas, and hook contracts work. But this is early software -- APIs may change before 1.0.
2095
2858
 
2096
2859
  What's solid:
2860
+
2097
2861
  - The 14-phase pipeline and 10 role contracts
2098
- - 261 expertise modules across 12 domains
2862
+ - 268 expertise modules across 12 domains
2099
2863
  - Host exports for Claude, Codex, Gemini, and Cursor
2100
2864
  - The composition engine and tiered recall system
2101
2865
 
2102
2866
  What may change:
2867
+
2103
2868
  - CLI command surface and flags
2104
2869
  - Schema field names
2105
2870
  - Hook contract signatures
@@ -2109,6 +2874,14 @@ Feedback and contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).
2109
2874
 
2110
2875
  ---
2111
2876
 
2877
+ ## Why "Wazir"?
2878
+
2879
+ Wazir (وزير) -- the vizier. The operational mastermind who ran empires while the sultan held authority. In Arabic chess, the wazir became the queen: the most powerful piece on the board.
2880
+
2881
+ The Arabic word *itqan* (إتقان) means mastery -- doing something so well that nothing remains to improve. This isn't a tagline. It's the test every commit runs against.
2882
+
2883
+ ---
2884
+
2112
2885
  ## Acknowledgments
2113
2886
 
2114
2887
  Wazir builds on ideas and patterns from these projects:
@@ -2120,6 +2893,7 @@ Wazir builds on ideas and patterns from these projects:
2120
2893
  - **[micro-agent](https://github.com/BuilderIO/micro-agent)** by Builder.io -- test-driven code generation patterns
2121
2894
  - **[distill](https://github.com/samuelfaj/distill)** by [@samuelfaj](https://github.com/samuelfaj) -- CLI output compression for token savings
2122
2895
  - **[claude-mem](https://github.com/thedotmack/claude-mem)** by [@thedotmack](https://github.com/thedotmack) -- persistent memory patterns for coding agents
2896
+ - **[ideation](https://github.com/bladnman/ideation_team_skill)** by [@bladnman](https://github.com/bladnman) -- multi-agent structured dialogue patterns
2123
2897
 
2124
2898
  ---
2125
2899
 
@@ -2132,7 +2906,6 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, branch conventions
2132
2906
  ## License
2133
2907
 
2134
2908
  MIT -- see [LICENSE](LICENSE).
2135
-
2136
2909
  ---
2137
2910
  ## Source: CONTRIBUTING.md
2138
2911