@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/CHANGELOG.md +74 -10
  2. package/README.md +15 -15
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/roles-and-workflows.md +2 -0
  9. package/docs/concepts/why-wazir.md +59 -0
  10. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  11. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  12. package/docs/readmes/INDEX.md +21 -5
  13. package/docs/readmes/features/expertise/README.md +2 -2
  14. package/docs/readmes/features/exports/README.md +2 -2
  15. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  16. package/docs/readmes/features/schemas/README.md +3 -0
  17. package/docs/readmes/features/skills/README.md +17 -0
  18. package/docs/readmes/features/skills/clarifier.md +5 -0
  19. package/docs/readmes/features/skills/claude-cli.md +5 -0
  20. package/docs/readmes/features/skills/codex-cli.md +5 -0
  21. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  22. package/docs/readmes/features/skills/executing-plans.md +5 -0
  23. package/docs/readmes/features/skills/executor.md +5 -0
  24. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  25. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  26. package/docs/readmes/features/skills/humanize.md +5 -0
  27. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  28. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  29. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  30. package/docs/readmes/features/skills/reviewer.md +5 -0
  31. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  32. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  33. package/docs/readmes/features/skills/wazir.md +5 -0
  34. package/docs/readmes/features/skills/writing-skills.md +5 -0
  35. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  36. package/docs/reference/configuration-reference.md +47 -6
  37. package/docs/reference/hooks.md +1 -0
  38. package/docs/reference/launch-checklist.md +4 -4
  39. package/docs/reference/review-loop-pattern.md +119 -9
  40. package/docs/reference/roles-reference.md +1 -0
  41. package/docs/reference/skill-tiers.md +147 -0
  42. package/docs/reference/tooling-cli.md +3 -1
  43. package/docs/truth-claims.yaml +12 -0
  44. package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
  45. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  46. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  47. package/exports/hosts/claude/.claude/settings.json +9 -0
  48. package/exports/hosts/claude/CLAUDE.md +1 -1
  49. package/exports/hosts/claude/export.manifest.json +6 -4
  50. package/exports/hosts/claude/host-package.json +3 -1
  51. package/exports/hosts/codex/AGENTS.md +1 -1
  52. package/exports/hosts/codex/export.manifest.json +6 -4
  53. package/exports/hosts/codex/host-package.json +3 -1
  54. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  55. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  56. package/exports/hosts/cursor/export.manifest.json +6 -4
  57. package/exports/hosts/cursor/host-package.json +3 -1
  58. package/exports/hosts/gemini/GEMINI.md +1 -1
  59. package/exports/hosts/gemini/export.manifest.json +6 -4
  60. package/exports/hosts/gemini/host-package.json +3 -1
  61. package/hooks/context-mode-router +191 -0
  62. package/hooks/definitions/context_mode_router.yaml +19 -0
  63. package/hooks/hooks.json +31 -6
  64. package/hooks/protected-path-write-guard +8 -0
  65. package/hooks/routing-matrix.json +45 -0
  66. package/hooks/session-start +62 -1
  67. package/llms-full.txt +937 -134
  68. package/package.json +2 -4
  69. package/schemas/hook.schema.json +2 -1
  70. package/schemas/phase-report.schema.json +89 -0
  71. package/schemas/usage.schema.json +25 -1
  72. package/schemas/wazir-manifest.schema.json +19 -0
  73. package/skills/brainstorming/SKILL.md +32 -157
  74. package/skills/clarifier/SKILL.md +289 -111
  75. package/skills/claude-cli/SKILL.md +320 -0
  76. package/skills/codex-cli/SKILL.md +260 -0
  77. package/skills/debugging/SKILL.md +13 -0
  78. package/skills/design/SKILL.md +13 -0
  79. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  80. package/skills/executing-plans/SKILL.md +13 -0
  81. package/skills/executor/SKILL.md +139 -19
  82. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  83. package/skills/gemini-cli/SKILL.md +260 -0
  84. package/skills/humanize/SKILL.md +13 -0
  85. package/skills/init-pipeline/SKILL.md +72 -164
  86. package/skills/prepare-next/SKILL.md +81 -10
  87. package/skills/receiving-code-review/SKILL.md +13 -0
  88. package/skills/requesting-code-review/SKILL.md +13 -0
  89. package/skills/reviewer/SKILL.md +369 -24
  90. package/skills/run-audit/SKILL.md +13 -0
  91. package/skills/scan-project/SKILL.md +13 -0
  92. package/skills/self-audit/SKILL.md +217 -16
  93. package/skills/skill-research/SKILL.md +188 -0
  94. package/skills/subagent-driven-development/SKILL.md +13 -0
  95. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  96. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  97. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  98. package/skills/tdd/SKILL.md +13 -0
  99. package/skills/using-git-worktrees/SKILL.md +13 -0
  100. package/skills/using-skills/SKILL.md +13 -0
  101. package/skills/verification/SKILL.md +54 -3
  102. package/skills/wazir/SKILL.md +464 -381
  103. package/skills/writing-plans/SKILL.md +14 -1
  104. package/skills/writing-skills/SKILL.md +13 -0
  105. package/templates/artifacts/implementation-plan.md +3 -0
  106. package/templates/artifacts/tasks-template.md +133 -0
  107. package/templates/examples/phase-report.example.json +48 -0
  108. package/tooling/src/adapters/composition-engine.js +256 -0
  109. package/tooling/src/adapters/model-router.js +84 -0
  110. package/tooling/src/capture/command.js +41 -2
  111. package/tooling/src/capture/run-config.js +3 -1
  112. package/tooling/src/capture/store.js +56 -0
  113. package/tooling/src/capture/usage.js +106 -0
  114. package/tooling/src/capture/user-input.js +66 -0
  115. package/tooling/src/checks/ac-matrix.js +256 -0
  116. package/tooling/src/checks/command-registry.js +12 -0
  117. package/tooling/src/checks/docs-truth.js +1 -1
  118. package/tooling/src/checks/security-sensitivity.js +69 -0
  119. package/tooling/src/checks/skills.js +111 -0
  120. package/tooling/src/cli.js +31 -20
  121. package/tooling/src/commands/stats.js +161 -0
  122. package/tooling/src/commands/validate.js +5 -1
  123. package/tooling/src/export/compiler.js +33 -37
  124. package/tooling/src/gating/agent.js +145 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
  126. package/tooling/src/hooks/routing-logic.js +69 -0
  127. package/tooling/src/init/auto-detect.js +258 -0
  128. package/tooling/src/init/command.js +38 -170
  129. package/tooling/src/input/scanner.js +46 -0
  130. package/tooling/src/reports/command.js +103 -0
  131. package/tooling/src/reports/phase-report.js +323 -0
  132. package/tooling/src/state/command.js +160 -0
  133. package/tooling/src/state/db.js +287 -0
  134. package/tooling/src/status/command.js +58 -1
  135. package/tooling/src/verify/proof-collector.js +299 -0
  136. package/wazir.manifest.yaml +26 -14
  137. package/workflows/plan-review.md +3 -1
  138. package/workflows/verify.md +30 -1
package/CHANGELOG.md CHANGED
@@ -1,15 +1,9 @@
1
- # [1.1.0](https://github.com/MohamedAbdallah-14/Wazir/compare/v1.0.0...v1.1.0) (2026-03-18)
2
-
3
-
4
- ### Bug Fixes
5
-
6
- * address review findings — tests, Codex wiring, Teams, pipeline CLI integration ([0b03215](https://github.com/MohamedAbdallah-14/Wazir/commit/0b032150c4a7967ba070eccdced513f55343fc65))
7
- * CI changelog gate + CodeRabbit review findings ([0247941](https://github.com/MohamedAbdallah-14/Wazir/commit/024794136b7a44116ef2c4f5fcc23823bc72e7fc))
1
+ # [1.3.0](https://github.com/MohamedAbdallah-14/Wazir/compare/v1.2.0...v1.3.0) (2026-03-20)
8
2
 
9
3
 
10
4
  ### Features
11
5
 
12
- * add core review loop pattern across all pipeline phases ([aa4c1d8](https://github.com/MohamedAbdallah-14/Wazir/commit/aa4c1d8400e69ab4fe943043705a862f9e5861f3))
6
+ * 13 critical fixes + pipeline enforcement mechanisms ([#4](https://github.com/MohamedAbdallah-14/Wazir/issues/4)) ([9d0f3b0](https://github.com/MohamedAbdallah-14/Wazir/commit/9d0f3b0de63ace524bc48a513ab02ff377a9d354))
13
7
 
14
8
  # Changelog
15
9
 
@@ -19,18 +13,82 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/), and this
19
13
 
20
14
  ## [Unreleased]
21
15
 
16
+ ### Added
17
+ - Workflow completion enforcement — `validateRunCompletion()` ensures all enabled workflows complete before run finalizes (`wazir capture summary --complete`)
18
+ - Mandatory security gate — pattern-based diff scanner (`tooling/src/checks/security-sensitivity.js`) auto-adds 6 security review dimensions when auth/token/SQL/etc. patterns detected
19
+ - Three interaction modes: `auto` (overnight, Codex-required), `guided` (default), `interactive` (co-design) via `/wazir auto|interactive ...`
20
+ - User input capture — NDJSON logging of all user messages during a run (`tooling/src/capture/user-input.js`) with retention pruning
21
+ - Two-layer reasoning chain output — concise conversation triggers + detailed file output at `reasoning/phase-<name>-reasoning.md`
22
+ - Input Coverage dimension in self-audit (compares original input vs plan vs commits)
23
+ - Input Coverage dimension in plan-review (8th dimension, catches scope reduction)
24
+ - Two-level phase model — `parent_phase` and `workflow` fields in phase report schema, hierarchy display in `wazir status`
25
+ - CLI/context-mode enforcement — reviewer flags >5 direct reads without index query and large commands without context-mode
26
+ - Per-phase context savings display at phase boundaries via `wazir stats`
27
+ - Overnight skill research skill (`skills/skill-research/SKILL.md`) for competitive analysis against superpowers and other frameworks
28
+ - Anti-pattern docs: AP-23 (skipping enabled workflows), AP-24 (clarifier deciding scope without asking)
29
+
30
+ ### Changed
31
+ - Clarifier Phase 1A rewritten — research runs first, then informed question batches (3-7 per batch), every scope exclusion requires user confirmation
32
+ - Executor enforces one commit per task (hard rule, reviewer rejects multi-task batching)
33
+ - Per-phase savings display added to clarifier and executor phase boundaries
34
+
35
+ ### Fixed
36
+ - SQLite ExperimentalWarning suppressed via lazy dynamic imports in CLI entrypoint
37
+ - `--complete` flag properly parsed in `wazir capture summary`
38
+ - `validateRunCompletion` filters by `workflow_policy` (enabled workflows only), not full manifest list
39
+
40
+ ### Changed
41
+ - Restructured pipeline from 14 micro-phases to 4 main phases: Init, Clarifier, Executor, Final Review
42
+ - Removed depth and intent questions from pipeline init — depth defaults to standard (override via inline modifiers), intent inferred from request keywords
43
+ - Enabled learn + prepare-next workflows by default (part of Final Review phase)
44
+ - Renamed `phase_policy` to `workflow_policy` in run-config (legacy name still supported)
45
+ - Input directory (`input/`) now scanned automatically at startup
46
+ - Learning extraction with concrete proposal format in reviewer final mode
47
+ - Accepted learnings injected into clarifier context (top 10 by confidence, scope-matched)
48
+ - Prepare-next skill produces structured handoff document
49
+ - All pipeline checkpoints now use AskUserQuestion pattern instead of numbered lists
50
+ - Every pipeline phase outputs value-reporting text (before/after) explaining why the phase matters and what it found
51
+ - Review dimensions annotated with "catches:" descriptions explaining what class of bugs each dimension prevents
52
+
53
+ ### Removed
54
+ - `@inquirer/prompts` dependency and `--interactive` init path (always fails in non-TTY)
55
+ - All Agent Teams references (team_mode, parallel_backend, TeamCreate/SendMessage/TeamDelete, Free Thinker/Grounder/Synthesizer)
56
+
57
+ ### Fixed
58
+ - Router logs now write to manifest-derived state root instead of `_default` (Codex P1)
59
+ - Routing log replay scoped to current run via timestamp filtering (Codex P2)
60
+ - Index-query savings now computed from avoided bytes, not raw bytes (Codex P2)
61
+ - Index-query savings included in savings-ratio denominator (Codex P2)
62
+ - Cursor export now includes context-mode-router hook (Codex P2)
63
+ - SessionStart hook uses correct `database_path` key for index freshness check
64
+ - TabManager stop hook error documented as Claude Code internal (cannot fix from Wazir side)
65
+
22
66
  ### Added
23
67
  - Core review loop pattern across all pipeline phases with Codex CLI integration
24
68
  - `wazir capture loop-check` CLI subcommand with task-scoped cap tracking and run-config loader
25
- - `wazir init` interactive CLI command with arrow-key selection (depth, intent, teams, codex model)
69
+ - `wazir init` zero-config auto-init (no prompts, infer everything)
26
70
  - `docs/reference/review-loop-pattern.md` canonical reference for the review loop pattern
27
71
  - Standalone skills: `/wazir:clarifier`, `/wazir:executor`, `/wazir:reviewer`
28
- - Agent Teams real implementation in brainstorming (TeamCreate, SendMessage, TeamDelete)
29
72
  - Codex prompt templates (artifact + code) with "Do NOT load skills" instruction
73
+ - `tooling/src/verify/proof-collector.js` — detects project type (web/api/cli/library) and collects mechanical proof of implementation
74
+ - Phase reports wired into pipeline — `wazir report phase` called after each phase exit and displayed to user
75
+ - Proof-of-implementation in verify workflow — runnable vs non-runnable detection with evidence collection
30
76
  - Git branch enforcement in `/wazir` runner (validates branch, offers to create feature branch)
31
77
  - CLI wiring across pipeline phases (doctor gate, index build/refresh, capture events, validate gates)
32
78
  - CHANGELOG enforcement in executor and reviewer skills
33
79
  - 10 new tests: 7 for handleLoopCheck, 4 for init command (406 total)
80
+ - Spec-kit task template (`templates/artifacts/tasks-template.md`) with checklist format, phase structure, parallel markers, MVP strategy
81
+ - AC verification scaffold (`tooling/src/checks/ac-matrix.js`) — 111 automated acceptance criteria checks
82
+ - Context-mode detection in `wazir init` (3 core tools + optional execute_file under MCP prefix)
83
+ - Input preservation logic in clarifier (adopt input specs verbatim, never remove detail)
84
+ - Gap analysis exit gate in clarifier (invoke wz:reviewer --mode plan-review, fix-and-loop)
85
+ - Online research in clarifier Phase 0 (keyword extraction, fetch_and_index/WebFetch, error handling)
86
+ - Codex output context protection (tee + extract via execute_file, fail-closed fallback)
87
+ - Resume detection with staleness check and interactive checkpoint in /wazir runner
88
+ - Usage capture at every phase_exit event
89
+ - Run-scoped user feedback routing (plan corrections vs scope changes)
90
+ - Phase scoring with canonical dimension sets and quality delta reporting
91
+ - Full end-of-phase reports (7 sections: Summary, Key Changes, Quality Delta, Findings Log, Usage, Context Savings, Time Spent)
34
92
 
35
93
  ### Changed
36
94
  - All Codex CLI calls now read model from `config.multi_tool.codex.model` with fallback to `gpt-5.4`
@@ -41,3 +99,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/), and this
41
99
  - `/wazir` runner pipeline rewritten with all manifest phases and review loops
42
100
  - Wazir CLI is now required (removed "Skip" option)
43
101
  - Fixed pass counts: quick=3, standard=5, deep=7 (no extension)
102
+ - Clarifier now invokes `wz:reviewer --mode` explicitly instead of ad-hoc codex calls
103
+ - Fix-and-loop pattern: re-submission after fixes is mandatory, "fix and continue" prohibited
104
+ - Review loop escalation at cap: 3 user options (approve-with-issues, fix-manually, abort)
105
+ - CHANGELOG/gitflow hard gates before PR (validate changelog + validate commits)
106
+ - All checkpoints use numbered interactive options with (Recommended) markers
107
+ - Reviewer documents 5 owned responsibilities (Codex integration, dimensions, pass counting, attribution, dimension set recording)
package/README.md CHANGED
@@ -32,7 +32,7 @@
32
32
  I'm Mohamed Abdallah. I kept watching AI agents write confident code that broke in production, skip tests, and forget what we agreed on yesterday. So I stopped asking them to be better and built them an engineering department instead.
33
33
 
34
34
  **Wazir puts engineering discipline inside AI coding agents.**
35
- No wrapper. No server. Just structure -- inside Claude, Codex, Gemini, and Cursor. Built on 300+ research sources distilled into 261 curated expertise modules across 12 domains.
35
+ No wrapper. No server. Just structure -- inside Claude, Codex, Gemini, and Cursor. Built on 300+ research sources distilled into 315 curated expertise modules across 12 domains.
36
36
 
37
37
  ---
38
38
 
@@ -77,7 +77,7 @@ AI agents love to announce they're finished. Wazir doesn't care. Every phase loo
77
77
 
78
78
  ## The Pipeline
79
79
 
80
- Every task flows through 14 phases. Three are adversarial review gates that block progress until the reviewer explicitly approves. Rejection loops back to the authoring phase.
80
+ Every task flows through 15 workflows grouped into 4 phases. Three are adversarial review gates that block progress until the reviewer explicitly approves. Rejection loops back to the authoring phase.
81
81
 
82
82
  ```mermaid
83
83
  graph LR
@@ -122,9 +122,9 @@ Three concepts.
122
122
 
123
123
  **1 -- Roles are isolation boundaries, not personas.** Each of the 10 roles has defined inputs, allowed tools, required outputs, escalation rules, and failure conditions. An agent inside a role cannot write to protected paths, cannot skip required outputs, and must escalate when ambiguity conditions are met. The discipline is structural, not instructional. See [Roles & Workflows](docs/concepts/roles-and-workflows.md).
124
124
 
125
- **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 18 JSON schemas. See [Architecture](docs/concepts/architecture.md).
125
+ **2 -- Phases are artifact checkpoints, not conversation stages.** Every phase consumes a named artifact from the previous phase and produces a named artifact for the next. Nothing flows through conversation history. A session can end, a new agent can pick up the artifacts, and delivery continues. The handoff is explicit, structured, and schema-validated against 19 JSON schemas. See [Architecture](docs/concepts/architecture.md).
126
126
 
127
- **3 -- The composition engine loads the right expert automatically.** One agent pretending to be an expert in everything is an expert in nothing. A 4-layer system (always, auto, stacks, concerns) decides which of 261 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
127
+ **3 -- The composition engine loads the right expert automatically.** One agent pretending to be an expert in everything is an expert in nothing. A 4-layer system (always, auto, stacks, concerns) decides which of 315 expertise modules load into each role's context. The executor gets modules on how to build. The verifier gets modules on what to detect. The reviewer gets modules on what to flag. All resolved automatically from the task's declared stack and concerns. Max 15 modules per dispatch, token budget enforced.
128
128
 
129
129
  ---
130
130
 
@@ -169,17 +169,17 @@ Run `wazir capture usage` at the end of a session to see the savings:
169
169
 
170
170
  **10 canonical role contracts.** Clarifier, researcher, specifier, content-author, designer, planner, executor, verifier, reviewer, learner. Each has enforceable inputs, outputs, and escalation rules. [Roles reference](docs/reference/roles-reference.md)
171
171
 
172
- **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Nine hard approval gates span the 14-phase pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
172
+ **Adversarial review at three chokepoints.** Spec-challenge, plan-review, and final review run by the reviewer role, never the phase author. Nine hard approval gates span the 15-workflow pipeline. Nothing advances without explicit clearance. [Architecture](docs/concepts/architecture.md)
173
173
 
174
- **261 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. Wazir ships with 261. Yours could be next. [Expertise index](docs/reference/expertise-index.md)
174
+ **315 curated expertise modules across 12 domains.** Loaded selectively per role per phase via a 4-layer composition engine. Max 15 modules per dispatch, token budget enforced. Wazir ships with 315. Yours could be next. [Expertise index](docs/reference/expertise-index.md)
175
175
 
176
176
  **Three-tier recall for token savings.** L0 (~~100 tokens), L1 (~~500-2k tokens), direct read for full source. Symbol-first exploration searches the index before reading source. Capture routing redirects large tool output to files. Result: 60-80% token reduction on exploration-heavy phases, measured per-session by `wazir capture usage`. [Indexing and Recall](docs/concepts/indexing-and-recall.md)
177
177
 
178
178
  **Structured learning.** Proposed learnings require explicit review and scope tagging before promotion. Only learnings whose file patterns overlap the current task get injected into context. The system improves per-project without drifting.
179
179
 
180
- **7 hook contracts for structural guardrails.** These enforce protected path writes (exit 42), loop caps (exit 43), and session observability. [Hooks](docs/reference/hooks.md)
180
+ **8 hook contracts for structural guardrails.** These enforce protected path writes (exit 42), loop caps (exit 43), and session observability. [Hooks](docs/reference/hooks.md)
181
181
 
182
- **20+ callable skills.** `/wazir` runs the full pipeline. `/wazir audit security` runs a codebase audit. `/wazir prd` generates a product requirements document from completed runs. Plus TDD, verification, debugging, and more -- each enforcing an exact procedure with evidence at every step. [Skills](docs/reference/skills.md)
182
+ **28 callable skills.** `/wazir` runs the full pipeline. `/wazir audit security` runs a codebase audit. `/wazir prd` generates a product requirements document from completed runs. Plus TDD, verification, debugging, and more -- each enforcing an exact procedure with evidence at every step. [Skills](docs/reference/skills.md)
183
183
 
184
184
  **Built-in text humanization.** The composition engine loads domain-specific language rules per role: code rules for the executor (commit messages, comments), content rules for the content-author (microcopy, glossary), and technical-docs rules for the specifier, planner, reviewer, and learner. A 61-item vocabulary blacklist, 24-pattern sentence taxonomy, and two-pass self-audit checklist keep all output sounding like it was written by a person.
185
185
 
@@ -189,19 +189,19 @@ Run `wazir capture usage` at the end of a session to see the savings:
189
189
 
190
190
  ## Compared to Other Tools
191
191
 
192
- The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Not every project needs 14 phases. For a weekend hack, prompting is fine. For production, you want structure.
192
+ The AI coding tool space is fragmenting. Developers bolt together separate plugins for workflow management, specification, memory, output compression, and orchestration. Not every project needs 15 workflows. For a weekend hack, prompting is fine. For production, you want structure.
193
193
 
194
194
 
195
195
  | Dimension | Wazir | [Superpowers](https://github.com/obra/superpowers) | [Spec-Kit](https://github.com/github/spec-kit) | [Micro-Agent](https://github.com/BuilderIO/micro-agent) | [Distill](https://github.com/samuelfaj/distill) | [Claude-Mem](https://github.com/thedotmack/claude-mem) | [OMC](https://github.com/yeachan-heo/oh-my-claudecode) |
196
196
  | ---------------------- | ----------------------------- | -------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
197
197
  | **Category** | Engineering OS | Skills framework | Spec toolkit | Code gen agent | Output compressor | Memory plugin | Orchestration layer |
198
- | **Scope** | Full lifecycle (14 phases) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
198
+ | **Scope** | Full lifecycle (15 workflows) | Dev workflow (~20 skills) | Specify / Plan / Implement | Single-file TDD loop | CLI output compression | Session memory | Multi-agent orchestration |
199
199
  | **Enforced roles** | 10 canonical, contractual | None (skills only) | None | None | None | None | 32 agents (behavioral) |
200
- | **Phase model** | 14 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
200
+ | **Phase model** | 15 explicit, artifact-gated | 7-step (advisory) | 3-step | 1 (generate/test) | N/A | N/A | 5-step pipeline |
201
201
  | **Adversarial review** | 3 gate phases | Code review skill | No | No | No | No | team-verify step |
202
202
  | **Context management** | L0/L1 tiered recall | None | None | None | LLM compression | Vector DB (ChromaDB) | Token routing |
203
- | **Schema validation** | 18 JSON schemas | No | No | No | No | No | No |
204
- | **Guardrails** | 7 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
203
+ | **Schema validation** | 19 JSON schemas | No | No | No | No | No | No |
204
+ | **Guardrails** | 8 hook contracts | None | None | None | None | 5 hooks (memory) | Agent tracking |
205
205
  | **External deps** | None (host-native) | None (prompt-only) | Python CLI | Node.js CLI | Node.js + LLM | ChromaDB, SQLite, Bun | tmux, exp. teams API |
206
206
  | **Host support** | Claude, Codex, Gemini, Cursor | Claude, Codex, Gemini, Cursor, OpenCode | Claude, Copilot, Gemini | Any LLM provider | Any LLM | Claude Code only | Claude Code (+ workers) |
207
207
 
@@ -264,8 +264,8 @@ The pipeline, roles, and expertise modules are stable and used in production by
264
264
 
265
265
  What's solid:
266
266
 
267
- - The 14-phase pipeline and 10 role contracts
268
- - 261 expertise modules across 12 domains
267
+ - The 15-workflow pipeline and 10 role contracts
268
+ - 315 expertise modules across 12 domains
269
269
  - Host exports for Claude, Codex, Gemini, and Cursor
270
270
  - The composition engine and tiered recall system
271
271
 
@@ -0,0 +1,47 @@
1
+ {"version":3,"term":{"cols":387,"rows":85,"type":"xterm-256color","version":"Warp(v0.2026.03.04.08.20.stable_03)"},"timestamp":1773955554,"command":"bash assets/demo-script.sh","env":{"SHELL":"/bin/zsh"}}
2
+ [0.008, "o", "$ wazir doctor\r\n"]
3
+ [0.315, "o", "\u001b[1G"]
4
+ [0.000, "o", "\u001b[0K⠙"]
5
+ [0.081, "o", "\u001b[1G\u001b[0K⠹"]
6
+ [0.080, "o", "\u001b[1G\u001b[0K⠸"]
7
+ [0.082, "o", "\u001b[1G\u001b[0K⠼"]
8
+ [0.081, "o", "\u001b[1G\u001b[0K⠴"]
9
+ [0.080, "o", "\u001b[1G\u001b[0K⠦"]
10
+ [0.082, "o", "\u001b[1G\u001b[0K⠧"]
11
+ [0.082, "o", "\u001b[1G\u001b[0K⠇"]
12
+ [0.081, "o", "\u001b[1G\u001b[0K⠏"]
13
+ [0.080, "o", "\u001b[1G\u001b[0K⠋"]
14
+ [0.080, "o", "\u001b[1G\u001b[0K⠙"]
15
+ [0.025, "o", "\u001b[1G\u001b[0K"]
16
+ [0.187, "o", "(node:20008) ExperimentalWarning: SQLite is an experimental feature and might change at any time\r\n(Use `node --trace-warnings ...` to show where the warning was created)\r\n"]
17
+ [0.074, "o", "PASS manifest: Manifest is valid.\r\nPASS hooks: Hook definitions are valid.\r\nPASS state-root: /Users/mohamedabdallah/.wazir/projects/wazir stays outside the project root\r\nPASS host-exports: All required host export directories exist.\r\n"]
18
+ [0.004, "o", "\u001b[1G\u001b[0K⠙"]
19
+ [0.001, "o", "\u001b[1G\u001b[0K"]
20
+ [2.010, "o", "\r\n$ wazir export build\r\n"]
21
+ [0.287, "o", "\u001b[1G"]
22
+ [0.000, "o", "\u001b[0K⠙"]
23
+ [0.081, "o", "\u001b[1G\u001b[0K⠹"]
24
+ [0.081, "o", "\u001b[1G\u001b[0K⠸"]
25
+ [0.080, "o", "\u001b[1G\u001b[0K⠼"]
26
+ [0.081, "o", "\u001b[1G\u001b[0K⠴"]
27
+ [0.082, "o", "\u001b[1G\u001b[0K⠦"]
28
+ [0.012, "o", "\u001b[1G\u001b[0K"]
29
+ [0.164, "o", "(node:20085) ExperimentalWarning: SQLite is an experimental feature and might change at any time\r\n(Use `node --trace-warnings ...` to show where the warning was created)\r\n"]
30
+ [0.070, "o", "Generated host exports for claude, codex, gemini, cursor.\r\n"]
31
+ [0.004, "o", "\u001b[1G\u001b[0K⠙"]
32
+ [0.000, "o", "\u001b[1G\u001b[0K"]
33
+ [2.013, "o", "\r\n$ wazir index build\r\n"]
34
+ [0.304, "o", "\u001b[1G"]
35
+ [0.000, "o", "\u001b[0K⠙"]
36
+ [0.081, "o", "\u001b[1G\u001b[0K⠹"]
37
+ [0.080, "o", "\u001b[1G\u001b[0K⠸"]
38
+ [0.081, "o", "\u001b[1G\u001b[0K⠼"]
39
+ [0.081, "o", "\u001b[1G\u001b[0K⠴"]
40
+ [0.081, "o", "\u001b[1G\u001b[0K⠦"]
41
+ [0.081, "o", "\u001b[1G\u001b[0K⠧"]
42
+ [0.018, "o", "\u001b[1G\u001b[0K"]
43
+ [0.189, "o", "(node:20186) ExperimentalWarning: SQLite is an experimental feature and might change at any time\r\n(Use `node --trace-warnings ...` to show where the warning was created)\r\n"]
44
+ [1.042, "o", "Indexed 889 files, 7493 symbols, and 26395 outlines.\r\n"]
45
+ [0.005, "o", "\u001b[1G\u001b[0K⠙"]
46
+ [0.001, "o", "\u001b[1G\u001b[0K"]
47
+ [1.014, "x", "0"]
Binary file
@@ -0,0 +1,28 @@
1
+ # AP-23: Selectively Skipping Enabled Workflows Within a Phase
2
+
3
+ ## Pattern
4
+
5
+ An agent completes a phase but skips one or more enabled workflows. The run proceeds to completion without the skipped workflow's output. No error is raised because the phase gate only checks artifacts from explicitly required predecessors, not workflow-level completeness.
6
+
7
+ ## Example
8
+
9
+ The final review phase has three workflows: `review`, `learn`, `prepare_next`. The agent completes `review` and presents the verdict, but skips `learn` and `prepare_next`. The run is marked complete. No learnings are captured, no handoff document is produced.
10
+
11
+ ## Harm
12
+
13
+ - Learnings from the run are lost — the same mistakes repeat in future runs
14
+ - Handoff documents are missing — the next session starts without context
15
+ - Verification evidence is incomplete — claims cannot be audited
16
+ - The user believes the pipeline ran fully when it did not
17
+
18
+ ## Detection
19
+
20
+ `validateRunCompletion(runDir, manifestPath)` in `tooling/src/guards/phase-prerequisite-guard.js` checks that every workflow declared in `wazir.manifest.yaml` has a `phase_exit` event with `status: completed` in the run's `events.ndjson`.
21
+
22
+ `wazir capture summary --complete` calls this check and refuses to finalize the run if any enabled workflow was skipped.
23
+
24
+ ## Fix
25
+
26
+ 1. Always emit `phase_exit` events for every workflow: `wazir capture event --run <id> --event phase_exit --phase <workflow> --status completed`
27
+ 2. Use `wazir capture summary --complete` instead of bare `wazir capture summary` at run end
28
+ 3. The wazir pipeline skill checks completion before presenting final results
@@ -0,0 +1,34 @@
1
+ # AP-24: Clarifier Making Scope Decisions Without Asking
2
+
3
+ ## Pattern
4
+
5
+ The clarifier autonomously decides that certain items are "out of scope" without asking the user. This typically happens when the input doesn't explicitly mention something (e.g., documentation, i18n, testing strategy), and the clarifier assumes silence means exclusion.
6
+
7
+ ## Example
8
+
9
+ User input: "Build a user authentication system with OAuth2."
10
+
11
+ Clarifier produces: "Out of scope: documentation, i18n, rate limiting, password recovery."
12
+
13
+ The user never agreed to exclude any of these. The clarifier decided unilaterally.
14
+
15
+ ## Harm
16
+
17
+ - Items the user wanted are silently dropped
18
+ - The user sees the final output and assumes the pipeline covered everything
19
+ - 21 input items become 5 tasks because the clarifier excluded 16 without asking
20
+ - Trust in the pipeline erodes when users discover missing features after delivery
21
+
22
+ ## Detection
23
+
24
+ - Clarification document contains "out of scope" items that were never discussed with the user
25
+ - Plan has fewer tasks than distinct items in the original input
26
+ - Scope coverage guard (`evaluateScopeCoverageGuard`) flags plan < input items
27
+
28
+ ## Fix
29
+
30
+ 1. Research runs FIRST — the clarifier must have context before asking questions
31
+ 2. After research, ask INFORMED questions in batches of 3-7
32
+ 3. Every scope exclusion must reference an explicit user confirmation
33
+ 4. If the input is clear, zero questions is fine — but the clarifier must state "no ambiguities detected" rather than silently proceeding
34
+ 5. The clarification document must cite user responses for every scope boundary decision
@@ -10,7 +10,7 @@ Wazir is a host-native engineering OS kit. The host environment (Claude, Codex,
10
10
  | Workflows | Phase entrypoints that sequence roles through delivery |
11
11
  | Skills | Reusable procedures (wz:tdd, wz:debugging, wz:verification, wz:brainstorming) |
12
12
  | Hooks | Guardrails enforcing protected paths, loop caps, and capture routing |
13
- | Expertise | 308 curated knowledge modules composed into agent prompts |
13
+ | Expertise | 315 curated knowledge modules composed into agent prompts |
14
14
  | Templates | Artifact templates for phase outputs and handoff |
15
15
  | Schemas | Validation schemas for manifest, hooks, artifacts, and exports |
16
16
  | Exports | Generated host packages tailored per supported host |
@@ -31,6 +31,8 @@ The canonical workflow sequence is:
31
31
  13. **learn** — capture scoped learnings
32
32
  14. **prepare-next** — produce a clean handoff for the next run
33
33
 
34
+ Additionally, **run-audit** is a standalone workflow that can be invoked outside the linear pipeline to perform structured codebase audits with source-backed findings.
35
+
34
36
  ## Role routing
35
37
 
36
38
  The orchestrator dispatches three roles per task: `executor`, `reviewer`, and `verifier`. By default, all three run for every task. The `required_roles` field in a task's YAML frontmatter controls which roles are dispatched, allowing the orchestrator to skip unnecessary roles and save context window budget.
@@ -0,0 +1,59 @@
1
+ # Why Wazir
2
+
3
+ What makes Wazir the best engineering OS you can add to an AI coding agent.
4
+
5
+ ## 1. Measure Twice, Cut Once
6
+
7
+ Wazir clarifies before coding. The pipeline forces research, spec hardening, design review, and plan approval before a single line of implementation code is written. Most AI agents jump straight to code and fix mistakes after. Wazir prevents the mistakes.
8
+
9
+ ## 2. Deep Research
10
+
11
+ Every AI agent knows how to research. Users don't ask them to. Wazir makes research a mandatory phase — the researcher role scans the codebase, fetches external sources, and produces a research brief before clarification begins. The agent starts informed, not guessing.
12
+
13
+ ## 3. Clarifier + Task Planning
14
+
15
+ A structured clarification pipeline turns vague requests into measurable specs. Spec hardening catches ambiguity, missing constraints, and untestable acceptance criteria before they become bugs. Task planning produces execution-grade task specs — not TODO lists.
16
+
17
+ ## 4. Content Author
18
+
19
+ A dedicated role for any content need — database seeding, sample content, test fixtures, translations, copy, email templates, notification text. Most AI agents treat content as an afterthought bolted onto code tasks. Wazir gives content its own phase with editorial standards, i18n awareness, and humanization rules.
20
+
21
+ ## 5. Self-Audit
22
+
23
+ The agent audits its own work in an isolated git worktree. Validates, finds structural issues, fixes what it can, verifies the fixes, and only merges on all-green. 5-loop cycle with convergence detection. Protected-path safety rails prevent the agent from modifying its own identity-defining files. Safe self-improvement.
24
+
25
+ ## 6. Composer
26
+
27
+ 315 curated expertise modules across 12 domains. The composition engine assembles task-specific agents by loading the right expertise for each role, stack, and concern. The executor building a Flutter RTL app gets Flutter patterns, RTL layout rules, and mobile antipatterns composed into its context. The reviewer gets the corresponding antipattern catalog. Every dispatched agent is a specialist, not a generalist pretending.
28
+
29
+ ## 7. Review Loops
30
+
31
+ Multi-pass adversarial review at every pipeline checkpoint — not a single rubber-stamp at the end. Research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions. Findings are resolved before advancing. The reviewer is an adversary, not a cheerleader.
32
+
33
+ ## 8. Continuous Learning
34
+
35
+ Wazir evolves from its own mistakes. Review findings, audit findings, and user corrections feed into a learning system. Recurring issues become accepted learnings injected into future runs. A drift budget prevents learned behavior from diverging too far from the original design. The agent that builds your 10th feature is better than the one that built your 1st.
36
+
37
+ ## 9. Antipatterns
38
+
39
+ A first-class antipattern catalog loaded into reviewer context BEFORE domain expertise. Catches AI-specific failure modes: fake completion, unwired abstractions, shallow tests, security theater, architecture drift. The reviewer's first lens is "what could go wrong" — not "does this look right."
40
+
41
+ ## 10. Multi-Host
42
+
43
+ One canonical source, four host exports. Wazir works on Claude Code, Codex, Gemini, and Cursor from a single `wazir export build`. Roles, workflows, skills, and expertise are written once and compiled into each host's native format. Switch hosts without rewriting your engineering process.
44
+
45
+ ## 11. Context Efficiency
46
+
47
+ AI agents waste most of their context window on brute-force file reads and verbose command output. Wazir's routing hook auto-routes large commands through context-mode. The index provides symbol-first exploration — query first, read only what's needed. Capture routing redirects large output to files. Result: 60-80% token reduction on exploration-heavy phases. The agent thinks more, reads less.
48
+
49
+ ## 12. Verification Before Completion
50
+
51
+ No success claims without evidence. The verify phase produces deterministic proof — test results, lint output, type-check results — not "I believe it works." Every completion claim is backed by a command that was actually run and output that was actually checked. Evidence before assertions, always.
52
+
53
+ ## 13. Gating Agent
54
+
55
+ Autonomous phase transition decisions. After each phase, a gating agent reads the phase report and decides: continue (all gates pass), loop back (specific failures with fix paths), or escalate to human (ambiguous trade-offs, scope changes). Default posture: escalate. The pipeline doesn't blindly advance — it stops when it should stop.
56
+
57
+ ## 14. Humanize
58
+
59
+ Anti-AI-writing patterns across all text output. A vocabulary blacklist, domain-specific rules, and a self-audit checklist ensure that specs, plans, code comments, commit messages, and documentation read like they were written by a human engineer — not generated by an LLM. Because AI-sounding output erodes trust.