@wazir-dev/cli 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (163) hide show
  1. package/CHANGELOG.md +100 -2
  2. package/README.md +6 -6
  3. package/docs/concepts/architecture.md +1 -1
  4. package/docs/concepts/roles-and-workflows.md +2 -0
  5. package/docs/concepts/why-wazir.md +59 -0
  6. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  7. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  8. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
  9. package/docs/readmes/INDEX.md +21 -5
  10. package/docs/readmes/features/expertise/README.md +2 -2
  11. package/docs/readmes/features/exports/README.md +2 -2
  12. package/docs/readmes/features/schemas/README.md +3 -0
  13. package/docs/readmes/features/skills/README.md +17 -0
  14. package/docs/readmes/features/skills/clarifier.md +5 -0
  15. package/docs/readmes/features/skills/claude-cli.md +5 -0
  16. package/docs/readmes/features/skills/codex-cli.md +5 -0
  17. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  18. package/docs/readmes/features/skills/executing-plans.md +5 -0
  19. package/docs/readmes/features/skills/executor.md +5 -0
  20. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  21. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  22. package/docs/readmes/features/skills/humanize.md +5 -0
  23. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  24. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  25. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  26. package/docs/readmes/features/skills/reviewer.md +5 -0
  27. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  28. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  29. package/docs/readmes/features/skills/wazir.md +5 -0
  30. package/docs/readmes/features/skills/writing-skills.md +5 -0
  31. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  32. package/docs/reference/configuration-reference.md +47 -6
  33. package/docs/reference/launch-checklist.md +4 -4
  34. package/docs/reference/review-loop-pattern.md +538 -0
  35. package/docs/reference/roles-reference.md +1 -0
  36. package/docs/reference/skill-tiers.md +147 -0
  37. package/docs/reference/tooling-cli.md +5 -1
  38. package/docs/truth-claims.yaml +18 -0
  39. package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
  40. package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
  41. package/exports/hosts/claude/.claude/agents/designer.md +3 -0
  42. package/exports/hosts/claude/.claude/agents/executor.md +2 -0
  43. package/exports/hosts/claude/.claude/agents/planner.md +3 -0
  44. package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
  45. package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
  46. package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
  47. package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
  48. package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
  49. package/exports/hosts/claude/.claude/commands/design.md +4 -0
  50. package/exports/hosts/claude/.claude/commands/discover.md +4 -0
  51. package/exports/hosts/claude/.claude/commands/execute.md +4 -0
  52. package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
  53. package/exports/hosts/claude/.claude/commands/plan.md +4 -0
  54. package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
  55. package/exports/hosts/claude/.claude/commands/specify.md +4 -0
  56. package/exports/hosts/claude/.claude/commands/verify.md +4 -0
  57. package/exports/hosts/claude/.claude/settings.json +9 -0
  58. package/exports/hosts/claude/CLAUDE.md +1 -1
  59. package/exports/hosts/claude/export.manifest.json +22 -20
  60. package/exports/hosts/claude/host-package.json +3 -1
  61. package/exports/hosts/codex/AGENTS.md +1 -1
  62. package/exports/hosts/codex/export.manifest.json +22 -20
  63. package/exports/hosts/codex/host-package.json +3 -1
  64. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  65. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  66. package/exports/hosts/cursor/export.manifest.json +22 -20
  67. package/exports/hosts/cursor/host-package.json +3 -1
  68. package/exports/hosts/gemini/GEMINI.md +1 -1
  69. package/exports/hosts/gemini/export.manifest.json +22 -20
  70. package/exports/hosts/gemini/host-package.json +3 -1
  71. package/hooks/context-mode-router +191 -0
  72. package/hooks/definitions/context_mode_router.yaml +19 -0
  73. package/hooks/definitions/loop_cap_guard.yaml +1 -1
  74. package/hooks/hooks.json +43 -0
  75. package/hooks/protected-path-write-guard +8 -0
  76. package/hooks/routing-matrix.json +45 -0
  77. package/hooks/session-start +62 -1
  78. package/llms-full.txt +905 -132
  79. package/package.json +3 -3
  80. package/roles/clarifier.md +3 -0
  81. package/roles/designer.md +3 -0
  82. package/roles/executor.md +2 -0
  83. package/roles/planner.md +3 -0
  84. package/roles/researcher.md +2 -0
  85. package/roles/reviewer.md +5 -1
  86. package/roles/specifier.md +3 -0
  87. package/schemas/hook.schema.json +2 -1
  88. package/schemas/phase-report.schema.json +80 -0
  89. package/schemas/usage.schema.json +25 -1
  90. package/schemas/wazir-manifest.schema.json +19 -0
  91. package/skills/brainstorming/SKILL.md +20 -56
  92. package/skills/clarifier/SKILL.md +243 -0
  93. package/skills/claude-cli/SKILL.md +320 -0
  94. package/skills/codex-cli/SKILL.md +260 -0
  95. package/skills/debugging/SKILL.md +24 -1
  96. package/skills/design/SKILL.md +13 -0
  97. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  98. package/skills/executing-plans/SKILL.md +28 -2
  99. package/skills/executor/SKILL.md +129 -0
  100. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  101. package/skills/gemini-cli/SKILL.md +260 -0
  102. package/skills/humanize/SKILL.md +13 -0
  103. package/skills/init-pipeline/SKILL.md +76 -78
  104. package/skills/prepare-next/SKILL.md +81 -10
  105. package/skills/receiving-code-review/SKILL.md +21 -0
  106. package/skills/requesting-code-review/SKILL.md +38 -5
  107. package/skills/reviewer/SKILL.md +423 -0
  108. package/skills/run-audit/SKILL.md +13 -0
  109. package/skills/scan-project/SKILL.md +13 -0
  110. package/skills/self-audit/SKILL.md +197 -16
  111. package/skills/subagent-driven-development/SKILL.md +38 -2
  112. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  113. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  114. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  115. package/skills/tdd/SKILL.md +21 -0
  116. package/skills/using-git-worktrees/SKILL.md +13 -0
  117. package/skills/using-skills/SKILL.md +13 -0
  118. package/skills/verification/SKILL.md +13 -0
  119. package/skills/wazir/SKILL.md +286 -262
  120. package/skills/writing-plans/SKILL.md +44 -4
  121. package/skills/writing-skills/SKILL.md +13 -0
  122. package/templates/artifacts/implementation-plan.md +3 -0
  123. package/templates/artifacts/tasks-template.md +133 -0
  124. package/templates/examples/phase-report.example.json +48 -0
  125. package/templates/examples/wazir-manifest.example.yaml +1 -1
  126. package/tooling/src/adapters/composition-engine.js +256 -0
  127. package/tooling/src/adapters/model-router.js +84 -0
  128. package/tooling/src/capture/command.js +111 -2
  129. package/tooling/src/capture/run-config.js +23 -0
  130. package/tooling/src/capture/store.js +24 -0
  131. package/tooling/src/capture/usage.js +106 -0
  132. package/tooling/src/checks/ac-matrix.js +256 -0
  133. package/tooling/src/checks/brand-truth.js +3 -6
  134. package/tooling/src/checks/command-registry.js +13 -0
  135. package/tooling/src/checks/docs-truth.js +1 -1
  136. package/tooling/src/checks/runtime-surface.js +3 -7
  137. package/tooling/src/checks/skills.js +111 -0
  138. package/tooling/src/cli.js +17 -3
  139. package/tooling/src/commands/stats.js +161 -0
  140. package/tooling/src/commands/validate.js +5 -1
  141. package/tooling/src/export/compiler.js +33 -37
  142. package/tooling/src/gating/agent.js +145 -0
  143. package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
  144. package/tooling/src/hooks/routing-logic.js +69 -0
  145. package/tooling/src/init/auto-detect.js +260 -0
  146. package/tooling/src/init/command.js +161 -0
  147. package/tooling/src/input/scanner.js +46 -0
  148. package/tooling/src/reports/command.js +103 -0
  149. package/tooling/src/reports/phase-report.js +323 -0
  150. package/tooling/src/state/command.js +160 -0
  151. package/tooling/src/state/db.js +287 -0
  152. package/tooling/src/status/command.js +53 -1
  153. package/wazir.manifest.yaml +26 -17
  154. package/workflows/clarify.md +4 -0
  155. package/workflows/design-review.md +4 -0
  156. package/workflows/design.md +4 -0
  157. package/workflows/discover.md +4 -0
  158. package/workflows/execute.md +4 -0
  159. package/workflows/plan-review.md +4 -0
  160. package/workflows/plan.md +4 -0
  161. package/workflows/spec-challenge.md +4 -0
  162. package/workflows/specify.md +4 -0
  163. package/workflows/verify.md +4 -0
@@ -0,0 +1,300 @@
1
+ # Wazir Enhancement Decisions — 2026-03-19
2
+
3
+ Decisions agreed upon during brainstorming session. Each item is implementation-ready.
4
+
5
+ ## Research Summary
6
+
7
+ | # | Decision | Online Research Needed? | What to Research |
8
+ |---|----------|------------------------|------------------|
9
+ | 1 | Smart context-mode routing | No | Internal implementation — all info is local |
10
+ | 2 | Enforce wazir index | No | Internal tooling, no external deps |
11
+ | 3 | Enforce context-mode for large output | No | Internal, paired with #1 |
12
+ | 4 | Track context savings metrics | No | Metrics design, internal |
13
+ | 5 | Three-tier skill strategy | **Yes** | Blocked on R1 + R2 |
14
+ | 6 | Rich phase reports + gating agent | **Yes** | How other autonomous agent systems handle phase gating, agent self-evaluation patterns, prior art on LLM confidence calibration |
15
+ | 7 | Continuous learning + user input capture | **Yes** | Continuous learning in agent systems, feedback loop patterns, drift prevention, how reinforcement from human feedback is stored and applied |
16
+ | 8 | Autoresearch self-improvement | No | Already researched — decision made |
17
+ | 9 | Composer task-specific agents | **Yes** | Prompt composition patterns, multi-module prompt assembly, token budget strategies for large context injection |
18
+ | R1 | Superpowers skill audit | **Yes** | Latest superpowers GitHub, changelog, roadmap, community health, skill extensibility |
19
+ | R2 | Skill composition infrastructure | **Yes** | Claude Code plugin ecosystem patterns for skill chaining/extension, existing RFCs or discussions |
20
+
21
+ ---
22
+
23
+ ## Agreed
24
+
25
+ ### 1. Smart context-mode routing for Bash commands
26
+
27
+ **Decision:** Not full enforcement — smart routing via a PreToolUse:Bash hook.
28
+
29
+ - Small-output commands (git status, ls, short queries) pass through native Bash — no latency tax
30
+ - Known-large-output patterns (test runners, builds, logs, diffs, dependency trees) auto-route through `batch_execute`
31
+ - Threshold: commands whose output routinely exceeds ~30-50 lines
32
+ - Index + context-mode is the preferred research path: index for *where*, context-mode for *what*
33
+ - Skills should still be able to explicitly opt in/out when they know better than the heuristic
34
+
35
+ ### 2. Enforce wazir index for codebase exploration
36
+
37
+ **Decision:** All codebase exploration MUST use `wazir index` as the first step.
38
+
39
+ - Never spawn heavyweight exploration agents that brute-force read dozens of files
40
+ - Flow: `wazir index search-symbols` / `wazir recall` → locate targets → then read only what's needed
41
+ - Subagents and skills must query the index before falling back to direct file reads
42
+ - If no index exists, build one first (`wazir index build && wazir index summarize --tier all`)
43
+
44
+ ### 3. Enforce context-mode for large-output commands
45
+
46
+ **Decision:** Context-mode is mandatory for commands with routinely large output.
47
+
48
+ **Must use context-mode (`batch_execute` / `execute_file`):**
49
+ - Test runners (npm test, vitest, jest, pytest, etc.)
50
+ - Build commands (npm run build, tsc, etc.)
51
+ - Dependency trees (npm ls, pip list, etc.)
52
+ - Large git diffs (`git diff` with many files)
53
+ - Log tailing / large file reads (>50 lines)
54
+ - Linting / static analysis output
55
+ - CI/CD output parsing
56
+
57
+ **Pass through native Bash:**
58
+ - git status, git log (short), git branch
59
+ - ls, pwd, mkdir, cp, mv
60
+ - wazir CLI commands with short output (doctor, index build, capture)
61
+ - Any command known to produce <30 lines
62
+
63
+ ### 4. Track context savings metrics
64
+
65
+ **Decision:** Every index query and context-mode invocation must update a running usage counter.
66
+
67
+ - Track per-session: queries made, estimated tokens saved, bytes avoided in context
68
+ - Track per-tool breakdown: index lookups vs. `execute_file` vs. `batch_execute` vs. `fetch_and_index`
69
+ - Store in run state (e.g., `.wazir/runs/<id>/usage.json` or equivalent)
70
+ - Surface via `wazir status` or a dedicated `wazir stats` subcommand
71
+ - If no tracking mechanism exists yet, build one — we can't optimize what we don't measure
72
+
73
+ ### 5. Three-tier skill strategy — delegate, augment, own
74
+
75
+ **Decision:** Stop forking superpowers skills wholesale. Categorize each into one of three tiers:
76
+
77
+ | Tier | Strategy | Naming |
78
+ |------|----------|--------|
79
+ | **Delegate** | Use superpowers skill as-is. Delete Wazir fork. | `superpowers:<name>` only |
80
+ | **Augment** | Invoke superpowers skill + inject a Wazir `CONTEXT.md` addendum (additive only, no overrides) | `superpowers:<name>` invoked with Wazir context |
81
+ | **Own** | Wazir-original or structurally rewritten skill. Rename to avoid conflict with superpowers. | `wz:<unique-name>` only |
82
+
83
+ **Rules:**
84
+ - Augment addenda must be strictly additive — no "replace step N" or "ignore this part"
85
+ - Owned skills must have a distinct name from any superpowers skill to prevent dual-registration confusion
86
+ - Delegated skills: delete the Wazir `skills/<name>/` directory entirely
87
+
88
+ **Status:** Blocked on Research Phase (see below).
89
+
90
+ ---
91
+
92
+ ## Research Required
93
+
94
+ ### R1. Superpowers skill audit and tier classification
95
+
96
+ **Why this matters:**
97
+ We're choosing between maintaining our own forks (current approach — high maintenance, falls behind upstream) and delegating to a well-maintained plugin (lower maintenance, auto-updates, but less control). Getting this wrong means either: (a) we fork everything and slowly diverge from improvements the superpowers community ships, or (b) we delegate too much and lose Wazir-specific behavior that matters. This is an architectural decision that affects every future skill interaction, so it must be evidence-based.
98
+
99
+ **What the research must cover:**
100
+
101
+ 1. **Full superpowers skill inventory (online)**
102
+ - Fetch the latest superpowers plugin source (GitHub/marketplace) — don't rely on our cached v4.3.1, it may be outdated
103
+ - Document every skill: name, purpose, structure, key behaviors
104
+ - Check the superpowers changelog/releases for skill evolution pace — are these skills actively improved or stable?
105
+
106
+ 2. **Skill-by-skill diff analysis**
107
+ - For each superpowers skill that has a Wazir counterpart: what exactly did Wazir change?
108
+ - Classify each change as:
109
+ - **Additive** — Wazir adds context/tooling but doesn't contradict superpowers behavior (→ Augment tier candidate)
110
+ - **Structural** — Wazir rewrites core logic, steps, or output format (→ Own tier candidate)
111
+ - **Cosmetic** — just naming/formatting, no behavioral difference (→ Delegate tier candidate)
112
+
113
+ 3. **Superpowers skills with NO Wazir counterpart**
114
+ - Are there superpowers skills we're not using but should be?
115
+ - Are there skills we could delegate to that we're currently handling ad-hoc?
116
+
117
+ 4. **Community and maintenance posture**
118
+ - How frequently does superpowers publish updates?
119
+ - Is there a public roadmap or skill deprecation policy?
120
+ - Are there breaking changes between versions that would affect our augment addenda?
121
+ - What's the plugin's approach to skill extensibility — do they support context injection natively or is that something we'd need to build?
122
+
123
+ 5. **Skill composition patterns in the ecosystem**
124
+ - How do other projects handle "use plugin X's skill but with my context"?
125
+ - Is there an established pattern for skill chaining/augmentation in Claude Code plugins?
126
+ - What are the failure modes — prompt priority conflicts, version drift, context bloat?
127
+
128
+ 6. **Risk analysis**
129
+ - What happens if superpowers changes a skill we depend on in Augment tier?
130
+ - What's our rollback path if delegation breaks a workflow?
131
+ - How do we test that augmented skills still work after an upstream update?
132
+
133
+ **How to execute:**
134
+ - Online research: superpowers GitHub repo, marketplace listing, changelogs, issues, discussions
135
+ - Local analysis: diff every Wazir skill against its superpowers counterpart (cached + latest)
136
+ - Output: a classification table with tier assignment, rationale, and risk notes per skill
137
+
138
+ **Principle:** Do the right thing, not the easy thing. If the research shows we should own more skills than expected, we own them. If it shows we should delegate almost everything, we delegate. Follow the evidence.
139
+
140
+ ---
141
+
142
+ ### R2. Skill composition infrastructure design
143
+
144
+ **Why this matters:**
145
+ The Augment tier needs a mechanism to invoke an external skill with Wazir-specific context injected. A thin wrapper per skill is the easy path — but it recreates the maintenance problem we're trying to solve (one more file per skill that can drift). The right solution is a composition system that's declarative, testable, and resilient to upstream changes.
146
+
147
+ **What the research must cover:**
148
+
149
+ 1. **Composition model design**
150
+ - How should a composed skill be declared? Options:
151
+ - A manifest entry: `{ base: "superpowers:tdd", augment: "wazir-context/tdd.md" }`
152
+ - A skill resolver that chains skills at invocation time
153
+ - A hook-based approach: PostSkillLoad injects context automatically
154
+ - Which model keeps the augmentation visible and auditable (no hidden magic)?
155
+ - How does the composed skill appear in the skill list — as one entry or two?
156
+
157
+ 2. **Context injection semantics**
158
+ - Where does the Wazir context go relative to the base skill? Before? After? Interleaved?
159
+ - Prompt priority: if the base skill says "write output to X" and the context says "write output to Y", which wins? We need a clear rule, not ambiguity.
160
+ - How do we prevent addenda from accidentally overriding base behavior? (Lint rule? Structural constraint?)
161
+
162
+ 3. **Version pinning and drift detection**
163
+ - Should we pin the superpowers version we augment against?
164
+ - How do we detect when an upstream skill change breaks our addendum? (CI check? Hash comparison?)
165
+ - What's the upgrade path when superpowers ships a new version?
166
+
167
+ 4. **Testing surface**
168
+ - How do we test that a composed skill (base + addendum) produces the right behavior?
169
+ - Can we diff the resolved prompt to verify no conflicts?
170
+ - Should there be integration tests that run composed skills against known scenarios?
171
+
172
+ 5. **Ecosystem research (online)**
173
+ - How do other Claude Code plugins handle skill extension/composition?
174
+ - Are there existing RFCs, discussions, or patterns in the Claude Code plugin ecosystem for this?
175
+ - Does superpowers itself have any extension mechanism planned?
176
+
177
+ **Implementation: blocked on R1.** The number of skills landing in Augment tier determines whether this infrastructure is justified. If R1 shows ≤2 augmented skills, a simple approach may suffice. If ≥5, build it properly.
178
+
179
+ **Principle:** Design now, build after evidence. Don't over-engineer, don't under-engineer — right-size to R1 results.
180
+
181
+ ---
182
+
183
+ ## Under Discussion
184
+
185
+ ### 6. Rich phase reports + three-way gating agent
186
+
187
+ **Decision:** Two parts — rich reports and a gating agent with three possible outputs.
188
+
189
+ **Part 1: Rich phase reports**
190
+ - Current reports are too thin to be actionable. Rebuild them to include:
191
+ - What was attempted and what the outcome was
192
+ - What succeeded, what failed, what's uncertain
193
+ - Drift from original intent / spec
194
+ - Quality metrics (test results, coverage, lint, type-checking)
195
+ - Risk flags and open questions
196
+ - Decisions made and their rationale
197
+ - Reports saved to file for Wazir self-improvement and auditability
198
+
199
+ **Part 2: Gating agent (three-way output)**
200
+ - Agent receives: user's original input, the phase report, and accumulated decisions
201
+ - Agent outputs ONE of three verdicts:
202
+ - **Continue** — proceed to next phase
203
+ - **Loop back** — return to current phase with specific fixes
204
+ - **Escalate to human** — agent cannot decide, needs human judgment
205
+
206
+ **Explicit criteria (not vibes):**
207
+
208
+ | Verdict | Criteria |
209
+ |---------|----------|
210
+ | Continue | All quality gates pass, no drift from spec, no open risks, no ambiguous trade-offs |
211
+ | Loop back | Specific failures identified, actionable fix path exists, no human judgment needed |
212
+ | Escalate | Ambiguous trade-off, scope change detected, conflicting signals, confidence below threshold, or any situation where two reasonable people could disagree |
213
+
214
+ **Critical design constraint:** The escalation criteria must be explicit and err toward escalating. If not codified, the agent will almost never escalate — LLMs are bad at recognizing their own uncertainty. Default posture: **when in doubt, escalate.**
215
+
216
+ ---
217
+
218
+ ### 7. Restore continuous learning loop + capture all user input
219
+
220
+ **Decision (parked — will circle back):**
221
+
222
+ **Part 1: Continuous learning**
223
+ - The old Wazir implementation had a final step that applied learnings from each run to future runs
224
+ - This must be restored — every completed run should extract what worked, what failed, and what was learned, and feed it forward
225
+ - Learning is cumulative across runs, not just within a single run
226
+
227
+ **Part 2: User input as learning signal**
228
+ - ALL user input during a run must be saved (not just the final output)
229
+ - User corrections, approvals, rejections, feedback, and mid-run redirections are the highest-quality training signal
230
+ - This feeds both the continuous learning loop and the phase reports (decision #6)
231
+
232
+ **Status:** Parked. Will design after current discussion topics are resolved.
233
+
234
+ ---
235
+
236
+ ### 8. Autoresearch pattern for Wazir self-improvement
237
+
238
+ **Decision:** Use autoresearch loop on Wazir itself, but with strict identity boundaries.
239
+
240
+ **Core risk:** A self-modifying system running overnight can drift Wazir into a completely different project. Each change passes the metric, but after 100 iterations the project's identity is gone. Skills define what Wazir *is* — an agent must not rewrite them unsupervised.
241
+
242
+ **The line: if changing it changes what Wazir *does*, a human decides. If it makes Wazir do the same thing better, loop it.**
243
+
244
+ **CAN loop overnight (mechanical, identity-safe):**
245
+ - Test coverage — add tests, never rewrite behavior
246
+ - Bug fixes for known, specific, scoped issues
247
+ - Lint / type errors / code quality
248
+ - Performance — make existing behavior faster
249
+ - Export validation fixes
250
+ - Documentation gaps
251
+
252
+ **CANNOT loop overnight (identity-defining, human-gated):**
253
+ - Skill files
254
+ - Workflow definitions
255
+ - Architecture
256
+ - Role contracts
257
+ - Manifest schema
258
+ - Design docs / program.md
259
+
260
+ **Patterns to adopt from autoresearch:**
261
+ - Keep/discard via git revert → use in executor
262
+ - Mechanical metric requirement (measurable before/after) → enforce in phase reports
263
+ - STRIDE + OWASP structured audit loop → inform `wz:run-audit` design
264
+ - Scoped overnight runs with morning human review gate
265
+
266
+ **Implementation: Enhanced self-audit with bounded loop (5 iterations).**
267
+
268
+ - Enhance self-audit quality first (richer audit dimensions, better findings, smarter fixes)
269
+ - Then run it in a 5-loop cycle: each loop finds new issues exposed by the previous loop's fixes
270
+ - Bounded — no drift risk, human reviews the final branch
271
+ - Simpler than autoresearch integration, uses existing worktree isolation
272
+ - Priority: make each individual audit *good* — 5 loops of a strong audit beats 100 loops of a shallow one
273
+
274
+ **Open:** What specifically needs enhancing in self-audit before the loop is worthwhile?
275
+
276
+ ---
277
+
278
+ ### 9. Composer generates task-specific agents with full expertise in context
279
+
280
+ **Decision:** The composition engine must compose full expertise content into each dispatched agent's context — not just filenames or summaries.
281
+
282
+ **How it works:**
283
+ 1. Detect task stack + concerns (from project scan / user input)
284
+ 2. Resolve which expertise modules apply per role (composition-map.yaml — 4 layers: always → auto → stacks → concerns)
285
+ 3. **Compose the full content** of every resolved module into the agent's prompt
286
+ 4. Dispatch executor, reviewer, verifier — each with the complete relevant expertise internalized
287
+
288
+ **Key principle:** Loading expertise is additive, not restrictive. An agent with Flutter expertise loaded doesn't forget React — it additionally knows Flutter patterns and antipatterns. This is strictly better than a generic agent.
289
+
290
+ **What this means for the reviewer:** The reviewer gets the full antipattern catalog + domain-specific review dimensions composed into its context. It reviews against *everything it knows*, with task-specific expertise making it sharper, not narrower.
291
+
292
+ **Why this matters:** Expertise files are meaningless if they're not in the prompt. A filename reference or summary doesn't give the agent the actual knowledge. The full content must be in context for the agent to act on it.
293
+
294
+ **Open:** How does this interact with context window limits? The composition engine already enforces max 15 modules per dispatch with token budget — this constraint stays. The composer must be smart about what fits.
295
+
296
+ ---
297
+
298
+ ## Rejected
299
+
300
+ *(nothing yet)*
@@ -29,7 +29,7 @@ Before starting any implementation, verify all of the following:
29
29
  - [ ] **Node.js >= 20.0.0** installed
30
30
  - [ ] **`npm test` passes on the clean branch** with zero failures
31
31
  - [ ] **`wazir export --check` passes** on the clean branch (no pre-existing drift)
32
- - [ ] **All 13 task spec files reviewed** in `.agent-os/tasks/clarified/` (004-016)
32
+ - [ ] **All 13 task spec files reviewed** in `.wazir/tasks/clarified/` (004-016)
33
33
  - [ ] **`tooling/src/capture/command.js` imports confirmed:** `fs` (line 1) and `path` (line 2) are already imported -- no additional module imports needed for task 006
34
34
  - [ ] **`tooling/test/capture.test.js` fixture pattern confirmed:** `createCaptureFixture()` provides `fixtureRoot`, `stateRoot`, and `cleanup()` -- new tests must use unique run IDs
35
35
  - [ ] **`tooling/test/role-contracts.test.js` is in `test:active`** -- confirmed, so workflow and role structural tests can be added there without new test file registration
@@ -1,6 +1,6 @@
1
1
  # Wazir README Index
2
2
 
3
- > 60 world-class README files covering every feature, workflow, role, skill, hook, and package.
3
+ > 76 world-class README files covering every feature, workflow, role, skill, hook, and package.
4
4
 
5
5
  ## Main README
6
6
 
@@ -48,7 +48,7 @@
48
48
  ### Skills (`features/skills/`)
49
49
  | File | Description |
50
50
  |------|-------------|
51
- | [README.md](features/skills/README.md) | Skills system overview — all 11 skills, type table, invocation rules |
51
+ | [README.md](features/skills/README.md) | Skills system overview — all 28 skills, type table, invocation rules |
52
52
  | [using-skills.md](features/skills/using-skills.md) | Bootstrap skill — enforces skill-check-before-action |
53
53
  | [brainstorming.md](features/skills/brainstorming.md) | Design gate skill — ideas into designs before implementation |
54
54
  | [writing-plans.md](features/skills/writing-plans.md) | Plan production skill — specs into bite-sized task files |
@@ -60,6 +60,23 @@
60
60
  | [run-audit.md](features/skills/run-audit.md) | Run audit skill — 6-step interactive audit pipeline |
61
61
  | [self-audit.md](features/skills/self-audit.md) | Self-audit skill — worktree-isolated drift detection |
62
62
  | [prepare-next.md](features/skills/prepare-next.md) | Prepare next skill — clean handoff between sessions |
63
+ | [clarifier.md](features/skills/clarifier.md) | Clarifier skill — research, scope, design, specs pipeline |
64
+ | [executor.md](features/skills/executor.md) | Executor skill — TDD execution with quality gates |
65
+ | [reviewer.md](features/skills/reviewer.md) | Reviewer skill — adversarial review against spec and plan |
66
+ | [wazir.md](features/skills/wazir.md) | Wazir skill — one-command full pipeline |
67
+ | [init-pipeline.md](features/skills/init-pipeline.md) | Init pipeline skill — zero-config project setup |
68
+ | [executing-plans.md](features/skills/executing-plans.md) | Executing plans skill — session-isolated plan execution |
69
+ | [dispatching-parallel-agents.md](features/skills/dispatching-parallel-agents.md) | Parallel agents skill — dispatch independent tasks |
70
+ | [subagent-driven-development.md](features/skills/subagent-driven-development.md) | Subagent development skill — in-session parallel execution |
71
+ | [using-git-worktrees.md](features/skills/using-git-worktrees.md) | Git worktrees skill — isolated feature branches |
72
+ | [finishing-a-development-branch.md](features/skills/finishing-a-development-branch.md) | Branch finishing skill — merge, PR, or cleanup |
73
+ | [humanize.md](features/skills/humanize.md) | Humanize skill — remove AI writing patterns |
74
+ | [writing-skills.md](features/skills/writing-skills.md) | Writing skills skill — create and verify skills |
75
+ | [receiving-code-review.md](features/skills/receiving-code-review.md) | Receiving review skill — process feedback with rigor |
76
+ | [requesting-code-review.md](features/skills/requesting-code-review.md) | Requesting review skill — structured review requests |
77
+ | [claude-cli.md](features/skills/claude-cli.md) | Claude CLI skill — programmatic Claude Code usage |
78
+ | [codex-cli.md](features/skills/codex-cli.md) | Codex CLI skill — programmatic Codex usage |
79
+ | [gemini-cli.md](features/skills/gemini-cli.md) | Gemini CLI skill — programmatic Gemini usage |
63
80
 
64
81
  ### Hooks (`features/hooks/`)
65
82
  | File | Description |
@@ -77,8 +94,8 @@
77
94
  ### Other Features
78
95
  | File | Description |
79
96
  |------|-------------|
80
- | [expertise/README.md](features/expertise/README.md) | Expertise system — 308 modules across 11 domains |
81
- | [schemas/README.md](features/schemas/README.md) | Schema system — 16 JSON schemas for artifact validation |
97
+ | [expertise/README.md](features/expertise/README.md) | Expertise system — 268 modules across 12 domains |
98
+ | [schemas/README.md](features/schemas/README.md) | Schema system — 19 JSON schemas for artifact validation |
82
99
  | [tooling/README.md](features/tooling/README.md) | CLI tooling — all commands with options and examples |
83
100
  | [exports/README.md](features/exports/README.md) | Host exports — Claude, Codex, Gemini, Cursor packages |
84
101
 
@@ -88,7 +105,6 @@
88
105
  |------|---------|-------------|
89
106
  | [README.md](packages/README.md) | — | Package index with versions and reading order |
90
107
  | [ajv.md](packages/ajv.md) | `ajv@^8.18.0` | JSON Schema 2020-12 validation |
91
- | [gray-matter.md](packages/gray-matter.md) | `gray-matter@^4.0.3` | YAML frontmatter parsing for skill files |
92
108
  | [yaml.md](packages/yaml.md) | `yaml@^2.0.0` | YAML 1.2 serialization for manifests |
93
109
  | [node-test.md](packages/node-test.md) | `node:test` | Zero-dependency built-in test runner |
94
110
  | [context-mode.md](packages/context-mode.md) | `context-mode` plugin | Context compression for large outputs |
@@ -1,6 +1,6 @@
1
1
  # Expertise System
2
2
 
3
- Wazir's expertise system is a curated library of **308 knowledge modules** spanning
3
+ Wazir's expertise system is a curated library of **268 knowledge modules** spanning
4
4
  architecture, security, performance, design, and more. Modules are loaded selectively into
5
5
  agent prompts — giving the right knowledge to the right role at the right phase — without
6
6
  flooding context with irrelevant content.
@@ -167,5 +167,5 @@ produce plausible-looking output that silently fails to meet requirements.
167
167
  |---|---|
168
168
  | `expertise/index.yaml` | Machine-readable module registry with phase metadata |
169
169
  | `expertise/index.md` | Human-readable semantic map and reading guide |
170
- | `expertise/PROGRESS.md` | Authoring history: 32 modules, 255 total files, completion dates |
170
+ | `expertise/PROGRESS.md` | Authoring history: 32 research batches, completion dates |
171
171
  | `expertise/README.md` | Directory contract: what is and is not allowed here |
@@ -50,8 +50,8 @@ wazir export build
50
50
  3. Collect canonical sources
51
51
  ┌────────────────────────────────┐
52
52
  │ wazir.manifest.yaml │
53
- │ roles/*.md (9 role files)
54
- │ workflows/*.md (13 workflows) │
53
+ │ roles/*.md (10 role files)
54
+ │ workflows/*.md (15 workflows) │
55
55
  │ hooks/definitions/*.yaml │
56
56
  └────────────────────────────────┘
57
57
 
@@ -53,6 +53,9 @@ suite via schema-backed example fixtures in `templates/examples/`.
53
53
  | `docs-claim.schema.json` | Docs Truth Claim | `validate docs` + `docs/truth-claims.yaml` |
54
54
  | `proposed-learning.schema.json` | Proposed Learning | `learner` role output |
55
55
  | `accepted-learning.schema.json` | Accepted Learning | `learn` workflow approval |
56
+ | `author-artifact.schema.json` | Author Artifact | `content-author` role output |
57
+ | `usage.schema.json` | Usage Report | `capture usage` output |
58
+ | `phase-report.schema.json` | Phase Report | Phase completion summary |
56
59
 
57
60
  ---
58
61
 
@@ -14,11 +14,28 @@ Skills are the **operational layer** of Wazir. Where roles define who acts and w
14
14
  | [TDD](tdd.md) | `wz:tdd` | Rigid | Enforce RED → GREEN → REFACTOR with evidence at each step |
15
15
  | [Debugging](debugging.md) | `wz:debugging` | Rigid | Observe-hypothesize-test-fix loop instead of guesswork |
16
16
  | [Verification](verification.md) | `wz:verification` | Rigid | Require fresh command evidence before any completion claim |
17
+ | [Receiving Code Review](receiving-code-review.md) | `wz:receiving-code-review` | Rigid | Process review feedback with technical rigor, not blind agreement |
18
+ | [Requesting Code Review](requesting-code-review.md) | `wz:requesting-code-review` | Rigid | Request review when completing tasks or before merging |
17
19
  | [Design](design.md) | `wz:design` | Flexible | Guide designer role through open-pencil MCP visual design workflow |
18
20
  | [Scan Project](scan-project.md) | `scan-project` | Flexible | Build an evidence-based project profile from repo surfaces |
19
21
  | [Run Audit](run-audit.md) | `run-audit` | Flexible | Interactive structured codebase audit with report or fix-plan output |
20
22
  | [Self-Audit](self-audit.md) | `self-audit` | Flexible | Worktree-isolated audit-fix loop — safe self-improvement |
21
23
  | [Prepare Next](prepare-next.md) | `prepare-next` | Flexible | Produce a clean next-run handoff without stale context bleed |
24
+ | [Clarifier](clarifier.md) | `wz:clarifier` | Rigid | Run the clarification pipeline — research, scope, design, specs |
25
+ | [Executor](executor.md) | `wz:executor` | Rigid | Run the execution phase with TDD, quality gates, and verification |
26
+ | [Reviewer](reviewer.md) | `wz:reviewer` | Rigid | Adversarial review against approved spec, plan, and evidence |
27
+ | [Wazir](wazir.md) | `wz:wazir` | Rigid | One-command pipeline — init, clarify, execute, review automatically |
28
+ | [Init Pipeline](init-pipeline.md) | `wz:init-pipeline` | Flexible | Initialize the Wazir pipeline with zero-config auto-detection |
29
+ | [Executing Plans](executing-plans.md) | `wz:executing-plans` | Flexible | Execute implementation plans in separate sessions with review checkpoints |
30
+ | [Dispatching Parallel Agents](dispatching-parallel-agents.md) | `wz:dispatching-parallel-agents` | Flexible | Dispatch 2+ independent tasks without shared state |
31
+ | [Subagent-Driven Development](subagent-driven-development.md) | `wz:subagent-driven-development` | Flexible | Execute plan tasks via independent subagents in current session |
32
+ | [Using Git Worktrees](using-git-worktrees.md) | `wz:using-git-worktrees` | Flexible | Create isolated worktrees for feature work or plan execution |
33
+ | [Finishing a Branch](finishing-a-development-branch.md) | `wz:finishing-a-development-branch` | Flexible | Guide completion — merge, PR, or cleanup options |
34
+ | [Humanize](humanize.md) | `wz:humanize` | Flexible | Detect and remove AI writing patterns from text artifacts |
35
+ | [Writing Skills](writing-skills.md) | `wz:writing-skills` | Flexible | Create, edit, or verify skills before deployment |
36
+ | [Claude CLI](claude-cli.md) | `wz:claude-cli` | Flexible | Use Claude Code CLI programmatically for reviews and automation |
37
+ | [Codex CLI](codex-cli.md) | `wz:codex-cli` | Flexible | Use Codex CLI programmatically for reviews and sandbox operations |
38
+ | [Gemini CLI](gemini-cli.md) | `wz:gemini-cli` | Flexible | Use Gemini CLI for headless reviews and sandbox operations |
22
39
 
23
40
  ## Skill Types
24
41
 
@@ -0,0 +1,5 @@
1
+ # wz:clarifier
2
+
3
+ Run the clarification pipeline — research, clarify scope, brainstorm design, generate task specs and execution plan. Pauses for user approval between phases.
4
+
5
+ See the full skill definition in `skills/clarifier/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:claude-cli
2
+
3
+ How to use Claude Code CLI programmatically for reviews, automation, and non-interactive operations within Wazir pipelines.
4
+
5
+ See the full skill definition in `skills/claude-cli/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:codex-cli
2
+
3
+ How to use Codex CLI programmatically for reviews, execution, and sandbox operations within Wazir pipelines.
4
+
5
+ See the full skill definition in `skills/codex-cli/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:dispatching-parallel-agents
2
+
3
+ Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
4
+
5
+ See the full skill definition in `skills/dispatching-parallel-agents/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:executing-plans
2
+
3
+ Use when you have a written implementation plan to execute in a separate session with review checkpoints
4
+
5
+ See the full skill definition in `skills/executing-plans/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:executor
2
+
3
+ Run the execution phase — implement the approved plan with TDD, quality gates, and verification.
4
+
5
+ See the full skill definition in `skills/executor/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:finishing-a-development-branch
2
+
3
+ Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
4
+
5
+ See the full skill definition in `skills/finishing-a-development-branch/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:gemini-cli
2
+
3
+ How to use Gemini CLI programmatically for headless reviews, automation, and sandbox operations within Wazir pipelines.
4
+
5
+ See the full skill definition in `skills/gemini-cli/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:humanize
2
+
3
+ Use when reviewing or editing any text artifact (specs, plans, code comments, commit messages, content, documentation) to detect and remove AI writing patterns. Runs a 4-phase pipeline -- Scan for AI vocabulary and structural patterns, Identify severity and domain, Rewrite problematic sections, Verify meaning preservation. Invoke on existing text that needs corrective humanization. For preventive humanization, the composition engine loads domain-specific rules automatically.
4
+
5
+ See the full skill definition in `skills/humanize/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:init-pipeline
2
+
3
+ Initialize the Wazir pipeline — zero-config by default, auto-detects host and project stack. No mandatory questions.
4
+
5
+ See the full skill definition in `skills/init-pipeline/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:receiving-code-review
2
+
3
+ Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
4
+
5
+ See the full skill definition in `skills/receiving-code-review/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:requesting-code-review
2
+
3
+ Use when completing tasks, implementing major features, or before merging to verify work meets requirements
4
+
5
+ See the full skill definition in `skills/requesting-code-review/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:reviewer
2
+
3
+ Run the review phase — adversarial review of implementation against the approved spec, plan, and verification evidence.
4
+
5
+ See the full skill definition in `skills/reviewer/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:subagent-driven-development
2
+
3
+ Use when executing implementation plans with independent tasks in the current session
4
+
5
+ See the full skill definition in `skills/subagent-driven-development/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:using-git-worktrees
2
+
3
+ Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification
4
+
5
+ See the full skill definition in `skills/using-git-worktrees/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:wazir
2
+
3
+ One-command pipeline — type /wazir followed by what you want to build. Handles init, clarification, execution, review, and audits automatically.
4
+
5
+ See the full skill definition in `skills/wazir/SKILL.md`.
@@ -0,0 +1,5 @@
1
+ # wz:writing-skills
2
+
3
+ Use when creating new skills, editing existing skills, or verifying skills work before deployment
4
+
5
+ See the full skill definition in `skills/writing-skills/SKILL.md`.
@@ -42,7 +42,7 @@ The planner who closes a run is often the same role that will open the next one.
42
42
 
43
43
  One of:
44
44
 
45
- 1. **Full completion** — All 14 phases are done, review is accepted, learnings are proposed. Prepare the next feature's starting point.
45
+ 1. **Full completion** — All 4 phases are done, review is accepted, learnings are proposed. Prepare the next feature's starting point.
46
46
  2. **Partial completion** — The session is ending before the pipeline finishes. Prepare a mid-pipeline handoff so the next session can resume.
47
47
  3. **Slice boundary** — The approved plan is being executed in multiple slices. Prepare the handoff between slices.
48
48
 
@@ -133,15 +133,56 @@ Out of scope for this manifest check:
133
133
 
134
134
  Maintainers are responsible for policing those surfaces with the separate docs-truth, runtime-surface, and repository review checks.
135
135
 
136
- ## Workflows vs phases
136
+ ## Phases vs workflows
137
137
 
138
- - `phases` are the core lifecycle states of the operating model.
139
- - `workflows` are the canonical callable or review-gated entrypoints that drive those phases.
138
+ The pipeline has **4 phases** (Init, Clarifier, Executor, Final Review) and **15 workflows** (atomic units within those phases).
140
139
 
141
- They overlap heavily, but they are not identical:
140
+ - **Phases** are the top-level pipeline stages. Event capture and tracking use phase names: `init`, `clarifier`, `executor`, `final_review`.
141
+ - **Workflows** are the canonical callable or review-gated entrypoints that run within phases. Each workflow can be independently enabled/disabled via `workflow_policy` in run-config.
142
142
 
143
- - `spec_challenge`, `plan_review`, and `prepare_next` are workflows that sit between or around the core execution phases.
144
- - Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
143
+ | Phase | Workflows |
144
+ |-------|-----------|
145
+ | Init | (inline — no workflow files) |
146
+ | Clarifier | clarify, discover, specify, spec_challenge, author, design, design_review, plan, plan_review |
147
+ | Executor | execute, verify |
148
+ | Final Review | review, learn, prepare_next |
149
+
150
+ `run_audit` is a standalone on-demand workflow, not part of the main pipeline flow.
151
+
152
+ Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
153
+
154
+ ## Hook configuration
155
+
156
+ ### `hooks/routing-matrix.json`
157
+
158
+ The routing matrix defines how the context-mode router classifies commands:
159
+
160
+ - `large` — array of command prefixes that always route to context-mode (AC-3.1). The `# wazir:passthrough` marker does NOT exempt commands in this category.
161
+ - `small` — array of command prefixes that always pass through without context-mode processing.
162
+ - `ambiguous_heuristic` — rules for commands that match neither large nor small:
163
+ - `pipe_detected` — classify piped commands as ambiguous
164
+ - `redirect_detected` — classify redirected commands as ambiguous
165
+ - `verbose_binaries` — array of binary names whose output is typically large
166
+
167
+ ### `config/gating-rules.yaml`
168
+
169
+ The gating rules file defines conditions for phase transition decisions:
170
+
171
+ - `rules.continue` — all conditions must pass for a phase to advance (test failures, lint errors, type errors, drift delta, risk flags, uncertain outcomes)
172
+ - `rules.loop_back` — any deterministic failure (test failures, lint errors, or type errors) triggers a loop-back with actionable fix descriptions
173
+ - `rules.escalate` — fallback when neither continue nor loop_back match
174
+ - `default_verdict` — verdict when the report is empty or missing (defaults to `escalate`)
175
+
176
+ ### Composition proof artifacts
177
+
178
+ The composition engine (`tooling/src/adapters/composition-engine.js`) writes a proof artifact per dispatch to `.wazir/runs/<id>/artifacts/composition-<role>-<task>.json` containing:
179
+
180
+ - `modules_included[]` — `{ path, layer, tokens }` for each loaded module
181
+ - `modules_dropped[]` — `{ path, layer, tokens, reason }` for each dropped module. Reason values:
182
+ - `module_cap_exceeded` — module count exceeded the 15-module cap
183
+ - `token_ceiling_exceeded` — total tokens exceeded the configurable ceiling (default: 50,000)
184
+ - `total_tokens` — total token count of composed prompt
185
+ - `prompt_hash` — SHA-256 hash of the composed prompt for audit traceability
145
186
 
146
187
  ## Current index parser roster
147
188