npm - @wazir-dev/cli - Versions diffs - 1.0.0 → 1.2.0 - Mend

@wazir-dev/cli 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/CHANGELOG.md +100 -2
package/README.md +6 -6
package/docs/concepts/architecture.md +1 -1
package/docs/concepts/roles-and-workflows.md +2 -0
package/docs/concepts/why-wazir.md +59 -0
package/docs/decisions/2026-03-19-deferred-items.md +564 -0
package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
package/docs/readmes/INDEX.md +21 -5
package/docs/readmes/features/expertise/README.md +2 -2
package/docs/readmes/features/exports/README.md +2 -2
package/docs/readmes/features/schemas/README.md +3 -0
package/docs/readmes/features/skills/README.md +17 -0
package/docs/readmes/features/skills/clarifier.md +5 -0
package/docs/readmes/features/skills/claude-cli.md +5 -0
package/docs/readmes/features/skills/codex-cli.md +5 -0
package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
package/docs/readmes/features/skills/executing-plans.md +5 -0
package/docs/readmes/features/skills/executor.md +5 -0
package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
package/docs/readmes/features/skills/gemini-cli.md +5 -0
package/docs/readmes/features/skills/humanize.md +5 -0
package/docs/readmes/features/skills/init-pipeline.md +5 -0
package/docs/readmes/features/skills/receiving-code-review.md +5 -0
package/docs/readmes/features/skills/requesting-code-review.md +5 -0
package/docs/readmes/features/skills/reviewer.md +5 -0
package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
package/docs/readmes/features/skills/wazir.md +5 -0
package/docs/readmes/features/skills/writing-skills.md +5 -0
package/docs/readmes/features/workflows/prepare-next.md +1 -1
package/docs/reference/configuration-reference.md +47 -6
package/docs/reference/launch-checklist.md +4 -4
package/docs/reference/review-loop-pattern.md +538 -0
package/docs/reference/roles-reference.md +1 -0
package/docs/reference/skill-tiers.md +147 -0
package/docs/reference/tooling-cli.md +5 -1
package/docs/truth-claims.yaml +18 -0
package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
package/exports/hosts/claude/.claude/agents/designer.md +3 -0
package/exports/hosts/claude/.claude/agents/executor.md +2 -0
package/exports/hosts/claude/.claude/agents/planner.md +3 -0
package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
package/exports/hosts/claude/.claude/commands/design.md +4 -0
package/exports/hosts/claude/.claude/commands/discover.md +4 -0
package/exports/hosts/claude/.claude/commands/execute.md +4 -0
package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
package/exports/hosts/claude/.claude/commands/plan.md +4 -0
package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
package/exports/hosts/claude/.claude/commands/specify.md +4 -0
package/exports/hosts/claude/.claude/commands/verify.md +4 -0
package/exports/hosts/claude/.claude/settings.json +9 -0
package/exports/hosts/claude/CLAUDE.md +1 -1
package/exports/hosts/claude/export.manifest.json +22 -20
package/exports/hosts/claude/host-package.json +3 -1
package/exports/hosts/codex/AGENTS.md +1 -1
package/exports/hosts/codex/export.manifest.json +22 -20
package/exports/hosts/codex/host-package.json +3 -1
package/exports/hosts/cursor/.cursor/hooks.json +4 -0
package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
package/exports/hosts/cursor/export.manifest.json +22 -20
package/exports/hosts/cursor/host-package.json +3 -1
package/exports/hosts/gemini/GEMINI.md +1 -1
package/exports/hosts/gemini/export.manifest.json +22 -20
package/exports/hosts/gemini/host-package.json +3 -1
package/hooks/context-mode-router +191 -0
package/hooks/definitions/context_mode_router.yaml +19 -0
package/hooks/definitions/loop_cap_guard.yaml +1 -1
package/hooks/hooks.json +43 -0
package/hooks/protected-path-write-guard +8 -0
package/hooks/routing-matrix.json +45 -0
package/hooks/session-start +62 -1
package/llms-full.txt +905 -132
package/package.json +3 -3
package/roles/clarifier.md +3 -0
package/roles/designer.md +3 -0
package/roles/executor.md +2 -0
package/roles/planner.md +3 -0
package/roles/researcher.md +2 -0
package/roles/reviewer.md +5 -1
package/roles/specifier.md +3 -0
package/schemas/hook.schema.json +2 -1
package/schemas/phase-report.schema.json +80 -0
package/schemas/usage.schema.json +25 -1
package/schemas/wazir-manifest.schema.json +19 -0
package/skills/brainstorming/SKILL.md +20 -56
package/skills/clarifier/SKILL.md +243 -0
package/skills/claude-cli/SKILL.md +320 -0
package/skills/codex-cli/SKILL.md +260 -0
package/skills/debugging/SKILL.md +24 -1
package/skills/design/SKILL.md +13 -0
package/skills/dispatching-parallel-agents/SKILL.md +13 -0
package/skills/executing-plans/SKILL.md +28 -2
package/skills/executor/SKILL.md +129 -0
package/skills/finishing-a-development-branch/SKILL.md +13 -0
package/skills/gemini-cli/SKILL.md +260 -0
package/skills/humanize/SKILL.md +13 -0
package/skills/init-pipeline/SKILL.md +76 -78
package/skills/prepare-next/SKILL.md +81 -10
package/skills/receiving-code-review/SKILL.md +21 -0
package/skills/requesting-code-review/SKILL.md +38 -5
package/skills/reviewer/SKILL.md +423 -0
package/skills/run-audit/SKILL.md +13 -0
package/skills/scan-project/SKILL.md +13 -0
package/skills/self-audit/SKILL.md +197 -16
package/skills/subagent-driven-development/SKILL.md +38 -2
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
package/skills/subagent-driven-development/implementer-prompt.md +8 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
package/skills/tdd/SKILL.md +21 -0
package/skills/using-git-worktrees/SKILL.md +13 -0
package/skills/using-skills/SKILL.md +13 -0
package/skills/verification/SKILL.md +13 -0
package/skills/wazir/SKILL.md +286 -262
package/skills/writing-plans/SKILL.md +44 -4
package/skills/writing-skills/SKILL.md +13 -0
package/templates/artifacts/implementation-plan.md +3 -0
package/templates/artifacts/tasks-template.md +133 -0
package/templates/examples/phase-report.example.json +48 -0
package/templates/examples/wazir-manifest.example.yaml +1 -1
package/tooling/src/adapters/composition-engine.js +256 -0
package/tooling/src/adapters/model-router.js +84 -0
package/tooling/src/capture/command.js +111 -2
package/tooling/src/capture/run-config.js +23 -0
package/tooling/src/capture/store.js +24 -0
package/tooling/src/capture/usage.js +106 -0
package/tooling/src/checks/ac-matrix.js +256 -0
package/tooling/src/checks/brand-truth.js +3 -6
package/tooling/src/checks/command-registry.js +13 -0
package/tooling/src/checks/docs-truth.js +1 -1
package/tooling/src/checks/runtime-surface.js +3 -7
package/tooling/src/checks/skills.js +111 -0
package/tooling/src/cli.js +17 -3
package/tooling/src/commands/stats.js +161 -0
package/tooling/src/commands/validate.js +5 -1
package/tooling/src/export/compiler.js +33 -37
package/tooling/src/gating/agent.js +145 -0
package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
package/tooling/src/hooks/routing-logic.js +69 -0
package/tooling/src/init/auto-detect.js +260 -0
package/tooling/src/init/command.js +161 -0
package/tooling/src/input/scanner.js +46 -0
package/tooling/src/reports/command.js +103 -0
package/tooling/src/reports/phase-report.js +323 -0
package/tooling/src/state/command.js +160 -0
package/tooling/src/state/db.js +287 -0
package/tooling/src/status/command.js +53 -1
package/wazir.manifest.yaml +26 -17
package/workflows/clarify.md +4 -0
package/workflows/design-review.md +4 -0
package/workflows/design.md +4 -0
package/workflows/discover.md +4 -0
package/workflows/execute.md +4 -0
package/workflows/plan-review.md +4 -0
package/workflows/plan.md +4 -0
package/workflows/spec-challenge.md +4 -0
package/workflows/specify.md +4 -0
package/workflows/verify.md +4 -0

package/docs/decisions/2026-03-19-enhancement-decisions.md ADDED Viewed

@@ -0,0 +1,300 @@
+# Wazir Enhancement Decisions — 2026-03-19
+Decisions agreed upon during brainstorming session. Each item is implementation-ready.
+## Research Summary
+| # | Decision | Online Research Needed? | What to Research |
+|---|----------|------------------------|------------------|
+| 1 | Smart context-mode routing | No | Internal implementation — all info is local |
+| 2 | Enforce wazir index | No | Internal tooling, no external deps |
+| 3 | Enforce context-mode for large output | No | Internal, paired with #1 |
+| 4 | Track context savings metrics | No | Metrics design, internal |
+| 5 | Three-tier skill strategy | **Yes** | Blocked on R1 + R2 |
+| 6 | Rich phase reports + gating agent | **Yes** | How other autonomous agent systems handle phase gating, agent self-evaluation patterns, prior art on LLM confidence calibration |
+| 7 | Continuous learning + user input capture | **Yes** | Continuous learning in agent systems, feedback loop patterns, drift prevention, how reinforcement from human feedback is stored and applied |
+| 8 | Autoresearch self-improvement | No | Already researched — decision made |
+| 9 | Composer task-specific agents | **Yes** | Prompt composition patterns, multi-module prompt assembly, token budget strategies for large context injection |
+| R1 | Superpowers skill audit | **Yes** | Latest superpowers GitHub, changelog, roadmap, community health, skill extensibility |
+| R2 | Skill composition infrastructure | **Yes** | Claude Code plugin ecosystem patterns for skill chaining/extension, existing RFCs or discussions |
+---
+## Agreed
+### 1. Smart context-mode routing for Bash commands
+**Decision:** Not full enforcement — smart routing via a PreToolUse:Bash hook.
+- Small-output commands (git status, ls, short queries) pass through native Bash — no latency tax
+- Known-large-output patterns (test runners, builds, logs, diffs, dependency trees) auto-route through `batch_execute`
+- Threshold: commands whose output routinely exceeds ~30-50 lines
+- Index + context-mode is the preferred research path: index for *where*, context-mode for *what*
+- Skills should still be able to explicitly opt in/out when they know better than the heuristic
+### 2. Enforce wazir index for codebase exploration
+**Decision:** All codebase exploration MUST use `wazir index` as the first step.
+- Never spawn heavyweight exploration agents that brute-force read dozens of files
+- Flow: `wazir index search-symbols` / `wazir recall` → locate targets → then read only what's needed
+- Subagents and skills must query the index before falling back to direct file reads
+- If no index exists, build one first (`wazir index build && wazir index summarize --tier all`)
+### 3. Enforce context-mode for large-output commands
+**Decision:** Context-mode is mandatory for commands with routinely large output.
+**Must use context-mode (`batch_execute` / `execute_file`):**
+- Test runners (npm test, vitest, jest, pytest, etc.)
+- Build commands (npm run build, tsc, etc.)
+- Dependency trees (npm ls, pip list, etc.)
+- Large git diffs (`git diff` with many files)
+- Log tailing / large file reads (>50 lines)
+- Linting / static analysis output
+- CI/CD output parsing
+**Pass through native Bash:**
+- git status, git log (short), git branch
+- ls, pwd, mkdir, cp, mv
+- wazir CLI commands with short output (doctor, index build, capture)
+- Any command known to produce <30 lines
+### 4. Track context savings metrics
+**Decision:** Every index query and context-mode invocation must update a running usage counter.
+- Track per-session: queries made, estimated tokens saved, bytes avoided in context
+- Track per-tool breakdown: index lookups vs. `execute_file` vs. `batch_execute` vs. `fetch_and_index`
+- Store in run state (e.g., `.wazir/runs/<id>/usage.json` or equivalent)
+- Surface via `wazir status` or a dedicated `wazir stats` subcommand
+- If no tracking mechanism exists yet, build one — we can't optimize what we don't measure
+### 5. Three-tier skill strategy — delegate, augment, own
+**Decision:** Stop forking superpowers skills wholesale. Categorize each into one of three tiers:
+| Tier | Strategy | Naming |
+|------|----------|--------|
+| **Delegate** | Use superpowers skill as-is. Delete Wazir fork. | `superpowers:<name>` only |
+| **Augment** | Invoke superpowers skill + inject a Wazir `CONTEXT.md` addendum (additive only, no overrides) | `superpowers:<name>` invoked with Wazir context |
+| **Own** | Wazir-original or structurally rewritten skill. Rename to avoid conflict with superpowers. | `wz:<unique-name>` only |
+**Rules:**
+- Augment addenda must be strictly additive — no "replace step N" or "ignore this part"
+- Owned skills must have a distinct name from any superpowers skill to prevent dual-registration confusion
+- Delegated skills: delete the Wazir `skills/<name>/` directory entirely
+**Status:** Blocked on Research Phase (see below).
+---
+## Research Required
+### R1. Superpowers skill audit and tier classification
+**Why this matters:**
+We're choosing between maintaining our own forks (current approach — high maintenance, falls behind upstream) and delegating to a well-maintained plugin (lower maintenance, auto-updates, but less control). Getting this wrong means either: (a) we fork everything and slowly diverge from improvements the superpowers community ships, or (b) we delegate too much and lose Wazir-specific behavior that matters. This is an architectural decision that affects every future skill interaction, so it must be evidence-based.
+**What the research must cover:**
+1. **Full superpowers skill inventory (online)**
+   - Fetch the latest superpowers plugin source (GitHub/marketplace) — don't rely on our cached v4.3.1, it may be outdated
+   - Document every skill: name, purpose, structure, key behaviors
+   - Check the superpowers changelog/releases for skill evolution pace — are these skills actively improved or stable?
+2. **Skill-by-skill diff analysis**
+   - For each superpowers skill that has a Wazir counterpart: what exactly did Wazir change?
+   - Classify each change as:
+     - **Additive** — Wazir adds context/tooling but doesn't contradict superpowers behavior (→ Augment tier candidate)
+     - **Structural** — Wazir rewrites core logic, steps, or output format (→ Own tier candidate)
+     - **Cosmetic** — just naming/formatting, no behavioral difference (→ Delegate tier candidate)
+3. **Superpowers skills with NO Wazir counterpart**
+   - Are there superpowers skills we're not using but should be?
+   - Are there skills we could delegate to that we're currently handling ad-hoc?
+4. **Community and maintenance posture**
+   - How frequently does superpowers publish updates?
+   - Is there a public roadmap or skill deprecation policy?
+   - Are there breaking changes between versions that would affect our augment addenda?
+   - What's the plugin's approach to skill extensibility — do they support context injection natively or is that something we'd need to build?
+5. **Skill composition patterns in the ecosystem**
+   - How do other projects handle "use plugin X's skill but with my context"?
+   - Is there an established pattern for skill chaining/augmentation in Claude Code plugins?
+   - What are the failure modes — prompt priority conflicts, version drift, context bloat?
+6. **Risk analysis**
+   - What happens if superpowers changes a skill we depend on in Augment tier?
+   - What's our rollback path if delegation breaks a workflow?
+   - How do we test that augmented skills still work after an upstream update?
+**How to execute:**
+- Online research: superpowers GitHub repo, marketplace listing, changelogs, issues, discussions
+- Local analysis: diff every Wazir skill against its superpowers counterpart (cached + latest)
+- Output: a classification table with tier assignment, rationale, and risk notes per skill
+**Principle:** Do the right thing, not the easy thing. If the research shows we should own more skills than expected, we own them. If it shows we should delegate almost everything, we delegate. Follow the evidence.
+---
+### R2. Skill composition infrastructure design
+**Why this matters:**
+The Augment tier needs a mechanism to invoke an external skill with Wazir-specific context injected. A thin wrapper per skill is the easy path — but it recreates the maintenance problem we're trying to solve (one more file per skill that can drift). The right solution is a composition system that's declarative, testable, and resilient to upstream changes.
+**What the research must cover:**
+1. **Composition model design**
+   - How should a composed skill be declared? Options:
+     - A manifest entry: `{ base: "superpowers:tdd", augment: "wazir-context/tdd.md" }`
+     - A skill resolver that chains skills at invocation time
+     - A hook-based approach: PostSkillLoad injects context automatically
+   - Which model keeps the augmentation visible and auditable (no hidden magic)?
+   - How does the composed skill appear in the skill list — as one entry or two?
+2. **Context injection semantics**
+   - Where does the Wazir context go relative to the base skill? Before? After? Interleaved?
+   - Prompt priority: if the base skill says "write output to X" and the context says "write output to Y", which wins? We need a clear rule, not ambiguity.
+   - How do we prevent addenda from accidentally overriding base behavior? (Lint rule? Structural constraint?)
+3. **Version pinning and drift detection**
+   - Should we pin the superpowers version we augment against?
+   - How do we detect when an upstream skill change breaks our addendum? (CI check? Hash comparison?)
+   - What's the upgrade path when superpowers ships a new version?
+4. **Testing surface**
+   - How do we test that a composed skill (base + addendum) produces the right behavior?
+   - Can we diff the resolved prompt to verify no conflicts?
+   - Should there be integration tests that run composed skills against known scenarios?
+5. **Ecosystem research (online)**
+   - How do other Claude Code plugins handle skill extension/composition?
+   - Are there existing RFCs, discussions, or patterns in the Claude Code plugin ecosystem for this?
+   - Does superpowers itself have any extension mechanism planned?
+**Implementation: blocked on R1.** The number of skills landing in Augment tier determines whether this infrastructure is justified. If R1 shows ≤2 augmented skills, a simple approach may suffice. If ≥5, build it properly.
+**Principle:** Design now, build after evidence. Don't over-engineer, don't under-engineer — right-size to R1 results.
+---
+## Under Discussion
+### 6. Rich phase reports + three-way gating agent
+**Decision:** Two parts — rich reports and a gating agent with three possible outputs.
+**Part 1: Rich phase reports**
+- Current reports are too thin to be actionable. Rebuild them to include:
+  - What was attempted and what the outcome was
+  - What succeeded, what failed, what's uncertain
+  - Drift from original intent / spec
+  - Quality metrics (test results, coverage, lint, type-checking)
+  - Risk flags and open questions
+  - Decisions made and their rationale
+- Reports saved to file for Wazir self-improvement and auditability
+**Part 2: Gating agent (three-way output)**
+- Agent receives: user's original input, the phase report, and accumulated decisions
+- Agent outputs ONE of three verdicts:
+  - **Continue** — proceed to next phase
+  - **Loop back** — return to current phase with specific fixes
+  - **Escalate to human** — agent cannot decide, needs human judgment
+**Explicit criteria (not vibes):**
+| Verdict | Criteria |
+|---------|----------|
+| Continue | All quality gates pass, no drift from spec, no open risks, no ambiguous trade-offs |
+| Loop back | Specific failures identified, actionable fix path exists, no human judgment needed |
+| Escalate | Ambiguous trade-off, scope change detected, conflicting signals, confidence below threshold, or any situation where two reasonable people could disagree |
+**Critical design constraint:** The escalation criteria must be explicit and err toward escalating. If not codified, the agent will almost never escalate — LLMs are bad at recognizing their own uncertainty. Default posture: **when in doubt, escalate.**
+---
+### 7. Restore continuous learning loop + capture all user input
+**Decision (parked — will circle back):**
+**Part 1: Continuous learning**
+- The old Wazir implementation had a final step that applied learnings from each run to future runs
+- This must be restored — every completed run should extract what worked, what failed, and what was learned, and feed it forward
+- Learning is cumulative across runs, not just within a single run
+**Part 2: User input as learning signal**
+- ALL user input during a run must be saved (not just the final output)
+- User corrections, approvals, rejections, feedback, and mid-run redirections are the highest-quality training signal
+- This feeds both the continuous learning loop and the phase reports (decision #6)
+**Status:** Parked. Will design after current discussion topics are resolved.
+---
+### 8. Autoresearch pattern for Wazir self-improvement
+**Decision:** Use autoresearch loop on Wazir itself, but with strict identity boundaries.
+**Core risk:** A self-modifying system running overnight can drift Wazir into a completely different project. Each change passes the metric, but after 100 iterations the project's identity is gone. Skills define what Wazir *is* — an agent must not rewrite them unsupervised.
+**The line: if changing it changes what Wazir *does*, a human decides. If it makes Wazir do the same thing better, loop it.**
+**CAN loop overnight (mechanical, identity-safe):**
+- Test coverage — add tests, never rewrite behavior
+- Bug fixes for known, specific, scoped issues
+- Lint / type errors / code quality
+- Performance — make existing behavior faster
+- Export validation fixes
+- Documentation gaps
+**CANNOT loop overnight (identity-defining, human-gated):**
+- Skill files
+- Workflow definitions
+- Architecture
+- Role contracts
+- Manifest schema
+- Design docs / program.md
+**Patterns to adopt from autoresearch:**
+- Keep/discard via git revert → use in executor
+- Mechanical metric requirement (measurable before/after) → enforce in phase reports
+- STRIDE + OWASP structured audit loop → inform `wz:run-audit` design
+- Scoped overnight runs with morning human review gate
+**Implementation: Enhanced self-audit with bounded loop (5 iterations).**
+- Enhance self-audit quality first (richer audit dimensions, better findings, smarter fixes)
+- Then run it in a 5-loop cycle: each loop finds new issues exposed by the previous loop's fixes
+- Bounded — no drift risk, human reviews the final branch
+- Simpler than autoresearch integration, uses existing worktree isolation
+- Priority: make each individual audit *good* — 5 loops of a strong audit beats 100 loops of a shallow one
+**Open:** What specifically needs enhancing in self-audit before the loop is worthwhile?
+---
+### 9. Composer generates task-specific agents with full expertise in context
+**Decision:** The composition engine must compose full expertise content into each dispatched agent's context — not just filenames or summaries.
+**How it works:**
+1. Detect task stack + concerns (from project scan / user input)
+2. Resolve which expertise modules apply per role (composition-map.yaml — 4 layers: always → auto → stacks → concerns)
+3. **Compose the full content** of every resolved module into the agent's prompt
+4. Dispatch executor, reviewer, verifier — each with the complete relevant expertise internalized
+**Key principle:** Loading expertise is additive, not restrictive. An agent with Flutter expertise loaded doesn't forget React — it additionally knows Flutter patterns and antipatterns. This is strictly better than a generic agent.
+**What this means for the reviewer:** The reviewer gets the full antipattern catalog + domain-specific review dimensions composed into its context. It reviews against *everything it knows*, with task-specific expertise making it sharper, not narrower.
+**Why this matters:** Expertise files are meaningless if they're not in the prompt. A filename reference or summary doesn't give the agent the actual knowledge. The full content must be in context for the agent to act on it.
+**Open:** How does this interact with context window limits? The composition engine already enforces max 15 modules per dispatch with token budget — this constraint stays. The composer must be smart about what fits.
+---
+## Rejected
+*(nothing yet)*

package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md CHANGED Viewed

@@ -29,7 +29,7 @@ Before starting any implementation, verify all of the following:
 - [ ] **Node.js >= 20.0.0** installed
 - [ ] **`npm test` passes on the clean branch** with zero failures
 - [ ] **`wazir export --check` passes** on the clean branch (no pre-existing drift)
-- [ ] **All 13 task spec files reviewed** in `.agent-os/tasks/clarified/` (004-016)
+- [ ] **All 13 task spec files reviewed** in `.wazir/tasks/clarified/` (004-016)
 - [ ] **`tooling/src/capture/command.js` imports confirmed:** `fs` (line 1) and `path` (line 2) are already imported -- no additional module imports needed for task 006
 - [ ] **`tooling/test/capture.test.js` fixture pattern confirmed:** `createCaptureFixture()` provides `fixtureRoot`, `stateRoot`, and `cleanup()` -- new tests must use unique run IDs
 - [ ] **`tooling/test/role-contracts.test.js` is in `test:active`** -- confirmed, so workflow and role structural tests can be added there without new test file registration

package/docs/readmes/INDEX.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Wazir README Index
-> 60 world-class README files covering every feature, workflow, role, skill, hook, and package.
+> 76 world-class README files covering every feature, workflow, role, skill, hook, and package.
 ## Main README
@@ -48,7 +48,7 @@
 ### Skills (`features/skills/`)
 | File | Description |
 |------|-------------|
-| [README.md](features/skills/README.md) | Skills system overview — all 11 skills, type table, invocation rules |
+| [README.md](features/skills/README.md) | Skills system overview — all 28 skills, type table, invocation rules |
 | [using-skills.md](features/skills/using-skills.md) | Bootstrap skill — enforces skill-check-before-action |
 | [brainstorming.md](features/skills/brainstorming.md) | Design gate skill — ideas into designs before implementation |
 | [writing-plans.md](features/skills/writing-plans.md) | Plan production skill — specs into bite-sized task files |
@@ -60,6 +60,23 @@
 | [run-audit.md](features/skills/run-audit.md) | Run audit skill — 6-step interactive audit pipeline |
 | [self-audit.md](features/skills/self-audit.md) | Self-audit skill — worktree-isolated drift detection |
 | [prepare-next.md](features/skills/prepare-next.md) | Prepare next skill — clean handoff between sessions |
+| [clarifier.md](features/skills/clarifier.md) | Clarifier skill — research, scope, design, specs pipeline |
+| [executor.md](features/skills/executor.md) | Executor skill — TDD execution with quality gates |
+| [reviewer.md](features/skills/reviewer.md) | Reviewer skill — adversarial review against spec and plan |
+| [wazir.md](features/skills/wazir.md) | Wazir skill — one-command full pipeline |
+| [init-pipeline.md](features/skills/init-pipeline.md) | Init pipeline skill — zero-config project setup |
+| [executing-plans.md](features/skills/executing-plans.md) | Executing plans skill — session-isolated plan execution |
+| [dispatching-parallel-agents.md](features/skills/dispatching-parallel-agents.md) | Parallel agents skill — dispatch independent tasks |
+| [subagent-driven-development.md](features/skills/subagent-driven-development.md) | Subagent development skill — in-session parallel execution |
+| [using-git-worktrees.md](features/skills/using-git-worktrees.md) | Git worktrees skill — isolated feature branches |
+| [finishing-a-development-branch.md](features/skills/finishing-a-development-branch.md) | Branch finishing skill — merge, PR, or cleanup |
+| [humanize.md](features/skills/humanize.md) | Humanize skill — remove AI writing patterns |
+| [writing-skills.md](features/skills/writing-skills.md) | Writing skills skill — create and verify skills |
+| [receiving-code-review.md](features/skills/receiving-code-review.md) | Receiving review skill — process feedback with rigor |
+| [requesting-code-review.md](features/skills/requesting-code-review.md) | Requesting review skill — structured review requests |
+| [claude-cli.md](features/skills/claude-cli.md) | Claude CLI skill — programmatic Claude Code usage |
+| [codex-cli.md](features/skills/codex-cli.md) | Codex CLI skill — programmatic Codex usage |
+| [gemini-cli.md](features/skills/gemini-cli.md) | Gemini CLI skill — programmatic Gemini usage |
 ### Hooks (`features/hooks/`)
 | File | Description |
@@ -77,8 +94,8 @@
 ### Other Features
 | File | Description |
 |------|-------------|
-| [expertise/README.md](features/expertise/README.md) | Expertise system — 308 modules across 11 domains |
-| [schemas/README.md](features/schemas/README.md) | Schema system — 16 JSON schemas for artifact validation |
+| [expertise/README.md](features/expertise/README.md) | Expertise system — 268 modules across 12 domains |
+| [schemas/README.md](features/schemas/README.md) | Schema system — 19 JSON schemas for artifact validation |
 | [tooling/README.md](features/tooling/README.md) | CLI tooling — all commands with options and examples |
 | [exports/README.md](features/exports/README.md) | Host exports — Claude, Codex, Gemini, Cursor packages |
@@ -88,7 +105,6 @@
 |------|---------|-------------|
 | [README.md](packages/README.md) | — | Package index with versions and reading order |
 | [ajv.md](packages/ajv.md) | `ajv@^8.18.0` | JSON Schema 2020-12 validation |
-| [gray-matter.md](packages/gray-matter.md) | `gray-matter@^4.0.3` | YAML frontmatter parsing for skill files |
 | [yaml.md](packages/yaml.md) | `yaml@^2.0.0` | YAML 1.2 serialization for manifests |
 | [node-test.md](packages/node-test.md) | `node:test` | Zero-dependency built-in test runner |
 | [context-mode.md](packages/context-mode.md) | `context-mode` plugin | Context compression for large outputs |

package/docs/readmes/features/expertise/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Expertise System
-Wazir's expertise system is a curated library of **308 knowledge modules** spanning
+Wazir's expertise system is a curated library of **268 knowledge modules** spanning
 architecture, security, performance, design, and more. Modules are loaded selectively into
 agent prompts — giving the right knowledge to the right role at the right phase — without
 flooding context with irrelevant content.
@@ -167,5 +167,5 @@ produce plausible-looking output that silently fails to meet requirements.
 |---|---|
 | `expertise/index.yaml` | Machine-readable module registry with phase metadata |
 | `expertise/index.md` | Human-readable semantic map and reading guide |
-| `expertise/PROGRESS.md` | Authoring history: 32 modules, 255 total files, completion dates |
+| `expertise/PROGRESS.md` | Authoring history: 32 research batches, completion dates |
 | `expertise/README.md` | Directory contract: what is and is not allowed here |

package/docs/readmes/features/exports/README.md CHANGED Viewed

@@ -50,8 +50,8 @@ wazir export build
 3. Collect canonical sources
    ┌────────────────────────────────┐
    │ wazir.manifest.yaml       │
-   │ roles/*.md  (9 role files)     │
-   │ workflows/*.md (13 workflows)  │
+   │ roles/*.md  (10 role files)    │
+   │ workflows/*.md (15 workflows)  │
    │ hooks/definitions/*.yaml       │
    └────────────────────────────────┘
         │

package/docs/readmes/features/schemas/README.md CHANGED Viewed

@@ -53,6 +53,9 @@ suite via schema-backed example fixtures in `templates/examples/`.
 | `docs-claim.schema.json` | Docs Truth Claim | `validate docs` + `docs/truth-claims.yaml` |
 | `proposed-learning.schema.json` | Proposed Learning | `learner` role output |
 | `accepted-learning.schema.json` | Accepted Learning | `learn` workflow approval |
+| `author-artifact.schema.json` | Author Artifact | `content-author` role output |
+| `usage.schema.json` | Usage Report | `capture usage` output |
+| `phase-report.schema.json` | Phase Report | Phase completion summary |
 ---

package/docs/readmes/features/skills/README.md CHANGED Viewed

@@ -14,11 +14,28 @@ Skills are the **operational layer** of Wazir. Where roles define who acts and w
 | [TDD](tdd.md) | `wz:tdd` | Rigid | Enforce RED → GREEN → REFACTOR with evidence at each step |
 | [Debugging](debugging.md) | `wz:debugging` | Rigid | Observe-hypothesize-test-fix loop instead of guesswork |
 | [Verification](verification.md) | `wz:verification` | Rigid | Require fresh command evidence before any completion claim |
+| [Receiving Code Review](receiving-code-review.md) | `wz:receiving-code-review` | Rigid | Process review feedback with technical rigor, not blind agreement |
+| [Requesting Code Review](requesting-code-review.md) | `wz:requesting-code-review` | Rigid | Request review when completing tasks or before merging |
 | [Design](design.md) | `wz:design` | Flexible | Guide designer role through open-pencil MCP visual design workflow |
 | [Scan Project](scan-project.md) | `scan-project` | Flexible | Build an evidence-based project profile from repo surfaces |
 | [Run Audit](run-audit.md) | `run-audit` | Flexible | Interactive structured codebase audit with report or fix-plan output |
 | [Self-Audit](self-audit.md) | `self-audit` | Flexible | Worktree-isolated audit-fix loop — safe self-improvement |
 | [Prepare Next](prepare-next.md) | `prepare-next` | Flexible | Produce a clean next-run handoff without stale context bleed |
+| [Clarifier](clarifier.md) | `wz:clarifier` | Rigid | Run the clarification pipeline — research, scope, design, specs |
+| [Executor](executor.md) | `wz:executor` | Rigid | Run the execution phase with TDD, quality gates, and verification |
+| [Reviewer](reviewer.md) | `wz:reviewer` | Rigid | Adversarial review against approved spec, plan, and evidence |
+| [Wazir](wazir.md) | `wz:wazir` | Rigid | One-command pipeline — init, clarify, execute, review automatically |
+| [Init Pipeline](init-pipeline.md) | `wz:init-pipeline` | Flexible | Initialize the Wazir pipeline with zero-config auto-detection |
+| [Executing Plans](executing-plans.md) | `wz:executing-plans` | Flexible | Execute implementation plans in separate sessions with review checkpoints |
+| [Dispatching Parallel Agents](dispatching-parallel-agents.md) | `wz:dispatching-parallel-agents` | Flexible | Dispatch 2+ independent tasks without shared state |
+| [Subagent-Driven Development](subagent-driven-development.md) | `wz:subagent-driven-development` | Flexible | Execute plan tasks via independent subagents in current session |
+| [Using Git Worktrees](using-git-worktrees.md) | `wz:using-git-worktrees` | Flexible | Create isolated worktrees for feature work or plan execution |
+| [Finishing a Branch](finishing-a-development-branch.md) | `wz:finishing-a-development-branch` | Flexible | Guide completion — merge, PR, or cleanup options |
+| [Humanize](humanize.md) | `wz:humanize` | Flexible | Detect and remove AI writing patterns from text artifacts |
+| [Writing Skills](writing-skills.md) | `wz:writing-skills` | Flexible | Create, edit, or verify skills before deployment |
+| [Claude CLI](claude-cli.md) | `wz:claude-cli` | Flexible | Use Claude Code CLI programmatically for reviews and automation |
+| [Codex CLI](codex-cli.md) | `wz:codex-cli` | Flexible | Use Codex CLI programmatically for reviews and sandbox operations |
+| [Gemini CLI](gemini-cli.md) | `wz:gemini-cli` | Flexible | Use Gemini CLI for headless reviews and sandbox operations |
 ## Skill Types

package/docs/readmes/features/skills/clarifier.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:clarifier
+Run the clarification pipeline — research, clarify scope, brainstorm design, generate task specs and execution plan. Pauses for user approval between phases.
+See the full skill definition in `skills/clarifier/SKILL.md`.

package/docs/readmes/features/skills/claude-cli.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:claude-cli
+How to use Claude Code CLI programmatically for reviews, automation, and non-interactive operations within Wazir pipelines.
+See the full skill definition in `skills/claude-cli/SKILL.md`.

package/docs/readmes/features/skills/codex-cli.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:codex-cli
+How to use Codex CLI programmatically for reviews, execution, and sandbox operations within Wazir pipelines.
+See the full skill definition in `skills/codex-cli/SKILL.md`.

package/docs/readmes/features/skills/dispatching-parallel-agents.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:dispatching-parallel-agents
+Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
+See the full skill definition in `skills/dispatching-parallel-agents/SKILL.md`.

package/docs/readmes/features/skills/executing-plans.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:executing-plans
+Use when you have a written implementation plan to execute in a separate session with review checkpoints
+See the full skill definition in `skills/executing-plans/SKILL.md`.

package/docs/readmes/features/skills/executor.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:executor
+Run the execution phase — implement the approved plan with TDD, quality gates, and verification.
+See the full skill definition in `skills/executor/SKILL.md`.

package/docs/readmes/features/skills/finishing-a-development-branch.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:finishing-a-development-branch
+Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
+See the full skill definition in `skills/finishing-a-development-branch/SKILL.md`.

package/docs/readmes/features/skills/gemini-cli.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:gemini-cli
+How to use Gemini CLI programmatically for headless reviews, automation, and sandbox operations within Wazir pipelines.
+See the full skill definition in `skills/gemini-cli/SKILL.md`.

package/docs/readmes/features/skills/humanize.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:humanize
+Use when reviewing or editing any text artifact (specs, plans, code comments, commit messages, content, documentation) to detect and remove AI writing patterns. Runs a 4-phase pipeline -- Scan for AI vocabulary and structural patterns, Identify severity and domain, Rewrite problematic sections, Verify meaning preservation. Invoke on existing text that needs corrective humanization. For preventive humanization, the composition engine loads domain-specific rules automatically.
+See the full skill definition in `skills/humanize/SKILL.md`.

package/docs/readmes/features/skills/init-pipeline.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:init-pipeline
+Initialize the Wazir pipeline — zero-config by default, auto-detects host and project stack. No mandatory questions.
+See the full skill definition in `skills/init-pipeline/SKILL.md`.

package/docs/readmes/features/skills/receiving-code-review.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:receiving-code-review
+Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
+See the full skill definition in `skills/receiving-code-review/SKILL.md`.

package/docs/readmes/features/skills/requesting-code-review.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:requesting-code-review
+Use when completing tasks, implementing major features, or before merging to verify work meets requirements
+See the full skill definition in `skills/requesting-code-review/SKILL.md`.

package/docs/readmes/features/skills/reviewer.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:reviewer
+Run the review phase — adversarial review of implementation against the approved spec, plan, and verification evidence.
+See the full skill definition in `skills/reviewer/SKILL.md`.

package/docs/readmes/features/skills/subagent-driven-development.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:subagent-driven-development
+Use when executing implementation plans with independent tasks in the current session
+See the full skill definition in `skills/subagent-driven-development/SKILL.md`.

package/docs/readmes/features/skills/using-git-worktrees.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:using-git-worktrees
+Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification
+See the full skill definition in `skills/using-git-worktrees/SKILL.md`.

package/docs/readmes/features/skills/wazir.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:wazir
+One-command pipeline — type /wazir followed by what you want to build. Handles init, clarification, execution, review, and audits automatically.
+See the full skill definition in `skills/wazir/SKILL.md`.

package/docs/readmes/features/skills/writing-skills.md ADDED Viewed

@@ -0,0 +1,5 @@
+# wz:writing-skills
+Use when creating new skills, editing existing skills, or verifying skills work before deployment
+See the full skill definition in `skills/writing-skills/SKILL.md`.

package/docs/readmes/features/workflows/prepare-next.md CHANGED Viewed

@@ -42,7 +42,7 @@ The planner who closes a run is often the same role that will open the next one.
 One of:
-1. **Full completion** — All 14 phases are done, review is accepted, learnings are proposed. Prepare the next feature's starting point.
+1. **Full completion** — All 4 phases are done, review is accepted, learnings are proposed. Prepare the next feature's starting point.
 2. **Partial completion** — The session is ending before the pipeline finishes. Prepare a mid-pipeline handoff so the next session can resume.
 3. **Slice boundary** — The approved plan is being executed in multiple slices. Prepare the handoff between slices.

package/docs/reference/configuration-reference.md CHANGED Viewed

@@ -133,15 +133,56 @@ Out of scope for this manifest check:
 Maintainers are responsible for policing those surfaces with the separate docs-truth, runtime-surface, and repository review checks.
-## Workflows vs phases
+## Phases vs workflows
-- `phases` are the core lifecycle states of the operating model.
-- `workflows` are the canonical callable or review-gated entrypoints that drive those phases.
+The pipeline has **4 phases** (Init, Clarifier, Executor, Final Review) and **15 workflows** (atomic units within those phases).
-They overlap heavily, but they are not identical:
+- **Phases** are the top-level pipeline stages. Event capture and tracking use phase names: `init`, `clarifier`, `executor`, `final_review`.
+- **Workflows** are the canonical callable or review-gated entrypoints that run within phases. Each workflow can be independently enabled/disabled via `workflow_policy` in run-config.
-- `spec_challenge`, `plan_review`, and `prepare_next` are workflows that sit between or around the core execution phases.
-- Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
+| Phase | Workflows |
+|-------|-----------|
+| Init | (inline — no workflow files) |
+| Clarifier | clarify, discover, specify, spec_challenge, author, design, design_review, plan, plan_review |
+| Executor | execute, verify |
+| Final Review | review, learn, prepare_next |
+`run_audit` is a standalone on-demand workflow, not part of the main pipeline flow.
+Validators and exports should treat manifest-declared workflows as the canonical workflow file roster.
+## Hook configuration
+### `hooks/routing-matrix.json`
+The routing matrix defines how the context-mode router classifies commands:
+- `large` — array of command prefixes that always route to context-mode (AC-3.1). The `# wazir:passthrough` marker does NOT exempt commands in this category.
+- `small` — array of command prefixes that always pass through without context-mode processing.
+- `ambiguous_heuristic` — rules for commands that match neither large nor small:
+  - `pipe_detected` — classify piped commands as ambiguous
+  - `redirect_detected` — classify redirected commands as ambiguous
+  - `verbose_binaries` — array of binary names whose output is typically large
+### `config/gating-rules.yaml`
+The gating rules file defines conditions for phase transition decisions:
+- `rules.continue` — all conditions must pass for a phase to advance (test failures, lint errors, type errors, drift delta, risk flags, uncertain outcomes)
+- `rules.loop_back` — any deterministic failure (test failures, lint errors, or type errors) triggers a loop-back with actionable fix descriptions
+- `rules.escalate` — fallback when neither continue nor loop_back match
+- `default_verdict` — verdict when the report is empty or missing (defaults to `escalate`)
+### Composition proof artifacts
+The composition engine (`tooling/src/adapters/composition-engine.js`) writes a proof artifact per dispatch to `.wazir/runs/<id>/artifacts/composition-<role>-<task>.json` containing:
+- `modules_included[]` — `{ path, layer, tokens }` for each loaded module
+- `modules_dropped[]` — `{ path, layer, tokens, reason }` for each dropped module. Reason values:
+  - `module_cap_exceeded` — module count exceeded the 15-module cap
+  - `token_ceiling_exceeded` — total tokens exceeded the configurable ceiling (default: 50,000)
+- `total_tokens` — total token count of composed prompt
+- `prompt_hash` — SHA-256 hash of the composed prompt for audit traceability
 ## Current index parser roster