@wazir-dev/cli 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/CHANGELOG.md +74 -10
  2. package/README.md +15 -15
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/roles-and-workflows.md +2 -0
  9. package/docs/concepts/why-wazir.md +59 -0
  10. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  11. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  12. package/docs/readmes/INDEX.md +21 -5
  13. package/docs/readmes/features/expertise/README.md +2 -2
  14. package/docs/readmes/features/exports/README.md +2 -2
  15. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  16. package/docs/readmes/features/schemas/README.md +3 -0
  17. package/docs/readmes/features/skills/README.md +17 -0
  18. package/docs/readmes/features/skills/clarifier.md +5 -0
  19. package/docs/readmes/features/skills/claude-cli.md +5 -0
  20. package/docs/readmes/features/skills/codex-cli.md +5 -0
  21. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  22. package/docs/readmes/features/skills/executing-plans.md +5 -0
  23. package/docs/readmes/features/skills/executor.md +5 -0
  24. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  25. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  26. package/docs/readmes/features/skills/humanize.md +5 -0
  27. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  28. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  29. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  30. package/docs/readmes/features/skills/reviewer.md +5 -0
  31. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  32. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  33. package/docs/readmes/features/skills/wazir.md +5 -0
  34. package/docs/readmes/features/skills/writing-skills.md +5 -0
  35. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  36. package/docs/reference/configuration-reference.md +47 -6
  37. package/docs/reference/hooks.md +1 -0
  38. package/docs/reference/launch-checklist.md +4 -4
  39. package/docs/reference/review-loop-pattern.md +119 -9
  40. package/docs/reference/roles-reference.md +1 -0
  41. package/docs/reference/skill-tiers.md +147 -0
  42. package/docs/reference/tooling-cli.md +3 -1
  43. package/docs/truth-claims.yaml +12 -0
  44. package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
  45. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  46. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  47. package/exports/hosts/claude/.claude/settings.json +9 -0
  48. package/exports/hosts/claude/CLAUDE.md +1 -1
  49. package/exports/hosts/claude/export.manifest.json +6 -4
  50. package/exports/hosts/claude/host-package.json +3 -1
  51. package/exports/hosts/codex/AGENTS.md +1 -1
  52. package/exports/hosts/codex/export.manifest.json +6 -4
  53. package/exports/hosts/codex/host-package.json +3 -1
  54. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  55. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  56. package/exports/hosts/cursor/export.manifest.json +6 -4
  57. package/exports/hosts/cursor/host-package.json +3 -1
  58. package/exports/hosts/gemini/GEMINI.md +1 -1
  59. package/exports/hosts/gemini/export.manifest.json +6 -4
  60. package/exports/hosts/gemini/host-package.json +3 -1
  61. package/hooks/context-mode-router +191 -0
  62. package/hooks/definitions/context_mode_router.yaml +19 -0
  63. package/hooks/hooks.json +31 -6
  64. package/hooks/protected-path-write-guard +8 -0
  65. package/hooks/routing-matrix.json +45 -0
  66. package/hooks/session-start +62 -1
  67. package/llms-full.txt +937 -134
  68. package/package.json +2 -4
  69. package/schemas/hook.schema.json +2 -1
  70. package/schemas/phase-report.schema.json +89 -0
  71. package/schemas/usage.schema.json +25 -1
  72. package/schemas/wazir-manifest.schema.json +19 -0
  73. package/skills/brainstorming/SKILL.md +32 -157
  74. package/skills/clarifier/SKILL.md +289 -111
  75. package/skills/claude-cli/SKILL.md +320 -0
  76. package/skills/codex-cli/SKILL.md +260 -0
  77. package/skills/debugging/SKILL.md +13 -0
  78. package/skills/design/SKILL.md +13 -0
  79. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  80. package/skills/executing-plans/SKILL.md +13 -0
  81. package/skills/executor/SKILL.md +139 -19
  82. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  83. package/skills/gemini-cli/SKILL.md +260 -0
  84. package/skills/humanize/SKILL.md +13 -0
  85. package/skills/init-pipeline/SKILL.md +72 -164
  86. package/skills/prepare-next/SKILL.md +81 -10
  87. package/skills/receiving-code-review/SKILL.md +13 -0
  88. package/skills/requesting-code-review/SKILL.md +13 -0
  89. package/skills/reviewer/SKILL.md +369 -24
  90. package/skills/run-audit/SKILL.md +13 -0
  91. package/skills/scan-project/SKILL.md +13 -0
  92. package/skills/self-audit/SKILL.md +217 -16
  93. package/skills/skill-research/SKILL.md +188 -0
  94. package/skills/subagent-driven-development/SKILL.md +13 -0
  95. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  96. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  97. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  98. package/skills/tdd/SKILL.md +13 -0
  99. package/skills/using-git-worktrees/SKILL.md +13 -0
  100. package/skills/using-skills/SKILL.md +13 -0
  101. package/skills/verification/SKILL.md +54 -3
  102. package/skills/wazir/SKILL.md +464 -381
  103. package/skills/writing-plans/SKILL.md +14 -1
  104. package/skills/writing-skills/SKILL.md +13 -0
  105. package/templates/artifacts/implementation-plan.md +3 -0
  106. package/templates/artifacts/tasks-template.md +133 -0
  107. package/templates/examples/phase-report.example.json +48 -0
  108. package/tooling/src/adapters/composition-engine.js +256 -0
  109. package/tooling/src/adapters/model-router.js +84 -0
  110. package/tooling/src/capture/command.js +41 -2
  111. package/tooling/src/capture/run-config.js +3 -1
  112. package/tooling/src/capture/store.js +56 -0
  113. package/tooling/src/capture/usage.js +106 -0
  114. package/tooling/src/capture/user-input.js +66 -0
  115. package/tooling/src/checks/ac-matrix.js +256 -0
  116. package/tooling/src/checks/command-registry.js +12 -0
  117. package/tooling/src/checks/docs-truth.js +1 -1
  118. package/tooling/src/checks/security-sensitivity.js +69 -0
  119. package/tooling/src/checks/skills.js +111 -0
  120. package/tooling/src/cli.js +31 -20
  121. package/tooling/src/commands/stats.js +161 -0
  122. package/tooling/src/commands/validate.js +5 -1
  123. package/tooling/src/export/compiler.js +33 -37
  124. package/tooling/src/gating/agent.js +145 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
  126. package/tooling/src/hooks/routing-logic.js +69 -0
  127. package/tooling/src/init/auto-detect.js +258 -0
  128. package/tooling/src/init/command.js +38 -170
  129. package/tooling/src/input/scanner.js +46 -0
  130. package/tooling/src/reports/command.js +103 -0
  131. package/tooling/src/reports/phase-report.js +323 -0
  132. package/tooling/src/state/command.js +160 -0
  133. package/tooling/src/state/db.js +287 -0
  134. package/tooling/src/status/command.js +58 -1
  135. package/tooling/src/verify/proof-collector.js +299 -0
  136. package/wazir.manifest.yaml +26 -14
  137. package/workflows/plan-review.md +3 -1
  138. package/workflows/verify.md +30 -1
@@ -5,10 +5,40 @@ description: Run the review phase — adversarial review of implementation again
5
5
 
6
6
  # Reviewer
7
7
 
8
- Run Phase 3 (Review) for the current project.
8
+ ## Model Annotation
9
+ When multi-model mode is enabled:
10
+ - **Sonnet** for internal review passes (internal-review)
11
+ - **Opus** for final review mode (final-review)
12
+ - **Opus** for spec-challenge mode (spec-harden)
13
+ - **Opus** for design-review mode (design)
14
+
15
+ ## Command Routing
16
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
17
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
18
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
19
+ - If context-mode unavailable, fall back to native Bash with warning
20
+
21
+ ## Codebase Exploration
22
+ 1. Query `wazir index search-symbols <query>` first
23
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
24
+ 3. Fall back to direct file reads ONLY for files identified by index queries
25
+ 4. Maximum 10 direct file reads without a justifying index query
26
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
27
+
28
+ Run the Final Review phase — or any review mode invoked by other phases.
9
29
 
10
30
  The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
11
31
 
32
+ **Key principle for `final` mode:** Compare implementation against the **ORIGINAL INPUT** (briefing + input files), NOT the task specs. The executor's per-task reviewer already validated against task specs — that concern is covered. The final reviewer catches drift: does what we built match what the user actually asked for?
33
+
34
+ **Reviewer-owned responsibilities** (callers must NOT replicate these):
35
+ 1. **Two-tier review** — internal review first (fast, cheap, expertise-loaded), Codex second (fresh eyes on clean code)
36
+ 2. **Dimension selection** — the reviewer selects the correct dimension set for the review mode and depth
37
+ 3. **Pass counting** — the reviewer tracks pass numbers and enforces the depth-based cap (quick=3, standard=5, deep=7)
38
+ 4. **Finding attribution** — each finding is tagged `[Internal]`, `[Codex]`, or `[Both]` based on source
39
+ 5. **Dimension set recording** — each review pass file records which canonical dimension set was used, enabling Phase Scoring (first vs final delta)
40
+ 6. **Learning pipeline** — ALL findings (internal + Codex) feed into `state.sqlite` and the learning system
41
+
12
42
  ## Review Modes
13
43
 
14
44
  The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
@@ -18,7 +48,7 @@ The reviewer operates in different modes depending on the phase. Mode MUST be pa
18
48
  | `final` | After execution + verification | Completed task artifacts, approved spec/plan/design | 7 final-review dims, scored 0-70 | Scored verdict (PASS/FAIL) |
19
49
  | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Pass/fix loop, no score |
20
50
  | `design-review` | After design approval | Design artifact, approved spec | 5 design-review dims (canonical) | Pass/fix loop, no score |
21
- | `plan-review` | After planning | Draft plan artifact | 7 plan dims | Pass/fix loop, no score |
51
+ | `plan-review` | After planning | Draft plan artifact | 8 plan dims (7 + input coverage) | Pass/fix loop, no score |
22
52
  | `task-review` | During execution, per task | Uncommitted changes or `--base` SHA | 5 task-execution dims (correctness, tests, wiring, drift, quality) | Pass/fix loop, no score |
23
53
  | `research-review` | During discover | Research artifact | 5 research dims | Pass/fix loop, no score |
24
54
  | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Pass/fix loop, no score |
@@ -34,6 +64,23 @@ In `task-review` and `final` modes, flag missing CHANGELOG entries for user-faci
34
64
  Prerequisites depend on the review mode:
35
65
 
36
66
  ### `final` mode
67
+
68
+ **Phase Prerequisites (Hard Gate):** Before proceeding, verify ALL of these artifacts exist. If ANY is missing, **STOP** and report which are missing.
69
+
70
+ - [ ] `.wazir/runs/latest/clarified/clarification.md`
71
+ - [ ] `.wazir/runs/latest/clarified/spec-hardened.md`
72
+ - [ ] `.wazir/runs/latest/clarified/design.md`
73
+ - [ ] `.wazir/runs/latest/clarified/execution-plan.md`
74
+ - [ ] `.wazir/runs/latest/artifacts/verification-proof.md`
75
+
76
+ If any file is missing:
77
+
78
+ > **Cannot run final review: missing prerequisite artifacts.**
79
+ >
80
+ > Missing: [list missing files]
81
+ >
82
+ > Run `/wazir:clarifier` (for clarified/* files) or `/wazir:executor` (for verification-proof.md) first.
83
+
37
84
  1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
38
85
  2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
39
86
  3. Read `.wazir/state/config.json` for depth and multi_tool settings.
@@ -41,22 +88,42 @@ Prerequisites depend on the review mode:
41
88
  ### `task-review` mode
42
89
  1. Uncommitted changes exist for the current task, or a `--base` SHA is provided for committed changes.
43
90
  2. Read `.wazir/state/config.json` for depth and multi_tool settings.
91
+ 3. **Commit discipline check:** If uncommitted changes span work from multiple tasks (e.g., files from task N and task N+1 are both modified), REJECT immediately: "REJECTED: Multiple tasks in single commit. Split into per-task commits before review." This is a blocking finding — no other dimensions are evaluated until resolved.
92
+ 4. **Security sensitivity check:** Run `detectSecurityPatterns` from `tooling/src/checks/security-sensitivity.js` against the diff. If `triggered === true`, add the 6 security review dimensions (injection, auth bypass, data exposure, CSRF/SSRF, XSS, secrets leakage) to the standard 5 task-execution dimensions for this review pass. Security findings use severity levels: critical (exploitable), high (likely exploitable), medium (defense-in-depth gap), low (best-practice deviation).
44
93
 
45
94
  ### `spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes
46
95
  1. The appropriate input artifact for the mode exists.
47
96
  2. Read `.wazir/state/config.json` for depth and multi_tool settings.
97
+ 3. **`plan-review` additional dimension — Input Coverage:**
98
+ - Read the original input/briefing from `.wazir/input/briefing.md` and any `input/*.md` files
99
+ - Count distinct items/requirements in the input
100
+ - Count tasks in the execution plan
101
+ - If `tasks_in_plan < items_in_input` → **HIGH** finding: "Plan covers [N] of [M] input items. Missing: [list of uncovered items]"
102
+ - If `tasks_in_plan >= items_in_input` → dimension passes
103
+ - One task MAY cover multiple input items if justified in the task description
104
+ - This is the review-level enforcement of the "no scope reduction" rule
48
105
 
49
106
  ## Review Process (`final` mode)
50
107
 
108
+ **Before starting this phase, output to the user:**
109
+
110
+ > **Final Review** — About to run adversarial 7-dimension review comparing your implementation against the original input, not just the task specs. The executor's per-task reviewer already validated correctness per-task — this catches drift between what you asked for and what was actually built.
111
+ >
112
+ > **Why this matters:** Without this, implementation drift ships undetected. Per-task review confirms each task matches its spec, but cannot catch: tasks that collectively miss the original intent, scope creep that added unrequested features, or acceptance criteria that were rewritten to match implementation instead of input.
113
+ >
114
+ > **Looking for:** Logic errors, missing features, dead code, unsubstantiated "it works" claims, scope creep, security gaps, stale documentation
115
+
116
+ **Input:** Read the ORIGINAL user input (`.wazir/input/briefing.md`, `input/` directory files) and compare against what was built. This catches intent drift that task-level review misses.
117
+
51
118
  Perform adversarial review across 7 dimensions:
52
119
 
53
- 1. **Correctness** — Does the code do what the spec says?
54
- 2. **Completeness** — Are all acceptance criteria met?
55
- 3. **Wiring** — Are all paths connected end-to-end?
56
- 4. **Verification** — Is there evidence (tests, type checks) for each claim?
57
- 5. **Drift** — Does the implementation match the approved plan?
58
- 6. **Quality** — Code style, naming, error handling, security
59
- 7. **Documentation** — Changelog entries, commit messages, comments
120
+ 1. **Correctness** — Does the code do what the original input asked for? *(catches: logic errors, wrong behavior, spec violations)*
121
+ 2. **Completeness** — Are all requirements from the original input met? *(catches: missing features, unimplemented acceptance criteria, partially delivered items)*
122
+ 3. **Wiring** — Are all paths connected end-to-end? *(catches: dead code, disconnected paths, missing imports, orphaned routes)*
123
+ 4. **Verification** — Is there evidence (tests, type checks) for each claim? *(catches: false claims of "it works" without evidence, untested code paths, missing type coverage)*
124
+ 5. **Drift** — Does the implementation match what the user originally requested? (not just the plan — the INPUT) *(catches: scope creep, plan deviations, unauthorized changes, gold-plating)*
125
+ 6. **Quality** — Code style, naming, error handling, security *(catches: security vulnerabilities, poor error handling, inconsistent naming, missing input validation)*
126
+ 7. **Documentation** — Changelog entries, commit messages, comments *(catches: missing changelogs, wrong commit messages, stale comments, undocumented breaking changes)*
60
127
 
61
128
  ## Context Retrieval
62
129
 
@@ -76,11 +143,28 @@ Score each dimension 0-10. Total out of 70.
76
143
  | **NEEDS REWORK** | 28-41 | Re-run affected tasks |
77
144
  | **FAIL** | 0-27 | Fundamental issues |
78
145
 
79
- ## Secondary Review
146
+ ## Two-Tier Review Flow
147
+
148
+ The review process has two tiers. Internal review catches ~80% of issues quickly and cheaply. Codex review provides fresh eyes on clean code.
149
+
150
+ ### Tier 1: Internal Review (Fast, Cheap, Expertise-Loaded)
151
+
152
+ 1. **Compose expertise:** Load relevant expertise modules from `expertise/composition-map.yaml` into context based on the review mode and detected stack. This gives the internal reviewer domain-specific knowledge.
153
+ 2. **Run internal review** using the dimension set for the current mode. When multi-model is enabled, use **Sonnet** (not Opus) for internal review passes — it's fast and good enough for pattern matching against expertise.
154
+ 3. **Produce findings:** Each finding is tagged `[Internal]` with severity (blocking, warning, note).
155
+ 4. **Fix cycle:** If blocking findings exist, the executor fixes them. Re-run internal review. Repeat until clean or cap reached.
156
+
157
+ Internal review passes are logged to `.wazir/runs/latest/reviews/<mode>-internal-pass-<N>.md`.
158
+
159
+ ### Tier 2: External Review (Fresh Eyes on Clean Code)
160
+
161
+ Only runs AFTER Tier 1 produces a clean pass (no blocking findings).
80
162
 
81
- Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers, run them **after** your own review and **before** producing the final verdict.
163
+ Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers:
82
164
 
83
- ### Codex Review
165
+ #### Codex Review
166
+
167
+ **For detailed Codex CLI usage, see `wz:codex-cli` skill.**
84
168
 
85
169
  If `codex` is in `multi_tool.tools`:
86
170
 
@@ -101,10 +185,10 @@ If `codex` is in `multi_tool.tools`:
101
185
  2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
102
186
  ```
103
187
 
104
- 2. Read the Codex findings from `.wazir/runs/latest/reviews/codex-review.md`
105
- 3. Incorporate Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
188
+ 2. **Extract findings only** (context protection): After tee, use `execute_file` to extract only the final findings from the Codex output (everything after the last `codex` marker). If context-mode is unavailable, use `tac <file> | sed '/^codex$/q' | tac | tail -n +2`. If no marker found, fail closed (0 findings, warn user). See `docs/reference/review-loop-pattern.md` "Codex Output Context Protection" for full protocol.
189
+ 3. Incorporate extracted Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
106
190
 
107
- **Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use self-review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
191
+ **Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use internal review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
108
192
 
109
193
  **Code review scoping by mode:**
110
194
  - Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
@@ -112,16 +196,69 @@ If `codex` is in `multi_tool.tools`:
112
196
  - Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
113
197
  - See `docs/reference/review-loop-pattern.md` for code review scoping rules.
114
198
 
115
- ### Gemini Review
199
+ #### Gemini Review
200
+
201
+ If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available. **For detailed Gemini CLI usage, see `wz:gemini-cli` skill.**
202
+
203
+ ### Fix Cycle (Codex Findings)
116
204
 
117
- If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available.
205
+ If Codex produces blocking findings:
206
+ 1. Executor fixes the Codex findings
207
+ 2. Re-run internal review (quick pass) to verify fixes didn't introduce regressions
208
+ 3. Optionally re-run Codex for a clean pass
118
209
 
119
210
  ### Merging Findings
120
211
 
121
212
  The final review report must clearly attribute each finding:
122
- - `[Wazir]` — found by primary review
123
- - `[Codex]` — found by Codex secondary review
124
- - `[Both]` — found independently by both
213
+ - `[Internal]` — found by Tier 1 internal review
214
+ - `[Codex]` — found by Tier 2 Codex review
215
+ - `[Gemini]` — found by Tier 2 Gemini review
216
+ - `[Both]` — found independently by multiple sources
217
+
218
+ ### Finding Persistence (Learning Pipeline)
219
+
220
+ ALL findings from both tiers are persisted to `state.sqlite` for cross-run learning:
221
+
222
+ ```javascript
223
+ // After each review pass
224
+ const { insertFinding, getRecurringFindingHashes } = require('tooling/src/state/db');
225
+ const db = openStateDb(stateRoot);
226
+
227
+ for (const finding of allFindings) {
228
+ insertFinding(db, {
229
+ run_id: runId,
230
+ phase: reviewMode,
231
+ source: finding.attribution, // 'internal', 'codex', 'gemini'
232
+ severity: finding.severity,
233
+ description: finding.description,
234
+ finding_hash: hashFinding(finding.description),
235
+ });
236
+ }
237
+
238
+ // Check for recurring patterns
239
+ const recurring = getRecurringFindingHashes(db, 2);
240
+ // Recurring findings → auto-propose as learnings in the learn phase
241
+ ```
242
+
243
+ This is how Wazir evolves — findings that recur across runs become accepted learnings injected into future executor context, preventing the same mistakes.
244
+
245
+ ## Interaction Mode Awareness
246
+
247
+ Read `interaction_mode` from run-config:
248
+
249
+ - **`auto`:** No user checkpoints. Present verdict and let gating agent decide. On escalation, write reason and STOP.
250
+ - **`guided`:** Standard behavior — present verdict, ask user how to proceed.
251
+ - **`interactive`:** Discuss findings with user: "I found a potential auth bypass in `src/auth.js:42` — here's why I rated it high severity. Do you agree, or is there context I'm missing?" Show detailed reasoning for each dimension score.
252
+
253
+ ## CLI/Context-Mode Enforcement
254
+
255
+ In ALL review modes, check for these violations:
256
+
257
+ 1. **Index usage enforcement:** If the agent performed >5 direct file reads (Read tool) without a preceding `wazir index search-symbols` query, flag as **[warning]** finding: "Agent performed [N] direct file reads without using wazir index. Use `wazir index search-symbols <query>` before reading files to reduce context consumption."
258
+
259
+ 2. **Context-mode enforcement:** If the agent ran a large-category command (test runners, builds, diffs, dependency trees, linting — as classified by `hooks/routing-matrix.json`) using native Bash instead of context-mode tools (when context-mode is available), flag as **[warning]** finding: "Large command `[cmd]` run without context-mode. Route through `mcp__plugin_context-mode_context-mode__execute` to reduce context usage."
260
+
261
+ These are warnings, not blocking findings — they improve efficiency but don't affect correctness.
125
262
 
126
263
  ## Task-Review Log Filenames
127
264
 
@@ -137,15 +274,223 @@ Save review results to `.wazir/runs/latest/reviews/review.md` with:
137
274
  - Score breakdown
138
275
  - Verdict
139
276
 
277
+ Run the phase report and display it to the user:
278
+ ```bash
279
+ wazir report phase --run <run-id> --phase <review-mode>
280
+ ```
281
+
282
+ Output the report content to the user in the conversation.
283
+
284
+ ## Phase Report Generation
285
+
286
+ After completing any review pass, generate a phase report following `schemas/phase-report.schema.json`:
287
+
288
+ 1. **`attempted_actions`** — Populate from the review findings. Each finding becomes an action entry:
289
+ - `description`: the finding summary
290
+ - `outcome`: `"success"` if the finding passed, `"fail"` if it is a blocking issue, `"uncertain"` if ambiguous
291
+ - `evidence`: the rationale or evidence supporting the outcome
292
+
293
+ 2. **`drift_analysis`** — Compare review findings against the approved spec:
294
+ - `delta`: count of deviations between implementation and spec (0 = no drift)
295
+ - `description`: summary of any drift detected and its impact
296
+
297
+ 3. **`quality_metrics`** — Populate from test, lint, and type-check results gathered during review:
298
+ - `test_pass_count`, `test_fail_count`: from test runner output
299
+ - `lint_errors`: from linter output
300
+ - `type_errors`: from type checker output
301
+
302
+ 4. **`risk_flags`** — Populate from any high-severity findings:
303
+ - `severity`: `"low"`, `"medium"`, or `"high"`
304
+ - `description`: what the risk is
305
+ - `mitigation`: recommended mitigation (if known)
306
+
307
+ 5. **`decisions`** — Populate from any scope or approach decisions made during the review:
308
+ - `description`: what was decided
309
+ - `rationale`: why
310
+ - `alternatives_considered`: other options evaluated (optional)
311
+ - `source`: `"[Wazir]"`, `"[Codex]"`, or `"[Both]"` (optional)
312
+
313
+ 6. **`verdict_recommendation`** — Set based on the gating rules in `config/gating-rules.yaml`:
314
+ - `verdict`: `"continue"` (PASS), `"loop_back"` (NEEDS MINOR FIXES / NEEDS REWORK), or `"escalate"` (FAIL with fundamental issues)
315
+ - `reasoning`: brief explanation of why this verdict was chosen
316
+
317
+ ### Report Output Paths
318
+
319
+ Save reports to two formats under the run directory:
320
+ - `.wazir/runs/<id>/reports/phase-<name>-report.json` — machine-readable, validated against `schemas/phase-report.schema.json`
321
+ - `.wazir/runs/<id>/reports/phase-<name>-report.md` — human-readable Markdown summary
322
+
323
+ The gating agent (`tooling/src/gating/agent.js`) consumes the JSON report to decide: **continue**, **loop_back**, or **escalate**.
324
+
325
+ ### Report Fields Reference
326
+
327
+ All required fields per `schemas/phase-report.schema.json`:
328
+
329
+ | Field | Type | Required | Description |
330
+ |-------|------|----------|-------------|
331
+ | `phase_name` | string | yes | Review mode name (e.g., `"final"`, `"task-review"`) |
332
+ | `run_id` | string | yes | Current run identifier |
333
+ | `timestamp` | string (date-time) | yes | ISO 8601 timestamp of report generation |
334
+ | `attempted_actions` | array | yes | Findings mapped to action outcomes |
335
+ | `drift_analysis` | object | yes | Spec-vs-implementation drift summary |
336
+ | `quality_metrics` | object | yes | Test/lint/type results |
337
+ | `risk_flags` | array | yes | High-severity risk items |
338
+ | `decisions` | array | yes | Scope/approach decisions made |
339
+ | `verdict_recommendation` | object | no | Gating verdict based on `config/gating-rules.yaml` |
340
+
341
+ ## Post-Review: Learn (final mode only)
342
+
343
+ After the final review verdict, extract durable learnings using the **learner role** (`roles/learner.md`).
344
+
345
+ ### Step 1: Gather all findings
346
+
347
+ Collect review findings from ALL sources in this run:
348
+ - `.wazir/runs/<run-id>/reviews/` — all review pass logs (task-review, final review)
349
+ - Codex findings (attributed `[Codex]` or `[Both]`)
350
+ - Self-audit findings (if `run_audit` was enabled)
351
+
352
+ ### Step 2: Identify learning candidates
353
+
354
+ A finding becomes a learning candidate if:
355
+ - It recurred across 2+ review passes within this run (same issue found repeatedly)
356
+ - It matches a finding from a prior run (check `memory/learnings/proposed/` and `accepted/` for similar patterns)
357
+ - It represents a class of mistake, not just a single instance (e.g., "missing error handling in async functions" vs "missing try-catch on line 42")
358
+
359
+ ### Step 3: Write learning proposals
360
+
361
+ For each candidate, write a proposal to `memory/learnings/proposed/<run-id>-<NNN>.md`:
362
+
363
+ ```markdown
364
+ ---
365
+ artifact_type: proposed_learning
366
+ phase: learn
367
+ role: learner
368
+ run_id: <run-id>
369
+ status: proposed
370
+ sources:
371
+ - <review-file-1>
372
+ - <review-file-2>
373
+ approval_status: required
374
+ ---
375
+
376
+ # Proposed Learning: <title>
377
+
378
+ ## Scope
379
+ - **Roles:** [which roles should receive this learning — e.g., executor, reviewer]
380
+ - **Stacks:** [which tech stacks — e.g., node, react, or "all"]
381
+ - **Concerns:** [which concerns — e.g., error-handling, testing, security]
382
+
383
+ ## Evidence
384
+ - [finding from review pass N: description]
385
+ - [finding from review pass M: same pattern]
386
+ - [optional: similar finding from prior run <run-id>]
387
+
388
+ ## Learning
389
+ [The concrete, actionable instruction that should be injected into future executor context]
390
+
391
+ ## Expected Benefit
392
+ [What this prevents in future runs]
393
+
394
+ ## Confidence
395
+ - **Level:** low | medium | high
396
+ - **Basis:** [single run observation | multi-run recurrence | user correction]
397
+ ```
398
+
399
+ ### Step 4: Report
400
+
401
+ Present proposed learnings to the user:
402
+
403
+ > **Learnings proposed:** [count]
404
+ > - [title 1] (confidence: high, scope: executor/node)
405
+ > - [title 2] (confidence: medium, scope: reviewer/all)
406
+ >
407
+ > Proposals saved to `memory/learnings/proposed/`. Review and accept with `/wazir audit learnings`.
408
+
409
+ Learnings are NEVER auto-applied. They require explicit user acceptance before being injected into future runs.
410
+
411
+ ## Post-Review: Prepare Next (final mode only)
412
+
413
+ After learning extraction, invoke the `prepare-next` skill to prepare the handoff:
414
+
415
+ ### Handoff document
416
+
417
+ Write to `.wazir/runs/<run-id>/handoff.md`:
418
+
419
+ ```markdown
420
+ # Handoff — <run-id>
421
+
422
+ **Status:** [Completed | Partial]
423
+ **Branch:** <branch-name>
424
+ **Date:** YYYY-MM-DD
425
+
426
+ ## What Was Done
427
+ [List of completed tasks with commit hashes]
428
+
429
+ ## Test Results
430
+ [Test count, pass/fail, validator status]
431
+
432
+ ## Review Score
433
+ [Final review verdict and score]
434
+
435
+ ## What's Next
436
+ [Pending items, deferred work, follow-up tasks]
437
+
438
+ ## Open Bugs
439
+ [Any known issues discovered during this run]
440
+
441
+ ## Learnings From This Run
442
+ [Key insights — what worked, what didn't, what to change]
443
+ ```
444
+
445
+ ### Cleanup
446
+
447
+ - Archive verbose intermediate review logs (compress to summary)
448
+ - Update `.wazir/runs/latest` symlink if creating a new run
449
+ - Do NOT mutate `input/` — it belongs to the user
450
+ - Do NOT auto-load proposed learnings into the next run
451
+
452
+ ## Reasoning Output
453
+
454
+ Throughout the reviewer phase, produce reasoning at two layers:
455
+
456
+ **Conversation (Layer 1):** Before each review pass, explain what dimensions are being checked and why. After findings, explain the reasoning behind severity assignments.
457
+
458
+ **File (Layer 2):** Write `.wazir/runs/<id>/reasoning/phase-reviewer-reasoning.md` with structured entries:
459
+ - **Trigger** — what prompted the finding (e.g., "diff adds SQL query without parameterization")
460
+ - **Options considered** — severity options, fix approaches
461
+ - **Chosen** — assigned severity and recommendation
462
+ - **Reasoning** — why this severity level
463
+ - **Confidence** — high/medium/low
464
+ - **Counterfactual** — what would ship if this finding were missed
465
+
466
+ Key reviewer reasoning moments: severity assignments, PASS/FAIL decisions, dimension score justifications, and escalation decisions.
467
+
140
468
  ## Done
141
469
 
470
+ **After completing this phase, output to the user:**
471
+
472
+ > **Final Review complete.**
473
+ >
474
+ > **Found:** [N] findings across 7 dimensions — [N] blocking, [N] warnings, [N] notes. Score: [score]/70 ([VERDICT]).
475
+ >
476
+ > **Without this phase:** [N] blocking issues would have shipped — including [specific examples: e.g., "missing error handler on /api/users endpoint", "auth middleware not wired to 3 routes", "CHANGELOG missing entry for breaking API change"]
477
+ >
478
+ > **Changed because of this work:** [List of issues caught and fixed during review passes, score improvement from first to final pass]
479
+
142
480
  Present the verdict and offer next steps:
143
481
 
144
482
  > **Review complete: [VERDICT] ([score]/70)**
145
483
  >
146
484
  > [Score breakdown and findings summary]
147
485
  >
148
- > **What would you like to do?**
149
- > 1. **Create a PR** (if PASS)
150
- > 2. **Auto-fix and re-review** (if MINOR FIXES)
151
- > 3. **Review findings in detail**
486
+ > **Learnings proposed:** [count] (see `memory/learnings/proposed/`)
487
+ > **Handoff:** `.wazir/runs/<run-id>/handoff.md`
488
+
489
+ Ask the user via AskUserQuestion:
490
+ - **Question:** "How would you like to proceed with the review results?"
491
+ - **Options:**
492
+ 1. "Create a PR" *(Recommended if PASS)*
493
+ 2. "Auto-fix and re-review" *(Recommended if MINOR FIXES)*
494
+ 3. "Review findings in detail"
495
+
496
+ Wait for the user's selection before continuing.
@@ -5,6 +5,19 @@ description: Run a structured audit on your codebase — security, code quality,
5
5
 
6
6
  # Run Audit — Structured Codebase Audit Pipeline
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  ## Overview
9
22
 
10
23
  This skill runs a structured audit on your codebase. It collects three parameters interactively (audit type, scope, output mode), then feeds them through the pipeline: Research → Audit → Report or Plan.
@@ -5,6 +5,19 @@ description: Build a project profile from manifests, docs, tests, and `input/` s
5
5
 
6
6
  # Scan Project
7
7
 
8
+ ## Command Routing
9
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
10
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
11
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
12
+ - If context-mode unavailable, fall back to native Bash with warning
13
+
14
+ ## Codebase Exploration
15
+ 1. Query `wazir index search-symbols <query>` first
16
+ 2. Use `wazir recall file <path> --tier L1` for targeted reads
17
+ 3. Fall back to direct file reads ONLY for files identified by index queries
18
+ 4. Maximum 10 direct file reads without a justifying index query
19
+ 5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
+
8
21
  Inspect the smallest set of repo surfaces needed to answer:
9
22
 
10
23
  - what kind of project this is