@wazir-dev/cli 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/CHANGELOG.md +31 -2
  2. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
  3. package/docs/reference/review-loop-pattern.md +429 -0
  4. package/docs/reference/tooling-cli.md +2 -0
  5. package/docs/truth-claims.yaml +6 -0
  6. package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
  7. package/exports/hosts/claude/.claude/agents/designer.md +3 -0
  8. package/exports/hosts/claude/.claude/agents/executor.md +2 -0
  9. package/exports/hosts/claude/.claude/agents/planner.md +3 -0
  10. package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
  11. package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
  12. package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
  13. package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
  14. package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
  15. package/exports/hosts/claude/.claude/commands/design.md +4 -0
  16. package/exports/hosts/claude/.claude/commands/discover.md +4 -0
  17. package/exports/hosts/claude/.claude/commands/execute.md +4 -0
  18. package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
  19. package/exports/hosts/claude/.claude/commands/plan.md +4 -0
  20. package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
  21. package/exports/hosts/claude/.claude/commands/specify.md +4 -0
  22. package/exports/hosts/claude/.claude/commands/verify.md +4 -0
  23. package/exports/hosts/claude/export.manifest.json +19 -19
  24. package/exports/hosts/codex/export.manifest.json +19 -19
  25. package/exports/hosts/cursor/export.manifest.json +19 -19
  26. package/exports/hosts/gemini/export.manifest.json +19 -19
  27. package/hooks/definitions/loop_cap_guard.yaml +1 -1
  28. package/hooks/hooks.json +18 -0
  29. package/package.json +3 -2
  30. package/roles/clarifier.md +3 -0
  31. package/roles/designer.md +3 -0
  32. package/roles/executor.md +2 -0
  33. package/roles/planner.md +3 -0
  34. package/roles/researcher.md +2 -0
  35. package/roles/reviewer.md +5 -1
  36. package/roles/specifier.md +3 -0
  37. package/skills/brainstorming/SKILL.md +139 -38
  38. package/skills/clarifier/SKILL.md +219 -0
  39. package/skills/debugging/SKILL.md +11 -1
  40. package/skills/executing-plans/SKILL.md +15 -2
  41. package/skills/executor/SKILL.md +76 -0
  42. package/skills/init-pipeline/SKILL.md +106 -17
  43. package/skills/receiving-code-review/SKILL.md +8 -0
  44. package/skills/requesting-code-review/SKILL.md +25 -5
  45. package/skills/reviewer/SKILL.md +151 -0
  46. package/skills/subagent-driven-development/SKILL.md +25 -2
  47. package/skills/tdd/SKILL.md +8 -0
  48. package/skills/wazir/SKILL.md +250 -43
  49. package/skills/writing-plans/SKILL.md +31 -4
  50. package/templates/examples/wazir-manifest.example.yaml +1 -1
  51. package/tooling/src/capture/command.js +87 -1
  52. package/tooling/src/capture/run-config.js +21 -0
  53. package/tooling/src/checks/brand-truth.js +3 -6
  54. package/tooling/src/checks/command-registry.js +1 -0
  55. package/tooling/src/checks/docs-truth.js +1 -1
  56. package/tooling/src/checks/runtime-surface.js +3 -7
  57. package/tooling/src/cli.js +8 -3
  58. package/tooling/src/init/command.js +201 -0
  59. package/wazir.manifest.yaml +0 -3
  60. package/workflows/clarify.md +4 -0
  61. package/workflows/design-review.md +4 -0
  62. package/workflows/design.md +4 -0
  63. package/workflows/discover.md +4 -0
  64. package/workflows/execute.md +4 -0
  65. package/workflows/plan-review.md +4 -0
  66. package/workflows/plan.md +4 -0
  67. package/workflows/spec-challenge.md +4 -0
  68. package/workflows/specify.md +4 -0
  69. package/workflows/verify.md +4 -0
package/CHANGELOG.md CHANGED
@@ -1,9 +1,15 @@
1
- # 1.0.0 (2026-03-17)
1
+ # [1.1.0](https://github.com/MohamedAbdallah-14/Wazir/compare/v1.0.0...v1.1.0) (2026-03-18)
2
+
3
+
4
+ ### Bug Fixes
5
+
6
+ * address review findings — tests, Codex wiring, Teams, pipeline CLI integration ([0b03215](https://github.com/MohamedAbdallah-14/Wazir/commit/0b032150c4a7967ba070eccdced513f55343fc65))
7
+ * CI changelog gate + CodeRabbit review findings ([0247941](https://github.com/MohamedAbdallah-14/Wazir/commit/024794136b7a44116ef2c4f5fcc23823bc72e7fc))
2
8
 
3
9
 
4
10
  ### Features
5
11
 
6
- * Wazir v0.1.0 - Engineering with itqan ([d9a5c1b](https://github.com/MohamedAbdallah-14/Wazir/commit/d9a5c1bf1ffe615f67d55181458ead68e5cf7ecf))
12
+ * add core review loop pattern across all pipeline phases ([aa4c1d8](https://github.com/MohamedAbdallah-14/Wazir/commit/aa4c1d8400e69ab4fe943043705a862f9e5861f3))
7
13
 
8
14
  # Changelog
9
15
 
@@ -12,3 +18,26 @@ All notable changes to this project will be documented in this file.
12
18
  The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/).
13
19
 
14
20
  ## [Unreleased]
21
+
22
+ ### Added
23
+ - Core review loop pattern across all pipeline phases with Codex CLI integration
24
+ - `wazir capture loop-check` CLI subcommand with task-scoped cap tracking and run-config loader
25
+ - `wazir init` interactive CLI command with arrow-key selection (depth, intent, teams, codex model)
26
+ - `docs/reference/review-loop-pattern.md` canonical reference for the review loop pattern
27
+ - Standalone skills: `/wazir:clarifier`, `/wazir:executor`, `/wazir:reviewer`
28
+ - Agent Teams real implementation in brainstorming (TeamCreate, SendMessage, TeamDelete)
29
+ - Codex prompt templates (artifact + code) with "Do NOT load skills" instruction
30
+ - Git branch enforcement in `/wazir` runner (validates branch, offers to create feature branch)
31
+ - CLI wiring across pipeline phases (doctor gate, index build/refresh, capture events, validate gates)
32
+ - CHANGELOG enforcement in executor and reviewer skills
33
+ - 10 new tests: 7 for handleLoopCheck, 4 for init command (406 total)
34
+
35
+ ### Changed
36
+ - All Codex CLI calls now read model from `config.multi_tool.codex.model` with fallback to `gpt-5.4`
37
+ - Producer-reviewer separation enforced: no role reviews its own output
38
+ - Reviewer skill is phase-aware with 7 explicit modes (final, spec-challenge, design-review, plan-review, task-review, research-review, clarification-review)
39
+ - Brainstorming design-review gate replaces direct handoff to writing-plans
40
+ - Clarifier delegates research to discover workflow, spec to specify workflow, planning to writing-plans
41
+ - `/wazir` runner pipeline rewritten with all manifest phases and review loops
42
+ - Wazir CLI is now required (removed "Skip" option)
43
+ - Fixed pass counts: quick=3, standard=5, deep=7 (no extension)
@@ -29,7 +29,7 @@ Before starting any implementation, verify all of the following:
29
29
  - [ ] **Node.js >= 20.0.0** installed
30
30
  - [ ] **`npm test` passes on the clean branch** with zero failures
31
31
  - [ ] **`wazir export --check` passes** on the clean branch (no pre-existing drift)
32
- - [ ] **All 13 task spec files reviewed** in `.agent-os/tasks/clarified/` (004-016)
32
+ - [ ] **All 13 task spec files reviewed** in `.wazir/tasks/clarified/` (004-016)
33
33
  - [ ] **`tooling/src/capture/command.js` imports confirmed:** `fs` (line 1) and `path` (line 2) are already imported -- no additional module imports needed for task 006
34
34
  - [ ] **`tooling/test/capture.test.js` fixture pattern confirmed:** `createCaptureFixture()` provides `fixtureRoot`, `stateRoot`, and `cleanup()` -- new tests must use unique run IDs
35
35
  - [ ] **`tooling/test/role-contracts.test.js` is in `test:active`** -- confirmed, so workflow and role structural tests can be added there without new test file registration
@@ -0,0 +1,429 @@
1
+ # Review Loop Pattern Reference
2
+
3
+ Canonical reference for the review loop pattern used across all Wazir pipeline phases. Skills and workflows link to this document rather than embedding loop logic inline.
4
+
5
+ ---
6
+
7
+ ## Core Principle: Producer-Reviewer Separation
8
+
9
+ The producer skill (clarifier, planner, designer, etc.) **emits** an artifact and calls for review. The **reviewer role** owns the review loop. The producer receives findings and resolves them. No role reviews its own output.
10
+
11
+ ```
12
+ Producer emits artifact
13
+ -> Reviewer runs review loop (N passes, Codex if available)
14
+ -> Findings returned to producer
15
+ -> Producer fixes and resubmits
16
+ -> Loop until all passes exhausted or cap reached
17
+ -> Escalate to user if cap exceeded
18
+ ```
19
+
20
+ When Codex is available, the reviewer role delegates to `codex review` as a secondary input while maintaining its own independent primary verdict.
21
+
22
+ ---
23
+
24
+ ## Per-Task Review vs Final Review
25
+
26
+ These are two structurally different constructs:
27
+
28
+ | | Per-Task Review | Final Review |
29
+ |---|---|---|
30
+ | **When** | During execution, after each task | After all execution + verification complete |
31
+ | **Dimensions** | 5 task-execution dims (correctness, tests, wiring, drift, quality) | 7 scored dims (correctness, completeness, wiring, verification, drift, quality, documentation) |
32
+ | **Scope** | Single task's uncommitted changes | Entire implementation vs spec/plan |
33
+ | **Output** | Pass/fix loop, no score | Scored verdict (0-70), PASS/FAIL |
34
+ | **Workflow** | Inline in execution flow | `workflows/review.md` |
35
+ | **Skill** | `wz:reviewer` in `task-review` mode | `wz:reviewer` in `final` mode |
36
+ | **Log filename** | `<phase>-task-<NNN>-review-pass-<N>.md` | `final-review.md` |
37
+
38
+ ---
39
+
40
+ ## Standalone Mode
41
+
42
+ When no `.wazir/runs/latest/` directory exists (standalone skill invocation outside a pipeline run):
43
+
44
+ 1. **Review loops still run** -- the review logic is embedded in the skill, not dependent on run state.
45
+ 2. **Artifact location** -- artifacts live in `docs/plans/`. This is the canonical standalone artifact path.
46
+ 3. **Review log location** -- review logs go alongside the artifact: `docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md`. No temp dir.
47
+ 4. **Loop cap is SKIPPED entirely** -- no `wazir capture loop-check` call. The loop runs for exactly `pass_counts[depth]` passes (3/5/7) and stops. No cap guard, no fallback constant.
48
+ 5. **`wazir capture loop-check`** -- not invoked in standalone mode. The standalone detection happens before the cap guard call.
49
+
50
+ Detection logic:
51
+
52
+ ```
53
+ if .wazir/runs/latest/ exists:
54
+ run_mode = "pipeline"
55
+ log_dir = .wazir/runs/latest/reviews/
56
+ cap_guard = wazir capture loop-check (full guard)
57
+ else:
58
+ run_mode = "standalone"
59
+ artifact_dir = docs/plans/
60
+ log_dir = docs/plans/ (alongside artifact)
61
+ cap_guard = none (depth pass count is the only limit)
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Review Loop Pseudocode
67
+
68
+ ```
69
+ review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
70
+
71
+ # options.mode -- explicit review mode (required)
72
+ # options.task_id -- task identifier for task-scoped reviews (optional)
73
+
74
+ # Standalone detection
75
+ run_mode = detect_run_mode() # "pipeline" or "standalone"
76
+
77
+ # Fixed pass counts -- no extension
78
+ pass_counts = { quick: 3, standard: 5, deep: 7 }
79
+ total_passes = pass_counts[depth]
80
+
81
+ # Depth-aware dimension subsets (coverage contract)
82
+ depth_dimensions = {
83
+ quick: dimensions[0:3], # first 3 dimensions only
84
+ standard: dimensions[0:5], # first 5
85
+ deep: dimensions, # all available
86
+ }
87
+ active_dims = depth_dimensions[depth]
88
+
89
+ codex_available = check_codex() # which codex && codex --version
90
+
91
+ for pass_number in 0..total_passes-1:
92
+
93
+ # --- Cap guard check (pipeline mode only, before each pass) ---
94
+ if run_mode == "pipeline":
95
+ loop_check_args = "--run <run-id> --phase <phase> --loop-count <pass_number+1>"
96
+ if options.task_id:
97
+ loop_check_args += " --task-id <task_id>"
98
+ wazir capture loop-check $loop_check_args
99
+ # loop-check wraps: event capture + evaluateLoopCapGuard
100
+ # If loop_cap_guard fires (exit 43), stop immediately:
101
+ if last_exit_code == 43:
102
+ log("Loop cap reached for phase: <phase>. Escalating to user.")
103
+ escalate_to_user(evidence_gathered_so_far)
104
+ return { pass_count: pass_number, escalated: true }
105
+ # Standalone mode: no cap guard. Loop runs for total_passes and stops.
106
+
107
+ dimension = active_dims[pass_number % len(active_dims)]
108
+
109
+ # --- Primary review (reviewer role, not producer) ---
110
+ # Mode is always explicit -- passed by caller via options.mode
111
+ findings = self_review(artifact_path, focus=dimension, mode=options.mode)
112
+
113
+ # --- Secondary review (Codex, if available) ---
114
+ if codex_available:
115
+ codex_exit_code, codex_output = run_codex_review(artifact_path, dimension)
116
+ if codex_exit_code != 0:
117
+ # Codex failed -- log error, fall back to self-review for this pass
118
+ log_error("Codex exited " + codex_exit_code + ": " + codex_output.stderr)
119
+ mark_pass_codex_unavailable(pass_number)
120
+ # Do NOT treat Codex failure as clean. Self-review findings stand alone.
121
+ else:
122
+ codex_findings = parse(codex_output.stdout)
123
+ merge(findings, codex_findings, preserve_attribution=true)
124
+
125
+ # --- Log the review pass ---
126
+ if run_mode == "pipeline":
127
+ if options.task_id:
128
+ log_path = .wazir/runs/latest/reviews/<phase>-task-<task_id>-review-pass-<N>.md
129
+ else:
130
+ log_path = .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
131
+ log(pass_number+1, dimension, findings) -> log_path
132
+ else:
133
+ log_path = docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md
134
+ log(pass_number+1, dimension, findings) -> log_path
135
+
136
+ if findings.has_issues:
137
+ # --- Fix inline, do NOT return ---
138
+ producer_fix(artifact_path, findings)
139
+ # Continue to next pass -- the fix will be re-reviewed
140
+
141
+ return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
142
+ ```
143
+
144
+ Key properties of this pseudocode:
145
+
146
+ 1. **Fixed pass counts** -- Quick is exactly 3, standard exactly 5, deep exactly 7. No `max_passes = min_passes + 3`. No clean-streak early-exit. No extension.
147
+ 2. **Task-scoped log filenames** -- `<phase>-task-<NNN>-review-pass-<N>.md` for per-task reviews, preventing log clobbering in parallel mode.
148
+ 3. **Task-scoped loop cap keys** -- `--task-id` flag on `loop-check` so each task gets its own counter in `phase_loop_counts`.
149
+ 4. **Explicit review mode** -- `options.mode` is always passed by the caller. No auto-detection.
150
+ 5. **Codex error handling** -- non-zero exit is logged, pass marked `codex-unavailable`, self-review findings used alone. Never treated as clean.
151
+ 6. **Standalone mode** -- uses `docs/plans/` for artifacts and logs. No temp dir. No cap guard at all.
152
+
153
+ ---
154
+
155
+ ## Codex Error Handling Contract
156
+
157
+ ```
158
+ run_codex_review(artifact_path, dimension):
159
+ CODEX_MODEL = read_config('.wazir/state/config.json', '.multi_tool.codex.model') or "gpt-5.4"
160
+
161
+ if is_code_artifact:
162
+ cmd = codex review -c model="$CODEX_MODEL" --uncommitted --title "..." "Review for [dimension]..."
163
+ # or: codex review -c model="$CODEX_MODEL" --base <sha> for committed changes
164
+ else:
165
+ cmd = cat <artifact_path> | codex exec -c model="$CODEX_MODEL" "Review this [type] for [dimension]..."
166
+
167
+ result = execute(cmd, timeout=120s, capture_stderr=true)
168
+
169
+ if result.exit_code != 0:
170
+ return (result.exit_code, { stderr: result.stderr, stdout: "" })
171
+ # Caller handles: log error, mark codex-unavailable, use self-review only
172
+
173
+ return (0, { stdout: result.stdout, stderr: result.stderr })
174
+ ```
175
+
176
+ Rules:
177
+
178
+ - If Codex exits non-zero, log the full stderr.
179
+ - Mark the pass as `codex-unavailable` in the review log metadata.
180
+ - Fall back to self-review for that pass only. Do not skip the pass.
181
+ - Do not retry Codex on the same pass. If Codex fails on pass 2, pass 3 still tries Codex (transient failures recover).
182
+ - Never treat a Codex failure as a clean review pass.
183
+
184
+ ---
185
+
186
+ ## Codex Availability Probe
187
+
188
+ Before any Codex call, verify availability once at loop start:
189
+
190
+ ```bash
191
+ which codex >/dev/null 2>&1 && codex --version >/dev/null 2>&1
192
+ ```
193
+
194
+ If the probe fails, set `codex_available = false` for the entire loop. Fall back to self-review only. Never error out.
195
+
196
+ Per-invocation failures (Codex available but a single call fails) are handled separately by the error contract above.
197
+
198
+ ---
199
+
200
+ ## Codex Artifact-Scoped Review
201
+
202
+ Never use `codex review` for non-code artifacts (specs, plans, designs). Instead, pipe the artifact content via stdin:
203
+
204
+ ```bash
205
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
206
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
207
+ cat .wazir/runs/latest/clarified/spec-hardened.md | \
208
+ codex exec -c model="$CODEX_MODEL" "Review this specification for: [dimension]. Be specific, cite sections. Say CLEAN if no issues." \
209
+ 2>&1 | tee .wazir/runs/latest/reviews/spec-challenge-review-pass-N.md
210
+ ```
211
+
212
+ For code artifacts, use `codex review -c model="$CODEX_MODEL" --uncommitted` (or `--base` for committed changes). See the next section for details.
213
+
214
+ ---
215
+
216
+ ## Code Review Scoping
217
+
218
+ **Rule: review BEFORE commit.**
219
+
220
+ For each task during execution:
221
+
222
+ 1. Implement the task (changes are uncommitted).
223
+ 2. Review the uncommitted changes using the **5 task-execution dimensions** (NOT the 7 final-review dimensions):
224
+ ```bash
225
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
226
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
227
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
228
+ "Review against acceptance criteria: <criteria>" \
229
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
230
+ ```
231
+ 3. Fix any findings (still uncommitted).
232
+ 4. Re-review until all passes exhausted or cap reached.
233
+ 5. **Only after review passes:** commit with conventional commit format.
234
+
235
+ **If changes are already committed** (e.g., subagent workflow where the implementer subagent commits before review):
236
+
237
+ ```bash
238
+ # Capture the SHA before the task starts
239
+ PRE_TASK_SHA=$(git rev-parse HEAD)
240
+
241
+ # ... subagent implements and commits ...
242
+
243
+ # Review the committed changes against the pre-task baseline
244
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
245
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
246
+ codex review -c model="$CODEX_MODEL" --base $PRE_TASK_SHA --title "Task NNN: <summary>" \
247
+ "Review against acceptance criteria: <criteria>" \
248
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
249
+ ```
250
+
251
+ ---
252
+
253
+ ## Dimension Sets
254
+
255
+ ### Research Dimensions (5)
256
+
257
+ 1. **Coverage** -- all briefing topics researched
258
+ 2. **Source quality** -- authoritative, current sources
259
+ 3. **Relevance** -- research answers the actual questions
260
+ 4. **Gaps** -- missing info that blocks later phases
261
+ 5. **Contradictions** -- conflicting sources identified
262
+
263
+ ### Spec/Clarification Dimensions (5)
264
+
265
+ 1. **Completeness** -- all requirements covered
266
+ 2. **Testability** -- each criterion verifiable
267
+ 3. **Ambiguity** -- no dual-interpretation statements
268
+ 4. **Assumptions** -- hidden assumptions explicit
269
+ 5. **Scope creep** -- nothing beyond briefing
270
+
271
+ ### Design-Review Dimensions (5)
272
+
273
+ Matches canonical `workflows/design-review.md`:
274
+
275
+ 1. **Spec coverage** -- does the design address every acceptance criterion with a visual component?
276
+ 2. **Design-spec consistency** -- does the design introduce anything not in the spec? (scope creep check)
277
+ 3. **Accessibility** -- color contrast ratios (WCAG 2.1 AA), focus states, touch target sizes (44x44px minimum)
278
+ 4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
279
+ 5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
280
+
281
+ ### Plan Dimensions (7)
282
+
283
+ 1. **Completeness** -- all design decisions mapped to tasks
284
+ 2. **Ordering** -- dependencies correct, parallelizable identified
285
+ 3. **Atomicity** -- each task fits one session
286
+ 4. **Testability** -- concrete verification per task
287
+ 5. **Edge cases** -- error paths covered
288
+ 6. **Security** -- auth, injection, data exposure
289
+ 7. **Integration** -- tasks connect end-to-end
290
+
291
+ ### Task Execution Dimensions (5)
292
+
293
+ Used for per-task review during execution:
294
+
295
+ 1. **Correctness** -- code matches spec
296
+ 2. **Tests** -- real tests, not mocked/faked
297
+ 3. **Wiring** -- all paths connected
298
+ 4. **Drift** -- matches task spec
299
+ 5. **Quality** -- naming, error handling
300
+
301
+ ### Final Review Dimensions (7)
302
+
303
+ Used for `workflows/review.md` scored gate:
304
+
305
+ 1. **Correctness** -- does the code do what the spec says?
306
+ 2. **Completeness** -- are all acceptance criteria met?
307
+ 3. **Wiring** -- are all paths connected end-to-end?
308
+ 4. **Verification** -- is there evidence (tests, type checks) for each claim?
309
+ 5. **Drift** -- does the implementation match the approved plan?
310
+ 6. **Quality** -- code style, naming, error handling, security
311
+ 7. **Documentation** -- changelog entries, commit messages, comments
312
+
313
+ The final review dimensions are the existing 7 from `skills/reviewer/SKILL.md`. `workflows/review.md` is not modified by this pattern.
314
+
315
+ ---
316
+
317
+ ## Per-Depth Coverage Contract
318
+
319
+ | Depth | Research | Spec | Design-Review | Plan | Task Execution | Final Review |
320
+ |-------|----------|------|---------------|------|----------------|--------------|
321
+ | Quick | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | always 7 dims, 1 pass |
322
+ | Standard | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | always 7 dims, 1 pass |
323
+ | Deep | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-7, 7 passes | dims 1-5, 7 passes | always 7 dims, 1 pass |
324
+
325
+ Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7 passes. No extension. No early-exit. Final review is always a single scored pass across all 7 dimensions -- it is a gate, not a loop.
326
+
327
+ ---
328
+
329
+ ## Loop Cap Configuration
330
+
331
+ The `phase_policy` section of `run-config.yaml` controls which phases are enabled and sets an absolute safety ceiling per phase. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not phase policy.
332
+
333
+ ```yaml
334
+ phase_policy:
335
+ discover: { enabled: true, loop_cap: 10 }
336
+ clarify: { enabled: true, loop_cap: 10 }
337
+ specify: { enabled: true, loop_cap: 10 }
338
+ spec-challenge: { enabled: true, loop_cap: 10 }
339
+ author: { enabled: false, loop_cap: 10 }
340
+ design: { enabled: true, loop_cap: 10 }
341
+ design-review: { enabled: true, loop_cap: 10 }
342
+ plan: { enabled: true, loop_cap: 10 }
343
+ plan-review: { enabled: true, loop_cap: 10 }
344
+ execute: { enabled: true, loop_cap: 10 }
345
+ verify: { enabled: true, loop_cap: 5 }
346
+ review: { enabled: true, loop_cap: 10 }
347
+ learn: { enabled: false, loop_cap: 5 }
348
+ prepare_next: { enabled: false, loop_cap: 5 }
349
+ run_audit: { enabled: false, loop_cap: 10 }
350
+ ```
351
+
352
+ **`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
353
+
354
+ **Adaptive phases** (`author`, `learn`, `prepare_next`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection. They do not participate in the standard review loop pattern because:
355
+
356
+ - `author` has a human approval gate, not an iterative review loop.
357
+ - `learn` extracts learnings from the completed run -- it is post-execution housekeeping.
358
+ - `prepare_next` prepares context for the next run -- it is a handoff phase.
359
+ - `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
360
+
361
+ ---
362
+
363
+ ## Reviewer Mode Table
364
+
365
+ The reviewer skill operates in different modes depending on the phase. **Mode is always explicit** -- the caller passes `--mode <mode>`. There is no auto-detection based on artifact availability.
366
+
367
+ | Mode | Invoked during | Prerequisites | Dimensions | Output |
368
+ |------|---------------|---------------|------------|--------|
369
+ | `final` | After execution + verification | Completed task artifacts in `.wazir/runs/latest/artifacts/` | 7 final-review dims, scored 0-70 | Verdict: PASS/NEEDS FIXES/NEEDS REWORK/FAIL |
370
+ | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Findings with severity, no score |
371
+ | `design-review` | After design approval | Design artifact, approved spec, accessibility guidelines | 5 design-review dims (canonical) | Findings with severity (blocking/advisory) |
372
+ | `plan-review` | After planning | Draft plan, approved spec, design artifact | 7 plan dims | Findings with severity, no score |
373
+ | `task-review` | During execution, per task | Uncommitted changes (or committed with known base SHA) | 5 task-execution dims | Pass/fail per task, no score |
374
+ | `research-review` | During discover | Research artifact | 5 research dims | Findings with severity, no score |
375
+ | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Findings with severity, no score |
376
+
377
+ If `--mode` is not provided, the reviewer asks the user which review to run. Auto-detection based on artifact availability is NOT used -- it causes ambiguity in resumed/multi-phase runs where stale artifacts from prior phases exist.
378
+
379
+ Each caller is responsible for passing the correct mode:
380
+
381
+ - Clarifier passes `--mode clarification-review` after Phase 1A
382
+ - Discover workflow passes `--mode research-review` after research
383
+ - Specifier flow passes `--mode spec-challenge` after specify
384
+ - Brainstorming passes `--mode design-review` after user approval
385
+ - Writing-plans passes `--mode plan-review` after planning
386
+ - Executor passes `--mode task-review` for each task
387
+ - `/wazir` runner passes `--mode final` for the final review gate
388
+
389
+ ---
390
+
391
+ ## Codex Prompt Templates
392
+
393
+ All Codex invocations read the model from config with a fallback:
394
+
395
+ ```bash
396
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
397
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
398
+ ```
399
+
400
+ ### Artifact Review (specs, plans, designs via stdin)
401
+
402
+ Use this template with `codex exec` for non-code artifacts piped via stdin:
403
+
404
+ ```bash
405
+ cat <artifact_path> | codex exec -c model="$CODEX_MODEL" \
406
+ "You are reviewing a [ARTIFACT_TYPE] for the Wazir engineering OS.
407
+ Focus on [DIMENSION]: [dimension description].
408
+ Rules: cite specific sections, be actionable, say CLEAN if no issues.
409
+ Do NOT load or invoke any skills. Do NOT read the codebase.
410
+ Review ONLY the content provided via stdin."
411
+ ```
412
+
413
+ Replace `[ARTIFACT_TYPE]` with: `specification`, `implementation plan`, `design document`, `research brief`, or `clarification`.
414
+ Replace `[DIMENSION]` and `[dimension description]` with the current review pass dimension from the relevant dimension set above.
415
+
416
+ ### Code Review (diffs via --uncommitted or --base)
417
+
418
+ Use this template with `codex review` for code changes:
419
+
420
+ ```bash
421
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
422
+ "Review the code changes for [DIMENSION]: [dimension description].
423
+ Check against acceptance criteria: [criteria].
424
+ Flag: correctness issues, missing tests, unwired paths, drift from spec.
425
+ Do NOT load or invoke any skills."
426
+ ```
427
+
428
+ For committed changes, replace `--uncommitted` with `--base <sha>`.
429
+ Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
@@ -35,12 +35,14 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
35
35
  | `wazir capture output` | implemented | Writes captured tool output to a run-local file and records a `post_tool_capture` event. |
36
36
  | `wazir capture summary` | implemented | Writes `summary.md` and records the chosen summary or handoff event. |
37
37
  | `wazir capture usage` | implemented | Generates a token savings report for a run, showing capture routing statistics and context window savings. |
38
+ | `wazir capture loop-check` | implemented | Records a loop iteration event and evaluates the loop cap guard. Exits 43 if the phase loop cap is exceeded. Accepts `--task-id` for task-scoped cap tracking. In standalone mode (no status.json), exits 0. |
38
39
 
39
40
  ## Exit codes
40
41
 
41
42
  - `0`: requested check passed
42
43
  - `1`: invalid input or validation failure
43
44
  - `2`: command surface exists but the implementation is intentionally not complete yet
45
+ - `43`: phase loop cap exceeded (returned by `wazir capture loop-check`)
44
46
 
45
47
  ## Root discovery
46
48
 
@@ -184,6 +184,12 @@
184
184
  subject: wazir capture usage
185
185
  verifier: command_registry
186
186
  required: true
187
+ - id: command-capture-loop-check
188
+ file: docs/reference/tooling-cli.md
189
+ claim_type: command
190
+ subject: wazir capture loop-check
191
+ verifier: command_registry
192
+ required: true
187
193
  - id: command-validate-branches
188
194
  file: docs/reference/tooling-cli.md
189
195
  claim_type: command
@@ -30,6 +30,7 @@ Default approach: recall L1 (structural summaries)
30
30
  - clarification artifact
31
31
  - unresolved questions list
32
32
  - scope summary with cited sources
33
+ - emits clarification artifact for reviewer loops
33
34
 
34
35
  ## Escalation Rules
35
36
 
@@ -40,3 +41,5 @@ Default approach: recall L1 (structural summaries)
40
41
  - leaves material ambiguity unresolved without escalation
41
42
  - mutates `input/`
42
43
  - invents constraints or facts without evidence
44
+ - self-reviews own output instead of delegating to reviewer
45
+ - performs substantial discovery research inline without delegating to the discover workflow when delegation is required
@@ -35,6 +35,9 @@ Default approach: recall L1 (structural summaries)
35
35
  - exported HTML + CSS scaffold
36
36
  - design tokens JSON (colors, spacing, typography)
37
37
  - screenshot PNGs of key frames
38
+ - emits design artifact for design-review loop
39
+
40
+ Design is not approved for planning until it survives the design-review loop owned by the reviewer role.
38
41
 
39
42
  ## Git-Flow Responsibilities
40
43
 
@@ -31,6 +31,7 @@ Default approach: direct file read (full content)
31
31
  - code and docs changes
32
32
  - execution notes
33
33
  - verification evidence
34
+ - submits per-task output for review before commit
34
35
 
35
36
  ## Git-Flow Responsibilities
36
37
 
@@ -53,3 +54,4 @@ All text outputs (code comments, commit messages, PR descriptions, CHANGELOG ent
53
54
  - unwired paths
54
55
  - fake tests
55
56
  - writes to protected paths outside approved flows
57
+ - commits before review passes
@@ -32,6 +32,9 @@ Default approach: recall L1 (structural summaries)
32
32
  - implementation plan artifact
33
33
  - ordered task list
34
34
  - verification plan per section
35
+ - emits plan artifact for plan-review loop
36
+
37
+ Plan is not approved until it survives the plan-review loop owned by the reviewer role.
35
38
 
36
39
  ## Git-Flow Responsibilities
37
40
 
@@ -31,6 +31,7 @@ Default approach: recall L1 (structural summaries)
31
31
  - research artifact with citations
32
32
  - finding summaries linked to sources
33
33
  - open risks and unknowns
34
+ - submits research artifact for reviewer evaluation before it flows downstream
34
35
 
35
36
  ## Escalation Rules
36
37
 
@@ -41,3 +42,4 @@ Default approach: recall L1 (structural summaries)
41
42
  - unsupported claims
42
43
  - stale or missing citations
43
44
  - substituting confidence for evidence
45
+ - research artifact used downstream without passing review
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Purpose
4
4
 
5
- Perform adversarial review to find correctness, scope, wiring, verification, and drift failures.
5
+ Perform adversarial review to find correctness, scope, wiring, verification, and drift failures. Owns all review loops: research-review, clarification-review, spec-challenge, design-review, plan-review, task-review, and final review.
6
6
 
7
7
  ## Inputs
8
8
 
@@ -17,6 +17,7 @@ Perform adversarial review to find correctness, scope, wiring, verification, and
17
17
  - source-backed comparison to spec/plan
18
18
  - secondary model review when available
19
19
  - Wazir CLI recall and index commands (see Context retrieval)
20
+ - review loop pattern (see docs/reference/review-loop-pattern.md)
20
21
 
21
22
  ## Context retrieval
22
23
 
@@ -32,6 +33,9 @@ Default approach: recall L1, escalate to direct read for flagged issues
32
33
  - findings with severity
33
34
  - rationale tied to evidence
34
35
  - explicit no-findings verdict when applicable
36
+ - review loop pass logs with source attribution ([Wazir], [Codex], [Both])
37
+
38
+ Review mode is always passed explicitly by the caller (--mode). The reviewer does not auto-detect mode from artifact availability.
35
39
 
36
40
  ## Git-Flow Responsibilities
37
41
 
@@ -31,6 +31,9 @@ Default approach: recall L1 (structural summaries)
31
31
  - spec artifact
32
32
  - measurable acceptance criteria
33
33
  - explicit non-goals and assumptions
34
+ - emits spec artifact for spec-challenge review loop
35
+
36
+ Spec is not approved until it survives the spec-challenge loop owned by the reviewer role.
34
37
 
35
38
  ## Writing Quality
36
39
 
@@ -24,6 +24,10 @@ On entering this phase, run:
24
24
  - unresolved questions list
25
25
  - scope summary
26
26
 
27
+ ## Review Loop
28
+
29
+ Clarification artifact is reviewed by the reviewer role using the review loop pattern with spec/clarification dimensions. The reviewer is invoked with `--mode clarification-review`. The clarifier resolves findings. Clarification does not flow to specify until all review passes complete.
30
+
27
31
  ## Approval Gate
28
32
 
29
33
  - no formal approval gate, but unresolved material ambiguity must be escalated
@@ -39,6 +39,10 @@ On rejection: `wazir capture event --run <run-id> --event gate_rejected --phase
39
39
  On completing this phase, run:
40
40
  `wazir capture event --run <run-id> --event phase_exit --phase <phase-name> --status completed`
41
41
 
42
+ ## Loop Structure
43
+
44
+ Follows the review loop pattern in `docs/reference/review-loop-pattern.md` with the canonical design-review dimensions (spec coverage, design-spec consistency, accessibility, visual consistency, exported-code fidelity). The designer role resolves findings. Starts when the approved design artifact enters the `design_review` phase. Pass count determined by depth. No extension.
45
+
42
46
  ## Failure Conditions
43
47
 
44
48
  - vague findings without visual evidence
@@ -33,6 +33,10 @@ On entering this phase, run:
33
33
 
34
34
  - explicit human approval required before design-review
35
35
 
36
+ ## Review Loop
37
+
38
+ After user approval, design artifact is reviewed via the design-review workflow (`workflows/design-review.md`) using the review loop pattern with the canonical design-review dimensions (spec coverage, design-spec consistency, accessibility, visual consistency, exported-code fidelity). The reviewer is invoked with `--mode design-review`. Design does not flow to planning until all review passes complete.
39
+
36
40
  ## Phase exit
37
41
 
38
42
  On completing this phase, run:
@@ -23,6 +23,10 @@ On entering this phase, run:
23
23
  - research artifact
24
24
  - cited findings
25
25
 
26
+ ## Review Loop
27
+
28
+ Research artifact is reviewed by the reviewer role using the review loop pattern (`docs/reference/review-loop-pattern.md`) with research dimensions (coverage, source quality, relevance, gaps, contradictions). The reviewer is invoked with `--mode research-review`. The researcher resolves findings. Research does not flow to specify until all review passes complete.
29
+
26
30
  ## Approval Gate
27
31
 
28
32
  - no formal approval gate, but unsupported research cannot flow forward