@wazir-dev/cli 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (163) hide show
  1. package/CHANGELOG.md +100 -2
  2. package/README.md +6 -6
  3. package/docs/concepts/architecture.md +1 -1
  4. package/docs/concepts/roles-and-workflows.md +2 -0
  5. package/docs/concepts/why-wazir.md +59 -0
  6. package/docs/decisions/2026-03-19-deferred-items.md +564 -0
  7. package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
  8. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
  9. package/docs/readmes/INDEX.md +21 -5
  10. package/docs/readmes/features/expertise/README.md +2 -2
  11. package/docs/readmes/features/exports/README.md +2 -2
  12. package/docs/readmes/features/schemas/README.md +3 -0
  13. package/docs/readmes/features/skills/README.md +17 -0
  14. package/docs/readmes/features/skills/clarifier.md +5 -0
  15. package/docs/readmes/features/skills/claude-cli.md +5 -0
  16. package/docs/readmes/features/skills/codex-cli.md +5 -0
  17. package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
  18. package/docs/readmes/features/skills/executing-plans.md +5 -0
  19. package/docs/readmes/features/skills/executor.md +5 -0
  20. package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
  21. package/docs/readmes/features/skills/gemini-cli.md +5 -0
  22. package/docs/readmes/features/skills/humanize.md +5 -0
  23. package/docs/readmes/features/skills/init-pipeline.md +5 -0
  24. package/docs/readmes/features/skills/receiving-code-review.md +5 -0
  25. package/docs/readmes/features/skills/requesting-code-review.md +5 -0
  26. package/docs/readmes/features/skills/reviewer.md +5 -0
  27. package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
  28. package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
  29. package/docs/readmes/features/skills/wazir.md +5 -0
  30. package/docs/readmes/features/skills/writing-skills.md +5 -0
  31. package/docs/readmes/features/workflows/prepare-next.md +1 -1
  32. package/docs/reference/configuration-reference.md +47 -6
  33. package/docs/reference/launch-checklist.md +4 -4
  34. package/docs/reference/review-loop-pattern.md +538 -0
  35. package/docs/reference/roles-reference.md +1 -0
  36. package/docs/reference/skill-tiers.md +147 -0
  37. package/docs/reference/tooling-cli.md +5 -1
  38. package/docs/truth-claims.yaml +18 -0
  39. package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
  40. package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
  41. package/exports/hosts/claude/.claude/agents/designer.md +3 -0
  42. package/exports/hosts/claude/.claude/agents/executor.md +2 -0
  43. package/exports/hosts/claude/.claude/agents/planner.md +3 -0
  44. package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
  45. package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
  46. package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
  47. package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
  48. package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
  49. package/exports/hosts/claude/.claude/commands/design.md +4 -0
  50. package/exports/hosts/claude/.claude/commands/discover.md +4 -0
  51. package/exports/hosts/claude/.claude/commands/execute.md +4 -0
  52. package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
  53. package/exports/hosts/claude/.claude/commands/plan.md +4 -0
  54. package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
  55. package/exports/hosts/claude/.claude/commands/specify.md +4 -0
  56. package/exports/hosts/claude/.claude/commands/verify.md +4 -0
  57. package/exports/hosts/claude/.claude/settings.json +9 -0
  58. package/exports/hosts/claude/CLAUDE.md +1 -1
  59. package/exports/hosts/claude/export.manifest.json +22 -20
  60. package/exports/hosts/claude/host-package.json +3 -1
  61. package/exports/hosts/codex/AGENTS.md +1 -1
  62. package/exports/hosts/codex/export.manifest.json +22 -20
  63. package/exports/hosts/codex/host-package.json +3 -1
  64. package/exports/hosts/cursor/.cursor/hooks.json +4 -0
  65. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
  66. package/exports/hosts/cursor/export.manifest.json +22 -20
  67. package/exports/hosts/cursor/host-package.json +3 -1
  68. package/exports/hosts/gemini/GEMINI.md +1 -1
  69. package/exports/hosts/gemini/export.manifest.json +22 -20
  70. package/exports/hosts/gemini/host-package.json +3 -1
  71. package/hooks/context-mode-router +191 -0
  72. package/hooks/definitions/context_mode_router.yaml +19 -0
  73. package/hooks/definitions/loop_cap_guard.yaml +1 -1
  74. package/hooks/hooks.json +43 -0
  75. package/hooks/protected-path-write-guard +8 -0
  76. package/hooks/routing-matrix.json +45 -0
  77. package/hooks/session-start +62 -1
  78. package/llms-full.txt +905 -132
  79. package/package.json +3 -3
  80. package/roles/clarifier.md +3 -0
  81. package/roles/designer.md +3 -0
  82. package/roles/executor.md +2 -0
  83. package/roles/planner.md +3 -0
  84. package/roles/researcher.md +2 -0
  85. package/roles/reviewer.md +5 -1
  86. package/roles/specifier.md +3 -0
  87. package/schemas/hook.schema.json +2 -1
  88. package/schemas/phase-report.schema.json +80 -0
  89. package/schemas/usage.schema.json +25 -1
  90. package/schemas/wazir-manifest.schema.json +19 -0
  91. package/skills/brainstorming/SKILL.md +20 -56
  92. package/skills/clarifier/SKILL.md +243 -0
  93. package/skills/claude-cli/SKILL.md +320 -0
  94. package/skills/codex-cli/SKILL.md +260 -0
  95. package/skills/debugging/SKILL.md +24 -1
  96. package/skills/design/SKILL.md +13 -0
  97. package/skills/dispatching-parallel-agents/SKILL.md +13 -0
  98. package/skills/executing-plans/SKILL.md +28 -2
  99. package/skills/executor/SKILL.md +129 -0
  100. package/skills/finishing-a-development-branch/SKILL.md +13 -0
  101. package/skills/gemini-cli/SKILL.md +260 -0
  102. package/skills/humanize/SKILL.md +13 -0
  103. package/skills/init-pipeline/SKILL.md +76 -78
  104. package/skills/prepare-next/SKILL.md +81 -10
  105. package/skills/receiving-code-review/SKILL.md +21 -0
  106. package/skills/requesting-code-review/SKILL.md +38 -5
  107. package/skills/reviewer/SKILL.md +423 -0
  108. package/skills/run-audit/SKILL.md +13 -0
  109. package/skills/scan-project/SKILL.md +13 -0
  110. package/skills/self-audit/SKILL.md +197 -16
  111. package/skills/subagent-driven-development/SKILL.md +38 -2
  112. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
  113. package/skills/subagent-driven-development/implementer-prompt.md +8 -0
  114. package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
  115. package/skills/tdd/SKILL.md +21 -0
  116. package/skills/using-git-worktrees/SKILL.md +13 -0
  117. package/skills/using-skills/SKILL.md +13 -0
  118. package/skills/verification/SKILL.md +13 -0
  119. package/skills/wazir/SKILL.md +286 -262
  120. package/skills/writing-plans/SKILL.md +44 -4
  121. package/skills/writing-skills/SKILL.md +13 -0
  122. package/templates/artifacts/implementation-plan.md +3 -0
  123. package/templates/artifacts/tasks-template.md +133 -0
  124. package/templates/examples/phase-report.example.json +48 -0
  125. package/templates/examples/wazir-manifest.example.yaml +1 -1
  126. package/tooling/src/adapters/composition-engine.js +256 -0
  127. package/tooling/src/adapters/model-router.js +84 -0
  128. package/tooling/src/capture/command.js +111 -2
  129. package/tooling/src/capture/run-config.js +23 -0
  130. package/tooling/src/capture/store.js +24 -0
  131. package/tooling/src/capture/usage.js +106 -0
  132. package/tooling/src/checks/ac-matrix.js +256 -0
  133. package/tooling/src/checks/brand-truth.js +3 -6
  134. package/tooling/src/checks/command-registry.js +13 -0
  135. package/tooling/src/checks/docs-truth.js +1 -1
  136. package/tooling/src/checks/runtime-surface.js +3 -7
  137. package/tooling/src/checks/skills.js +111 -0
  138. package/tooling/src/cli.js +17 -3
  139. package/tooling/src/commands/stats.js +161 -0
  140. package/tooling/src/commands/validate.js +5 -1
  141. package/tooling/src/export/compiler.js +33 -37
  142. package/tooling/src/gating/agent.js +145 -0
  143. package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
  144. package/tooling/src/hooks/routing-logic.js +69 -0
  145. package/tooling/src/init/auto-detect.js +260 -0
  146. package/tooling/src/init/command.js +161 -0
  147. package/tooling/src/input/scanner.js +46 -0
  148. package/tooling/src/reports/command.js +103 -0
  149. package/tooling/src/reports/phase-report.js +323 -0
  150. package/tooling/src/state/command.js +160 -0
  151. package/tooling/src/state/db.js +287 -0
  152. package/tooling/src/status/command.js +53 -1
  153. package/wazir.manifest.yaml +26 -17
  154. package/workflows/clarify.md +4 -0
  155. package/workflows/design-review.md +4 -0
  156. package/workflows/design.md +4 -0
  157. package/workflows/discover.md +4 -0
  158. package/workflows/execute.md +4 -0
  159. package/workflows/plan-review.md +4 -0
  160. package/workflows/plan.md +4 -0
  161. package/workflows/spec-challenge.md +4 -0
  162. package/workflows/specify.md +4 -0
  163. package/workflows/verify.md +4 -0
@@ -26,7 +26,7 @@ Submit pull requests to these curated lists (one PR per list, follow each repo's
26
26
  ### awesome-claude-code
27
27
  - **Repo:** `github.com/anthropics/awesome-claude-code` (or the most-starred community fork)
28
28
  - **Section:** Tools / Plugins / Extensions
29
- - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 14 phases, and 308 expertise modules.`
29
+ - **Entry format:** `[Wazir](https://github.com/MohamedAbdallah-14/Wazir) - Host-native engineering OS kit with 10 roles, 4 phases (15 workflows), and 268 expertise modules.`
30
30
  - **Tips:** Keep the description under 120 characters. Link directly to the repo.
31
31
 
32
32
  ### awesome-ai-agents
@@ -56,7 +56,7 @@ Show HN: Wazir – Engineering OS kit for AI coding agents (Claude, Codex, Gemin
56
56
  ### First comment
57
57
  Post a comment immediately after submission explaining:
58
58
  1. What problem Wazir solves (AI agents lack structured engineering workflows)
59
- 2. How it works (10 canonical roles, 14-phase pipeline, 308 expertise modules)
59
+ 2. How it works (10 canonical roles, 14-phase pipeline, 268 expertise modules)
60
60
  3. What makes it different (host-native, works across Claude/Codex/Gemini/Cursor)
61
61
  4. Quick install: `npx @wazir-dev/cli init`
62
62
  5. Invite feedback -- HN readers appreciate genuine requests for input
@@ -75,7 +75,7 @@ Post a comment immediately after submission explaining:
75
75
  **Title:** "How I Built an Engineering OS for AI Coding Agents"
76
76
 
77
77
  1. **Hook** -- The problem: AI agents write code but lack engineering discipline.
78
- 2. **Architecture overview** -- 10 roles, 14 phases, expertise modules, quality gates.
78
+ 2. **Architecture overview** -- 10 roles, 4 phases (15 workflows), expertise modules, quality gates.
79
79
  3. **Code walkthrough** -- Show a real workflow: how a feature moves from requirements through TDD to deployment.
80
80
  4. **Host-native approach** -- Explain why one kit works across Claude, Codex, Gemini, and Cursor.
81
81
  5. **Results** -- Concrete metrics or before/after comparisons.
@@ -100,7 +100,7 @@ Structure as a 5-7 tweet thread:
100
100
 
101
101
  1. **Hook tweet:** One-liner about the problem + link to repo.
102
102
  2. **What it is:** Brief description of Wazir.
103
- 3. **Architecture:** 10 roles, 14 phases, 308 modules (include a diagram image).
103
+ 3. **Architecture:** 10 roles, 4 phases (15 workflows), 308 modules (include a diagram image).
104
104
  4. **Demo:** Short GIF or screenshot of a workflow in action.
105
105
  5. **Multi-host:** Works with Claude, Codex, Gemini, and Cursor.
106
106
  6. **Install:** `npx @wazir-dev/cli init`
@@ -0,0 +1,538 @@
1
+ # Review Loop Pattern Reference
2
+
3
+ Canonical reference for the review loop pattern used across all Wazir pipeline phases. Skills and workflows link to this document rather than embedding loop logic inline.
4
+
5
+ ---
6
+
7
+ ## Core Principle: Producer-Reviewer Separation
8
+
9
+ The producer skill (clarifier, planner, designer, etc.) **emits** an artifact and calls for review. The **reviewer role** owns the review loop. The producer receives findings and resolves them. No role reviews its own output.
10
+
11
+ ```
12
+ Producer emits artifact
13
+ -> Reviewer runs review loop (N passes, Codex if available)
14
+ -> Findings returned to producer
15
+ -> Producer fixes and resubmits
16
+ -> Loop until all passes exhausted or cap reached
17
+ -> Escalate to user if cap exceeded
18
+ ```
19
+
20
+ When Codex is available, the reviewer role delegates to `codex review` as a secondary input while maintaining its own independent primary verdict.
21
+
22
+ ---
23
+
24
+ ## Per-Task Review vs Final Review
25
+
26
+ These are two structurally different constructs:
27
+
28
+ | | Per-Task Review | Final Review |
29
+ |---|---|---|
30
+ | **When** | During execution, after each task | After all execution + verification complete |
31
+ | **Dimensions** | 5 task-execution dims (correctness, tests, wiring, drift, quality) | 7 scored dims (correctness, completeness, wiring, verification, drift, quality, documentation) |
32
+ | **Scope** | Single task's uncommitted changes | Entire implementation vs spec/plan |
33
+ | **Output** | Pass/fix loop, no score | Scored verdict (0-70), PASS/FAIL |
34
+ | **Workflow** | Inline in execution flow | `workflows/review.md` |
35
+ | **Skill** | `wz:reviewer` in `task-review` mode | `wz:reviewer` in `final` mode |
36
+ | **Log filename** | `<phase>-task-<NNN>-review-pass-<N>.md` | `final-review.md` |
37
+
38
+ ---
39
+
40
+ ## Standalone Mode
41
+
42
+ When no `.wazir/runs/latest/` directory exists (standalone skill invocation outside a pipeline run):
43
+
44
+ 1. **Review loops still run** -- the review logic is embedded in the skill, not dependent on run state.
45
+ 2. **Artifact location** -- artifacts live in `docs/plans/`. This is the canonical standalone artifact path.
46
+ 3. **Review log location** -- review logs go alongside the artifact: `docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md`. No temp dir.
47
+ 4. **Loop cap is SKIPPED entirely** -- no `wazir capture loop-check` call. The loop runs for exactly `pass_counts[depth]` passes (3/5/7) and stops. No cap guard, no fallback constant.
48
+ 5. **`wazir capture loop-check`** -- not invoked in standalone mode. The standalone detection happens before the cap guard call.
49
+
50
+ Detection logic:
51
+
52
+ ```
53
+ if .wazir/runs/latest/ exists:
54
+ run_mode = "pipeline"
55
+ log_dir = .wazir/runs/latest/reviews/
56
+ cap_guard = wazir capture loop-check (full guard)
57
+ else:
58
+ run_mode = "standalone"
59
+ artifact_dir = docs/plans/
60
+ log_dir = docs/plans/ (alongside artifact)
61
+ cap_guard = none (depth pass count is the only limit)
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Review Loop Pseudocode
67
+
68
+ ```
69
+ review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
70
+
71
+ # options.mode -- explicit review mode (required)
72
+ # options.task_id -- task identifier for task-scoped reviews (optional)
73
+
74
+ # Standalone detection
75
+ run_mode = detect_run_mode() # "pipeline" or "standalone"
76
+
77
+ # Fixed pass counts -- no extension
78
+ pass_counts = { quick: 3, standard: 5, deep: 7 }
79
+ total_passes = pass_counts[depth]
80
+
81
+ # Depth-aware dimension subsets (coverage contract)
82
+ depth_dimensions = {
83
+ quick: dimensions[0:3], # first 3 dimensions only
84
+ standard: dimensions[0:5], # first 5
85
+ deep: dimensions, # all available
86
+ }
87
+ active_dims = depth_dimensions[depth]
88
+
89
+ codex_available = check_codex() # which codex && codex --version
90
+
91
+ for pass_number in 0..total_passes-1:
92
+
93
+ # --- Cap guard check (pipeline mode only, before each pass) ---
94
+ if run_mode == "pipeline":
95
+ loop_check_args = "--run <run-id> --phase <phase> --loop-count <pass_number+1>"
96
+ if options.task_id:
97
+ loop_check_args += " --task-id <task_id>"
98
+ wazir capture loop-check $loop_check_args
99
+ # loop-check wraps: event capture + evaluateLoopCapGuard
100
+ # If loop_cap_guard fires (exit 43), stop immediately:
101
+ if last_exit_code == 43:
102
+ log("Loop cap reached for phase: <phase>. Escalating to user.")
103
+ escalate_to_user(evidence_gathered_so_far)
104
+ return { pass_count: pass_number, escalated: true }
105
+ # Standalone mode: no cap guard. Loop runs for total_passes and stops.
106
+
107
+ dimension = active_dims[pass_number % len(active_dims)]
108
+
109
+ # --- Primary review (reviewer role, not producer) ---
110
+ # Mode is always explicit -- passed by caller via options.mode
111
+ findings = self_review(artifact_path, focus=dimension, mode=options.mode)
112
+
113
+ # --- Secondary review (Codex, if available) ---
114
+ if codex_available:
115
+ codex_exit_code, codex_output = run_codex_review(artifact_path, dimension)
116
+ if codex_exit_code != 0:
117
+ # Codex failed -- log error, fall back to self-review for this pass
118
+ log_error("Codex exited " + codex_exit_code + ": " + codex_output.stderr)
119
+ mark_pass_codex_unavailable(pass_number)
120
+ # Do NOT treat Codex failure as clean. Self-review findings stand alone.
121
+ else:
122
+ codex_findings = parse(codex_output.stdout)
123
+ merge(findings, codex_findings, preserve_attribution=true)
124
+
125
+ # --- Log the review pass ---
126
+ if run_mode == "pipeline":
127
+ if options.task_id:
128
+ log_path = .wazir/runs/latest/reviews/<phase>-task-<task_id>-review-pass-<N>.md
129
+ else:
130
+ log_path = .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
131
+ log(pass_number+1, dimension, findings) -> log_path
132
+ else:
133
+ log_path = docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md
134
+ log(pass_number+1, dimension, findings) -> log_path
135
+
136
+ if findings.has_issues:
137
+ # --- Fix and re-submit (MANDATORY) ---
138
+ # The producer MUST fix findings and the reviewer MUST re-review.
139
+ # "Fix and continue without re-review" is EXPLICITLY PROHIBITED.
140
+ producer_fix(artifact_path, findings)
141
+ # Continue to next pass -- the fix will be re-reviewed
142
+
143
+ # --- Post-loop: escalation if issues remain ---
144
+ if remaining.has_issues:
145
+ # Cap reached with unresolved findings. Present to user:
146
+ # 1. Approve with known issues (Recommended if non-blocking)
147
+ # 2. Fix manually and re-run
148
+ # 3. Abort
149
+ escalate_to_user(remaining, options=[
150
+ "approve-with-issues",
151
+ "fix-manually-and-rerun",
152
+ "abort"
153
+ ])
154
+ # User decides. If approved, log "user-approved-with-issues" in final pass file.
155
+
156
+ return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
157
+ ```
158
+
159
+ Key properties of this pseudocode:
160
+
161
+ 1. **Fixed pass counts** -- Quick is exactly 3, standard exactly 5, deep exactly 7. No `max_passes = min_passes + 3`. No clean-streak early-exit. No extension.
162
+ 2. **Task-scoped log filenames** -- `<phase>-task-<NNN>-review-pass-<N>.md` for per-task reviews, preventing log clobbering in parallel mode.
163
+ 3. **Task-scoped loop cap keys** -- `--task-id` flag on `loop-check` so each task gets its own counter in `phase_loop_counts`.
164
+ 4. **Explicit review mode** -- `options.mode` is always passed by the caller. No auto-detection.
165
+ 5. **Codex error handling** -- non-zero exit is logged, pass marked `codex-unavailable`, self-review findings used alone. Never treated as clean.
166
+ 6. **Standalone mode** -- uses `docs/plans/` for artifacts and logs. No temp dir. No cap guard at all.
167
+
168
+ ---
169
+
170
+ ## Codex Error Handling Contract
171
+
172
+ ```
173
+ run_codex_review(artifact_path, dimension):
174
+ CODEX_MODEL = read_config('.wazir/state/config.json', '.multi_tool.codex.model') or "gpt-5.4"
175
+
176
+ if is_code_artifact:
177
+ cmd = codex review -c model="$CODEX_MODEL" --uncommitted --title "..." "Review for [dimension]..."
178
+ # or: codex review -c model="$CODEX_MODEL" --base <sha> for committed changes
179
+ else:
180
+ cmd = cat <artifact_path> | codex exec -c model="$CODEX_MODEL" "Review this [type] for [dimension]..."
181
+
182
+ result = execute(cmd, timeout=120s, capture_stderr=true)
183
+
184
+ if result.exit_code != 0:
185
+ return (result.exit_code, { stderr: result.stderr, stdout: "" })
186
+ # Caller handles: log error, mark codex-unavailable, use self-review only
187
+
188
+ return (0, { stdout: result.stdout, stderr: result.stderr })
189
+ ```
190
+
191
+ Rules:
192
+
193
+ - If Codex exits non-zero, log the full stderr.
194
+ - Mark the pass as `codex-unavailable` in the review log metadata.
195
+ - Fall back to self-review for that pass only. Do not skip the pass.
196
+ - Do not retry Codex on the same pass. If Codex fails on pass 2, pass 3 still tries Codex (transient failures recover).
197
+ - Never treat a Codex failure as a clean review pass.
198
+
199
+ ---
200
+
201
+ ## Codex Availability Probe
202
+
203
+ Before any Codex call, verify availability once at loop start:
204
+
205
+ ```bash
206
+ which codex >/dev/null 2>&1 && codex --version >/dev/null 2>&1
207
+ ```
208
+
209
+ If the probe fails, set `codex_available = false` for the entire loop. Fall back to self-review only. Never error out.
210
+
211
+ Per-invocation failures (Codex available but a single call fails) are handled separately by the error contract above.
212
+
213
+ ---
214
+
215
+ ## Codex Artifact-Scoped Review
216
+
217
+ Never use `codex review` for non-code artifacts (specs, plans, designs). Instead, pipe the artifact content via stdin:
218
+
219
+ ```bash
220
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
221
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
222
+ cat .wazir/runs/latest/clarified/spec-hardened.md | \
223
+ codex exec -c model="$CODEX_MODEL" "Review this specification for: [dimension]. Be specific, cite sections. Say CLEAN if no issues." \
224
+ 2>&1 | tee .wazir/runs/latest/reviews/spec-challenge-review-pass-N.md
225
+ ```
226
+
227
+ For code artifacts, use `codex review -c model="$CODEX_MODEL" --uncommitted` (or `--base` for committed changes). See the next section for details.
228
+
229
+ ---
230
+
231
+ ## Code Review Scoping
232
+
233
+ **Rule: review BEFORE commit.**
234
+
235
+ For each task during execution:
236
+
237
+ 1. Implement the task (changes are uncommitted).
238
+ 2. Review the uncommitted changes using the **5 task-execution dimensions** (NOT the 7 final-review dimensions):
239
+ ```bash
240
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
241
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
242
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
243
+ "Review against acceptance criteria: <criteria>" \
244
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
245
+ ```
246
+ 3. Fix any findings (still uncommitted).
247
+ 4. Re-review until all passes exhausted or cap reached.
248
+ 5. **Only after review passes:** commit with conventional commit format.
249
+
250
+ **If changes are already committed** (e.g., subagent workflow where the implementer subagent commits before review):
251
+
252
+ ```bash
253
+ # Capture the SHA before the task starts
254
+ PRE_TASK_SHA=$(git rev-parse HEAD)
255
+
256
+ # ... subagent implements and commits ...
257
+
258
+ # Review the committed changes against the pre-task baseline
259
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
260
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
261
+ codex review -c model="$CODEX_MODEL" --base $PRE_TASK_SHA --title "Task NNN: <summary>" \
262
+ "Review against acceptance criteria: <criteria>" \
263
+ 2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
264
+ ```
265
+
266
+ ---
267
+
268
+ ## Dimension Sets
269
+
270
+ ### Research Dimensions (5)
271
+
272
+ 1. **Coverage** -- all briefing topics researched
273
+ 2. **Source quality** -- authoritative, current sources
274
+ 3. **Relevance** -- research answers the actual questions
275
+ 4. **Gaps** -- missing info that blocks later phases
276
+ 5. **Contradictions** -- conflicting sources identified
277
+
278
+ ### Spec/Clarification Dimensions (5)
279
+
280
+ 1. **Completeness** -- all requirements covered
281
+ 2. **Testability** -- each criterion verifiable
282
+ 3. **Ambiguity** -- no dual-interpretation statements
283
+ 4. **Assumptions** -- hidden assumptions explicit
284
+ 5. **Scope creep** -- nothing beyond briefing
285
+
286
+ ### Design-Review Dimensions (5)
287
+
288
+ Matches canonical `workflows/design-review.md`:
289
+
290
+ 1. **Spec coverage** -- does the design address every acceptance criterion with a visual component?
291
+ 2. **Design-spec consistency** -- does the design introduce anything not in the spec? (scope creep check)
292
+ 3. **Accessibility** -- color contrast ratios (WCAG 2.1 AA), focus states, touch target sizes (44x44px minimum)
293
+ 4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
294
+ 5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
295
+
296
+ ### Plan Dimensions (7)
297
+
298
+ 1. **Completeness** -- all design decisions mapped to tasks
299
+ 2. **Ordering** -- dependencies correct, parallelizable identified
300
+ 3. **Atomicity** -- each task fits one session
301
+ 4. **Testability** -- concrete verification per task
302
+ 5. **Edge cases** -- error paths covered
303
+ 6. **Security** -- auth, injection, data exposure
304
+ 7. **Integration** -- tasks connect end-to-end
305
+
306
+ ### Task Execution Dimensions (5)
307
+
308
+ Used for per-task review during execution:
309
+
310
+ 1. **Correctness** -- code matches spec
311
+ 2. **Tests** -- real tests, not mocked/faked
312
+ 3. **Wiring** -- all paths connected
313
+ 4. **Drift** -- matches task spec
314
+ 5. **Quality** -- naming, error handling
315
+
316
+ ### Final Review Dimensions (7)
317
+
318
+ Used for `workflows/review.md` scored gate:
319
+
320
+ 1. **Correctness** -- does the code do what the spec says?
321
+ 2. **Completeness** -- are all acceptance criteria met?
322
+ 3. **Wiring** -- are all paths connected end-to-end?
323
+ 4. **Verification** -- is there evidence (tests, type checks) for each claim?
324
+ 5. **Drift** -- does the implementation match the approved plan?
325
+ 6. **Quality** -- code style, naming, error handling, security
326
+ 7. **Documentation** -- changelog entries, commit messages, comments
327
+
328
+ The final review dimensions are the existing 7 from `skills/reviewer/SKILL.md`. `workflows/review.md` is not modified by this pattern.
329
+
330
+ ---
331
+
332
+ ## Per-Depth Coverage Contract
333
+
334
+ | Depth | Research | Spec | Design-Review | Plan | Task Execution | Final Review |
335
+ |-------|----------|------|---------------|------|----------------|--------------|
336
+ | Quick | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | always 7 dims, 1 pass |
337
+ | Standard | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | always 7 dims, 1 pass |
338
+ | Deep | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-7, 7 passes | dims 1-5, 7 passes | always 7 dims, 1 pass |
339
+
340
+ Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7 passes. No extension. No early-exit. Final review is always a single scored pass across all 7 dimensions -- it is a gate, not a loop.
341
+
342
+ ---
343
+
344
+ ## Loop Cap Configuration
345
+
346
+ The `workflow_policy` section of `run-config.yaml` (legacy: `phase_policy`) controls which workflows are enabled and sets an absolute safety ceiling per workflow. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not workflow policy.
347
+
348
+ ```yaml
349
+ workflow_policy:
350
+ # Clarifier phase workflows
351
+ discover: { enabled: true, loop_cap: 10 }
352
+ clarify: { enabled: true, loop_cap: 10 }
353
+ specify: { enabled: true, loop_cap: 10 }
354
+ spec-challenge: { enabled: true, loop_cap: 10 }
355
+ author: { enabled: false, loop_cap: 10 }
356
+ design: { enabled: true, loop_cap: 10 }
357
+ design-review: { enabled: true, loop_cap: 10 }
358
+ plan: { enabled: true, loop_cap: 10 }
359
+ plan-review: { enabled: true, loop_cap: 10 }
360
+ # Executor phase workflows
361
+ execute: { enabled: true, loop_cap: 10 }
362
+ verify: { enabled: true, loop_cap: 5 }
363
+ review: { enabled: true, loop_cap: 10 }
364
+ learn: { enabled: true, loop_cap: 5 }
365
+ prepare_next: { enabled: true, loop_cap: 5 }
366
+ run_audit: { enabled: false, loop_cap: 10 }
367
+ ```
368
+
369
+ **`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
370
+
371
+ **Adaptive workflows** (`author`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection.
372
+
373
+ **Post-run workflows** (`learn`, `prepare_next`) default to `enabled: true`. They run as part of the Final Review phase:
374
+
375
+ - `learn` extracts durable learnings from review findings -- recurring findings become accepted learnings.
376
+ - `prepare_next` prepares context and handoff for the next run.
377
+ - `author` has a human approval gate, not an iterative review loop.
378
+ - `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
379
+
380
+ ---
381
+
382
+ ## Reviewer Mode Table
383
+
384
+ The reviewer skill operates in different modes depending on the phase. **Mode is always explicit** -- the caller passes `--mode <mode>`. There is no auto-detection based on artifact availability.
385
+
386
+ | Mode | Invoked during | Prerequisites | Dimensions | Output |
387
+ |------|---------------|---------------|------------|--------|
388
+ | `final` | After execution + verification | Completed task artifacts in `.wazir/runs/latest/artifacts/` | 7 final-review dims, scored 0-70 | Verdict: PASS/NEEDS FIXES/NEEDS REWORK/FAIL |
389
+ | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Findings with severity, no score |
390
+ | `design-review` | After design approval | Design artifact, approved spec, accessibility guidelines | 5 design-review dims (canonical) | Findings with severity (blocking/advisory) |
391
+ | `plan-review` | After planning | Draft plan, approved spec, design artifact | 7 plan dims | Findings with severity, no score |
392
+ | `task-review` | During execution, per task | Uncommitted changes (or committed with known base SHA) | 5 task-execution dims | Pass/fail per task, no score |
393
+ | `research-review` | During discover | Research artifact | 5 research dims | Findings with severity, no score |
394
+ | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Findings with severity, no score |
395
+
396
+ If `--mode` is not provided, the reviewer asks the user which review to run. Auto-detection based on artifact availability is NOT used -- it causes ambiguity in resumed/multi-phase runs where stale artifacts from prior phases exist.
397
+
398
+ Each caller is responsible for passing the correct mode:
399
+
400
+ - Clarifier passes `--mode clarification-review` after Phase 1A
401
+ - Discover workflow passes `--mode research-review` after research
402
+ - Specifier flow passes `--mode spec-challenge` after specify
403
+ - Brainstorming passes `--mode design-review` after user approval
404
+ - Writing-plans passes `--mode plan-review` after planning
405
+ - Executor passes `--mode task-review` for each task
406
+ - `/wazir` runner passes `--mode final` for the final review gate
407
+
408
+ ---
409
+
410
+ ## Codex Prompt Templates
411
+
412
+ All Codex invocations read the model from config with a fallback:
413
+
414
+ ```bash
415
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
416
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
417
+ ```
418
+
419
+ ### Artifact Review (specs, plans, designs via stdin)
420
+
421
+ Use this template with `codex exec` for non-code artifacts piped via stdin:
422
+
423
+ ```bash
424
+ cat <artifact_path> | codex exec -c model="$CODEX_MODEL" \
425
+ "You are reviewing a [ARTIFACT_TYPE] for the Wazir engineering OS.
426
+ Focus on [DIMENSION]: [dimension description].
427
+ Rules: cite specific sections, be actionable, say CLEAN if no issues.
428
+ Do NOT load or invoke any skills. Do NOT read the codebase.
429
+ Review ONLY the content provided via stdin."
430
+ ```
431
+
432
+ Replace `[ARTIFACT_TYPE]` with: `specification`, `implementation plan`, `design document`, `research brief`, or `clarification`.
433
+ Replace `[DIMENSION]` and `[dimension description]` with the current review pass dimension from the relevant dimension set above.
434
+
435
+ ### Code Review (diffs via --uncommitted or --base)
436
+
437
+ Use this template with `codex review` for code changes:
438
+
439
+ ```bash
440
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
441
+ "Review the code changes for [DIMENSION]: [dimension description].
442
+ Check against acceptance criteria: [criteria].
443
+ Flag: correctness issues, missing tests, unwired paths, drift from spec.
444
+ Do NOT load or invoke any skills."
445
+ ```
446
+
447
+ For committed changes, replace `--uncommitted` with `--base <sha>`.
448
+ Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
449
+
450
+ ---
451
+
452
+ ## Codex Output Context Protection
453
+
454
+ Codex CLI output includes internal traces (file reads, tool calls, reasoning) that are NOT useful for the review — only the final findings matter. To prevent context flooding:
455
+
456
+ ### Tee + Extract Pattern
457
+
458
+ 1. **Always tee** Codex output to a file:
459
+ ```bash
460
+ codex exec ... 2>&1 | tee .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
461
+ ```
462
+
463
+ 2. **Extract findings** after the last `codex` marker using `execute_file`:
464
+ ```bash
465
+ # If context-mode available (has_execute_file: true):
466
+ mcp__plugin_context-mode_context-mode__execute_file(
467
+ path: ".wazir/runs/latest/reviews/<phase>-review-pass-<N>.md",
468
+ language: "shell",
469
+ code: "tac $FILE | sed '/^codex$/q' | tac | tail -n +2"
470
+ )
471
+ ```
472
+
473
+ 3. **Present extracted findings only** — the raw trace stays in the file for debugging but never enters the main context window.
474
+
475
+ ### Fallback (no context-mode)
476
+
477
+ If `context_mode.has_execute_file` is false, extract using shell directly:
478
+
479
+ ```bash
480
+ tac <file> | sed '/^codex$/q' | tac | tail -n +2
481
+ ```
482
+
483
+ This reverses the file, finds the first (= last original) `codex` marker, reverses back, and skips the marker line.
484
+
485
+ **If no marker found:** fail closed
486
+
487
+ ---
488
+
489
+ ## Phase Scoring: First vs Final Artifact Comparison
490
+
491
+ At the start of each review loop (pass 1), score the artifact on its phase's canonical dimension set (1-10 per dimension). At the end of the loop (final pass), score again using the **same canonical dimensions**. Present the delta in the end-of-phase report.
492
+
493
+ ### Canonical Dimension Sets Per Phase
494
+
495
+ These are the fixed rubrics — no ad-hoc dimension selection:
496
+
497
+ | Phase | Canonical Dimensions |
498
+ |-------|---------------------|
499
+ | research-review | Coverage, Source quality, Relevance, Gaps identified, Actionability |
500
+ | clarification-review / spec-challenge | Completeness, Testability, Ambiguity, Assumptions, Scope creep |
501
+ | design-review | Spec coverage, Design-spec consistency, Accessibility, Visual consistency, Exported-code fidelity |
502
+ | plan-review | Completeness, Testability, Task granularity, Dependency correctness, Phase structure, File coverage, Estimation accuracy |
503
+ | task-review | Correctness, Tests, Wiring, Drift, Quality |
504
+ | final | Correctness, Completeness, Wiring, Verification, Drift, Quality, Documentation |
505
+
506
+ ### Scoring Rules
507
+
508
+ 1. Initial and final scores MUST use the **same dimension set** — the delta is only meaningful on the same rubric.
509
+ 2. The reviewer records which dimension set was used in each pass file.
510
+ 3. Delta format: `Dimension: X/10 → Y/10 (+Z)`.
511
+
512
+ ### Quality Delta Report Section
513
+
514
+ The end-of-phase report (see "End-of-Phase Report" below) includes a **Quality Delta** section:
515
+
516
+ ```markdown
517
+ ## Quality Delta
518
+
519
+ | Dimension | Initial | Final | Delta |
520
+ |-----------|---------|-------|-------|
521
+ | Completeness | 4/10 | 9/10 | +5 |
522
+ | Testability | 3/10 | 8/10 | +5 |
523
+ | Ambiguity | 5/10 | 9/10 | +4 |
524
+ ```
525
+
526
+ ---
527
+
528
+ ## End-of-Phase Report
529
+
530
+ Every phase exit produces a report saved to `.wazir/runs/latest/reviews/<phase>-report.md` containing:
531
+
532
+ 1. **Summary** — what the phase produced
533
+ 2. **Key Changes** — first-version vs final-version highlights (not full diff — what improved)
534
+ 3. **Quality Delta** — per-dimension before/after scores (see Phase Scoring above)
535
+ 4. **Findings Log** — per-pass finding counts by severity (e.g., "Pass 1: 6 findings (3 blocking, 2 warning, 1 note). Pass 7: 0 findings. All resolved.")
536
+ 5. **Usage** — token usage from `wazir capture usage` (runs before report generation)
537
+ 6. **Context Savings** — context-mode stats if available, omit section if not
538
+ 7. **Time Spent** — wall-clock elapsed time from phase start to end — log "codex marker not found in output, cannot extract findings" and present a warning to the user with 0 findings extracted. The raw file is preserved for manual review. Do NOT fall back to `tail` or any best-effort extraction that could leak traces into context.
@@ -35,6 +35,7 @@ This is the lookup reference for canonical roles, workflows, and their contracts
35
35
  | `review` | `verify` | Adversarial quality review |
36
36
  | `learn` | `review` | Capture scoped learnings |
37
37
  | `prepare-next` | `learn` | Produce clean next-run handoff |
38
+ | `run-audit` | (standalone) | Structured codebase audit with source-backed findings |
38
39
 
39
40
  ## Role routing valid values
40
41