@wazir-dev/cli 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/CHANGELOG.md +31 -2
  2. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
  3. package/docs/reference/review-loop-pattern.md +429 -0
  4. package/docs/reference/tooling-cli.md +2 -0
  5. package/docs/truth-claims.yaml +6 -0
  6. package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
  7. package/exports/hosts/claude/.claude/agents/designer.md +3 -0
  8. package/exports/hosts/claude/.claude/agents/executor.md +2 -0
  9. package/exports/hosts/claude/.claude/agents/planner.md +3 -0
  10. package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
  11. package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
  12. package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
  13. package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
  14. package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
  15. package/exports/hosts/claude/.claude/commands/design.md +4 -0
  16. package/exports/hosts/claude/.claude/commands/discover.md +4 -0
  17. package/exports/hosts/claude/.claude/commands/execute.md +4 -0
  18. package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
  19. package/exports/hosts/claude/.claude/commands/plan.md +4 -0
  20. package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
  21. package/exports/hosts/claude/.claude/commands/specify.md +4 -0
  22. package/exports/hosts/claude/.claude/commands/verify.md +4 -0
  23. package/exports/hosts/claude/export.manifest.json +19 -19
  24. package/exports/hosts/codex/export.manifest.json +19 -19
  25. package/exports/hosts/cursor/export.manifest.json +19 -19
  26. package/exports/hosts/gemini/export.manifest.json +19 -19
  27. package/hooks/definitions/loop_cap_guard.yaml +1 -1
  28. package/hooks/hooks.json +18 -0
  29. package/package.json +3 -2
  30. package/roles/clarifier.md +3 -0
  31. package/roles/designer.md +3 -0
  32. package/roles/executor.md +2 -0
  33. package/roles/planner.md +3 -0
  34. package/roles/researcher.md +2 -0
  35. package/roles/reviewer.md +5 -1
  36. package/roles/specifier.md +3 -0
  37. package/skills/brainstorming/SKILL.md +139 -38
  38. package/skills/clarifier/SKILL.md +219 -0
  39. package/skills/debugging/SKILL.md +11 -1
  40. package/skills/executing-plans/SKILL.md +15 -2
  41. package/skills/executor/SKILL.md +76 -0
  42. package/skills/init-pipeline/SKILL.md +106 -17
  43. package/skills/receiving-code-review/SKILL.md +8 -0
  44. package/skills/requesting-code-review/SKILL.md +25 -5
  45. package/skills/reviewer/SKILL.md +151 -0
  46. package/skills/subagent-driven-development/SKILL.md +25 -2
  47. package/skills/tdd/SKILL.md +8 -0
  48. package/skills/wazir/SKILL.md +250 -43
  49. package/skills/writing-plans/SKILL.md +31 -4
  50. package/templates/examples/wazir-manifest.example.yaml +1 -1
  51. package/tooling/src/capture/command.js +87 -1
  52. package/tooling/src/capture/run-config.js +21 -0
  53. package/tooling/src/checks/brand-truth.js +3 -6
  54. package/tooling/src/checks/command-registry.js +1 -0
  55. package/tooling/src/checks/docs-truth.js +1 -1
  56. package/tooling/src/checks/runtime-surface.js +3 -7
  57. package/tooling/src/cli.js +8 -3
  58. package/tooling/src/init/command.js +201 -0
  59. package/wazir.manifest.yaml +0 -3
  60. package/workflows/clarify.md +4 -0
  61. package/workflows/design-review.md +4 -0
  62. package/workflows/design.md +4 -0
  63. package/workflows/discover.md +4 -0
  64. package/workflows/execute.md +4 -0
  65. package/workflows/plan-review.md +4 -0
  66. package/workflows/plan.md +4 -0
  67. package/workflows/spec-challenge.md +4 -0
  68. package/workflows/specify.md +4 -0
  69. package/workflows/verify.md +4 -0
@@ -5,27 +5,35 @@ description: Initialize the Wazir pipeline with interactive setup. Creates proje
5
5
 
6
6
  # Initialize Pipeline
7
7
 
8
- Set up the Wazir pipeline for this project. All questions use numbered interactive options — one question at a time.
8
+ Set up the Wazir pipeline for this project.
9
9
 
10
10
  ## Step 0: Check Wazir CLI
11
11
 
12
12
  Run `which wazir` to check if the CLI is installed.
13
13
 
14
+ **If installed** — run `wazir init` and let it handle the interactive setup (arrow-key selection). If the pipeline was already initialized, use `wazir init --force` to reinitialize. Once it completes, skip to Step 9 (Confirm).
15
+
14
16
  **If not installed**, present:
15
17
 
16
18
  > **The Wazir CLI is not installed. It's required for event capture, validation, and indexing.**
17
19
  >
18
20
  > **How would you like to install it?**
19
21
  >
20
- > 1. **npm** (Recommended) — `npm install -g wazir`
22
+ > 1. **npm** (Recommended) — `npm install -g @wazir-dev/cli`
21
23
  > 2. **Local link** — `npm link` from the Wazir project root
22
24
  > 3. **Skip** — Continue without the CLI (some features will be unavailable)
23
25
 
24
- If the user picks 1, run `npm install -g wazir` and verify with `wazir --version`.
26
+ Wait for the user to answer before continuing.
27
+
28
+ If the user picks 1, run `npm install -g @wazir-dev/cli` and verify with `wazir --version`.
25
29
  If the user picks 2, run `npm link` from the project root and verify.
26
- If the user picks 3, warn that `wazir capture`, `wazir validate`, and `wazir index` commands will not work, then continue.
30
+ If the user picks 3, warn that `wazir capture`, `wazir validate`, and `wazir index` commands will not work, then continue to the manual steps below.
31
+
32
+ After installing, run `wazir init` and let it handle the rest. Skip to Step 9.
33
+
34
+ ---
27
35
 
28
- **If installed**, run `wazir doctor --json` to verify repo health and continue.
36
+ **The steps below are the manual fallback only used when the CLI is not installed and the user chose to skip installation.**
29
37
 
30
38
  ## Step 1: Create Project Directories
31
39
 
@@ -39,9 +47,9 @@ Present this question:
39
47
 
40
48
  > **How should Wazir run in this project?**
41
49
  >
42
- > 1. **Claude only** (Recommended) — Everything runs in Claude Code. Single model, slash commands only.
43
- > 2. **Multi-model** — Still Claude Code, but routes tasks by complexity (Haiku for micro, Sonnet for standard, Opus for complex).
44
- > 3. **Multi-tool** — Claude Code + external tools for reviews.
50
+ > 1. **Single model** (Recommended) — Everything runs in the current model. Single model, slash commands only.
51
+ > 2. **Multi-model** — Still the current model, but routes tasks by complexity (Haiku for micro, Sonnet for standard, Opus for complex).
52
+ > 3. **Multi-tool** — Current model + external tools for reviews.
45
53
 
46
54
  Wait for the user to answer before continuing.
47
55
 
@@ -51,37 +59,118 @@ Only ask this if the user selected option 3:
51
59
 
52
60
  > **Which external tools should Wazir use for reviews?**
53
61
  >
54
- > 1. **Codex** — Send reviews to OpenAI Codex
62
+ > 1. **Codex** (Recommended) — Send reviews to OpenAI Codex
55
63
  > 2. **Gemini** — Send reviews to Google Gemini
56
64
  > 3. **Both** — Use Codex and Gemini as secondary reviewers
57
65
 
58
66
  Wait for the user to answer before continuing.
59
67
 
60
- ## Step 4: Write Config
68
+ ## Step 3.5: Codex Model (conditional)
69
+
70
+ Only ask this if Codex was selected in Step 3:
71
+
72
+ > **Which Codex model should Wazir use?**
73
+ >
74
+ > 1. **gpt-5.3-codex-spark** (Recommended) — Fast, good for review loops and grounder work
75
+ > 2. **gpt-5.4** — Slower, deeper analysis for complex reviews
76
+ >
77
+ > *You can change this later in `.wazir/state/config.json` under `multi_tool.codex.model`.*
78
+
79
+ Wait for the user to answer before continuing.
80
+
81
+ ## Step 4: Default Depth
82
+
83
+ > **What default depth should runs use?**
84
+ >
85
+ > 1. **Quick** — Minimal research, single-pass review, fast execution. Good for small fixes and config changes.
86
+ > 2. **Standard** (Recommended) — Balanced research, multi-pass hardening, full review. Good for most features.
87
+ > 3. **Deep** — Extended research, thorough hardening, strict review thresholds. Good for complex or security-critical work.
88
+ >
89
+ > *This sets the project default. Individual runs can override via inline modifiers (e.g. `/wazir quick ...`).*
90
+
91
+ Wait for the user to answer before continuing.
92
+
93
+ ## Step 5: Default Intent
94
+
95
+ > **What kind of work does this project mostly involve?**
96
+ >
97
+ > 1. **Feature** (Recommended) — New functionality or enhancement
98
+ > 2. **Bugfix** — Fix broken behavior
99
+ > 3. **Refactor** — Restructure without changing behavior
100
+ > 4. **Docs** — Documentation only
101
+ > 5. **Spike** — Research and exploration, no production code
102
+ >
103
+ > *This sets the project default. Individual runs can override via inline modifiers or when intent is obvious from the request.*
104
+
105
+ Wait for the user to answer before continuing.
106
+
107
+ ## Step 6: Agent Teams (conditional)
108
+
109
+ Only ask this if ALL of these are true:
110
+ - The host is Claude Code (not Codex/Gemini/Cursor)
111
+ - Default depth is `standard` or `deep`
112
+ - Default intent is `feature` or `refactor` (not bugfix/docs/spike)
113
+
114
+ > **Would you like to use Agent Teams for parallel execution?**
115
+ >
116
+ > 1. **No** (Recommended) — Tasks run sequentially. Predictable, lower cost.
117
+ > 2. **Yes** — Spawns parallel teammates for independent tasks. Potentially faster and richer output.
118
+ >
119
+ > *Agent Teams is experimental from Claude's side. Requires Opus model. Higher token consumption.*
120
+
121
+ If the conditions above are NOT met, silently default to `team_mode: sequential`.
122
+
123
+ If the user selects **Yes**, enable the Agent Teams experimental feature:
124
+
125
+ ```bash
126
+ claude config set env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS 1
127
+ ```
128
+
129
+ Then inform the user they need to restart their Claude Code session for it to take effect.
130
+
131
+ Wait for the user to answer before continuing.
132
+
133
+ ## Step 7: Write Config
61
134
 
62
135
  Create/update `.wazir/state/config.json`:
63
136
 
64
137
  - Set `model_mode` to the selected mode (`claude-only`, `multi-model`, or `multi-tool`)
65
138
  - If `multi-tool`, set `multi_tool.tools` to the selected tools (e.g. `["codex"]`, `["gemini"]`, or `["codex", "gemini"]`)
139
+ - Set `default_depth` to the selected depth (`quick`, `standard`, or `deep`)
140
+ - Set `default_intent` to the selected intent (`feature`, `bugfix`, `refactor`, `docs`, or `spike`)
141
+ - Set `team_mode` to the selected mode (`sequential` or `parallel`)
142
+ - If `team_mode` is `parallel`, set `parallel_backend` to `claude_teams`
143
+ - If Codex selected, set `multi_tool.codex.model` to the chosen model
66
144
 
67
- Example for claude-only:
145
+ Example for claude-only with defaults:
68
146
  ```json
69
147
  {
70
- "model_mode": "claude-only"
148
+ "model_mode": "claude-only",
149
+ "default_depth": "standard",
150
+ "default_intent": "feature",
151
+ "team_mode": "sequential",
152
+ "parallel_backend": "none"
71
153
  }
72
154
  ```
73
155
 
74
- Example for multi-tool with codex:
156
+ Example for multi-tool with teams enabled:
75
157
  ```json
76
158
  {
77
159
  "model_mode": "multi-tool",
78
160
  "multi_tool": {
79
- "tools": ["codex"]
80
- }
161
+ "tools": ["codex"],
162
+ "codex": {
163
+ "model": "gpt-5.3-codex-spark"
164
+ }
165
+ },
166
+ "default_depth": "deep",
167
+ "default_intent": "feature",
168
+ "team_mode": "parallel",
169
+ "parallel_backend": "claude_teams"
81
170
  }
82
171
  ```
83
172
 
84
- ## Step 5: Runtime-Specific Setup
173
+ ## Step 8: Runtime-Specific Setup
85
174
 
86
175
  Based on `multi_tool.tools`:
87
176
 
@@ -104,7 +193,7 @@ Based on `multi_tool.tools`:
104
193
 
105
194
  - If **both**: Create both files.
106
195
 
107
- ## Step 6: Confirm
196
+ ## Step 9: Confirm
108
197
 
109
198
  List all files created and show the selected mode. Then present:
110
199
 
@@ -11,6 +11,14 @@ Code review requires technical evaluation, not emotional performance.
11
11
 
12
12
  **Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
13
13
 
14
+ ## Loop Tracking
15
+
16
+ When receiving review findings, the fix-and-re-review cycle follows the review loop pattern:
17
+ - **Pipeline mode** (`.wazir/runs/latest/` exists): track iterations via `wazir capture loop-check`. If the cap is reached (exit 43), escalate to the user with current state and evidence.
18
+ - **Standalone mode** (no `.wazir/runs/latest/`): the loop runs for `pass_counts[depth]` passes (quick=3, standard=5, deep=7) with no cap guard. Track iteration count manually.
19
+
20
+ Reference `docs/reference/review-loop-pattern.md` for the full loop contract.
21
+
14
22
  ## The Response Pattern
15
23
 
16
24
  ```
@@ -7,7 +7,7 @@ description: Use when completing tasks, implementing major features, or before m
7
7
 
8
8
  Dispatch wz:code-reviewer subagent to catch issues before they cascade. The reviewer gets precisely crafted context for evaluation — never your session's history. This keeps the reviewer focused on the work product, not your thought process, and preserves your own context for continued work.
9
9
 
10
- **Core principle:** Review early, review often.
10
+ **Core principle:** Review early, review often. Review follows the loop pattern in `docs/reference/review-loop-pattern.md`. Dispatch the reviewer with explicit `--mode` and depth-aware loop parameters.
11
11
 
12
12
  ## When to Request Review
13
13
 
@@ -23,22 +23,36 @@ Dispatch wz:code-reviewer subagent to catch issues before they cascade. The revi
23
23
 
24
24
  ## How to Request
25
25
 
26
- **1. Get git SHAs:**
26
+ **1. Get git SHAs and scope the review:**
27
+
28
+ Use `--uncommitted` for uncommitted changes, `--base <sha>` for committed changes.
29
+
27
30
  ```bash
31
+ # For committed changes:
28
32
  BASE_SHA=$(git rev-parse HEAD~1) # or origin/main
29
33
  HEAD_SHA=$(git rev-parse HEAD)
34
+ codex review --base $BASE_SHA
35
+
36
+ # For uncommitted changes:
37
+ codex review --uncommitted
30
38
  ```
31
39
 
32
- **2. Dispatch code-reviewer subagent:**
40
+ **2. Dispatch code-reviewer subagent with loop config:**
33
41
 
34
42
  Use Task tool with wz:code-reviewer type, fill template at `./code-reviewer.md`
35
43
 
44
+ Include explicit loop parameters:
45
+ - `--mode` (e.g., `task-review`, `final`)
46
+ - Depth-aware dimensions and cap from `phase_policy`
47
+ - Review pass number (for log filenames)
48
+
36
49
  **Placeholders:**
37
50
  - `{WHAT_WAS_IMPLEMENTED}` - What you just built
38
51
  - `{PLAN_OR_REQUIREMENTS}` - What it should do
39
52
  - `{BASE_SHA}` - Starting commit
40
53
  - `{HEAD_SHA}` - Ending commit
41
54
  - `{DESCRIPTION}` - Brief summary
55
+ - `{REVIEW_MODE}` - Explicit review mode (e.g., task-review)
42
56
 
43
57
  **3. Act on feedback:**
44
58
  - Fix Critical issues immediately
@@ -46,6 +60,10 @@ Use Task tool with wz:code-reviewer type, fill template at `./code-reviewer.md`
46
60
  - Note Minor issues for later
47
61
  - Push back if reviewer is wrong (with reasoning)
48
62
 
63
+ ### Codex Error Handling
64
+
65
+ If codex exits non-zero during review, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
66
+
49
67
  ## Example
50
68
 
51
69
  ```
@@ -56,12 +74,13 @@ You: Let me request code review before proceeding.
56
74
  BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}')
57
75
  HEAD_SHA=$(git rev-parse HEAD)
58
76
 
59
- [Dispatch wz:code-reviewer subagent]
77
+ [Dispatch wz:code-reviewer subagent with --mode task-review]
60
78
  WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index
61
79
  PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md
62
80
  BASE_SHA: a7981ec
63
81
  HEAD_SHA: 3df7661
64
82
  DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types
83
+ REVIEW_MODE: task-review
65
84
 
66
85
  [Subagent returns]:
67
86
  Strengths: Clean architecture, real tests
@@ -82,7 +101,7 @@ You: [Fix progress indicators]
82
101
  - Fix before moving to next task
83
102
 
84
103
  **Executing Plans:**
85
- - Review after each batch (3 tasks)
104
+ - Review after each task (per-task review checkpoint)
86
105
  - Get feedback, apply, continue
87
106
 
88
107
  **Ad-Hoc Development:**
@@ -96,6 +115,7 @@ You: [Fix progress indicators]
96
115
  - Ignore Critical issues
97
116
  - Proceed with unfixed Important issues
98
117
  - Argue with valid technical feedback
118
+ - Dispatch review without explicit `--mode`
99
119
 
100
120
  **If reviewer wrong:**
101
121
  - Push back with technical reasoning
@@ -0,0 +1,151 @@
1
+ ---
2
+ name: wz:reviewer
3
+ description: Run the review phase — adversarial review of implementation against the approved spec, plan, and verification evidence.
4
+ ---
5
+
6
+ # Reviewer
7
+
8
+ Run Phase 3 (Review) for the current project.
9
+
10
+ The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
11
+
12
+ ## Review Modes
13
+
14
+ The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
15
+
16
+ | Mode | Invoked during | Prerequisites | Dimensions | Output |
17
+ |------|---------------|---------------|------------|--------|
18
+ | `final` | After execution + verification | Completed task artifacts, approved spec/plan/design | 7 final-review dims, scored 0-70 | Scored verdict (PASS/FAIL) |
19
+ | `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Pass/fix loop, no score |
20
+ | `design-review` | After design approval | Design artifact, approved spec | 5 design-review dims (canonical) | Pass/fix loop, no score |
21
+ | `plan-review` | After planning | Draft plan artifact | 7 plan dims | Pass/fix loop, no score |
22
+ | `task-review` | During execution, per task | Uncommitted changes or `--base` SHA | 5 task-execution dims (correctness, tests, wiring, drift, quality) | Pass/fix loop, no score |
23
+ | `research-review` | During discover | Research artifact | 5 research dims | Pass/fix loop, no score |
24
+ | `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Pass/fix loop, no score |
25
+
26
+ Each mode follows the review loop pattern in `docs/reference/review-loop-pattern.md`. Pass counts are fixed by depth (quick=3, standard=5, deep=7). No extension.
27
+
28
+ ### CHANGELOG Enforcement
29
+
30
+ In `task-review` and `final` modes, flag missing CHANGELOG entries for user-facing changes as **[warning]** severity. User-facing changes include new features, behavior changes, and bug fixes visible to users. Internal changes (refactors, tooling, tests) do not require CHANGELOG entries.
31
+
32
+ ## Prerequisites
33
+
34
+ Prerequisites depend on the review mode:
35
+
36
+ ### `final` mode
37
+ 1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
38
+ 2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
39
+ 3. Read `.wazir/state/config.json` for depth and multi_tool settings.
40
+
41
+ ### `task-review` mode
42
+ 1. Uncommitted changes exist for the current task, or a `--base` SHA is provided for committed changes.
43
+ 2. Read `.wazir/state/config.json` for depth and multi_tool settings.
44
+
45
+ ### `spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes
46
+ 1. The appropriate input artifact for the mode exists.
47
+ 2. Read `.wazir/state/config.json` for depth and multi_tool settings.
48
+
49
+ ## Review Process (`final` mode)
50
+
51
+ Perform adversarial review across 7 dimensions:
52
+
53
+ 1. **Correctness** — Does the code do what the spec says?
54
+ 2. **Completeness** — Are all acceptance criteria met?
55
+ 3. **Wiring** — Are all paths connected end-to-end?
56
+ 4. **Verification** — Is there evidence (tests, type checks) for each claim?
57
+ 5. **Drift** — Does the implementation match the approved plan?
58
+ 6. **Quality** — Code style, naming, error handling, security
59
+ 7. **Documentation** — Changelog entries, commit messages, comments
60
+
61
+ ## Context Retrieval
62
+
63
+ - Read the diff first (primary input)
64
+ - Use `wazir index search-symbols <name>` to locate related code
65
+ - Use `wazir recall symbol <name-or-id> --tier L1` to check structural alignment
66
+ - Read files directly for: logic errors, missing edge cases, integration concerns
67
+
68
+ ## Scoring (`final` mode)
69
+
70
+ Score each dimension 0-10. Total out of 70.
71
+
72
+ | Verdict | Score | Action |
73
+ |---------|-------|--------|
74
+ | **PASS** | 56+ | Ready for PR or merge |
75
+ | **NEEDS MINOR FIXES** | 42-55 | Auto-fix and re-review |
76
+ | **NEEDS REWORK** | 28-41 | Re-run affected tasks |
77
+ | **FAIL** | 0-27 | Fundamental issues |
78
+
79
+ ## Secondary Review
80
+
81
+ Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers, run them **after** your own review and **before** producing the final verdict.
82
+
83
+ ### Codex Review
84
+
85
+ If `codex` is in `multi_tool.tools`:
86
+
87
+ 1. Run Codex review against the current changes:
88
+ ```bash
89
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
90
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
91
+ codex review -c model="$CODEX_MODEL" --uncommitted --title "Wazir review: <brief summary>" \
92
+ "Review against these acceptance criteria: <paste criteria from spec>" \
93
+ 2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
94
+ ```
95
+ Or if changes are committed:
96
+ ```bash
97
+ CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
98
+ CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
99
+ codex review -c model="$CODEX_MODEL" --base <base-branch> --title "Wazir review: <brief summary>" \
100
+ "Review against these acceptance criteria: <paste criteria from spec>" \
101
+ 2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
102
+ ```
103
+
104
+ 2. Read the Codex findings from `.wazir/runs/latest/reviews/codex-review.md`
105
+ 3. Incorporate Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
106
+
107
+ **Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use self-review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
108
+
109
+ **Code review scoping by mode:**
110
+ - Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
111
+ - Use `--base <sha>` when reviewing committed changes.
112
+ - Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
113
+ - See `docs/reference/review-loop-pattern.md` for code review scoping rules.
114
+
115
+ ### Gemini Review
116
+
117
+ If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available.
118
+
119
+ ### Merging Findings
120
+
121
+ The final review report must clearly attribute each finding:
122
+ - `[Wazir]` — found by primary review
123
+ - `[Codex]` — found by Codex secondary review
124
+ - `[Both]` — found independently by both
125
+
126
+ ## Task-Review Log Filenames
127
+
128
+ In `task-review` mode, use task-scoped log filenames and cap tracking:
129
+ - Log filenames: `.wazir/runs/latest/reviews/execute-task-<NNN>-review-pass-<N>.md`
130
+ - Cap tracking: `wazir capture loop-check --task-id <NNN>` (each task has its own independent cap counter)
131
+
132
+ ## Output
133
+
134
+ Save review results to `.wazir/runs/latest/reviews/review.md` with:
135
+ - Findings with severity (blocking, warning, note)
136
+ - Rationale tied to evidence
137
+ - Score breakdown
138
+ - Verdict
139
+
140
+ ## Done
141
+
142
+ Present the verdict and offer next steps:
143
+
144
+ > **Review complete: [VERDICT] ([score]/70)**
145
+ >
146
+ > [Score breakdown and findings summary]
147
+ >
148
+ > **What would you like to do?**
149
+ > 1. **Create a PR** (if PASS)
150
+ > 2. **Auto-fix and re-review** (if MINOR FIXES)
151
+ > 3. **Review findings in detail**
@@ -45,6 +45,7 @@ digraph process {
45
45
 
46
46
  subgraph cluster_per_task {
47
47
  label="Per Task";
48
+ "Capture PRE_TASK_SHA" [shape=box];
48
49
  "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
49
50
  "Implementer subagent asks questions?" [shape=diamond];
50
51
  "Answer questions, provide context" [shape=box];
@@ -63,7 +64,8 @@ digraph process {
63
64
  "Dispatch final code reviewer subagent for entire implementation" [shape=box];
64
65
  "Use wz:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
65
66
 
66
- "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)";
67
+ "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Capture PRE_TASK_SHA";
68
+ "Capture PRE_TASK_SHA" -> "Dispatch implementer subagent (./implementer-prompt.md)";
67
69
  "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
68
70
  "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
69
71
  "Answer questions, provide context" -> "Implementer subagent implements, tests, commits, self-reviews";
@@ -78,12 +80,32 @@ digraph process {
78
80
  "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
79
81
  "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)";
80
82
  "Mark task complete in TodoWrite" -> "More tasks remain?";
81
- "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
83
+ "More tasks remain?" -> "Capture PRE_TASK_SHA" [label="yes"];
82
84
  "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"];
83
85
  "Dispatch final code reviewer subagent for entire implementation" -> "Use wz:finishing-a-development-branch";
84
86
  }
85
87
  ```
86
88
 
89
+ ### Code Review Scoping
90
+
91
+ The implementer subagent commits before review. The spec reviewer and code quality reviewer must use `codex review --base <pre-task-sha>` to scope their review to the task's changes. Capture `PRE_TASK_SHA=$(git rev-parse HEAD)` before dispatching the implementer.
92
+
93
+ ### Review Loop Alignment
94
+
95
+ Both review stages follow the review loop pattern in `docs/reference/review-loop-pattern.md` with explicit `--mode task-review`:
96
+ - **Spec compliance review:** Uses spec dimensions with `--mode task-review`
97
+ - **Code quality review:** Uses 5 task-execution dimensions with `--mode task-review`
98
+
99
+ Review logs use task-scoped filenames: `execute-task-<NNN>-review-pass-<N>.md`
100
+
101
+ Each review stage respects the loop cap via `wazir capture loop-check --task-id <NNN>`. If the cap is reached (exit 43), escalate to the controller (you) for a decision.
102
+
103
+ ### Codex Error Handling
104
+
105
+ If codex exits non-zero during review, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
106
+
107
+ **Standalone mode:** When no `.wazir/runs/latest/` exists, review logs go to `docs/plans/`.
108
+
87
109
  ## Prompt Templates
88
110
 
89
111
  - `./implementer-prompt.md` - Dispatch implementer subagent
@@ -137,6 +159,7 @@ digraph process {
137
159
  - Let implementer self-review replace actual review (both are needed)
138
160
  - **Start code quality review before spec compliance is PASS** (wrong order)
139
161
  - Move to next task while either review has open issues
162
+ - **Review the wrong diff -- always scope to the current task's changes using --base**
140
163
 
141
164
  **If subagent asks questions:**
142
165
  - Answer clearly and completely
@@ -10,6 +10,12 @@ Sequence:
10
10
  1. RED
11
11
  Write or update a test that expresses the new behavior or the bug being fixed, then run it and confirm failure.
12
12
 
13
+ **Test quality check (single-pass):** Before proceeding to GREEN, verify:
14
+ - Are these tests testing the right behavior?
15
+ - Are they real assertions, not tautologies?
16
+ - Do they fail for the right reason (not a syntax error or import failure)?
17
+ If any check fails, fix the test before moving on. This is a single-pass quality check, not a full review loop.
18
+
13
19
  2. GREEN
14
20
  Write the smallest implementation change that makes the failing test pass.
15
21
 
@@ -21,3 +27,5 @@ Rules:
21
27
  - do not skip the failing-test step when automated verification is feasible
22
28
  - do not rewrite tests to fit broken behavior
23
29
  - rerun verification after each meaningful refactor
30
+
31
+ For the full review loop pattern, see `docs/reference/review-loop-pattern.md`. TDD uses a single-pass quality check, not the full loop.