@wazir-dev/cli 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +31 -2
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
- package/docs/reference/review-loop-pattern.md +429 -0
- package/docs/reference/tooling-cli.md +2 -0
- package/docs/truth-claims.yaml +6 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
- package/exports/hosts/claude/.claude/agents/designer.md +3 -0
- package/exports/hosts/claude/.claude/agents/executor.md +2 -0
- package/exports/hosts/claude/.claude/agents/planner.md +3 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
- package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/design.md +4 -0
- package/exports/hosts/claude/.claude/commands/discover.md +4 -0
- package/exports/hosts/claude/.claude/commands/execute.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan.md +4 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
- package/exports/hosts/claude/.claude/commands/specify.md +4 -0
- package/exports/hosts/claude/.claude/commands/verify.md +4 -0
- package/exports/hosts/claude/export.manifest.json +19 -19
- package/exports/hosts/codex/export.manifest.json +19 -19
- package/exports/hosts/cursor/export.manifest.json +19 -19
- package/exports/hosts/gemini/export.manifest.json +19 -19
- package/hooks/definitions/loop_cap_guard.yaml +1 -1
- package/hooks/hooks.json +18 -0
- package/package.json +3 -2
- package/roles/clarifier.md +3 -0
- package/roles/designer.md +3 -0
- package/roles/executor.md +2 -0
- package/roles/planner.md +3 -0
- package/roles/researcher.md +2 -0
- package/roles/reviewer.md +5 -1
- package/roles/specifier.md +3 -0
- package/skills/brainstorming/SKILL.md +139 -38
- package/skills/clarifier/SKILL.md +219 -0
- package/skills/debugging/SKILL.md +11 -1
- package/skills/executing-plans/SKILL.md +15 -2
- package/skills/executor/SKILL.md +76 -0
- package/skills/init-pipeline/SKILL.md +106 -17
- package/skills/receiving-code-review/SKILL.md +8 -0
- package/skills/requesting-code-review/SKILL.md +25 -5
- package/skills/reviewer/SKILL.md +151 -0
- package/skills/subagent-driven-development/SKILL.md +25 -2
- package/skills/tdd/SKILL.md +8 -0
- package/skills/wazir/SKILL.md +250 -43
- package/skills/writing-plans/SKILL.md +31 -4
- package/templates/examples/wazir-manifest.example.yaml +1 -1
- package/tooling/src/capture/command.js +87 -1
- package/tooling/src/capture/run-config.js +21 -0
- package/tooling/src/checks/brand-truth.js +3 -6
- package/tooling/src/checks/command-registry.js +1 -0
- package/tooling/src/checks/docs-truth.js +1 -1
- package/tooling/src/checks/runtime-surface.js +3 -7
- package/tooling/src/cli.js +8 -3
- package/tooling/src/init/command.js +201 -0
- package/wazir.manifest.yaml +0 -3
- package/workflows/clarify.md +4 -0
- package/workflows/design-review.md +4 -0
- package/workflows/design.md +4 -0
- package/workflows/discover.md +4 -0
- package/workflows/execute.md +4 -0
- package/workflows/plan-review.md +4 -0
- package/workflows/plan.md +4 -0
- package/workflows/spec-challenge.md +4 -0
- package/workflows/specify.md +4 -0
- package/workflows/verify.md +4 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,9 +1,15 @@
|
|
|
1
|
-
# 1.0.0 (2026-03-
|
|
1
|
+
# [1.1.0](https://github.com/MohamedAbdallah-14/Wazir/compare/v1.0.0...v1.1.0) (2026-03-18)
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
### Bug Fixes
|
|
5
|
+
|
|
6
|
+
* address review findings — tests, Codex wiring, Teams, pipeline CLI integration ([0b03215](https://github.com/MohamedAbdallah-14/Wazir/commit/0b032150c4a7967ba070eccdced513f55343fc65))
|
|
7
|
+
* CI changelog gate + CodeRabbit review findings ([0247941](https://github.com/MohamedAbdallah-14/Wazir/commit/024794136b7a44116ef2c4f5fcc23823bc72e7fc))
|
|
2
8
|
|
|
3
9
|
|
|
4
10
|
### Features
|
|
5
11
|
|
|
6
|
-
*
|
|
12
|
+
* add core review loop pattern across all pipeline phases ([aa4c1d8](https://github.com/MohamedAbdallah-14/Wazir/commit/aa4c1d8400e69ab4fe943043705a862f9e5861f3))
|
|
7
13
|
|
|
8
14
|
# Changelog
|
|
9
15
|
|
|
@@ -12,3 +18,26 @@ All notable changes to this project will be documented in this file.
|
|
|
12
18
|
The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
13
19
|
|
|
14
20
|
## [Unreleased]
|
|
21
|
+
|
|
22
|
+
### Added
|
|
23
|
+
- Core review loop pattern across all pipeline phases with Codex CLI integration
|
|
24
|
+
- `wazir capture loop-check` CLI subcommand with task-scoped cap tracking and run-config loader
|
|
25
|
+
- `wazir init` interactive CLI command with arrow-key selection (depth, intent, teams, codex model)
|
|
26
|
+
- `docs/reference/review-loop-pattern.md` canonical reference for the review loop pattern
|
|
27
|
+
- Standalone skills: `/wazir:clarifier`, `/wazir:executor`, `/wazir:reviewer`
|
|
28
|
+
- Agent Teams real implementation in brainstorming (TeamCreate, SendMessage, TeamDelete)
|
|
29
|
+
- Codex prompt templates (artifact + code) with "Do NOT load skills" instruction
|
|
30
|
+
- Git branch enforcement in `/wazir` runner (validates branch, offers to create feature branch)
|
|
31
|
+
- CLI wiring across pipeline phases (doctor gate, index build/refresh, capture events, validate gates)
|
|
32
|
+
- CHANGELOG enforcement in executor and reviewer skills
|
|
33
|
+
- 10 new tests: 7 for handleLoopCheck, 4 for init command (406 total)
|
|
34
|
+
|
|
35
|
+
### Changed
|
|
36
|
+
- All Codex CLI calls now read model from `config.multi_tool.codex.model` with fallback to `gpt-5.4`
|
|
37
|
+
- Producer-reviewer separation enforced: no role reviews its own output
|
|
38
|
+
- Reviewer skill is phase-aware with 7 explicit modes (final, spec-challenge, design-review, plan-review, task-review, research-review, clarification-review)
|
|
39
|
+
- Brainstorming design-review gate replaces direct handoff to writing-plans
|
|
40
|
+
- Clarifier delegates research to discover workflow, spec to specify workflow, planning to writing-plans
|
|
41
|
+
- `/wazir` runner pipeline rewritten with all manifest phases and review loops
|
|
42
|
+
- Wazir CLI is now required (removed "Skip" option)
|
|
43
|
+
- Fixed pass counts: quick=3, standard=5, deep=7 (no extension)
|
|
@@ -29,7 +29,7 @@ Before starting any implementation, verify all of the following:
|
|
|
29
29
|
- [ ] **Node.js >= 20.0.0** installed
|
|
30
30
|
- [ ] **`npm test` passes on the clean branch** with zero failures
|
|
31
31
|
- [ ] **`wazir export --check` passes** on the clean branch (no pre-existing drift)
|
|
32
|
-
- [ ] **All 13 task spec files reviewed** in `.
|
|
32
|
+
- [ ] **All 13 task spec files reviewed** in `.wazir/tasks/clarified/` (004-016)
|
|
33
33
|
- [ ] **`tooling/src/capture/command.js` imports confirmed:** `fs` (line 1) and `path` (line 2) are already imported -- no additional module imports needed for task 006
|
|
34
34
|
- [ ] **`tooling/test/capture.test.js` fixture pattern confirmed:** `createCaptureFixture()` provides `fixtureRoot`, `stateRoot`, and `cleanup()` -- new tests must use unique run IDs
|
|
35
35
|
- [ ] **`tooling/test/role-contracts.test.js` is in `test:active`** -- confirmed, so workflow and role structural tests can be added there without new test file registration
|
|
@@ -0,0 +1,429 @@
|
|
|
1
|
+
# Review Loop Pattern Reference
|
|
2
|
+
|
|
3
|
+
Canonical reference for the review loop pattern used across all Wazir pipeline phases. Skills and workflows link to this document rather than embedding loop logic inline.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Core Principle: Producer-Reviewer Separation
|
|
8
|
+
|
|
9
|
+
The producer skill (clarifier, planner, designer, etc.) **emits** an artifact and calls for review. The **reviewer role** owns the review loop. The producer receives findings and resolves them. No role reviews its own output.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Producer emits artifact
|
|
13
|
+
-> Reviewer runs review loop (N passes, Codex if available)
|
|
14
|
+
-> Findings returned to producer
|
|
15
|
+
-> Producer fixes and resubmits
|
|
16
|
+
-> Loop until all passes exhausted or cap reached
|
|
17
|
+
-> Escalate to user if cap exceeded
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
When Codex is available, the reviewer role delegates to `codex review` as a secondary input while maintaining its own independent primary verdict.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Per-Task Review vs Final Review
|
|
25
|
+
|
|
26
|
+
These are two structurally different constructs:
|
|
27
|
+
|
|
28
|
+
| | Per-Task Review | Final Review |
|
|
29
|
+
|---|---|---|
|
|
30
|
+
| **When** | During execution, after each task | After all execution + verification complete |
|
|
31
|
+
| **Dimensions** | 5 task-execution dims (correctness, tests, wiring, drift, quality) | 7 scored dims (correctness, completeness, wiring, verification, drift, quality, documentation) |
|
|
32
|
+
| **Scope** | Single task's uncommitted changes | Entire implementation vs spec/plan |
|
|
33
|
+
| **Output** | Pass/fix loop, no score | Scored verdict (0-70), PASS/FAIL |
|
|
34
|
+
| **Workflow** | Inline in execution flow | `workflows/review.md` |
|
|
35
|
+
| **Skill** | `wz:reviewer` in `task-review` mode | `wz:reviewer` in `final` mode |
|
|
36
|
+
| **Log filename** | `<phase>-task-<NNN>-review-pass-<N>.md` | `final-review.md` |
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Standalone Mode
|
|
41
|
+
|
|
42
|
+
When no `.wazir/runs/latest/` directory exists (standalone skill invocation outside a pipeline run):
|
|
43
|
+
|
|
44
|
+
1. **Review loops still run** -- the review logic is embedded in the skill, not dependent on run state.
|
|
45
|
+
2. **Artifact location** -- artifacts live in `docs/plans/`. This is the canonical standalone artifact path.
|
|
46
|
+
3. **Review log location** -- review logs go alongside the artifact: `docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md`. No temp dir.
|
|
47
|
+
4. **Loop cap is SKIPPED entirely** -- no `wazir capture loop-check` call. The loop runs for exactly `pass_counts[depth]` passes (3/5/7) and stops. No cap guard, no fallback constant.
|
|
48
|
+
5. **`wazir capture loop-check`** -- not invoked in standalone mode. The standalone detection happens before the cap guard call.
|
|
49
|
+
|
|
50
|
+
Detection logic:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
if .wazir/runs/latest/ exists:
|
|
54
|
+
run_mode = "pipeline"
|
|
55
|
+
log_dir = .wazir/runs/latest/reviews/
|
|
56
|
+
cap_guard = wazir capture loop-check (full guard)
|
|
57
|
+
else:
|
|
58
|
+
run_mode = "standalone"
|
|
59
|
+
artifact_dir = docs/plans/
|
|
60
|
+
log_dir = docs/plans/ (alongside artifact)
|
|
61
|
+
cap_guard = none (depth pass count is the only limit)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Review Loop Pseudocode
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
review_loop(artifact_path, phase, dimensions[], depth, config, options={}):
|
|
70
|
+
|
|
71
|
+
# options.mode -- explicit review mode (required)
|
|
72
|
+
# options.task_id -- task identifier for task-scoped reviews (optional)
|
|
73
|
+
|
|
74
|
+
# Standalone detection
|
|
75
|
+
run_mode = detect_run_mode() # "pipeline" or "standalone"
|
|
76
|
+
|
|
77
|
+
# Fixed pass counts -- no extension
|
|
78
|
+
pass_counts = { quick: 3, standard: 5, deep: 7 }
|
|
79
|
+
total_passes = pass_counts[depth]
|
|
80
|
+
|
|
81
|
+
# Depth-aware dimension subsets (coverage contract)
|
|
82
|
+
depth_dimensions = {
|
|
83
|
+
quick: dimensions[0:3], # first 3 dimensions only
|
|
84
|
+
standard: dimensions[0:5], # first 5
|
|
85
|
+
deep: dimensions, # all available
|
|
86
|
+
}
|
|
87
|
+
active_dims = depth_dimensions[depth]
|
|
88
|
+
|
|
89
|
+
codex_available = check_codex() # which codex && codex --version
|
|
90
|
+
|
|
91
|
+
for pass_number in 0..total_passes-1:
|
|
92
|
+
|
|
93
|
+
# --- Cap guard check (pipeline mode only, before each pass) ---
|
|
94
|
+
if run_mode == "pipeline":
|
|
95
|
+
loop_check_args = "--run <run-id> --phase <phase> --loop-count <pass_number+1>"
|
|
96
|
+
if options.task_id:
|
|
97
|
+
loop_check_args += " --task-id <task_id>"
|
|
98
|
+
wazir capture loop-check $loop_check_args
|
|
99
|
+
# loop-check wraps: event capture + evaluateLoopCapGuard
|
|
100
|
+
# If loop_cap_guard fires (exit 43), stop immediately:
|
|
101
|
+
if last_exit_code == 43:
|
|
102
|
+
log("Loop cap reached for phase: <phase>. Escalating to user.")
|
|
103
|
+
escalate_to_user(evidence_gathered_so_far)
|
|
104
|
+
return { pass_count: pass_number, escalated: true }
|
|
105
|
+
# Standalone mode: no cap guard. Loop runs for total_passes and stops.
|
|
106
|
+
|
|
107
|
+
dimension = active_dims[pass_number % len(active_dims)]
|
|
108
|
+
|
|
109
|
+
# --- Primary review (reviewer role, not producer) ---
|
|
110
|
+
# Mode is always explicit -- passed by caller via options.mode
|
|
111
|
+
findings = self_review(artifact_path, focus=dimension, mode=options.mode)
|
|
112
|
+
|
|
113
|
+
# --- Secondary review (Codex, if available) ---
|
|
114
|
+
if codex_available:
|
|
115
|
+
codex_exit_code, codex_output = run_codex_review(artifact_path, dimension)
|
|
116
|
+
if codex_exit_code != 0:
|
|
117
|
+
# Codex failed -- log error, fall back to self-review for this pass
|
|
118
|
+
log_error("Codex exited " + codex_exit_code + ": " + codex_output.stderr)
|
|
119
|
+
mark_pass_codex_unavailable(pass_number)
|
|
120
|
+
# Do NOT treat Codex failure as clean. Self-review findings stand alone.
|
|
121
|
+
else:
|
|
122
|
+
codex_findings = parse(codex_output.stdout)
|
|
123
|
+
merge(findings, codex_findings, preserve_attribution=true)
|
|
124
|
+
|
|
125
|
+
# --- Log the review pass ---
|
|
126
|
+
if run_mode == "pipeline":
|
|
127
|
+
if options.task_id:
|
|
128
|
+
log_path = .wazir/runs/latest/reviews/<phase>-task-<task_id>-review-pass-<N>.md
|
|
129
|
+
else:
|
|
130
|
+
log_path = .wazir/runs/latest/reviews/<phase>-review-pass-<N>.md
|
|
131
|
+
log(pass_number+1, dimension, findings) -> log_path
|
|
132
|
+
else:
|
|
133
|
+
log_path = docs/plans/YYYY-MM-DD-<topic>-review-pass-<N>.md
|
|
134
|
+
log(pass_number+1, dimension, findings) -> log_path
|
|
135
|
+
|
|
136
|
+
if findings.has_issues:
|
|
137
|
+
# --- Fix inline, do NOT return ---
|
|
138
|
+
producer_fix(artifact_path, findings)
|
|
139
|
+
# Continue to next pass -- the fix will be re-reviewed
|
|
140
|
+
|
|
141
|
+
return { pass_count: total_passes, issues_found, issues_fixed, remaining, attributions }
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Key properties of this pseudocode:
|
|
145
|
+
|
|
146
|
+
1. **Fixed pass counts** -- Quick is exactly 3, standard exactly 5, deep exactly 7. No `max_passes = min_passes + 3`. No clean-streak early-exit. No extension.
|
|
147
|
+
2. **Task-scoped log filenames** -- `<phase>-task-<NNN>-review-pass-<N>.md` for per-task reviews, preventing log clobbering in parallel mode.
|
|
148
|
+
3. **Task-scoped loop cap keys** -- `--task-id` flag on `loop-check` so each task gets its own counter in `phase_loop_counts`.
|
|
149
|
+
4. **Explicit review mode** -- `options.mode` is always passed by the caller. No auto-detection.
|
|
150
|
+
5. **Codex error handling** -- non-zero exit is logged, pass marked `codex-unavailable`, self-review findings used alone. Never treated as clean.
|
|
151
|
+
6. **Standalone mode** -- uses `docs/plans/` for artifacts and logs. No temp dir. No cap guard at all.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Codex Error Handling Contract
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
run_codex_review(artifact_path, dimension):
|
|
159
|
+
CODEX_MODEL = read_config('.wazir/state/config.json', '.multi_tool.codex.model') or "gpt-5.4"
|
|
160
|
+
|
|
161
|
+
if is_code_artifact:
|
|
162
|
+
cmd = codex review -c model="$CODEX_MODEL" --uncommitted --title "..." "Review for [dimension]..."
|
|
163
|
+
# or: codex review -c model="$CODEX_MODEL" --base <sha> for committed changes
|
|
164
|
+
else:
|
|
165
|
+
cmd = cat <artifact_path> | codex exec -c model="$CODEX_MODEL" "Review this [type] for [dimension]..."
|
|
166
|
+
|
|
167
|
+
result = execute(cmd, timeout=120s, capture_stderr=true)
|
|
168
|
+
|
|
169
|
+
if result.exit_code != 0:
|
|
170
|
+
return (result.exit_code, { stderr: result.stderr, stdout: "" })
|
|
171
|
+
# Caller handles: log error, mark codex-unavailable, use self-review only
|
|
172
|
+
|
|
173
|
+
return (0, { stdout: result.stdout, stderr: result.stderr })
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Rules:
|
|
177
|
+
|
|
178
|
+
- If Codex exits non-zero, log the full stderr.
|
|
179
|
+
- Mark the pass as `codex-unavailable` in the review log metadata.
|
|
180
|
+
- Fall back to self-review for that pass only. Do not skip the pass.
|
|
181
|
+
- Do not retry Codex on the same pass. If Codex fails on pass 2, pass 3 still tries Codex (transient failures recover).
|
|
182
|
+
- Never treat a Codex failure as a clean review pass.
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Codex Availability Probe
|
|
187
|
+
|
|
188
|
+
Before any Codex call, verify availability once at loop start:
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
which codex >/dev/null 2>&1 && codex --version >/dev/null 2>&1
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
If the probe fails, set `codex_available = false` for the entire loop. Fall back to self-review only. Never error out.
|
|
195
|
+
|
|
196
|
+
Per-invocation failures (Codex available but a single call fails) are handled separately by the error contract above.
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Codex Artifact-Scoped Review
|
|
201
|
+
|
|
202
|
+
Never use `codex review` for non-code artifacts (specs, plans, designs). Instead, pipe the artifact content via stdin:
|
|
203
|
+
|
|
204
|
+
```bash
|
|
205
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
206
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
207
|
+
cat .wazir/runs/latest/clarified/spec-hardened.md | \
|
|
208
|
+
codex exec -c model="$CODEX_MODEL" "Review this specification for: [dimension]. Be specific, cite sections. Say CLEAN if no issues." \
|
|
209
|
+
2>&1 | tee .wazir/runs/latest/reviews/spec-challenge-review-pass-N.md
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
For code artifacts, use `codex review -c model="$CODEX_MODEL" --uncommitted` (or `--base` for committed changes). See the next section for details.
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## Code Review Scoping
|
|
217
|
+
|
|
218
|
+
**Rule: review BEFORE commit.**
|
|
219
|
+
|
|
220
|
+
For each task during execution:
|
|
221
|
+
|
|
222
|
+
1. Implement the task (changes are uncommitted).
|
|
223
|
+
2. Review the uncommitted changes using the **5 task-execution dimensions** (NOT the 7 final-review dimensions):
|
|
224
|
+
```bash
|
|
225
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
226
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
227
|
+
codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
|
|
228
|
+
"Review against acceptance criteria: <criteria>" \
|
|
229
|
+
2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
|
|
230
|
+
```
|
|
231
|
+
3. Fix any findings (still uncommitted).
|
|
232
|
+
4. Re-review until all passes exhausted or cap reached.
|
|
233
|
+
5. **Only after review passes:** commit with conventional commit format.
|
|
234
|
+
|
|
235
|
+
**If changes are already committed** (e.g., subagent workflow where the implementer subagent commits before review):
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
# Capture the SHA before the task starts
|
|
239
|
+
PRE_TASK_SHA=$(git rev-parse HEAD)
|
|
240
|
+
|
|
241
|
+
# ... subagent implements and commits ...
|
|
242
|
+
|
|
243
|
+
# Review the committed changes against the pre-task baseline
|
|
244
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
245
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
246
|
+
codex review -c model="$CODEX_MODEL" --base $PRE_TASK_SHA --title "Task NNN: <summary>" \
|
|
247
|
+
"Review against acceptance criteria: <criteria>" \
|
|
248
|
+
2>&1 | tee .wazir/runs/latest/reviews/execute-task-NNN-review-pass-N.md
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
---
|
|
252
|
+
|
|
253
|
+
## Dimension Sets
|
|
254
|
+
|
|
255
|
+
### Research Dimensions (5)
|
|
256
|
+
|
|
257
|
+
1. **Coverage** -- all briefing topics researched
|
|
258
|
+
2. **Source quality** -- authoritative, current sources
|
|
259
|
+
3. **Relevance** -- research answers the actual questions
|
|
260
|
+
4. **Gaps** -- missing info that blocks later phases
|
|
261
|
+
5. **Contradictions** -- conflicting sources identified
|
|
262
|
+
|
|
263
|
+
### Spec/Clarification Dimensions (5)
|
|
264
|
+
|
|
265
|
+
1. **Completeness** -- all requirements covered
|
|
266
|
+
2. **Testability** -- each criterion verifiable
|
|
267
|
+
3. **Ambiguity** -- no dual-interpretation statements
|
|
268
|
+
4. **Assumptions** -- hidden assumptions explicit
|
|
269
|
+
5. **Scope creep** -- nothing beyond briefing
|
|
270
|
+
|
|
271
|
+
### Design-Review Dimensions (5)
|
|
272
|
+
|
|
273
|
+
Matches canonical `workflows/design-review.md`:
|
|
274
|
+
|
|
275
|
+
1. **Spec coverage** -- does the design address every acceptance criterion with a visual component?
|
|
276
|
+
2. **Design-spec consistency** -- does the design introduce anything not in the spec? (scope creep check)
|
|
277
|
+
3. **Accessibility** -- color contrast ratios (WCAG 2.1 AA), focus states, touch target sizes (44x44px minimum)
|
|
278
|
+
4. **Visual consistency** -- design tokens form a coherent system, dark/light mode alignment
|
|
279
|
+
5. **Exported-code fidelity** -- do exported scaffolds match the designs? Mismatches are failures here, not implementation concerns.
|
|
280
|
+
|
|
281
|
+
### Plan Dimensions (7)
|
|
282
|
+
|
|
283
|
+
1. **Completeness** -- all design decisions mapped to tasks
|
|
284
|
+
2. **Ordering** -- dependencies correct, parallelizable identified
|
|
285
|
+
3. **Atomicity** -- each task fits one session
|
|
286
|
+
4. **Testability** -- concrete verification per task
|
|
287
|
+
5. **Edge cases** -- error paths covered
|
|
288
|
+
6. **Security** -- auth, injection, data exposure
|
|
289
|
+
7. **Integration** -- tasks connect end-to-end
|
|
290
|
+
|
|
291
|
+
### Task Execution Dimensions (5)
|
|
292
|
+
|
|
293
|
+
Used for per-task review during execution:
|
|
294
|
+
|
|
295
|
+
1. **Correctness** -- code matches spec
|
|
296
|
+
2. **Tests** -- real tests, not mocked/faked
|
|
297
|
+
3. **Wiring** -- all paths connected
|
|
298
|
+
4. **Drift** -- matches task spec
|
|
299
|
+
5. **Quality** -- naming, error handling
|
|
300
|
+
|
|
301
|
+
### Final Review Dimensions (7)
|
|
302
|
+
|
|
303
|
+
Used for `workflows/review.md` scored gate:
|
|
304
|
+
|
|
305
|
+
1. **Correctness** -- does the code do what the spec says?
|
|
306
|
+
2. **Completeness** -- are all acceptance criteria met?
|
|
307
|
+
3. **Wiring** -- are all paths connected end-to-end?
|
|
308
|
+
4. **Verification** -- is there evidence (tests, type checks) for each claim?
|
|
309
|
+
5. **Drift** -- does the implementation match the approved plan?
|
|
310
|
+
6. **Quality** -- code style, naming, error handling, security
|
|
311
|
+
7. **Documentation** -- changelog entries, commit messages, comments
|
|
312
|
+
|
|
313
|
+
The final review dimensions are the existing 7 from `skills/reviewer/SKILL.md`. `workflows/review.md` is not modified by this pattern.
|
|
314
|
+
|
|
315
|
+
---
|
|
316
|
+
|
|
317
|
+
## Per-Depth Coverage Contract
|
|
318
|
+
|
|
319
|
+
| Depth | Research | Spec | Design-Review | Plan | Task Execution | Final Review |
|
|
320
|
+
|-------|----------|------|---------------|------|----------------|--------------|
|
|
321
|
+
| Quick | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | dims 1-3, 3 passes | always 7 dims, 1 pass |
|
|
322
|
+
| Standard | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | dims 1-5, 5 passes | always 7 dims, 1 pass |
|
|
323
|
+
| Deep | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-5, 7 passes | dims 1-7, 7 passes | dims 1-5, 7 passes | always 7 dims, 1 pass |
|
|
324
|
+
|
|
325
|
+
Pass counts are FIXED per depth. Quick = 3 passes, standard = 5 passes, deep = 7 passes. No extension. No early-exit. Final review is always a single scored pass across all 7 dimensions -- it is a gate, not a loop.
|
|
326
|
+
|
|
327
|
+
---
|
|
328
|
+
|
|
329
|
+
## Loop Cap Configuration
|
|
330
|
+
|
|
331
|
+
The `phase_policy` section of `run-config.yaml` controls which phases are enabled and sets an absolute safety ceiling per phase. Only two fields exist: `enabled` and `loop_cap`. There is no `passes` field -- depth determines pass counts (3/5/7), not phase policy.
|
|
332
|
+
|
|
333
|
+
```yaml
|
|
334
|
+
phase_policy:
|
|
335
|
+
discover: { enabled: true, loop_cap: 10 }
|
|
336
|
+
clarify: { enabled: true, loop_cap: 10 }
|
|
337
|
+
specify: { enabled: true, loop_cap: 10 }
|
|
338
|
+
spec-challenge: { enabled: true, loop_cap: 10 }
|
|
339
|
+
author: { enabled: false, loop_cap: 10 }
|
|
340
|
+
design: { enabled: true, loop_cap: 10 }
|
|
341
|
+
design-review: { enabled: true, loop_cap: 10 }
|
|
342
|
+
plan: { enabled: true, loop_cap: 10 }
|
|
343
|
+
plan-review: { enabled: true, loop_cap: 10 }
|
|
344
|
+
execute: { enabled: true, loop_cap: 10 }
|
|
345
|
+
verify: { enabled: true, loop_cap: 5 }
|
|
346
|
+
review: { enabled: true, loop_cap: 10 }
|
|
347
|
+
learn: { enabled: false, loop_cap: 5 }
|
|
348
|
+
prepare_next: { enabled: false, loop_cap: 5 }
|
|
349
|
+
run_audit: { enabled: false, loop_cap: 10 }
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
**`loop_cap`** is an absolute safety ceiling that prevents runaway loops regardless of depth. It is checked by `wazir capture loop-check` in pipeline mode. It is NOT the same as pass count (which is determined by depth: 3/5/7). Example: depth=deep gives 7 passes, but if `loop_cap: 5`, the cap guard fires at pass 5 and escalates. This is intentional -- the operator can constrain expensive phases.
|
|
353
|
+
|
|
354
|
+
**Adaptive phases** (`author`, `learn`, `prepare_next`, `run_audit`) default to `enabled: false`. They are activated by explicit operator config or intent detection. They do not participate in the standard review loop pattern because:
|
|
355
|
+
|
|
356
|
+
- `author` has a human approval gate, not an iterative review loop.
|
|
357
|
+
- `learn` extracts learnings from the completed run -- it is post-execution housekeeping.
|
|
358
|
+
- `prepare_next` prepares context for the next run -- it is a handoff phase.
|
|
359
|
+
- `run_audit` is an on-demand standalone audit, not part of the main pipeline flow.
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## Reviewer Mode Table
|
|
364
|
+
|
|
365
|
+
The reviewer skill operates in different modes depending on the phase. **Mode is always explicit** -- the caller passes `--mode <mode>`. There is no auto-detection based on artifact availability.
|
|
366
|
+
|
|
367
|
+
| Mode | Invoked during | Prerequisites | Dimensions | Output |
|
|
368
|
+
|------|---------------|---------------|------------|--------|
|
|
369
|
+
| `final` | After execution + verification | Completed task artifacts in `.wazir/runs/latest/artifacts/` | 7 final-review dims, scored 0-70 | Verdict: PASS/NEEDS FIXES/NEEDS REWORK/FAIL |
|
|
370
|
+
| `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Findings with severity, no score |
|
|
371
|
+
| `design-review` | After design approval | Design artifact, approved spec, accessibility guidelines | 5 design-review dims (canonical) | Findings with severity (blocking/advisory) |
|
|
372
|
+
| `plan-review` | After planning | Draft plan, approved spec, design artifact | 7 plan dims | Findings with severity, no score |
|
|
373
|
+
| `task-review` | During execution, per task | Uncommitted changes (or committed with known base SHA) | 5 task-execution dims | Pass/fail per task, no score |
|
|
374
|
+
| `research-review` | During discover | Research artifact | 5 research dims | Findings with severity, no score |
|
|
375
|
+
| `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Findings with severity, no score |
|
|
376
|
+
|
|
377
|
+
If `--mode` is not provided, the reviewer asks the user which review to run. Auto-detection based on artifact availability is NOT used -- it causes ambiguity in resumed/multi-phase runs where stale artifacts from prior phases exist.
|
|
378
|
+
|
|
379
|
+
Each caller is responsible for passing the correct mode:
|
|
380
|
+
|
|
381
|
+
- Clarifier passes `--mode clarification-review` after Phase 1A
|
|
382
|
+
- Discover workflow passes `--mode research-review` after research
|
|
383
|
+
- Specifier flow passes `--mode spec-challenge` after specify
|
|
384
|
+
- Brainstorming passes `--mode design-review` after user approval
|
|
385
|
+
- Writing-plans passes `--mode plan-review` after planning
|
|
386
|
+
- Executor passes `--mode task-review` for each task
|
|
387
|
+
- `/wazir` runner passes `--mode final` for the final review gate
|
|
388
|
+
|
|
389
|
+
---
|
|
390
|
+
|
|
391
|
+
## Codex Prompt Templates
|
|
392
|
+
|
|
393
|
+
All Codex invocations read the model from config with a fallback:
|
|
394
|
+
|
|
395
|
+
```bash
|
|
396
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
397
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
### Artifact Review (specs, plans, designs via stdin)
|
|
401
|
+
|
|
402
|
+
Use this template with `codex exec` for non-code artifacts piped via stdin:
|
|
403
|
+
|
|
404
|
+
```bash
|
|
405
|
+
cat <artifact_path> | codex exec -c model="$CODEX_MODEL" \
|
|
406
|
+
"You are reviewing a [ARTIFACT_TYPE] for the Wazir engineering OS.
|
|
407
|
+
Focus on [DIMENSION]: [dimension description].
|
|
408
|
+
Rules: cite specific sections, be actionable, say CLEAN if no issues.
|
|
409
|
+
Do NOT load or invoke any skills. Do NOT read the codebase.
|
|
410
|
+
Review ONLY the content provided via stdin."
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
Replace `[ARTIFACT_TYPE]` with: `specification`, `implementation plan`, `design document`, `research brief`, or `clarification`.
|
|
414
|
+
Replace `[DIMENSION]` and `[dimension description]` with the current review pass dimension from the relevant dimension set above.
|
|
415
|
+
|
|
416
|
+
### Code Review (diffs via --uncommitted or --base)
|
|
417
|
+
|
|
418
|
+
Use this template with `codex review` for code changes:
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
codex review -c model="$CODEX_MODEL" --uncommitted --title "Task NNN: <summary>" \
|
|
422
|
+
"Review the code changes for [DIMENSION]: [dimension description].
|
|
423
|
+
Check against acceptance criteria: [criteria].
|
|
424
|
+
Flag: correctness issues, missing tests, unwired paths, drift from spec.
|
|
425
|
+
Do NOT load or invoke any skills."
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
For committed changes, replace `--uncommitted` with `--base <sha>`.
|
|
429
|
+
Replace `[DIMENSION]`, `[dimension description]`, and `[criteria]` with the task-specific values from the execution plan and spec.
|
|
@@ -35,12 +35,14 @@ The `wazir` CLI is minimal on purpose. It exists to validate and export the host
|
|
|
35
35
|
| `wazir capture output` | implemented | Writes captured tool output to a run-local file and records a `post_tool_capture` event. |
|
|
36
36
|
| `wazir capture summary` | implemented | Writes `summary.md` and records the chosen summary or handoff event. |
|
|
37
37
|
| `wazir capture usage` | implemented | Generates a token savings report for a run, showing capture routing statistics and context window savings. |
|
|
38
|
+
| `wazir capture loop-check` | implemented | Records a loop iteration event and evaluates the loop cap guard. Exits 43 if the phase loop cap is exceeded. Accepts `--task-id` for task-scoped cap tracking. In standalone mode (no status.json), exits 0. |
|
|
38
39
|
|
|
39
40
|
## Exit codes
|
|
40
41
|
|
|
41
42
|
- `0`: requested check passed
|
|
42
43
|
- `1`: invalid input or validation failure
|
|
43
44
|
- `2`: command surface exists but the implementation is intentionally not complete yet
|
|
45
|
+
- `43`: phase loop cap exceeded (returned by `wazir capture loop-check`)
|
|
44
46
|
|
|
45
47
|
## Root discovery
|
|
46
48
|
|
package/docs/truth-claims.yaml
CHANGED
|
@@ -184,6 +184,12 @@
|
|
|
184
184
|
subject: wazir capture usage
|
|
185
185
|
verifier: command_registry
|
|
186
186
|
required: true
|
|
187
|
+
- id: command-capture-loop-check
|
|
188
|
+
file: docs/reference/tooling-cli.md
|
|
189
|
+
claim_type: command
|
|
190
|
+
subject: wazir capture loop-check
|
|
191
|
+
verifier: command_registry
|
|
192
|
+
required: true
|
|
187
193
|
- id: command-validate-branches
|
|
188
194
|
file: docs/reference/tooling-cli.md
|
|
189
195
|
claim_type: command
|
|
@@ -30,6 +30,7 @@ Default approach: recall L1 (structural summaries)
|
|
|
30
30
|
- clarification artifact
|
|
31
31
|
- unresolved questions list
|
|
32
32
|
- scope summary with cited sources
|
|
33
|
+
- emits clarification artifact for reviewer loops
|
|
33
34
|
|
|
34
35
|
## Escalation Rules
|
|
35
36
|
|
|
@@ -40,3 +41,5 @@ Default approach: recall L1 (structural summaries)
|
|
|
40
41
|
- leaves material ambiguity unresolved without escalation
|
|
41
42
|
- mutates `input/`
|
|
42
43
|
- invents constraints or facts without evidence
|
|
44
|
+
- self-reviews own output instead of delegating to reviewer
|
|
45
|
+
- performs substantial discovery research inline without delegating to the discover workflow when delegation is required
|
|
@@ -35,6 +35,9 @@ Default approach: recall L1 (structural summaries)
|
|
|
35
35
|
- exported HTML + CSS scaffold
|
|
36
36
|
- design tokens JSON (colors, spacing, typography)
|
|
37
37
|
- screenshot PNGs of key frames
|
|
38
|
+
- emits design artifact for design-review loop
|
|
39
|
+
|
|
40
|
+
Design is not approved for planning until it survives the design-review loop owned by the reviewer role.
|
|
38
41
|
|
|
39
42
|
## Git-Flow Responsibilities
|
|
40
43
|
|
|
@@ -31,6 +31,7 @@ Default approach: direct file read (full content)
|
|
|
31
31
|
- code and docs changes
|
|
32
32
|
- execution notes
|
|
33
33
|
- verification evidence
|
|
34
|
+
- submits per-task output for review before commit
|
|
34
35
|
|
|
35
36
|
## Git-Flow Responsibilities
|
|
36
37
|
|
|
@@ -53,3 +54,4 @@ All text outputs (code comments, commit messages, PR descriptions, CHANGELOG ent
|
|
|
53
54
|
- unwired paths
|
|
54
55
|
- fake tests
|
|
55
56
|
- writes to protected paths outside approved flows
|
|
57
|
+
- commits before review passes
|
|
@@ -32,6 +32,9 @@ Default approach: recall L1 (structural summaries)
|
|
|
32
32
|
- implementation plan artifact
|
|
33
33
|
- ordered task list
|
|
34
34
|
- verification plan per section
|
|
35
|
+
- emits plan artifact for plan-review loop
|
|
36
|
+
|
|
37
|
+
Plan is not approved until it survives the plan-review loop owned by the reviewer role.
|
|
35
38
|
|
|
36
39
|
## Git-Flow Responsibilities
|
|
37
40
|
|
|
@@ -31,6 +31,7 @@ Default approach: recall L1 (structural summaries)
|
|
|
31
31
|
- research artifact with citations
|
|
32
32
|
- finding summaries linked to sources
|
|
33
33
|
- open risks and unknowns
|
|
34
|
+
- submits research artifact for reviewer evaluation before it flows downstream
|
|
34
35
|
|
|
35
36
|
## Escalation Rules
|
|
36
37
|
|
|
@@ -41,3 +42,4 @@ Default approach: recall L1 (structural summaries)
|
|
|
41
42
|
- unsupported claims
|
|
42
43
|
- stale or missing citations
|
|
43
44
|
- substituting confidence for evidence
|
|
45
|
+
- research artifact used downstream without passing review
|
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
## Purpose
|
|
4
4
|
|
|
5
|
-
Perform adversarial review to find correctness, scope, wiring, verification, and drift failures.
|
|
5
|
+
Perform adversarial review to find correctness, scope, wiring, verification, and drift failures. Owns all review loops: research-review, clarification-review, spec-challenge, design-review, plan-review, task-review, and final review.
|
|
6
6
|
|
|
7
7
|
## Inputs
|
|
8
8
|
|
|
@@ -17,6 +17,7 @@ Perform adversarial review to find correctness, scope, wiring, verification, and
|
|
|
17
17
|
- source-backed comparison to spec/plan
|
|
18
18
|
- secondary model review when available
|
|
19
19
|
- Wazir CLI recall and index commands (see Context retrieval)
|
|
20
|
+
- review loop pattern (see docs/reference/review-loop-pattern.md)
|
|
20
21
|
|
|
21
22
|
## Context retrieval
|
|
22
23
|
|
|
@@ -32,6 +33,9 @@ Default approach: recall L1, escalate to direct read for flagged issues
|
|
|
32
33
|
- findings with severity
|
|
33
34
|
- rationale tied to evidence
|
|
34
35
|
- explicit no-findings verdict when applicable
|
|
36
|
+
- review loop pass logs with source attribution ([Wazir], [Codex], [Both])
|
|
37
|
+
|
|
38
|
+
Review mode is always passed explicitly by the caller (--mode). The reviewer does not auto-detect mode from artifact availability.
|
|
35
39
|
|
|
36
40
|
## Git-Flow Responsibilities
|
|
37
41
|
|
|
@@ -31,6 +31,9 @@ Default approach: recall L1 (structural summaries)
|
|
|
31
31
|
- spec artifact
|
|
32
32
|
- measurable acceptance criteria
|
|
33
33
|
- explicit non-goals and assumptions
|
|
34
|
+
- emits spec artifact for spec-challenge review loop
|
|
35
|
+
|
|
36
|
+
Spec is not approved until it survives the spec-challenge loop owned by the reviewer role.
|
|
34
37
|
|
|
35
38
|
## Writing Quality
|
|
36
39
|
|
|
@@ -24,6 +24,10 @@ On entering this phase, run:
|
|
|
24
24
|
- unresolved questions list
|
|
25
25
|
- scope summary
|
|
26
26
|
|
|
27
|
+
## Review Loop
|
|
28
|
+
|
|
29
|
+
Clarification artifact is reviewed by the reviewer role using the review loop pattern with spec/clarification dimensions. The reviewer is invoked with `--mode clarification-review`. The clarifier resolves findings. Clarification does not flow to specify until all review passes complete.
|
|
30
|
+
|
|
27
31
|
## Approval Gate
|
|
28
32
|
|
|
29
33
|
- no formal approval gate, but unresolved material ambiguity must be escalated
|
|
@@ -39,6 +39,10 @@ On rejection: `wazir capture event --run <run-id> --event gate_rejected --phase
|
|
|
39
39
|
On completing this phase, run:
|
|
40
40
|
`wazir capture event --run <run-id> --event phase_exit --phase <phase-name> --status completed`
|
|
41
41
|
|
|
42
|
+
## Loop Structure
|
|
43
|
+
|
|
44
|
+
Follows the review loop pattern in `docs/reference/review-loop-pattern.md` with the canonical design-review dimensions (spec coverage, design-spec consistency, accessibility, visual consistency, exported-code fidelity). The designer role resolves findings. Starts when the approved design artifact enters the `design_review` phase. Pass count determined by depth. No extension.
|
|
45
|
+
|
|
42
46
|
## Failure Conditions
|
|
43
47
|
|
|
44
48
|
- vague findings without visual evidence
|
|
@@ -33,6 +33,10 @@ On entering this phase, run:
|
|
|
33
33
|
|
|
34
34
|
- explicit human approval required before design-review
|
|
35
35
|
|
|
36
|
+
## Review Loop
|
|
37
|
+
|
|
38
|
+
After user approval, design artifact is reviewed via the design-review workflow (`workflows/design-review.md`) using the review loop pattern with the canonical design-review dimensions (spec coverage, design-spec consistency, accessibility, visual consistency, exported-code fidelity). The reviewer is invoked with `--mode design-review`. Design does not flow to planning until all review passes complete.
|
|
39
|
+
|
|
36
40
|
## Phase exit
|
|
37
41
|
|
|
38
42
|
On completing this phase, run:
|
|
@@ -23,6 +23,10 @@ On entering this phase, run:
|
|
|
23
23
|
- research artifact
|
|
24
24
|
- cited findings
|
|
25
25
|
|
|
26
|
+
## Review Loop
|
|
27
|
+
|
|
28
|
+
Research artifact is reviewed by the reviewer role using the review loop pattern (`docs/reference/review-loop-pattern.md`) with research dimensions (coverage, source quality, relevance, gaps, contradictions). The reviewer is invoked with `--mode research-review`. The researcher resolves findings. Research does not flow to specify until all review passes complete.
|
|
29
|
+
|
|
26
30
|
## Approval Gate
|
|
27
31
|
|
|
28
32
|
- no formal approval gate, but unsupported research cannot flow forward
|