@wazir-dev/cli 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +100 -2
- package/README.md +6 -6
- package/docs/concepts/architecture.md +1 -1
- package/docs/concepts/roles-and-workflows.md +2 -0
- package/docs/concepts/why-wazir.md +59 -0
- package/docs/decisions/2026-03-19-deferred-items.md +564 -0
- package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
- package/docs/readmes/INDEX.md +21 -5
- package/docs/readmes/features/expertise/README.md +2 -2
- package/docs/readmes/features/exports/README.md +2 -2
- package/docs/readmes/features/schemas/README.md +3 -0
- package/docs/readmes/features/skills/README.md +17 -0
- package/docs/readmes/features/skills/clarifier.md +5 -0
- package/docs/readmes/features/skills/claude-cli.md +5 -0
- package/docs/readmes/features/skills/codex-cli.md +5 -0
- package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
- package/docs/readmes/features/skills/executing-plans.md +5 -0
- package/docs/readmes/features/skills/executor.md +5 -0
- package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
- package/docs/readmes/features/skills/gemini-cli.md +5 -0
- package/docs/readmes/features/skills/humanize.md +5 -0
- package/docs/readmes/features/skills/init-pipeline.md +5 -0
- package/docs/readmes/features/skills/receiving-code-review.md +5 -0
- package/docs/readmes/features/skills/requesting-code-review.md +5 -0
- package/docs/readmes/features/skills/reviewer.md +5 -0
- package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
- package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
- package/docs/readmes/features/skills/wazir.md +5 -0
- package/docs/readmes/features/skills/writing-skills.md +5 -0
- package/docs/readmes/features/workflows/prepare-next.md +1 -1
- package/docs/reference/configuration-reference.md +47 -6
- package/docs/reference/launch-checklist.md +4 -4
- package/docs/reference/review-loop-pattern.md +538 -0
- package/docs/reference/roles-reference.md +1 -0
- package/docs/reference/skill-tiers.md +147 -0
- package/docs/reference/tooling-cli.md +5 -1
- package/docs/truth-claims.yaml +18 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
- package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
- package/exports/hosts/claude/.claude/agents/designer.md +3 -0
- package/exports/hosts/claude/.claude/agents/executor.md +2 -0
- package/exports/hosts/claude/.claude/agents/planner.md +3 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
- package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/design.md +4 -0
- package/exports/hosts/claude/.claude/commands/discover.md +4 -0
- package/exports/hosts/claude/.claude/commands/execute.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan.md +4 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
- package/exports/hosts/claude/.claude/commands/specify.md +4 -0
- package/exports/hosts/claude/.claude/commands/verify.md +4 -0
- package/exports/hosts/claude/.claude/settings.json +9 -0
- package/exports/hosts/claude/CLAUDE.md +1 -1
- package/exports/hosts/claude/export.manifest.json +22 -20
- package/exports/hosts/claude/host-package.json +3 -1
- package/exports/hosts/codex/AGENTS.md +1 -1
- package/exports/hosts/codex/export.manifest.json +22 -20
- package/exports/hosts/codex/host-package.json +3 -1
- package/exports/hosts/cursor/.cursor/hooks.json +4 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
- package/exports/hosts/cursor/export.manifest.json +22 -20
- package/exports/hosts/cursor/host-package.json +3 -1
- package/exports/hosts/gemini/GEMINI.md +1 -1
- package/exports/hosts/gemini/export.manifest.json +22 -20
- package/exports/hosts/gemini/host-package.json +3 -1
- package/hooks/context-mode-router +191 -0
- package/hooks/definitions/context_mode_router.yaml +19 -0
- package/hooks/definitions/loop_cap_guard.yaml +1 -1
- package/hooks/hooks.json +43 -0
- package/hooks/protected-path-write-guard +8 -0
- package/hooks/routing-matrix.json +45 -0
- package/hooks/session-start +62 -1
- package/llms-full.txt +905 -132
- package/package.json +3 -3
- package/roles/clarifier.md +3 -0
- package/roles/designer.md +3 -0
- package/roles/executor.md +2 -0
- package/roles/planner.md +3 -0
- package/roles/researcher.md +2 -0
- package/roles/reviewer.md +5 -1
- package/roles/specifier.md +3 -0
- package/schemas/hook.schema.json +2 -1
- package/schemas/phase-report.schema.json +80 -0
- package/schemas/usage.schema.json +25 -1
- package/schemas/wazir-manifest.schema.json +19 -0
- package/skills/brainstorming/SKILL.md +20 -56
- package/skills/clarifier/SKILL.md +243 -0
- package/skills/claude-cli/SKILL.md +320 -0
- package/skills/codex-cli/SKILL.md +260 -0
- package/skills/debugging/SKILL.md +24 -1
- package/skills/design/SKILL.md +13 -0
- package/skills/dispatching-parallel-agents/SKILL.md +13 -0
- package/skills/executing-plans/SKILL.md +28 -2
- package/skills/executor/SKILL.md +129 -0
- package/skills/finishing-a-development-branch/SKILL.md +13 -0
- package/skills/gemini-cli/SKILL.md +260 -0
- package/skills/humanize/SKILL.md +13 -0
- package/skills/init-pipeline/SKILL.md +76 -78
- package/skills/prepare-next/SKILL.md +81 -10
- package/skills/receiving-code-review/SKILL.md +21 -0
- package/skills/requesting-code-review/SKILL.md +38 -5
- package/skills/reviewer/SKILL.md +423 -0
- package/skills/run-audit/SKILL.md +13 -0
- package/skills/scan-project/SKILL.md +13 -0
- package/skills/self-audit/SKILL.md +197 -16
- package/skills/subagent-driven-development/SKILL.md +38 -2
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
- package/skills/subagent-driven-development/implementer-prompt.md +8 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
- package/skills/tdd/SKILL.md +21 -0
- package/skills/using-git-worktrees/SKILL.md +13 -0
- package/skills/using-skills/SKILL.md +13 -0
- package/skills/verification/SKILL.md +13 -0
- package/skills/wazir/SKILL.md +286 -262
- package/skills/writing-plans/SKILL.md +44 -4
- package/skills/writing-skills/SKILL.md +13 -0
- package/templates/artifacts/implementation-plan.md +3 -0
- package/templates/artifacts/tasks-template.md +133 -0
- package/templates/examples/phase-report.example.json +48 -0
- package/templates/examples/wazir-manifest.example.yaml +1 -1
- package/tooling/src/adapters/composition-engine.js +256 -0
- package/tooling/src/adapters/model-router.js +84 -0
- package/tooling/src/capture/command.js +111 -2
- package/tooling/src/capture/run-config.js +23 -0
- package/tooling/src/capture/store.js +24 -0
- package/tooling/src/capture/usage.js +106 -0
- package/tooling/src/checks/ac-matrix.js +256 -0
- package/tooling/src/checks/brand-truth.js +3 -6
- package/tooling/src/checks/command-registry.js +13 -0
- package/tooling/src/checks/docs-truth.js +1 -1
- package/tooling/src/checks/runtime-surface.js +3 -7
- package/tooling/src/checks/skills.js +111 -0
- package/tooling/src/cli.js +17 -3
- package/tooling/src/commands/stats.js +161 -0
- package/tooling/src/commands/validate.js +5 -1
- package/tooling/src/export/compiler.js +33 -37
- package/tooling/src/gating/agent.js +145 -0
- package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
- package/tooling/src/hooks/routing-logic.js +69 -0
- package/tooling/src/init/auto-detect.js +260 -0
- package/tooling/src/init/command.js +161 -0
- package/tooling/src/input/scanner.js +46 -0
- package/tooling/src/reports/command.js +103 -0
- package/tooling/src/reports/phase-report.js +323 -0
- package/tooling/src/state/command.js +160 -0
- package/tooling/src/state/db.js +287 -0
- package/tooling/src/status/command.js +53 -1
- package/wazir.manifest.yaml +26 -17
- package/workflows/clarify.md +4 -0
- package/workflows/design-review.md +4 -0
- package/workflows/design.md +4 -0
- package/workflows/discover.md +4 -0
- package/workflows/execute.md +4 -0
- package/workflows/plan-review.md +4 -0
- package/workflows/plan.md +4 -0
- package/workflows/spec-challenge.md +4 -0
- package/workflows/specify.md +4 -0
- package/workflows/verify.md +4 -0
|
@@ -0,0 +1,423 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: wz:reviewer
|
|
3
|
+
description: Run the review phase — adversarial review of implementation against the approved spec, plan, and verification evidence.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Reviewer
|
|
7
|
+
|
|
8
|
+
## Model Annotation
|
|
9
|
+
When multi-model mode is enabled:
|
|
10
|
+
- **Sonnet** for internal review passes (internal-review)
|
|
11
|
+
- **Opus** for final review mode (final-review)
|
|
12
|
+
- **Opus** for spec-challenge mode (spec-harden)
|
|
13
|
+
- **Opus** for design-review mode (design)
|
|
14
|
+
|
|
15
|
+
## Command Routing
|
|
16
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
17
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
18
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
19
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
20
|
+
|
|
21
|
+
## Codebase Exploration
|
|
22
|
+
1. Query `wazir index search-symbols <query>` first
|
|
23
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
24
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
25
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
26
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
27
|
+
|
|
28
|
+
Run the Final Review phase — or any review mode invoked by other phases.
|
|
29
|
+
|
|
30
|
+
The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
|
|
31
|
+
|
|
32
|
+
**Key principle for `final` mode:** Compare implementation against the **ORIGINAL INPUT** (briefing + input files), NOT the task specs. The executor's per-task reviewer already validated against task specs — that concern is covered. The final reviewer catches drift: does what we built match what the user actually asked for?
|
|
33
|
+
|
|
34
|
+
**Reviewer-owned responsibilities** (callers must NOT replicate these):
|
|
35
|
+
1. **Two-tier review** — internal review first (fast, cheap, expertise-loaded), Codex second (fresh eyes on clean code)
|
|
36
|
+
2. **Dimension selection** — the reviewer selects the correct dimension set for the review mode and depth
|
|
37
|
+
3. **Pass counting** — the reviewer tracks pass numbers and enforces the depth-based cap (quick=3, standard=5, deep=7)
|
|
38
|
+
4. **Finding attribution** — each finding is tagged `[Internal]`, `[Codex]`, or `[Both]` based on source
|
|
39
|
+
5. **Dimension set recording** — each review pass file records which canonical dimension set was used, enabling Phase Scoring (first vs final delta)
|
|
40
|
+
6. **Learning pipeline** — ALL findings (internal + Codex) feed into `state.sqlite` and the learning system
|
|
41
|
+
|
|
42
|
+
## Review Modes
|
|
43
|
+
|
|
44
|
+
The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
|
|
45
|
+
|
|
46
|
+
| Mode | Invoked during | Prerequisites | Dimensions | Output |
|
|
47
|
+
|------|---------------|---------------|------------|--------|
|
|
48
|
+
| `final` | After execution + verification | Completed task artifacts, approved spec/plan/design | 7 final-review dims, scored 0-70 | Scored verdict (PASS/FAIL) |
|
|
49
|
+
| `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Pass/fix loop, no score |
|
|
50
|
+
| `design-review` | After design approval | Design artifact, approved spec | 5 design-review dims (canonical) | Pass/fix loop, no score |
|
|
51
|
+
| `plan-review` | After planning | Draft plan artifact | 7 plan dims | Pass/fix loop, no score |
|
|
52
|
+
| `task-review` | During execution, per task | Uncommitted changes or `--base` SHA | 5 task-execution dims (correctness, tests, wiring, drift, quality) | Pass/fix loop, no score |
|
|
53
|
+
| `research-review` | During discover | Research artifact | 5 research dims | Pass/fix loop, no score |
|
|
54
|
+
| `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Pass/fix loop, no score |
|
|
55
|
+
|
|
56
|
+
Each mode follows the review loop pattern in `docs/reference/review-loop-pattern.md`. Pass counts are fixed by depth (quick=3, standard=5, deep=7). No extension.
|
|
57
|
+
|
|
58
|
+
### CHANGELOG Enforcement
|
|
59
|
+
|
|
60
|
+
In `task-review` and `final` modes, flag missing CHANGELOG entries for user-facing changes as **[warning]** severity. User-facing changes include new features, behavior changes, and bug fixes visible to users. Internal changes (refactors, tooling, tests) do not require CHANGELOG entries.
|
|
61
|
+
|
|
62
|
+
## Prerequisites
|
|
63
|
+
|
|
64
|
+
Prerequisites depend on the review mode:
|
|
65
|
+
|
|
66
|
+
### `final` mode
|
|
67
|
+
|
|
68
|
+
**Phase Prerequisites (Hard Gate):** Before proceeding, verify ALL of these artifacts exist. If ANY is missing, **STOP** and report which are missing.
|
|
69
|
+
|
|
70
|
+
- [ ] `.wazir/runs/latest/clarified/clarification.md`
|
|
71
|
+
- [ ] `.wazir/runs/latest/clarified/spec-hardened.md`
|
|
72
|
+
- [ ] `.wazir/runs/latest/clarified/design.md`
|
|
73
|
+
- [ ] `.wazir/runs/latest/clarified/execution-plan.md`
|
|
74
|
+
- [ ] `.wazir/runs/latest/artifacts/verification-proof.md`
|
|
75
|
+
|
|
76
|
+
If any file is missing:
|
|
77
|
+
|
|
78
|
+
> **Cannot run final review: missing prerequisite artifacts.**
|
|
79
|
+
>
|
|
80
|
+
> Missing: [list missing files]
|
|
81
|
+
>
|
|
82
|
+
> Run `/wazir:clarifier` (for clarified/* files) or `/wazir:executor` (for verification-proof.md) first.
|
|
83
|
+
|
|
84
|
+
1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
|
|
85
|
+
2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
|
|
86
|
+
3. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
87
|
+
|
|
88
|
+
### `task-review` mode
|
|
89
|
+
1. Uncommitted changes exist for the current task, or a `--base` SHA is provided for committed changes.
|
|
90
|
+
2. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
91
|
+
|
|
92
|
+
### `spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes
|
|
93
|
+
1. The appropriate input artifact for the mode exists.
|
|
94
|
+
2. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
95
|
+
|
|
96
|
+
## Review Process (`final` mode)
|
|
97
|
+
|
|
98
|
+
**Input:** Read the ORIGINAL user input (`.wazir/input/briefing.md`, `input/` directory files) and compare against what was built. This catches intent drift that task-level review misses.
|
|
99
|
+
|
|
100
|
+
Perform adversarial review across 7 dimensions:
|
|
101
|
+
|
|
102
|
+
1. **Correctness** — Does the code do what the original input asked for?
|
|
103
|
+
2. **Completeness** — Are all requirements from the original input met?
|
|
104
|
+
3. **Wiring** — Are all paths connected end-to-end?
|
|
105
|
+
4. **Verification** — Is there evidence (tests, type checks) for each claim?
|
|
106
|
+
5. **Drift** — Does the implementation match what the user originally requested? (not just the plan — the INPUT)
|
|
107
|
+
6. **Quality** — Code style, naming, error handling, security
|
|
108
|
+
7. **Documentation** — Changelog entries, commit messages, comments
|
|
109
|
+
|
|
110
|
+
## Context Retrieval
|
|
111
|
+
|
|
112
|
+
- Read the diff first (primary input)
|
|
113
|
+
- Use `wazir index search-symbols <name>` to locate related code
|
|
114
|
+
- Use `wazir recall symbol <name-or-id> --tier L1` to check structural alignment
|
|
115
|
+
- Read files directly for: logic errors, missing edge cases, integration concerns
|
|
116
|
+
|
|
117
|
+
## Scoring (`final` mode)
|
|
118
|
+
|
|
119
|
+
Score each dimension 0-10. Total out of 70.
|
|
120
|
+
|
|
121
|
+
| Verdict | Score | Action |
|
|
122
|
+
|---------|-------|--------|
|
|
123
|
+
| **PASS** | 56+ | Ready for PR or merge |
|
|
124
|
+
| **NEEDS MINOR FIXES** | 42-55 | Auto-fix and re-review |
|
|
125
|
+
| **NEEDS REWORK** | 28-41 | Re-run affected tasks |
|
|
126
|
+
| **FAIL** | 0-27 | Fundamental issues |
|
|
127
|
+
|
|
128
|
+
## Two-Tier Review Flow
|
|
129
|
+
|
|
130
|
+
The review process has two tiers. Internal review catches ~80% of issues quickly and cheaply. Codex review provides fresh eyes on clean code.
|
|
131
|
+
|
|
132
|
+
### Tier 1: Internal Review (Fast, Cheap, Expertise-Loaded)
|
|
133
|
+
|
|
134
|
+
1. **Compose expertise:** Load relevant expertise modules from `expertise/composition-map.yaml` into context based on the review mode and detected stack. This gives the internal reviewer domain-specific knowledge.
|
|
135
|
+
2. **Run internal review** using the dimension set for the current mode. When multi-model is enabled, use **Sonnet** (not Opus) for internal review passes — it's fast and good enough for pattern matching against expertise.
|
|
136
|
+
3. **Produce findings:** Each finding is tagged `[Internal]` with severity (blocking, warning, note).
|
|
137
|
+
4. **Fix cycle:** If blocking findings exist, the executor fixes them. Re-run internal review. Repeat until clean or cap reached.
|
|
138
|
+
|
|
139
|
+
Internal review passes are logged to `.wazir/runs/latest/reviews/<mode>-internal-pass-<N>.md`.
|
|
140
|
+
|
|
141
|
+
### Tier 2: External Review (Fresh Eyes on Clean Code)
|
|
142
|
+
|
|
143
|
+
Only runs AFTER Tier 1 produces a clean pass (no blocking findings).
|
|
144
|
+
|
|
145
|
+
Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers:
|
|
146
|
+
|
|
147
|
+
#### Codex Review
|
|
148
|
+
|
|
149
|
+
**For detailed Codex CLI usage, see `wz:codex-cli` skill.**
|
|
150
|
+
|
|
151
|
+
If `codex` is in `multi_tool.tools`:
|
|
152
|
+
|
|
153
|
+
1. Run Codex review against the current changes:
|
|
154
|
+
```bash
|
|
155
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
156
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
157
|
+
codex review -c model="$CODEX_MODEL" --uncommitted --title "Wazir review: <brief summary>" \
|
|
158
|
+
"Review against these acceptance criteria: <paste criteria from spec>" \
|
|
159
|
+
2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
|
|
160
|
+
```
|
|
161
|
+
Or if changes are committed:
|
|
162
|
+
```bash
|
|
163
|
+
CODEX_MODEL=$(jq -r '.multi_tool.codex.model // empty' .wazir/state/config.json 2>/dev/null)
|
|
164
|
+
CODEX_MODEL=${CODEX_MODEL:-gpt-5.4}
|
|
165
|
+
codex review -c model="$CODEX_MODEL" --base <base-branch> --title "Wazir review: <brief summary>" \
|
|
166
|
+
"Review against these acceptance criteria: <paste criteria from spec>" \
|
|
167
|
+
2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
2. **Extract findings only** (context protection): After tee, use `execute_file` to extract only the final findings from the Codex output (everything after the last `codex` marker). If context-mode is unavailable, use `tac <file> | sed '/^codex$/q' | tac | tail -n +2`. If no marker found, fail closed (0 findings, warn user). See `docs/reference/review-loop-pattern.md` "Codex Output Context Protection" for full protocol.
|
|
171
|
+
3. Incorporate extracted Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
|
|
172
|
+
|
|
173
|
+
**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use internal review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
|
|
174
|
+
|
|
175
|
+
**Code review scoping by mode:**
|
|
176
|
+
- Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
|
|
177
|
+
- Use `--base <sha>` when reviewing committed changes.
|
|
178
|
+
- Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
|
|
179
|
+
- See `docs/reference/review-loop-pattern.md` for code review scoping rules.
|
|
180
|
+
|
|
181
|
+
#### Gemini Review
|
|
182
|
+
|
|
183
|
+
If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available. **For detailed Gemini CLI usage, see `wz:gemini-cli` skill.**
|
|
184
|
+
|
|
185
|
+
### Fix Cycle (Codex Findings)
|
|
186
|
+
|
|
187
|
+
If Codex produces blocking findings:
|
|
188
|
+
1. Executor fixes the Codex findings
|
|
189
|
+
2. Re-run internal review (quick pass) to verify fixes didn't introduce regressions
|
|
190
|
+
3. Optionally re-run Codex for a clean pass
|
|
191
|
+
|
|
192
|
+
### Merging Findings
|
|
193
|
+
|
|
194
|
+
The final review report must clearly attribute each finding:
|
|
195
|
+
- `[Internal]` — found by Tier 1 internal review
|
|
196
|
+
- `[Codex]` — found by Tier 2 Codex review
|
|
197
|
+
- `[Gemini]` — found by Tier 2 Gemini review
|
|
198
|
+
- `[Both]` — found independently by multiple sources
|
|
199
|
+
|
|
200
|
+
### Finding Persistence (Learning Pipeline)
|
|
201
|
+
|
|
202
|
+
ALL findings from both tiers are persisted to `state.sqlite` for cross-run learning:
|
|
203
|
+
|
|
204
|
+
```javascript
|
|
205
|
+
// After each review pass
|
|
206
|
+
const { insertFinding, getRecurringFindingHashes } = require('tooling/src/state/db');
|
|
207
|
+
const db = openStateDb(stateRoot);
|
|
208
|
+
|
|
209
|
+
for (const finding of allFindings) {
|
|
210
|
+
insertFinding(db, {
|
|
211
|
+
run_id: runId,
|
|
212
|
+
phase: reviewMode,
|
|
213
|
+
source: finding.attribution, // 'internal', 'codex', 'gemini'
|
|
214
|
+
severity: finding.severity,
|
|
215
|
+
description: finding.description,
|
|
216
|
+
finding_hash: hashFinding(finding.description),
|
|
217
|
+
});
|
|
218
|
+
}
|
|
219
|
+
|
|
220
|
+
// Check for recurring patterns
|
|
221
|
+
const recurring = getRecurringFindingHashes(db, 2);
|
|
222
|
+
// Recurring findings → auto-propose as learnings in the learn phase
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
This is how Wazir evolves — findings that recur across runs become accepted learnings injected into future executor context, preventing the same mistakes.
|
|
226
|
+
|
|
227
|
+
## Task-Review Log Filenames
|
|
228
|
+
|
|
229
|
+
In `task-review` mode, use task-scoped log filenames and cap tracking:
|
|
230
|
+
- Log filenames: `.wazir/runs/latest/reviews/execute-task-<NNN>-review-pass-<N>.md`
|
|
231
|
+
- Cap tracking: `wazir capture loop-check --task-id <NNN>` (each task has its own independent cap counter)
|
|
232
|
+
|
|
233
|
+
## Output
|
|
234
|
+
|
|
235
|
+
Save review results to `.wazir/runs/latest/reviews/review.md` with:
|
|
236
|
+
- Findings with severity (blocking, warning, note)
|
|
237
|
+
- Rationale tied to evidence
|
|
238
|
+
- Score breakdown
|
|
239
|
+
- Verdict
|
|
240
|
+
|
|
241
|
+
## Phase Report Generation
|
|
242
|
+
|
|
243
|
+
After completing any review pass, generate a phase report following `schemas/phase-report.schema.json`:
|
|
244
|
+
|
|
245
|
+
1. **`attempted_actions`** — Populate from the review findings. Each finding becomes an action entry:
|
|
246
|
+
- `description`: the finding summary
|
|
247
|
+
- `outcome`: `"success"` if the finding passed, `"fail"` if it is a blocking issue, `"uncertain"` if ambiguous
|
|
248
|
+
- `evidence`: the rationale or evidence supporting the outcome
|
|
249
|
+
|
|
250
|
+
2. **`drift_analysis`** — Compare review findings against the approved spec:
|
|
251
|
+
- `delta`: count of deviations between implementation and spec (0 = no drift)
|
|
252
|
+
- `description`: summary of any drift detected and its impact
|
|
253
|
+
|
|
254
|
+
3. **`quality_metrics`** — Populate from test, lint, and type-check results gathered during review:
|
|
255
|
+
- `test_pass_count`, `test_fail_count`: from test runner output
|
|
256
|
+
- `lint_errors`: from linter output
|
|
257
|
+
- `type_errors`: from type checker output
|
|
258
|
+
|
|
259
|
+
4. **`risk_flags`** — Populate from any high-severity findings:
|
|
260
|
+
- `severity`: `"low"`, `"medium"`, or `"high"`
|
|
261
|
+
- `description`: what the risk is
|
|
262
|
+
- `mitigation`: recommended mitigation (if known)
|
|
263
|
+
|
|
264
|
+
5. **`decisions`** — Populate from any scope or approach decisions made during the review:
|
|
265
|
+
- `description`: what was decided
|
|
266
|
+
- `rationale`: why
|
|
267
|
+
- `alternatives_considered`: other options evaluated (optional)
|
|
268
|
+
- `source`: `"[Wazir]"`, `"[Codex]"`, or `"[Both]"` (optional)
|
|
269
|
+
|
|
270
|
+
6. **`verdict_recommendation`** — Set based on the gating rules in `config/gating-rules.yaml`:
|
|
271
|
+
- `verdict`: `"continue"` (PASS), `"loop_back"` (NEEDS MINOR FIXES / NEEDS REWORK), or `"escalate"` (FAIL with fundamental issues)
|
|
272
|
+
- `reasoning`: brief explanation of why this verdict was chosen
|
|
273
|
+
|
|
274
|
+
### Report Output Paths
|
|
275
|
+
|
|
276
|
+
Save reports to two formats under the run directory:
|
|
277
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.json` — machine-readable, validated against `schemas/phase-report.schema.json`
|
|
278
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.md` — human-readable Markdown summary
|
|
279
|
+
|
|
280
|
+
The gating agent (`tooling/src/gating/agent.js`) consumes the JSON report to decide: **continue**, **loop_back**, or **escalate**.
|
|
281
|
+
|
|
282
|
+
### Report Fields Reference
|
|
283
|
+
|
|
284
|
+
All required fields per `schemas/phase-report.schema.json`:
|
|
285
|
+
|
|
286
|
+
| Field | Type | Required | Description |
|
|
287
|
+
|-------|------|----------|-------------|
|
|
288
|
+
| `phase_name` | string | yes | Review mode name (e.g., `"final"`, `"task-review"`) |
|
|
289
|
+
| `run_id` | string | yes | Current run identifier |
|
|
290
|
+
| `timestamp` | string (date-time) | yes | ISO 8601 timestamp of report generation |
|
|
291
|
+
| `attempted_actions` | array | yes | Findings mapped to action outcomes |
|
|
292
|
+
| `drift_analysis` | object | yes | Spec-vs-implementation drift summary |
|
|
293
|
+
| `quality_metrics` | object | yes | Test/lint/type results |
|
|
294
|
+
| `risk_flags` | array | yes | High-severity risk items |
|
|
295
|
+
| `decisions` | array | yes | Scope/approach decisions made |
|
|
296
|
+
| `verdict_recommendation` | object | no | Gating verdict based on `config/gating-rules.yaml` |
|
|
297
|
+
|
|
298
|
+
## Post-Review: Learn (final mode only)
|
|
299
|
+
|
|
300
|
+
After the final review verdict, extract durable learnings using the **learner role** (`roles/learner.md`).
|
|
301
|
+
|
|
302
|
+
### Step 1: Gather all findings
|
|
303
|
+
|
|
304
|
+
Collect review findings from ALL sources in this run:
|
|
305
|
+
- `.wazir/runs/<run-id>/reviews/` — all review pass logs (task-review, final review)
|
|
306
|
+
- Codex findings (attributed `[Codex]` or `[Both]`)
|
|
307
|
+
- Self-audit findings (if `run_audit` was enabled)
|
|
308
|
+
|
|
309
|
+
### Step 2: Identify learning candidates
|
|
310
|
+
|
|
311
|
+
A finding becomes a learning candidate if:
|
|
312
|
+
- It recurred across 2+ review passes within this run (same issue found repeatedly)
|
|
313
|
+
- It matches a finding from a prior run (check `memory/learnings/proposed/` and `accepted/` for similar patterns)
|
|
314
|
+
- It represents a class of mistake, not just a single instance (e.g., "missing error handling in async functions" vs "missing try-catch on line 42")
|
|
315
|
+
|
|
316
|
+
### Step 3: Write learning proposals
|
|
317
|
+
|
|
318
|
+
For each candidate, write a proposal to `memory/learnings/proposed/<run-id>-<NNN>.md`:
|
|
319
|
+
|
|
320
|
+
```markdown
|
|
321
|
+
---
|
|
322
|
+
artifact_type: proposed_learning
|
|
323
|
+
phase: learn
|
|
324
|
+
role: learner
|
|
325
|
+
run_id: <run-id>
|
|
326
|
+
status: proposed
|
|
327
|
+
sources:
|
|
328
|
+
- <review-file-1>
|
|
329
|
+
- <review-file-2>
|
|
330
|
+
approval_status: required
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
# Proposed Learning: <title>
|
|
334
|
+
|
|
335
|
+
## Scope
|
|
336
|
+
- **Roles:** [which roles should receive this learning — e.g., executor, reviewer]
|
|
337
|
+
- **Stacks:** [which tech stacks — e.g., node, react, or "all"]
|
|
338
|
+
- **Concerns:** [which concerns — e.g., error-handling, testing, security]
|
|
339
|
+
|
|
340
|
+
## Evidence
|
|
341
|
+
- [finding from review pass N: description]
|
|
342
|
+
- [finding from review pass M: same pattern]
|
|
343
|
+
- [optional: similar finding from prior run <run-id>]
|
|
344
|
+
|
|
345
|
+
## Learning
|
|
346
|
+
[The concrete, actionable instruction that should be injected into future executor context]
|
|
347
|
+
|
|
348
|
+
## Expected Benefit
|
|
349
|
+
[What this prevents in future runs]
|
|
350
|
+
|
|
351
|
+
## Confidence
|
|
352
|
+
- **Level:** low | medium | high
|
|
353
|
+
- **Basis:** [single run observation | multi-run recurrence | user correction]
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
### Step 4: Report
|
|
357
|
+
|
|
358
|
+
Present proposed learnings to the user:
|
|
359
|
+
|
|
360
|
+
> **Learnings proposed:** [count]
|
|
361
|
+
> - [title 1] (confidence: high, scope: executor/node)
|
|
362
|
+
> - [title 2] (confidence: medium, scope: reviewer/all)
|
|
363
|
+
>
|
|
364
|
+
> Proposals saved to `memory/learnings/proposed/`. Review and accept with `/wazir audit learnings`.
|
|
365
|
+
|
|
366
|
+
Learnings are NEVER auto-applied. They require explicit user acceptance before being injected into future runs.
|
|
367
|
+
|
|
368
|
+
## Post-Review: Prepare Next (final mode only)
|
|
369
|
+
|
|
370
|
+
After learning extraction, invoke the `prepare-next` skill to prepare the handoff:
|
|
371
|
+
|
|
372
|
+
### Handoff document
|
|
373
|
+
|
|
374
|
+
Write to `.wazir/runs/<run-id>/handoff.md`:
|
|
375
|
+
|
|
376
|
+
```markdown
|
|
377
|
+
# Handoff — <run-id>
|
|
378
|
+
|
|
379
|
+
**Status:** [Completed | Partial]
|
|
380
|
+
**Branch:** <branch-name>
|
|
381
|
+
**Date:** YYYY-MM-DD
|
|
382
|
+
|
|
383
|
+
## What Was Done
|
|
384
|
+
[List of completed tasks with commit hashes]
|
|
385
|
+
|
|
386
|
+
## Test Results
|
|
387
|
+
[Test count, pass/fail, validator status]
|
|
388
|
+
|
|
389
|
+
## Review Score
|
|
390
|
+
[Final review verdict and score]
|
|
391
|
+
|
|
392
|
+
## What's Next
|
|
393
|
+
[Pending items, deferred work, follow-up tasks]
|
|
394
|
+
|
|
395
|
+
## Open Bugs
|
|
396
|
+
[Any known issues discovered during this run]
|
|
397
|
+
|
|
398
|
+
## Learnings From This Run
|
|
399
|
+
[Key insights — what worked, what didn't, what to change]
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
### Cleanup
|
|
403
|
+
|
|
404
|
+
- Archive verbose intermediate review logs (compress to summary)
|
|
405
|
+
- Update `.wazir/runs/latest` symlink if creating a new run
|
|
406
|
+
- Do NOT mutate `input/` — it belongs to the user
|
|
407
|
+
- Do NOT auto-load proposed learnings into the next run
|
|
408
|
+
|
|
409
|
+
## Done
|
|
410
|
+
|
|
411
|
+
Present the verdict and offer next steps:
|
|
412
|
+
|
|
413
|
+
> **Review complete: [VERDICT] ([score]/70)**
|
|
414
|
+
>
|
|
415
|
+
> [Score breakdown and findings summary]
|
|
416
|
+
>
|
|
417
|
+
> **Learnings proposed:** [count] (see `memory/learnings/proposed/`)
|
|
418
|
+
> **Handoff:** `.wazir/runs/<run-id>/handoff.md`
|
|
419
|
+
>
|
|
420
|
+
> **What would you like to do?**
|
|
421
|
+
> 1. **Create a PR** (if PASS)
|
|
422
|
+
> 2. **Auto-fix and re-review** (if MINOR FIXES)
|
|
423
|
+
> 3. **Review findings in detail**
|
|
@@ -5,6 +5,19 @@ description: Run a structured audit on your codebase — security, code quality,
|
|
|
5
5
|
|
|
6
6
|
# Run Audit — Structured Codebase Audit Pipeline
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
## Overview
|
|
9
22
|
|
|
10
23
|
This skill runs a structured audit on your codebase. It collects three parameters interactively (audit type, scope, output mode), then feeds them through the pipeline: Research → Audit → Report or Plan.
|
|
@@ -5,6 +5,19 @@ description: Build a project profile from manifests, docs, tests, and `input/` s
|
|
|
5
5
|
|
|
6
6
|
# Scan Project
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Inspect the smallest set of repo surfaces needed to answer:
|
|
9
22
|
|
|
10
23
|
- what kind of project this is
|