@wazir-dev/cli 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +73 -4
- package/README.md +6 -6
- package/docs/concepts/architecture.md +1 -1
- package/docs/concepts/roles-and-workflows.md +2 -0
- package/docs/concepts/why-wazir.md +59 -0
- package/docs/decisions/2026-03-19-deferred-items.md +564 -0
- package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
- package/docs/readmes/INDEX.md +21 -5
- package/docs/readmes/features/expertise/README.md +2 -2
- package/docs/readmes/features/exports/README.md +2 -2
- package/docs/readmes/features/schemas/README.md +3 -0
- package/docs/readmes/features/skills/README.md +17 -0
- package/docs/readmes/features/skills/clarifier.md +5 -0
- package/docs/readmes/features/skills/claude-cli.md +5 -0
- package/docs/readmes/features/skills/codex-cli.md +5 -0
- package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
- package/docs/readmes/features/skills/executing-plans.md +5 -0
- package/docs/readmes/features/skills/executor.md +5 -0
- package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
- package/docs/readmes/features/skills/gemini-cli.md +5 -0
- package/docs/readmes/features/skills/humanize.md +5 -0
- package/docs/readmes/features/skills/init-pipeline.md +5 -0
- package/docs/readmes/features/skills/receiving-code-review.md +5 -0
- package/docs/readmes/features/skills/requesting-code-review.md +5 -0
- package/docs/readmes/features/skills/reviewer.md +5 -0
- package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
- package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
- package/docs/readmes/features/skills/wazir.md +5 -0
- package/docs/readmes/features/skills/writing-skills.md +5 -0
- package/docs/readmes/features/workflows/prepare-next.md +1 -1
- package/docs/reference/configuration-reference.md +47 -6
- package/docs/reference/launch-checklist.md +4 -4
- package/docs/reference/review-loop-pattern.md +117 -8
- package/docs/reference/roles-reference.md +1 -0
- package/docs/reference/skill-tiers.md +147 -0
- package/docs/reference/tooling-cli.md +3 -1
- package/docs/truth-claims.yaml +12 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
- package/exports/hosts/claude/.claude/settings.json +9 -0
- package/exports/hosts/claude/CLAUDE.md +1 -1
- package/exports/hosts/claude/export.manifest.json +4 -2
- package/exports/hosts/claude/host-package.json +3 -1
- package/exports/hosts/codex/AGENTS.md +1 -1
- package/exports/hosts/codex/export.manifest.json +4 -2
- package/exports/hosts/codex/host-package.json +3 -1
- package/exports/hosts/cursor/.cursor/hooks.json +4 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
- package/exports/hosts/cursor/export.manifest.json +4 -2
- package/exports/hosts/cursor/host-package.json +3 -1
- package/exports/hosts/gemini/GEMINI.md +1 -1
- package/exports/hosts/gemini/export.manifest.json +4 -2
- package/exports/hosts/gemini/host-package.json +3 -1
- package/hooks/context-mode-router +191 -0
- package/hooks/definitions/context_mode_router.yaml +19 -0
- package/hooks/hooks.json +31 -6
- package/hooks/protected-path-write-guard +8 -0
- package/hooks/routing-matrix.json +45 -0
- package/hooks/session-start +62 -1
- package/llms-full.txt +905 -132
- package/package.json +2 -3
- package/schemas/hook.schema.json +2 -1
- package/schemas/phase-report.schema.json +80 -0
- package/schemas/usage.schema.json +25 -1
- package/schemas/wazir-manifest.schema.json +19 -0
- package/skills/brainstorming/SKILL.md +18 -155
- package/skills/clarifier/SKILL.md +122 -98
- package/skills/claude-cli/SKILL.md +320 -0
- package/skills/codex-cli/SKILL.md +260 -0
- package/skills/debugging/SKILL.md +13 -0
- package/skills/design/SKILL.md +13 -0
- package/skills/dispatching-parallel-agents/SKILL.md +13 -0
- package/skills/executing-plans/SKILL.md +13 -0
- package/skills/executor/SKILL.md +72 -19
- package/skills/finishing-a-development-branch/SKILL.md +13 -0
- package/skills/gemini-cli/SKILL.md +260 -0
- package/skills/humanize/SKILL.md +13 -0
- package/skills/init-pipeline/SKILL.md +73 -164
- package/skills/prepare-next/SKILL.md +81 -10
- package/skills/receiving-code-review/SKILL.md +13 -0
- package/skills/requesting-code-review/SKILL.md +13 -0
- package/skills/reviewer/SKILL.md +287 -15
- package/skills/run-audit/SKILL.md +13 -0
- package/skills/scan-project/SKILL.md +13 -0
- package/skills/self-audit/SKILL.md +197 -16
- package/skills/subagent-driven-development/SKILL.md +13 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
- package/skills/subagent-driven-development/implementer-prompt.md +8 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
- package/skills/tdd/SKILL.md +13 -0
- package/skills/using-git-worktrees/SKILL.md +13 -0
- package/skills/using-skills/SKILL.md +13 -0
- package/skills/verification/SKILL.md +13 -0
- package/skills/wazir/SKILL.md +194 -377
- package/skills/writing-plans/SKILL.md +14 -1
- package/skills/writing-skills/SKILL.md +13 -0
- package/templates/artifacts/implementation-plan.md +3 -0
- package/templates/artifacts/tasks-template.md +133 -0
- package/templates/examples/phase-report.example.json +48 -0
- package/tooling/src/adapters/composition-engine.js +256 -0
- package/tooling/src/adapters/model-router.js +84 -0
- package/tooling/src/capture/command.js +24 -1
- package/tooling/src/capture/run-config.js +3 -1
- package/tooling/src/capture/store.js +24 -0
- package/tooling/src/capture/usage.js +106 -0
- package/tooling/src/checks/ac-matrix.js +256 -0
- package/tooling/src/checks/command-registry.js +12 -0
- package/tooling/src/checks/docs-truth.js +1 -1
- package/tooling/src/checks/skills.js +111 -0
- package/tooling/src/cli.js +9 -0
- package/tooling/src/commands/stats.js +161 -0
- package/tooling/src/commands/validate.js +5 -1
- package/tooling/src/export/compiler.js +33 -37
- package/tooling/src/gating/agent.js +145 -0
- package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
- package/tooling/src/hooks/routing-logic.js +69 -0
- package/tooling/src/init/auto-detect.js +260 -0
- package/tooling/src/init/command.js +95 -135
- package/tooling/src/input/scanner.js +46 -0
- package/tooling/src/reports/command.js +103 -0
- package/tooling/src/reports/phase-report.js +323 -0
- package/tooling/src/state/command.js +160 -0
- package/tooling/src/state/db.js +287 -0
- package/tooling/src/status/command.js +53 -1
- package/wazir.manifest.yaml +26 -14
|
@@ -5,6 +5,19 @@ description: Use when completing tasks, implementing major features, or before m
|
|
|
5
5
|
|
|
6
6
|
# Requesting Code Review
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Dispatch wz:code-reviewer subagent to catch issues before they cascade. The reviewer gets precisely crafted context for evaluation — never your session's history. This keeps the reviewer focused on the work product, not your thought process, and preserves your own context for continued work.
|
|
9
22
|
|
|
10
23
|
**Core principle:** Review early, review often. Review follows the loop pattern in `docs/reference/review-loop-pattern.md`. Dispatch the reviewer with explicit `--mode` and depth-aware loop parameters.
|
package/skills/reviewer/SKILL.md
CHANGED
|
@@ -5,10 +5,40 @@ description: Run the review phase — adversarial review of implementation again
|
|
|
5
5
|
|
|
6
6
|
# Reviewer
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
## Model Annotation
|
|
9
|
+
When multi-model mode is enabled:
|
|
10
|
+
- **Sonnet** for internal review passes (internal-review)
|
|
11
|
+
- **Opus** for final review mode (final-review)
|
|
12
|
+
- **Opus** for spec-challenge mode (spec-harden)
|
|
13
|
+
- **Opus** for design-review mode (design)
|
|
14
|
+
|
|
15
|
+
## Command Routing
|
|
16
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
17
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
18
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
19
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
20
|
+
|
|
21
|
+
## Codebase Exploration
|
|
22
|
+
1. Query `wazir index search-symbols <query>` first
|
|
23
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
24
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
25
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
26
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
27
|
+
|
|
28
|
+
Run the Final Review phase — or any review mode invoked by other phases.
|
|
9
29
|
|
|
10
30
|
The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
|
|
11
31
|
|
|
32
|
+
**Key principle for `final` mode:** Compare implementation against the **ORIGINAL INPUT** (briefing + input files), NOT the task specs. The executor's per-task reviewer already validated against task specs — that concern is covered. The final reviewer catches drift: does what we built match what the user actually asked for?
|
|
33
|
+
|
|
34
|
+
**Reviewer-owned responsibilities** (callers must NOT replicate these):
|
|
35
|
+
1. **Two-tier review** — internal review first (fast, cheap, expertise-loaded), Codex second (fresh eyes on clean code)
|
|
36
|
+
2. **Dimension selection** — the reviewer selects the correct dimension set for the review mode and depth
|
|
37
|
+
3. **Pass counting** — the reviewer tracks pass numbers and enforces the depth-based cap (quick=3, standard=5, deep=7)
|
|
38
|
+
4. **Finding attribution** — each finding is tagged `[Internal]`, `[Codex]`, or `[Both]` based on source
|
|
39
|
+
5. **Dimension set recording** — each review pass file records which canonical dimension set was used, enabling Phase Scoring (first vs final delta)
|
|
40
|
+
6. **Learning pipeline** — ALL findings (internal + Codex) feed into `state.sqlite` and the learning system
|
|
41
|
+
|
|
12
42
|
## Review Modes
|
|
13
43
|
|
|
14
44
|
The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
|
|
@@ -34,6 +64,23 @@ In `task-review` and `final` modes, flag missing CHANGELOG entries for user-faci
|
|
|
34
64
|
Prerequisites depend on the review mode:
|
|
35
65
|
|
|
36
66
|
### `final` mode
|
|
67
|
+
|
|
68
|
+
**Phase Prerequisites (Hard Gate):** Before proceeding, verify ALL of these artifacts exist. If ANY is missing, **STOP** and report which are missing.
|
|
69
|
+
|
|
70
|
+
- [ ] `.wazir/runs/latest/clarified/clarification.md`
|
|
71
|
+
- [ ] `.wazir/runs/latest/clarified/spec-hardened.md`
|
|
72
|
+
- [ ] `.wazir/runs/latest/clarified/design.md`
|
|
73
|
+
- [ ] `.wazir/runs/latest/clarified/execution-plan.md`
|
|
74
|
+
- [ ] `.wazir/runs/latest/artifacts/verification-proof.md`
|
|
75
|
+
|
|
76
|
+
If any file is missing:
|
|
77
|
+
|
|
78
|
+
> **Cannot run final review: missing prerequisite artifacts.**
|
|
79
|
+
>
|
|
80
|
+
> Missing: [list missing files]
|
|
81
|
+
>
|
|
82
|
+
> Run `/wazir:clarifier` (for clarified/* files) or `/wazir:executor` (for verification-proof.md) first.
|
|
83
|
+
|
|
37
84
|
1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
|
|
38
85
|
2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
|
|
39
86
|
3. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
@@ -48,13 +95,15 @@ Prerequisites depend on the review mode:
|
|
|
48
95
|
|
|
49
96
|
## Review Process (`final` mode)
|
|
50
97
|
|
|
98
|
+
**Input:** Read the ORIGINAL user input (`.wazir/input/briefing.md`, `input/` directory files) and compare against what was built. This catches intent drift that task-level review misses.
|
|
99
|
+
|
|
51
100
|
Perform adversarial review across 7 dimensions:
|
|
52
101
|
|
|
53
|
-
1. **Correctness** — Does the code do what the
|
|
54
|
-
2. **Completeness** — Are all
|
|
102
|
+
1. **Correctness** — Does the code do what the original input asked for?
|
|
103
|
+
2. **Completeness** — Are all requirements from the original input met?
|
|
55
104
|
3. **Wiring** — Are all paths connected end-to-end?
|
|
56
105
|
4. **Verification** — Is there evidence (tests, type checks) for each claim?
|
|
57
|
-
5. **Drift** — Does the implementation match the
|
|
106
|
+
5. **Drift** — Does the implementation match what the user originally requested? (not just the plan — the INPUT)
|
|
58
107
|
6. **Quality** — Code style, naming, error handling, security
|
|
59
108
|
7. **Documentation** — Changelog entries, commit messages, comments
|
|
60
109
|
|
|
@@ -76,11 +125,28 @@ Score each dimension 0-10. Total out of 70.
|
|
|
76
125
|
| **NEEDS REWORK** | 28-41 | Re-run affected tasks |
|
|
77
126
|
| **FAIL** | 0-27 | Fundamental issues |
|
|
78
127
|
|
|
79
|
-
##
|
|
128
|
+
## Two-Tier Review Flow
|
|
129
|
+
|
|
130
|
+
The review process has two tiers. Internal review catches ~80% of issues quickly and cheaply. Codex review provides fresh eyes on clean code.
|
|
131
|
+
|
|
132
|
+
### Tier 1: Internal Review (Fast, Cheap, Expertise-Loaded)
|
|
133
|
+
|
|
134
|
+
1. **Compose expertise:** Load relevant expertise modules from `expertise/composition-map.yaml` into context based on the review mode and detected stack. This gives the internal reviewer domain-specific knowledge.
|
|
135
|
+
2. **Run internal review** using the dimension set for the current mode. When multi-model is enabled, use **Sonnet** (not Opus) for internal review passes — it's fast and good enough for pattern matching against expertise.
|
|
136
|
+
3. **Produce findings:** Each finding is tagged `[Internal]` with severity (blocking, warning, note).
|
|
137
|
+
4. **Fix cycle:** If blocking findings exist, the executor fixes them. Re-run internal review. Repeat until clean or cap reached.
|
|
138
|
+
|
|
139
|
+
Internal review passes are logged to `.wazir/runs/latest/reviews/<mode>-internal-pass-<N>.md`.
|
|
80
140
|
|
|
81
|
-
|
|
141
|
+
### Tier 2: External Review (Fresh Eyes on Clean Code)
|
|
82
142
|
|
|
83
|
-
|
|
143
|
+
Only runs AFTER Tier 1 produces a clean pass (no blocking findings).
|
|
144
|
+
|
|
145
|
+
Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers:
|
|
146
|
+
|
|
147
|
+
#### Codex Review
|
|
148
|
+
|
|
149
|
+
**For detailed Codex CLI usage, see `wz:codex-cli` skill.**
|
|
84
150
|
|
|
85
151
|
If `codex` is in `multi_tool.tools`:
|
|
86
152
|
|
|
@@ -101,10 +167,10 @@ If `codex` is in `multi_tool.tools`:
|
|
|
101
167
|
2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
|
|
102
168
|
```
|
|
103
169
|
|
|
104
|
-
2.
|
|
105
|
-
3. Incorporate Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
|
|
170
|
+
2. **Extract findings only** (context protection): After tee, use `execute_file` to extract only the final findings from the Codex output (everything after the last `codex` marker). If context-mode is unavailable, use `tac <file> | sed '/^codex$/q' | tac | tail -n +2`. If no marker found, fail closed (0 findings, warn user). See `docs/reference/review-loop-pattern.md` "Codex Output Context Protection" for full protocol.
|
|
171
|
+
3. Incorporate extracted Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
|
|
106
172
|
|
|
107
|
-
**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use
|
|
173
|
+
**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use internal review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
|
|
108
174
|
|
|
109
175
|
**Code review scoping by mode:**
|
|
110
176
|
- Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
|
|
@@ -112,16 +178,51 @@ If `codex` is in `multi_tool.tools`:
|
|
|
112
178
|
- Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
|
|
113
179
|
- See `docs/reference/review-loop-pattern.md` for code review scoping rules.
|
|
114
180
|
|
|
115
|
-
|
|
181
|
+
#### Gemini Review
|
|
182
|
+
|
|
183
|
+
If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available. **For detailed Gemini CLI usage, see `wz:gemini-cli` skill.**
|
|
116
184
|
|
|
117
|
-
|
|
185
|
+
### Fix Cycle (Codex Findings)
|
|
186
|
+
|
|
187
|
+
If Codex produces blocking findings:
|
|
188
|
+
1. Executor fixes the Codex findings
|
|
189
|
+
2. Re-run internal review (quick pass) to verify fixes didn't introduce regressions
|
|
190
|
+
3. Optionally re-run Codex for a clean pass
|
|
118
191
|
|
|
119
192
|
### Merging Findings
|
|
120
193
|
|
|
121
194
|
The final review report must clearly attribute each finding:
|
|
122
|
-
- `[
|
|
123
|
-
- `[Codex]` — found by Codex
|
|
124
|
-
- `[
|
|
195
|
+
- `[Internal]` — found by Tier 1 internal review
|
|
196
|
+
- `[Codex]` — found by Tier 2 Codex review
|
|
197
|
+
- `[Gemini]` — found by Tier 2 Gemini review
|
|
198
|
+
- `[Both]` — found independently by multiple sources
|
|
199
|
+
|
|
200
|
+
### Finding Persistence (Learning Pipeline)
|
|
201
|
+
|
|
202
|
+
ALL findings from both tiers are persisted to `state.sqlite` for cross-run learning:
|
|
203
|
+
|
|
204
|
+
```javascript
|
|
205
|
+
// After each review pass
|
|
206
|
+
const { insertFinding, getRecurringFindingHashes } = require('tooling/src/state/db');
|
|
207
|
+
const db = openStateDb(stateRoot);
|
|
208
|
+
|
|
209
|
+
for (const finding of allFindings) {
|
|
210
|
+
insertFinding(db, {
|
|
211
|
+
run_id: runId,
|
|
212
|
+
phase: reviewMode,
|
|
213
|
+
source: finding.attribution, // 'internal', 'codex', 'gemini'
|
|
214
|
+
severity: finding.severity,
|
|
215
|
+
description: finding.description,
|
|
216
|
+
finding_hash: hashFinding(finding.description),
|
|
217
|
+
});
|
|
218
|
+
}
|
|
219
|
+
|
|
220
|
+
// Check for recurring patterns
|
|
221
|
+
const recurring = getRecurringFindingHashes(db, 2);
|
|
222
|
+
// Recurring findings → auto-propose as learnings in the learn phase
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
This is how Wazir evolves — findings that recur across runs become accepted learnings injected into future executor context, preventing the same mistakes.
|
|
125
226
|
|
|
126
227
|
## Task-Review Log Filenames
|
|
127
228
|
|
|
@@ -137,6 +238,174 @@ Save review results to `.wazir/runs/latest/reviews/review.md` with:
|
|
|
137
238
|
- Score breakdown
|
|
138
239
|
- Verdict
|
|
139
240
|
|
|
241
|
+
## Phase Report Generation
|
|
242
|
+
|
|
243
|
+
After completing any review pass, generate a phase report following `schemas/phase-report.schema.json`:
|
|
244
|
+
|
|
245
|
+
1. **`attempted_actions`** — Populate from the review findings. Each finding becomes an action entry:
|
|
246
|
+
- `description`: the finding summary
|
|
247
|
+
- `outcome`: `"success"` if the finding passed, `"fail"` if it is a blocking issue, `"uncertain"` if ambiguous
|
|
248
|
+
- `evidence`: the rationale or evidence supporting the outcome
|
|
249
|
+
|
|
250
|
+
2. **`drift_analysis`** — Compare review findings against the approved spec:
|
|
251
|
+
- `delta`: count of deviations between implementation and spec (0 = no drift)
|
|
252
|
+
- `description`: summary of any drift detected and its impact
|
|
253
|
+
|
|
254
|
+
3. **`quality_metrics`** — Populate from test, lint, and type-check results gathered during review:
|
|
255
|
+
- `test_pass_count`, `test_fail_count`: from test runner output
|
|
256
|
+
- `lint_errors`: from linter output
|
|
257
|
+
- `type_errors`: from type checker output
|
|
258
|
+
|
|
259
|
+
4. **`risk_flags`** — Populate from any high-severity findings:
|
|
260
|
+
- `severity`: `"low"`, `"medium"`, or `"high"`
|
|
261
|
+
- `description`: what the risk is
|
|
262
|
+
- `mitigation`: recommended mitigation (if known)
|
|
263
|
+
|
|
264
|
+
5. **`decisions`** — Populate from any scope or approach decisions made during the review:
|
|
265
|
+
- `description`: what was decided
|
|
266
|
+
- `rationale`: why
|
|
267
|
+
- `alternatives_considered`: other options evaluated (optional)
|
|
268
|
+
- `source`: `"[Wazir]"`, `"[Codex]"`, or `"[Both]"` (optional)
|
|
269
|
+
|
|
270
|
+
6. **`verdict_recommendation`** — Set based on the gating rules in `config/gating-rules.yaml`:
|
|
271
|
+
- `verdict`: `"continue"` (PASS), `"loop_back"` (NEEDS MINOR FIXES / NEEDS REWORK), or `"escalate"` (FAIL with fundamental issues)
|
|
272
|
+
- `reasoning`: brief explanation of why this verdict was chosen
|
|
273
|
+
|
|
274
|
+
### Report Output Paths
|
|
275
|
+
|
|
276
|
+
Save reports to two formats under the run directory:
|
|
277
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.json` — machine-readable, validated against `schemas/phase-report.schema.json`
|
|
278
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.md` — human-readable Markdown summary
|
|
279
|
+
|
|
280
|
+
The gating agent (`tooling/src/gating/agent.js`) consumes the JSON report to decide: **continue**, **loop_back**, or **escalate**.
|
|
281
|
+
|
|
282
|
+
### Report Fields Reference
|
|
283
|
+
|
|
284
|
+
All required fields per `schemas/phase-report.schema.json`:
|
|
285
|
+
|
|
286
|
+
| Field | Type | Required | Description |
|
|
287
|
+
|-------|------|----------|-------------|
|
|
288
|
+
| `phase_name` | string | yes | Review mode name (e.g., `"final"`, `"task-review"`) |
|
|
289
|
+
| `run_id` | string | yes | Current run identifier |
|
|
290
|
+
| `timestamp` | string (date-time) | yes | ISO 8601 timestamp of report generation |
|
|
291
|
+
| `attempted_actions` | array | yes | Findings mapped to action outcomes |
|
|
292
|
+
| `drift_analysis` | object | yes | Spec-vs-implementation drift summary |
|
|
293
|
+
| `quality_metrics` | object | yes | Test/lint/type results |
|
|
294
|
+
| `risk_flags` | array | yes | High-severity risk items |
|
|
295
|
+
| `decisions` | array | yes | Scope/approach decisions made |
|
|
296
|
+
| `verdict_recommendation` | object | no | Gating verdict based on `config/gating-rules.yaml` |
|
|
297
|
+
|
|
298
|
+
## Post-Review: Learn (final mode only)
|
|
299
|
+
|
|
300
|
+
After the final review verdict, extract durable learnings using the **learner role** (`roles/learner.md`).
|
|
301
|
+
|
|
302
|
+
### Step 1: Gather all findings
|
|
303
|
+
|
|
304
|
+
Collect review findings from ALL sources in this run:
|
|
305
|
+
- `.wazir/runs/<run-id>/reviews/` — all review pass logs (task-review, final review)
|
|
306
|
+
- Codex findings (attributed `[Codex]` or `[Both]`)
|
|
307
|
+
- Self-audit findings (if `run_audit` was enabled)
|
|
308
|
+
|
|
309
|
+
### Step 2: Identify learning candidates
|
|
310
|
+
|
|
311
|
+
A finding becomes a learning candidate if:
|
|
312
|
+
- It recurred across 2+ review passes within this run (same issue found repeatedly)
|
|
313
|
+
- It matches a finding from a prior run (check `memory/learnings/proposed/` and `accepted/` for similar patterns)
|
|
314
|
+
- It represents a class of mistake, not just a single instance (e.g., "missing error handling in async functions" vs "missing try-catch on line 42")
|
|
315
|
+
|
|
316
|
+
### Step 3: Write learning proposals
|
|
317
|
+
|
|
318
|
+
For each candidate, write a proposal to `memory/learnings/proposed/<run-id>-<NNN>.md`:
|
|
319
|
+
|
|
320
|
+
```markdown
|
|
321
|
+
---
|
|
322
|
+
artifact_type: proposed_learning
|
|
323
|
+
phase: learn
|
|
324
|
+
role: learner
|
|
325
|
+
run_id: <run-id>
|
|
326
|
+
status: proposed
|
|
327
|
+
sources:
|
|
328
|
+
- <review-file-1>
|
|
329
|
+
- <review-file-2>
|
|
330
|
+
approval_status: required
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
# Proposed Learning: <title>
|
|
334
|
+
|
|
335
|
+
## Scope
|
|
336
|
+
- **Roles:** [which roles should receive this learning — e.g., executor, reviewer]
|
|
337
|
+
- **Stacks:** [which tech stacks — e.g., node, react, or "all"]
|
|
338
|
+
- **Concerns:** [which concerns — e.g., error-handling, testing, security]
|
|
339
|
+
|
|
340
|
+
## Evidence
|
|
341
|
+
- [finding from review pass N: description]
|
|
342
|
+
- [finding from review pass M: same pattern]
|
|
343
|
+
- [optional: similar finding from prior run <run-id>]
|
|
344
|
+
|
|
345
|
+
## Learning
|
|
346
|
+
[The concrete, actionable instruction that should be injected into future executor context]
|
|
347
|
+
|
|
348
|
+
## Expected Benefit
|
|
349
|
+
[What this prevents in future runs]
|
|
350
|
+
|
|
351
|
+
## Confidence
|
|
352
|
+
- **Level:** low | medium | high
|
|
353
|
+
- **Basis:** [single run observation | multi-run recurrence | user correction]
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
### Step 4: Report
|
|
357
|
+
|
|
358
|
+
Present proposed learnings to the user:
|
|
359
|
+
|
|
360
|
+
> **Learnings proposed:** [count]
|
|
361
|
+
> - [title 1] (confidence: high, scope: executor/node)
|
|
362
|
+
> - [title 2] (confidence: medium, scope: reviewer/all)
|
|
363
|
+
>
|
|
364
|
+
> Proposals saved to `memory/learnings/proposed/`. Review and accept with `/wazir audit learnings`.
|
|
365
|
+
|
|
366
|
+
Learnings are NEVER auto-applied. They require explicit user acceptance before being injected into future runs.
|
|
367
|
+
|
|
368
|
+
## Post-Review: Prepare Next (final mode only)
|
|
369
|
+
|
|
370
|
+
After learning extraction, invoke the `prepare-next` skill to prepare the handoff:
|
|
371
|
+
|
|
372
|
+
### Handoff document
|
|
373
|
+
|
|
374
|
+
Write to `.wazir/runs/<run-id>/handoff.md`:
|
|
375
|
+
|
|
376
|
+
```markdown
|
|
377
|
+
# Handoff — <run-id>
|
|
378
|
+
|
|
379
|
+
**Status:** [Completed | Partial]
|
|
380
|
+
**Branch:** <branch-name>
|
|
381
|
+
**Date:** YYYY-MM-DD
|
|
382
|
+
|
|
383
|
+
## What Was Done
|
|
384
|
+
[List of completed tasks with commit hashes]
|
|
385
|
+
|
|
386
|
+
## Test Results
|
|
387
|
+
[Test count, pass/fail, validator status]
|
|
388
|
+
|
|
389
|
+
## Review Score
|
|
390
|
+
[Final review verdict and score]
|
|
391
|
+
|
|
392
|
+
## What's Next
|
|
393
|
+
[Pending items, deferred work, follow-up tasks]
|
|
394
|
+
|
|
395
|
+
## Open Bugs
|
|
396
|
+
[Any known issues discovered during this run]
|
|
397
|
+
|
|
398
|
+
## Learnings From This Run
|
|
399
|
+
[Key insights — what worked, what didn't, what to change]
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
### Cleanup
|
|
403
|
+
|
|
404
|
+
- Archive verbose intermediate review logs (compress to summary)
|
|
405
|
+
- Update `.wazir/runs/latest` symlink if creating a new run
|
|
406
|
+
- Do NOT mutate `input/` — it belongs to the user
|
|
407
|
+
- Do NOT auto-load proposed learnings into the next run
|
|
408
|
+
|
|
140
409
|
## Done
|
|
141
410
|
|
|
142
411
|
Present the verdict and offer next steps:
|
|
@@ -145,6 +414,9 @@ Present the verdict and offer next steps:
|
|
|
145
414
|
>
|
|
146
415
|
> [Score breakdown and findings summary]
|
|
147
416
|
>
|
|
417
|
+
> **Learnings proposed:** [count] (see `memory/learnings/proposed/`)
|
|
418
|
+
> **Handoff:** `.wazir/runs/<run-id>/handoff.md`
|
|
419
|
+
>
|
|
148
420
|
> **What would you like to do?**
|
|
149
421
|
> 1. **Create a PR** (if PASS)
|
|
150
422
|
> 2. **Auto-fix and re-review** (if MINOR FIXES)
|
|
@@ -5,6 +5,19 @@ description: Run a structured audit on your codebase — security, code quality,
|
|
|
5
5
|
|
|
6
6
|
# Run Audit — Structured Codebase Audit Pipeline
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
## Overview
|
|
9
22
|
|
|
10
23
|
This skill runs a structured audit on your codebase. It collects three parameters interactively (audit type, scope, output mode), then feeds them through the pipeline: Research → Audit → Report or Plan.
|
|
@@ -5,6 +5,19 @@ description: Build a project profile from manifests, docs, tests, and `input/` s
|
|
|
5
5
|
|
|
6
6
|
# Scan Project
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Inspect the smallest set of repo surfaces needed to answer:
|
|
9
22
|
|
|
10
23
|
- what kind of project this is
|