@wazir-dev/cli 1.1.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +74 -10
- package/README.md +15 -15
- package/assets/demo.cast +47 -0
- package/assets/demo.gif +0 -0
- package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
- package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
- package/docs/concepts/architecture.md +1 -1
- package/docs/concepts/roles-and-workflows.md +2 -0
- package/docs/concepts/why-wazir.md +59 -0
- package/docs/decisions/2026-03-19-deferred-items.md +564 -0
- package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
- package/docs/readmes/INDEX.md +21 -5
- package/docs/readmes/features/expertise/README.md +2 -2
- package/docs/readmes/features/exports/README.md +2 -2
- package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
- package/docs/readmes/features/schemas/README.md +3 -0
- package/docs/readmes/features/skills/README.md +17 -0
- package/docs/readmes/features/skills/clarifier.md +5 -0
- package/docs/readmes/features/skills/claude-cli.md +5 -0
- package/docs/readmes/features/skills/codex-cli.md +5 -0
- package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
- package/docs/readmes/features/skills/executing-plans.md +5 -0
- package/docs/readmes/features/skills/executor.md +5 -0
- package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
- package/docs/readmes/features/skills/gemini-cli.md +5 -0
- package/docs/readmes/features/skills/humanize.md +5 -0
- package/docs/readmes/features/skills/init-pipeline.md +5 -0
- package/docs/readmes/features/skills/receiving-code-review.md +5 -0
- package/docs/readmes/features/skills/requesting-code-review.md +5 -0
- package/docs/readmes/features/skills/reviewer.md +5 -0
- package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
- package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
- package/docs/readmes/features/skills/wazir.md +5 -0
- package/docs/readmes/features/skills/writing-skills.md +5 -0
- package/docs/readmes/features/workflows/prepare-next.md +1 -1
- package/docs/reference/configuration-reference.md +47 -6
- package/docs/reference/hooks.md +1 -0
- package/docs/reference/launch-checklist.md +4 -4
- package/docs/reference/review-loop-pattern.md +119 -9
- package/docs/reference/roles-reference.md +1 -0
- package/docs/reference/skill-tiers.md +147 -0
- package/docs/reference/tooling-cli.md +3 -1
- package/docs/truth-claims.yaml +12 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +214 -1
- package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
- package/exports/hosts/claude/.claude/commands/verify.md +30 -1
- package/exports/hosts/claude/.claude/settings.json +9 -0
- package/exports/hosts/claude/CLAUDE.md +1 -1
- package/exports/hosts/claude/export.manifest.json +6 -4
- package/exports/hosts/claude/host-package.json +3 -1
- package/exports/hosts/codex/AGENTS.md +1 -1
- package/exports/hosts/codex/export.manifest.json +6 -4
- package/exports/hosts/codex/host-package.json +3 -1
- package/exports/hosts/cursor/.cursor/hooks.json +4 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
- package/exports/hosts/cursor/export.manifest.json +6 -4
- package/exports/hosts/cursor/host-package.json +3 -1
- package/exports/hosts/gemini/GEMINI.md +1 -1
- package/exports/hosts/gemini/export.manifest.json +6 -4
- package/exports/hosts/gemini/host-package.json +3 -1
- package/hooks/context-mode-router +191 -0
- package/hooks/definitions/context_mode_router.yaml +19 -0
- package/hooks/hooks.json +31 -6
- package/hooks/protected-path-write-guard +8 -0
- package/hooks/routing-matrix.json +45 -0
- package/hooks/session-start +62 -1
- package/llms-full.txt +937 -134
- package/package.json +2 -4
- package/schemas/hook.schema.json +2 -1
- package/schemas/phase-report.schema.json +89 -0
- package/schemas/usage.schema.json +25 -1
- package/schemas/wazir-manifest.schema.json +19 -0
- package/skills/brainstorming/SKILL.md +32 -157
- package/skills/clarifier/SKILL.md +289 -111
- package/skills/claude-cli/SKILL.md +320 -0
- package/skills/codex-cli/SKILL.md +260 -0
- package/skills/debugging/SKILL.md +13 -0
- package/skills/design/SKILL.md +13 -0
- package/skills/dispatching-parallel-agents/SKILL.md +13 -0
- package/skills/executing-plans/SKILL.md +13 -0
- package/skills/executor/SKILL.md +139 -19
- package/skills/finishing-a-development-branch/SKILL.md +13 -0
- package/skills/gemini-cli/SKILL.md +260 -0
- package/skills/humanize/SKILL.md +13 -0
- package/skills/init-pipeline/SKILL.md +72 -164
- package/skills/prepare-next/SKILL.md +81 -10
- package/skills/receiving-code-review/SKILL.md +13 -0
- package/skills/requesting-code-review/SKILL.md +13 -0
- package/skills/reviewer/SKILL.md +369 -24
- package/skills/run-audit/SKILL.md +13 -0
- package/skills/scan-project/SKILL.md +13 -0
- package/skills/self-audit/SKILL.md +217 -16
- package/skills/skill-research/SKILL.md +188 -0
- package/skills/subagent-driven-development/SKILL.md +13 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
- package/skills/subagent-driven-development/implementer-prompt.md +8 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
- package/skills/tdd/SKILL.md +13 -0
- package/skills/using-git-worktrees/SKILL.md +13 -0
- package/skills/using-skills/SKILL.md +13 -0
- package/skills/verification/SKILL.md +54 -3
- package/skills/wazir/SKILL.md +464 -381
- package/skills/writing-plans/SKILL.md +14 -1
- package/skills/writing-skills/SKILL.md +13 -0
- package/templates/artifacts/implementation-plan.md +3 -0
- package/templates/artifacts/tasks-template.md +133 -0
- package/templates/examples/phase-report.example.json +48 -0
- package/tooling/src/adapters/composition-engine.js +256 -0
- package/tooling/src/adapters/model-router.js +84 -0
- package/tooling/src/capture/command.js +41 -2
- package/tooling/src/capture/run-config.js +3 -1
- package/tooling/src/capture/store.js +56 -0
- package/tooling/src/capture/usage.js +106 -0
- package/tooling/src/capture/user-input.js +66 -0
- package/tooling/src/checks/ac-matrix.js +256 -0
- package/tooling/src/checks/command-registry.js +12 -0
- package/tooling/src/checks/docs-truth.js +1 -1
- package/tooling/src/checks/security-sensitivity.js +69 -0
- package/tooling/src/checks/skills.js +111 -0
- package/tooling/src/cli.js +31 -20
- package/tooling/src/commands/stats.js +161 -0
- package/tooling/src/commands/validate.js +5 -1
- package/tooling/src/export/compiler.js +33 -37
- package/tooling/src/gating/agent.js +145 -0
- package/tooling/src/guards/phase-prerequisite-guard.js +185 -0
- package/tooling/src/hooks/routing-logic.js +69 -0
- package/tooling/src/init/auto-detect.js +258 -0
- package/tooling/src/init/command.js +38 -170
- package/tooling/src/input/scanner.js +46 -0
- package/tooling/src/reports/command.js +103 -0
- package/tooling/src/reports/phase-report.js +323 -0
- package/tooling/src/state/command.js +160 -0
- package/tooling/src/state/db.js +287 -0
- package/tooling/src/status/command.js +58 -1
- package/tooling/src/verify/proof-collector.js +299 -0
- package/wazir.manifest.yaml +26 -14
- package/workflows/plan-review.md +3 -1
- package/workflows/verify.md +30 -1
package/skills/reviewer/SKILL.md
CHANGED
|
@@ -5,10 +5,40 @@ description: Run the review phase — adversarial review of implementation again
|
|
|
5
5
|
|
|
6
6
|
# Reviewer
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
## Model Annotation
|
|
9
|
+
When multi-model mode is enabled:
|
|
10
|
+
- **Sonnet** for internal review passes (internal-review)
|
|
11
|
+
- **Opus** for final review mode (final-review)
|
|
12
|
+
- **Opus** for spec-challenge mode (spec-harden)
|
|
13
|
+
- **Opus** for design-review mode (design)
|
|
14
|
+
|
|
15
|
+
## Command Routing
|
|
16
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
17
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
18
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
19
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
20
|
+
|
|
21
|
+
## Codebase Exploration
|
|
22
|
+
1. Query `wazir index search-symbols <query>` first
|
|
23
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
24
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
25
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
26
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
27
|
+
|
|
28
|
+
Run the Final Review phase — or any review mode invoked by other phases.
|
|
9
29
|
|
|
10
30
|
The reviewer role owns all review loops across the pipeline: research-review, clarification-review, spec-challenge, design-review, plan-review, per-task execution review, and final review. Each uses phase-specific dimensions from `docs/reference/review-loop-pattern.md`.
|
|
11
31
|
|
|
32
|
+
**Key principle for `final` mode:** Compare implementation against the **ORIGINAL INPUT** (briefing + input files), NOT the task specs. The executor's per-task reviewer already validated against task specs — that concern is covered. The final reviewer catches drift: does what we built match what the user actually asked for?
|
|
33
|
+
|
|
34
|
+
**Reviewer-owned responsibilities** (callers must NOT replicate these):
|
|
35
|
+
1. **Two-tier review** — internal review first (fast, cheap, expertise-loaded), Codex second (fresh eyes on clean code)
|
|
36
|
+
2. **Dimension selection** — the reviewer selects the correct dimension set for the review mode and depth
|
|
37
|
+
3. **Pass counting** — the reviewer tracks pass numbers and enforces the depth-based cap (quick=3, standard=5, deep=7)
|
|
38
|
+
4. **Finding attribution** — each finding is tagged `[Internal]`, `[Codex]`, or `[Both]` based on source
|
|
39
|
+
5. **Dimension set recording** — each review pass file records which canonical dimension set was used, enabling Phase Scoring (first vs final delta)
|
|
40
|
+
6. **Learning pipeline** — ALL findings (internal + Codex) feed into `state.sqlite` and the learning system
|
|
41
|
+
|
|
12
42
|
## Review Modes
|
|
13
43
|
|
|
14
44
|
The reviewer operates in different modes depending on the phase. Mode MUST be passed explicitly by the caller (`--mode <mode>`). The reviewer does NOT auto-detect mode from artifact availability. If `--mode` is not provided, ask the user which review to run.
|
|
@@ -18,7 +48,7 @@ The reviewer operates in different modes depending on the phase. Mode MUST be pa
|
|
|
18
48
|
| `final` | After execution + verification | Completed task artifacts, approved spec/plan/design | 7 final-review dims, scored 0-70 | Scored verdict (PASS/FAIL) |
|
|
19
49
|
| `spec-challenge` | After specify | Draft spec artifact | 5 spec/clarification dims | Pass/fix loop, no score |
|
|
20
50
|
| `design-review` | After design approval | Design artifact, approved spec | 5 design-review dims (canonical) | Pass/fix loop, no score |
|
|
21
|
-
| `plan-review` | After planning | Draft plan artifact |
|
|
51
|
+
| `plan-review` | After planning | Draft plan artifact | 8 plan dims (7 + input coverage) | Pass/fix loop, no score |
|
|
22
52
|
| `task-review` | During execution, per task | Uncommitted changes or `--base` SHA | 5 task-execution dims (correctness, tests, wiring, drift, quality) | Pass/fix loop, no score |
|
|
23
53
|
| `research-review` | During discover | Research artifact | 5 research dims | Pass/fix loop, no score |
|
|
24
54
|
| `clarification-review` | During clarify | Clarification artifact | 5 spec/clarification dims | Pass/fix loop, no score |
|
|
@@ -34,6 +64,23 @@ In `task-review` and `final` modes, flag missing CHANGELOG entries for user-faci
|
|
|
34
64
|
Prerequisites depend on the review mode:
|
|
35
65
|
|
|
36
66
|
### `final` mode
|
|
67
|
+
|
|
68
|
+
**Phase Prerequisites (Hard Gate):** Before proceeding, verify ALL of these artifacts exist. If ANY is missing, **STOP** and report which are missing.
|
|
69
|
+
|
|
70
|
+
- [ ] `.wazir/runs/latest/clarified/clarification.md`
|
|
71
|
+
- [ ] `.wazir/runs/latest/clarified/spec-hardened.md`
|
|
72
|
+
- [ ] `.wazir/runs/latest/clarified/design.md`
|
|
73
|
+
- [ ] `.wazir/runs/latest/clarified/execution-plan.md`
|
|
74
|
+
- [ ] `.wazir/runs/latest/artifacts/verification-proof.md`
|
|
75
|
+
|
|
76
|
+
If any file is missing:
|
|
77
|
+
|
|
78
|
+
> **Cannot run final review: missing prerequisite artifacts.**
|
|
79
|
+
>
|
|
80
|
+
> Missing: [list missing files]
|
|
81
|
+
>
|
|
82
|
+
> Run `/wazir:clarifier` (for clarified/* files) or `/wazir:executor` (for verification-proof.md) first.
|
|
83
|
+
|
|
37
84
|
1. Check `.wazir/runs/latest/artifacts/` has completed task artifacts. If not, tell the user to run `/wazir:executor` first.
|
|
38
85
|
2. Read the approved spec, plan, and design from `.wazir/runs/latest/clarified/`.
|
|
39
86
|
3. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
@@ -41,22 +88,42 @@ Prerequisites depend on the review mode:
|
|
|
41
88
|
### `task-review` mode
|
|
42
89
|
1. Uncommitted changes exist for the current task, or a `--base` SHA is provided for committed changes.
|
|
43
90
|
2. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
91
|
+
3. **Commit discipline check:** If uncommitted changes span work from multiple tasks (e.g., files from task N and task N+1 are both modified), REJECT immediately: "REJECTED: Multiple tasks in single commit. Split into per-task commits before review." This is a blocking finding — no other dimensions are evaluated until resolved.
|
|
92
|
+
4. **Security sensitivity check:** Run `detectSecurityPatterns` from `tooling/src/checks/security-sensitivity.js` against the diff. If `triggered === true`, add the 6 security review dimensions (injection, auth bypass, data exposure, CSRF/SSRF, XSS, secrets leakage) to the standard 5 task-execution dimensions for this review pass. Security findings use severity levels: critical (exploitable), high (likely exploitable), medium (defense-in-depth gap), low (best-practice deviation).
|
|
44
93
|
|
|
45
94
|
### `spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes
|
|
46
95
|
1. The appropriate input artifact for the mode exists.
|
|
47
96
|
2. Read `.wazir/state/config.json` for depth and multi_tool settings.
|
|
97
|
+
3. **`plan-review` additional dimension — Input Coverage:**
|
|
98
|
+
- Read the original input/briefing from `.wazir/input/briefing.md` and any `input/*.md` files
|
|
99
|
+
- Count distinct items/requirements in the input
|
|
100
|
+
- Count tasks in the execution plan
|
|
101
|
+
- If `tasks_in_plan < items_in_input` → **HIGH** finding: "Plan covers [N] of [M] input items. Missing: [list of uncovered items]"
|
|
102
|
+
- If `tasks_in_plan >= items_in_input` → dimension passes
|
|
103
|
+
- One task MAY cover multiple input items if justified in the task description
|
|
104
|
+
- This is the review-level enforcement of the "no scope reduction" rule
|
|
48
105
|
|
|
49
106
|
## Review Process (`final` mode)
|
|
50
107
|
|
|
108
|
+
**Before starting this phase, output to the user:**
|
|
109
|
+
|
|
110
|
+
> **Final Review** — About to run adversarial 7-dimension review comparing your implementation against the original input, not just the task specs. The executor's per-task reviewer already validated correctness per-task — this catches drift between what you asked for and what was actually built.
|
|
111
|
+
>
|
|
112
|
+
> **Why this matters:** Without this, implementation drift ships undetected. Per-task review confirms each task matches its spec, but cannot catch: tasks that collectively miss the original intent, scope creep that added unrequested features, or acceptance criteria that were rewritten to match implementation instead of input.
|
|
113
|
+
>
|
|
114
|
+
> **Looking for:** Logic errors, missing features, dead code, unsubstantiated "it works" claims, scope creep, security gaps, stale documentation
|
|
115
|
+
|
|
116
|
+
**Input:** Read the ORIGINAL user input (`.wazir/input/briefing.md`, `input/` directory files) and compare against what was built. This catches intent drift that task-level review misses.
|
|
117
|
+
|
|
51
118
|
Perform adversarial review across 7 dimensions:
|
|
52
119
|
|
|
53
|
-
1. **Correctness** — Does the code do what the spec
|
|
54
|
-
2. **Completeness** — Are all acceptance criteria
|
|
55
|
-
3. **Wiring** — Are all paths connected end-to-end?
|
|
56
|
-
4. **Verification** — Is there evidence (tests, type checks) for each claim?
|
|
57
|
-
5. **Drift** — Does the implementation match the
|
|
58
|
-
6. **Quality** — Code style, naming, error handling, security
|
|
59
|
-
7. **Documentation** — Changelog entries, commit messages, comments
|
|
120
|
+
1. **Correctness** — Does the code do what the original input asked for? *(catches: logic errors, wrong behavior, spec violations)*
|
|
121
|
+
2. **Completeness** — Are all requirements from the original input met? *(catches: missing features, unimplemented acceptance criteria, partially delivered items)*
|
|
122
|
+
3. **Wiring** — Are all paths connected end-to-end? *(catches: dead code, disconnected paths, missing imports, orphaned routes)*
|
|
123
|
+
4. **Verification** — Is there evidence (tests, type checks) for each claim? *(catches: false claims of "it works" without evidence, untested code paths, missing type coverage)*
|
|
124
|
+
5. **Drift** — Does the implementation match what the user originally requested? (not just the plan — the INPUT) *(catches: scope creep, plan deviations, unauthorized changes, gold-plating)*
|
|
125
|
+
6. **Quality** — Code style, naming, error handling, security *(catches: security vulnerabilities, poor error handling, inconsistent naming, missing input validation)*
|
|
126
|
+
7. **Documentation** — Changelog entries, commit messages, comments *(catches: missing changelogs, wrong commit messages, stale comments, undocumented breaking changes)*
|
|
60
127
|
|
|
61
128
|
## Context Retrieval
|
|
62
129
|
|
|
@@ -76,11 +143,28 @@ Score each dimension 0-10. Total out of 70.
|
|
|
76
143
|
| **NEEDS REWORK** | 28-41 | Re-run affected tasks |
|
|
77
144
|
| **FAIL** | 0-27 | Fundamental issues |
|
|
78
145
|
|
|
79
|
-
##
|
|
146
|
+
## Two-Tier Review Flow
|
|
147
|
+
|
|
148
|
+
The review process has two tiers. Internal review catches ~80% of issues quickly and cheaply. Codex review provides fresh eyes on clean code.
|
|
149
|
+
|
|
150
|
+
### Tier 1: Internal Review (Fast, Cheap, Expertise-Loaded)
|
|
151
|
+
|
|
152
|
+
1. **Compose expertise:** Load relevant expertise modules from `expertise/composition-map.yaml` into context based on the review mode and detected stack. This gives the internal reviewer domain-specific knowledge.
|
|
153
|
+
2. **Run internal review** using the dimension set for the current mode. When multi-model is enabled, use **Sonnet** (not Opus) for internal review passes — it's fast and good enough for pattern matching against expertise.
|
|
154
|
+
3. **Produce findings:** Each finding is tagged `[Internal]` with severity (blocking, warning, note).
|
|
155
|
+
4. **Fix cycle:** If blocking findings exist, the executor fixes them. Re-run internal review. Repeat until clean or cap reached.
|
|
156
|
+
|
|
157
|
+
Internal review passes are logged to `.wazir/runs/latest/reviews/<mode>-internal-pass-<N>.md`.
|
|
158
|
+
|
|
159
|
+
### Tier 2: External Review (Fresh Eyes on Clean Code)
|
|
160
|
+
|
|
161
|
+
Only runs AFTER Tier 1 produces a clean pass (no blocking findings).
|
|
80
162
|
|
|
81
|
-
Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers
|
|
163
|
+
Read `.wazir/state/config.json`. If `multi_tool.tools` includes external reviewers:
|
|
82
164
|
|
|
83
|
-
|
|
165
|
+
#### Codex Review
|
|
166
|
+
|
|
167
|
+
**For detailed Codex CLI usage, see `wz:codex-cli` skill.**
|
|
84
168
|
|
|
85
169
|
If `codex` is in `multi_tool.tools`:
|
|
86
170
|
|
|
@@ -101,10 +185,10 @@ If `codex` is in `multi_tool.tools`:
|
|
|
101
185
|
2>&1 | tee .wazir/runs/latest/reviews/codex-review.md
|
|
102
186
|
```
|
|
103
187
|
|
|
104
|
-
2.
|
|
105
|
-
3. Incorporate Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
|
|
188
|
+
2. **Extract findings only** (context protection): After tee, use `execute_file` to extract only the final findings from the Codex output (everything after the last `codex` marker). If context-mode is unavailable, use `tac <file> | sed '/^codex$/q' | tac | tail -n +2`. If no marker found, fail closed (0 findings, warn user). See `docs/reference/review-loop-pattern.md` "Codex Output Context Protection" for full protocol.
|
|
189
|
+
3. Incorporate extracted Codex findings into your scoring — if Codex flags something you missed, add it. If you disagree with a Codex finding, note it with your rationale.
|
|
106
190
|
|
|
107
|
-
**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use
|
|
191
|
+
**Codex error handling:** If codex exits non-zero (auth/rate-limit/transport failure), log the full stderr, mark the pass as `codex-unavailable` in the review log, and use internal review findings only for that pass. Do NOT treat a Codex failure as a clean review. Do NOT skip the pass. The next pass still attempts Codex (transient failures may recover).
|
|
108
192
|
|
|
109
193
|
**Code review scoping by mode:**
|
|
110
194
|
- Use `--uncommitted` when reviewing uncommitted changes (`task-review` mode).
|
|
@@ -112,16 +196,69 @@ If `codex` is in `multi_tool.tools`:
|
|
|
112
196
|
- Use `codex exec -c model="$CODEX_MODEL"` with stdin pipe for non-code artifacts (`spec-challenge`, `design-review`, `plan-review`, `research-review`, `clarification-review` modes).
|
|
113
197
|
- See `docs/reference/review-loop-pattern.md` for code review scoping rules.
|
|
114
198
|
|
|
115
|
-
|
|
199
|
+
#### Gemini Review
|
|
200
|
+
|
|
201
|
+
If `gemini` is in `multi_tool.tools`, follow the same pattern using the Gemini CLI when available. **For detailed Gemini CLI usage, see `wz:gemini-cli` skill.**
|
|
202
|
+
|
|
203
|
+
### Fix Cycle (Codex Findings)
|
|
116
204
|
|
|
117
|
-
If
|
|
205
|
+
If Codex produces blocking findings:
|
|
206
|
+
1. Executor fixes the Codex findings
|
|
207
|
+
2. Re-run internal review (quick pass) to verify fixes didn't introduce regressions
|
|
208
|
+
3. Optionally re-run Codex for a clean pass
|
|
118
209
|
|
|
119
210
|
### Merging Findings
|
|
120
211
|
|
|
121
212
|
The final review report must clearly attribute each finding:
|
|
122
|
-
- `[
|
|
123
|
-
- `[Codex]` — found by Codex
|
|
124
|
-
- `[
|
|
213
|
+
- `[Internal]` — found by Tier 1 internal review
|
|
214
|
+
- `[Codex]` — found by Tier 2 Codex review
|
|
215
|
+
- `[Gemini]` — found by Tier 2 Gemini review
|
|
216
|
+
- `[Both]` — found independently by multiple sources
|
|
217
|
+
|
|
218
|
+
### Finding Persistence (Learning Pipeline)
|
|
219
|
+
|
|
220
|
+
ALL findings from both tiers are persisted to `state.sqlite` for cross-run learning:
|
|
221
|
+
|
|
222
|
+
```javascript
|
|
223
|
+
// After each review pass
|
|
224
|
+
const { insertFinding, getRecurringFindingHashes } = require('tooling/src/state/db');
|
|
225
|
+
const db = openStateDb(stateRoot);
|
|
226
|
+
|
|
227
|
+
for (const finding of allFindings) {
|
|
228
|
+
insertFinding(db, {
|
|
229
|
+
run_id: runId,
|
|
230
|
+
phase: reviewMode,
|
|
231
|
+
source: finding.attribution, // 'internal', 'codex', 'gemini'
|
|
232
|
+
severity: finding.severity,
|
|
233
|
+
description: finding.description,
|
|
234
|
+
finding_hash: hashFinding(finding.description),
|
|
235
|
+
});
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
// Check for recurring patterns
|
|
239
|
+
const recurring = getRecurringFindingHashes(db, 2);
|
|
240
|
+
// Recurring findings → auto-propose as learnings in the learn phase
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
This is how Wazir evolves — findings that recur across runs become accepted learnings injected into future executor context, preventing the same mistakes.
|
|
244
|
+
|
|
245
|
+
## Interaction Mode Awareness
|
|
246
|
+
|
|
247
|
+
Read `interaction_mode` from run-config:
|
|
248
|
+
|
|
249
|
+
- **`auto`:** No user checkpoints. Present verdict and let gating agent decide. On escalation, write reason and STOP.
|
|
250
|
+
- **`guided`:** Standard behavior — present verdict, ask user how to proceed.
|
|
251
|
+
- **`interactive`:** Discuss findings with user: "I found a potential auth bypass in `src/auth.js:42` — here's why I rated it high severity. Do you agree, or is there context I'm missing?" Show detailed reasoning for each dimension score.
|
|
252
|
+
|
|
253
|
+
## CLI/Context-Mode Enforcement
|
|
254
|
+
|
|
255
|
+
In ALL review modes, check for these violations:
|
|
256
|
+
|
|
257
|
+
1. **Index usage enforcement:** If the agent performed >5 direct file reads (Read tool) without a preceding `wazir index search-symbols` query, flag as **[warning]** finding: "Agent performed [N] direct file reads without using wazir index. Use `wazir index search-symbols <query>` before reading files to reduce context consumption."
|
|
258
|
+
|
|
259
|
+
2. **Context-mode enforcement:** If the agent ran a large-category command (test runners, builds, diffs, dependency trees, linting — as classified by `hooks/routing-matrix.json`) using native Bash instead of context-mode tools (when context-mode is available), flag as **[warning]** finding: "Large command `[cmd]` run without context-mode. Route through `mcp__plugin_context-mode_context-mode__execute` to reduce context usage."
|
|
260
|
+
|
|
261
|
+
These are warnings, not blocking findings — they improve efficiency but don't affect correctness.
|
|
125
262
|
|
|
126
263
|
## Task-Review Log Filenames
|
|
127
264
|
|
|
@@ -137,15 +274,223 @@ Save review results to `.wazir/runs/latest/reviews/review.md` with:
|
|
|
137
274
|
- Score breakdown
|
|
138
275
|
- Verdict
|
|
139
276
|
|
|
277
|
+
Run the phase report and display it to the user:
|
|
278
|
+
```bash
|
|
279
|
+
wazir report phase --run <run-id> --phase <review-mode>
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
Output the report content to the user in the conversation.
|
|
283
|
+
|
|
284
|
+
## Phase Report Generation
|
|
285
|
+
|
|
286
|
+
After completing any review pass, generate a phase report following `schemas/phase-report.schema.json`:
|
|
287
|
+
|
|
288
|
+
1. **`attempted_actions`** — Populate from the review findings. Each finding becomes an action entry:
|
|
289
|
+
- `description`: the finding summary
|
|
290
|
+
- `outcome`: `"success"` if the finding passed, `"fail"` if it is a blocking issue, `"uncertain"` if ambiguous
|
|
291
|
+
- `evidence`: the rationale or evidence supporting the outcome
|
|
292
|
+
|
|
293
|
+
2. **`drift_analysis`** — Compare review findings against the approved spec:
|
|
294
|
+
- `delta`: count of deviations between implementation and spec (0 = no drift)
|
|
295
|
+
- `description`: summary of any drift detected and its impact
|
|
296
|
+
|
|
297
|
+
3. **`quality_metrics`** — Populate from test, lint, and type-check results gathered during review:
|
|
298
|
+
- `test_pass_count`, `test_fail_count`: from test runner output
|
|
299
|
+
- `lint_errors`: from linter output
|
|
300
|
+
- `type_errors`: from type checker output
|
|
301
|
+
|
|
302
|
+
4. **`risk_flags`** — Populate from any high-severity findings:
|
|
303
|
+
- `severity`: `"low"`, `"medium"`, or `"high"`
|
|
304
|
+
- `description`: what the risk is
|
|
305
|
+
- `mitigation`: recommended mitigation (if known)
|
|
306
|
+
|
|
307
|
+
5. **`decisions`** — Populate from any scope or approach decisions made during the review:
|
|
308
|
+
- `description`: what was decided
|
|
309
|
+
- `rationale`: why
|
|
310
|
+
- `alternatives_considered`: other options evaluated (optional)
|
|
311
|
+
- `source`: `"[Wazir]"`, `"[Codex]"`, or `"[Both]"` (optional)
|
|
312
|
+
|
|
313
|
+
6. **`verdict_recommendation`** — Set based on the gating rules in `config/gating-rules.yaml`:
|
|
314
|
+
- `verdict`: `"continue"` (PASS), `"loop_back"` (NEEDS MINOR FIXES / NEEDS REWORK), or `"escalate"` (FAIL with fundamental issues)
|
|
315
|
+
- `reasoning`: brief explanation of why this verdict was chosen
|
|
316
|
+
|
|
317
|
+
### Report Output Paths
|
|
318
|
+
|
|
319
|
+
Save reports to two formats under the run directory:
|
|
320
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.json` — machine-readable, validated against `schemas/phase-report.schema.json`
|
|
321
|
+
- `.wazir/runs/<id>/reports/phase-<name>-report.md` — human-readable Markdown summary
|
|
322
|
+
|
|
323
|
+
The gating agent (`tooling/src/gating/agent.js`) consumes the JSON report to decide: **continue**, **loop_back**, or **escalate**.
|
|
324
|
+
|
|
325
|
+
### Report Fields Reference
|
|
326
|
+
|
|
327
|
+
All required fields per `schemas/phase-report.schema.json`:
|
|
328
|
+
|
|
329
|
+
| Field | Type | Required | Description |
|
|
330
|
+
|-------|------|----------|-------------|
|
|
331
|
+
| `phase_name` | string | yes | Review mode name (e.g., `"final"`, `"task-review"`) |
|
|
332
|
+
| `run_id` | string | yes | Current run identifier |
|
|
333
|
+
| `timestamp` | string (date-time) | yes | ISO 8601 timestamp of report generation |
|
|
334
|
+
| `attempted_actions` | array | yes | Findings mapped to action outcomes |
|
|
335
|
+
| `drift_analysis` | object | yes | Spec-vs-implementation drift summary |
|
|
336
|
+
| `quality_metrics` | object | yes | Test/lint/type results |
|
|
337
|
+
| `risk_flags` | array | yes | High-severity risk items |
|
|
338
|
+
| `decisions` | array | yes | Scope/approach decisions made |
|
|
339
|
+
| `verdict_recommendation` | object | no | Gating verdict based on `config/gating-rules.yaml` |
|
|
340
|
+
|
|
341
|
+
## Post-Review: Learn (final mode only)
|
|
342
|
+
|
|
343
|
+
After the final review verdict, extract durable learnings using the **learner role** (`roles/learner.md`).
|
|
344
|
+
|
|
345
|
+
### Step 1: Gather all findings
|
|
346
|
+
|
|
347
|
+
Collect review findings from ALL sources in this run:
|
|
348
|
+
- `.wazir/runs/<run-id>/reviews/` — all review pass logs (task-review, final review)
|
|
349
|
+
- Codex findings (attributed `[Codex]` or `[Both]`)
|
|
350
|
+
- Self-audit findings (if `run_audit` was enabled)
|
|
351
|
+
|
|
352
|
+
### Step 2: Identify learning candidates
|
|
353
|
+
|
|
354
|
+
A finding becomes a learning candidate if:
|
|
355
|
+
- It recurred across 2+ review passes within this run (same issue found repeatedly)
|
|
356
|
+
- It matches a finding from a prior run (check `memory/learnings/proposed/` and `accepted/` for similar patterns)
|
|
357
|
+
- It represents a class of mistake, not just a single instance (e.g., "missing error handling in async functions" vs "missing try-catch on line 42")
|
|
358
|
+
|
|
359
|
+
### Step 3: Write learning proposals
|
|
360
|
+
|
|
361
|
+
For each candidate, write a proposal to `memory/learnings/proposed/<run-id>-<NNN>.md`:
|
|
362
|
+
|
|
363
|
+
```markdown
|
|
364
|
+
---
|
|
365
|
+
artifact_type: proposed_learning
|
|
366
|
+
phase: learn
|
|
367
|
+
role: learner
|
|
368
|
+
run_id: <run-id>
|
|
369
|
+
status: proposed
|
|
370
|
+
sources:
|
|
371
|
+
- <review-file-1>
|
|
372
|
+
- <review-file-2>
|
|
373
|
+
approval_status: required
|
|
374
|
+
---
|
|
375
|
+
|
|
376
|
+
# Proposed Learning: <title>
|
|
377
|
+
|
|
378
|
+
## Scope
|
|
379
|
+
- **Roles:** [which roles should receive this learning — e.g., executor, reviewer]
|
|
380
|
+
- **Stacks:** [which tech stacks — e.g., node, react, or "all"]
|
|
381
|
+
- **Concerns:** [which concerns — e.g., error-handling, testing, security]
|
|
382
|
+
|
|
383
|
+
## Evidence
|
|
384
|
+
- [finding from review pass N: description]
|
|
385
|
+
- [finding from review pass M: same pattern]
|
|
386
|
+
- [optional: similar finding from prior run <run-id>]
|
|
387
|
+
|
|
388
|
+
## Learning
|
|
389
|
+
[The concrete, actionable instruction that should be injected into future executor context]
|
|
390
|
+
|
|
391
|
+
## Expected Benefit
|
|
392
|
+
[What this prevents in future runs]
|
|
393
|
+
|
|
394
|
+
## Confidence
|
|
395
|
+
- **Level:** low | medium | high
|
|
396
|
+
- **Basis:** [single run observation | multi-run recurrence | user correction]
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
### Step 4: Report
|
|
400
|
+
|
|
401
|
+
Present proposed learnings to the user:
|
|
402
|
+
|
|
403
|
+
> **Learnings proposed:** [count]
|
|
404
|
+
> - [title 1] (confidence: high, scope: executor/node)
|
|
405
|
+
> - [title 2] (confidence: medium, scope: reviewer/all)
|
|
406
|
+
>
|
|
407
|
+
> Proposals saved to `memory/learnings/proposed/`. Review and accept with `/wazir audit learnings`.
|
|
408
|
+
|
|
409
|
+
Learnings are NEVER auto-applied. They require explicit user acceptance before being injected into future runs.
|
|
410
|
+
|
|
411
|
+
## Post-Review: Prepare Next (final mode only)
|
|
412
|
+
|
|
413
|
+
After learning extraction, invoke the `prepare-next` skill to prepare the handoff:
|
|
414
|
+
|
|
415
|
+
### Handoff document
|
|
416
|
+
|
|
417
|
+
Write to `.wazir/runs/<run-id>/handoff.md`:
|
|
418
|
+
|
|
419
|
+
```markdown
|
|
420
|
+
# Handoff — <run-id>
|
|
421
|
+
|
|
422
|
+
**Status:** [Completed | Partial]
|
|
423
|
+
**Branch:** <branch-name>
|
|
424
|
+
**Date:** YYYY-MM-DD
|
|
425
|
+
|
|
426
|
+
## What Was Done
|
|
427
|
+
[List of completed tasks with commit hashes]
|
|
428
|
+
|
|
429
|
+
## Test Results
|
|
430
|
+
[Test count, pass/fail, validator status]
|
|
431
|
+
|
|
432
|
+
## Review Score
|
|
433
|
+
[Final review verdict and score]
|
|
434
|
+
|
|
435
|
+
## What's Next
|
|
436
|
+
[Pending items, deferred work, follow-up tasks]
|
|
437
|
+
|
|
438
|
+
## Open Bugs
|
|
439
|
+
[Any known issues discovered during this run]
|
|
440
|
+
|
|
441
|
+
## Learnings From This Run
|
|
442
|
+
[Key insights — what worked, what didn't, what to change]
|
|
443
|
+
```
|
|
444
|
+
|
|
445
|
+
### Cleanup
|
|
446
|
+
|
|
447
|
+
- Archive verbose intermediate review logs (compress to summary)
|
|
448
|
+
- Update `.wazir/runs/latest` symlink if creating a new run
|
|
449
|
+
- Do NOT mutate `input/` — it belongs to the user
|
|
450
|
+
- Do NOT auto-load proposed learnings into the next run
|
|
451
|
+
|
|
452
|
+
## Reasoning Output
|
|
453
|
+
|
|
454
|
+
Throughout the reviewer phase, produce reasoning at two layers:
|
|
455
|
+
|
|
456
|
+
**Conversation (Layer 1):** Before each review pass, explain what dimensions are being checked and why. After findings, explain the reasoning behind severity assignments.
|
|
457
|
+
|
|
458
|
+
**File (Layer 2):** Write `.wazir/runs/<id>/reasoning/phase-reviewer-reasoning.md` with structured entries:
|
|
459
|
+
- **Trigger** — what prompted the finding (e.g., "diff adds SQL query without parameterization")
|
|
460
|
+
- **Options considered** — severity options, fix approaches
|
|
461
|
+
- **Chosen** — assigned severity and recommendation
|
|
462
|
+
- **Reasoning** — why this severity level
|
|
463
|
+
- **Confidence** — high/medium/low
|
|
464
|
+
- **Counterfactual** — what would ship if this finding were missed
|
|
465
|
+
|
|
466
|
+
Key reviewer reasoning moments: severity assignments, PASS/FAIL decisions, dimension score justifications, and escalation decisions.
|
|
467
|
+
|
|
140
468
|
## Done
|
|
141
469
|
|
|
470
|
+
**After completing this phase, output to the user:**
|
|
471
|
+
|
|
472
|
+
> **Final Review complete.**
|
|
473
|
+
>
|
|
474
|
+
> **Found:** [N] findings across 7 dimensions — [N] blocking, [N] warnings, [N] notes. Score: [score]/70 ([VERDICT]).
|
|
475
|
+
>
|
|
476
|
+
> **Without this phase:** [N] blocking issues would have shipped — including [specific examples: e.g., "missing error handler on /api/users endpoint", "auth middleware not wired to 3 routes", "CHANGELOG missing entry for breaking API change"]
|
|
477
|
+
>
|
|
478
|
+
> **Changed because of this work:** [List of issues caught and fixed during review passes, score improvement from first to final pass]
|
|
479
|
+
|
|
142
480
|
Present the verdict and offer next steps:
|
|
143
481
|
|
|
144
482
|
> **Review complete: [VERDICT] ([score]/70)**
|
|
145
483
|
>
|
|
146
484
|
> [Score breakdown and findings summary]
|
|
147
485
|
>
|
|
148
|
-
> **
|
|
149
|
-
>
|
|
150
|
-
|
|
151
|
-
|
|
486
|
+
> **Learnings proposed:** [count] (see `memory/learnings/proposed/`)
|
|
487
|
+
> **Handoff:** `.wazir/runs/<run-id>/handoff.md`
|
|
488
|
+
|
|
489
|
+
Ask the user via AskUserQuestion:
|
|
490
|
+
- **Question:** "How would you like to proceed with the review results?"
|
|
491
|
+
- **Options:**
|
|
492
|
+
1. "Create a PR" *(Recommended if PASS)*
|
|
493
|
+
2. "Auto-fix and re-review" *(Recommended if MINOR FIXES)*
|
|
494
|
+
3. "Review findings in detail"
|
|
495
|
+
|
|
496
|
+
Wait for the user's selection before continuing.
|
|
@@ -5,6 +5,19 @@ description: Run a structured audit on your codebase — security, code quality,
|
|
|
5
5
|
|
|
6
6
|
# Run Audit — Structured Codebase Audit Pipeline
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
## Overview
|
|
9
22
|
|
|
10
23
|
This skill runs a structured audit on your codebase. It collects three parameters interactively (audit type, scope, output mode), then feeds them through the pipeline: Research → Audit → Report or Plan.
|
|
@@ -5,6 +5,19 @@ description: Build a project profile from manifests, docs, tests, and `input/` s
|
|
|
5
5
|
|
|
6
6
|
# Scan Project
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Inspect the smallest set of repo surfaces needed to answer:
|
|
9
22
|
|
|
10
23
|
- what kind of project this is
|