@wazir-dev/cli 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +100 -2
- package/README.md +6 -6
- package/docs/concepts/architecture.md +1 -1
- package/docs/concepts/roles-and-workflows.md +2 -0
- package/docs/concepts/why-wazir.md +59 -0
- package/docs/decisions/2026-03-19-deferred-items.md +564 -0
- package/docs/decisions/2026-03-19-enhancement-decisions.md +300 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +1 -1
- package/docs/readmes/INDEX.md +21 -5
- package/docs/readmes/features/expertise/README.md +2 -2
- package/docs/readmes/features/exports/README.md +2 -2
- package/docs/readmes/features/schemas/README.md +3 -0
- package/docs/readmes/features/skills/README.md +17 -0
- package/docs/readmes/features/skills/clarifier.md +5 -0
- package/docs/readmes/features/skills/claude-cli.md +5 -0
- package/docs/readmes/features/skills/codex-cli.md +5 -0
- package/docs/readmes/features/skills/dispatching-parallel-agents.md +5 -0
- package/docs/readmes/features/skills/executing-plans.md +5 -0
- package/docs/readmes/features/skills/executor.md +5 -0
- package/docs/readmes/features/skills/finishing-a-development-branch.md +5 -0
- package/docs/readmes/features/skills/gemini-cli.md +5 -0
- package/docs/readmes/features/skills/humanize.md +5 -0
- package/docs/readmes/features/skills/init-pipeline.md +5 -0
- package/docs/readmes/features/skills/receiving-code-review.md +5 -0
- package/docs/readmes/features/skills/requesting-code-review.md +5 -0
- package/docs/readmes/features/skills/reviewer.md +5 -0
- package/docs/readmes/features/skills/subagent-driven-development.md +5 -0
- package/docs/readmes/features/skills/using-git-worktrees.md +5 -0
- package/docs/readmes/features/skills/wazir.md +5 -0
- package/docs/readmes/features/skills/writing-skills.md +5 -0
- package/docs/readmes/features/workflows/prepare-next.md +1 -1
- package/docs/reference/configuration-reference.md +47 -6
- package/docs/reference/launch-checklist.md +4 -4
- package/docs/reference/review-loop-pattern.md +538 -0
- package/docs/reference/roles-reference.md +1 -0
- package/docs/reference/skill-tiers.md +147 -0
- package/docs/reference/tooling-cli.md +5 -1
- package/docs/truth-claims.yaml +18 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +97 -1
- package/exports/hosts/claude/.claude/agents/clarifier.md +3 -0
- package/exports/hosts/claude/.claude/agents/designer.md +3 -0
- package/exports/hosts/claude/.claude/agents/executor.md +2 -0
- package/exports/hosts/claude/.claude/agents/planner.md +3 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +2 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +5 -1
- package/exports/hosts/claude/.claude/agents/specifier.md +3 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +4 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/design.md +4 -0
- package/exports/hosts/claude/.claude/commands/discover.md +4 -0
- package/exports/hosts/claude/.claude/commands/execute.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +4 -0
- package/exports/hosts/claude/.claude/commands/plan.md +4 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +4 -0
- package/exports/hosts/claude/.claude/commands/specify.md +4 -0
- package/exports/hosts/claude/.claude/commands/verify.md +4 -0
- package/exports/hosts/claude/.claude/settings.json +9 -0
- package/exports/hosts/claude/CLAUDE.md +1 -1
- package/exports/hosts/claude/export.manifest.json +22 -20
- package/exports/hosts/claude/host-package.json +3 -1
- package/exports/hosts/codex/AGENTS.md +1 -1
- package/exports/hosts/codex/export.manifest.json +22 -20
- package/exports/hosts/codex/host-package.json +3 -1
- package/exports/hosts/cursor/.cursor/hooks.json +4 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +1 -1
- package/exports/hosts/cursor/export.manifest.json +22 -20
- package/exports/hosts/cursor/host-package.json +3 -1
- package/exports/hosts/gemini/GEMINI.md +1 -1
- package/exports/hosts/gemini/export.manifest.json +22 -20
- package/exports/hosts/gemini/host-package.json +3 -1
- package/hooks/context-mode-router +191 -0
- package/hooks/definitions/context_mode_router.yaml +19 -0
- package/hooks/definitions/loop_cap_guard.yaml +1 -1
- package/hooks/hooks.json +43 -0
- package/hooks/protected-path-write-guard +8 -0
- package/hooks/routing-matrix.json +45 -0
- package/hooks/session-start +62 -1
- package/llms-full.txt +905 -132
- package/package.json +3 -3
- package/roles/clarifier.md +3 -0
- package/roles/designer.md +3 -0
- package/roles/executor.md +2 -0
- package/roles/planner.md +3 -0
- package/roles/researcher.md +2 -0
- package/roles/reviewer.md +5 -1
- package/roles/specifier.md +3 -0
- package/schemas/hook.schema.json +2 -1
- package/schemas/phase-report.schema.json +80 -0
- package/schemas/usage.schema.json +25 -1
- package/schemas/wazir-manifest.schema.json +19 -0
- package/skills/brainstorming/SKILL.md +20 -56
- package/skills/clarifier/SKILL.md +243 -0
- package/skills/claude-cli/SKILL.md +320 -0
- package/skills/codex-cli/SKILL.md +260 -0
- package/skills/debugging/SKILL.md +24 -1
- package/skills/design/SKILL.md +13 -0
- package/skills/dispatching-parallel-agents/SKILL.md +13 -0
- package/skills/executing-plans/SKILL.md +28 -2
- package/skills/executor/SKILL.md +129 -0
- package/skills/finishing-a-development-branch/SKILL.md +13 -0
- package/skills/gemini-cli/SKILL.md +260 -0
- package/skills/humanize/SKILL.md +13 -0
- package/skills/init-pipeline/SKILL.md +76 -78
- package/skills/prepare-next/SKILL.md +81 -10
- package/skills/receiving-code-review/SKILL.md +21 -0
- package/skills/requesting-code-review/SKILL.md +38 -5
- package/skills/reviewer/SKILL.md +423 -0
- package/skills/run-audit/SKILL.md +13 -0
- package/skills/scan-project/SKILL.md +13 -0
- package/skills/self-audit/SKILL.md +197 -16
- package/skills/subagent-driven-development/SKILL.md +38 -2
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +2 -0
- package/skills/subagent-driven-development/implementer-prompt.md +8 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +7 -0
- package/skills/tdd/SKILL.md +21 -0
- package/skills/using-git-worktrees/SKILL.md +13 -0
- package/skills/using-skills/SKILL.md +13 -0
- package/skills/verification/SKILL.md +13 -0
- package/skills/wazir/SKILL.md +286 -262
- package/skills/writing-plans/SKILL.md +44 -4
- package/skills/writing-skills/SKILL.md +13 -0
- package/templates/artifacts/implementation-plan.md +3 -0
- package/templates/artifacts/tasks-template.md +133 -0
- package/templates/examples/phase-report.example.json +48 -0
- package/templates/examples/wazir-manifest.example.yaml +1 -1
- package/tooling/src/adapters/composition-engine.js +256 -0
- package/tooling/src/adapters/model-router.js +84 -0
- package/tooling/src/capture/command.js +111 -2
- package/tooling/src/capture/run-config.js +23 -0
- package/tooling/src/capture/store.js +24 -0
- package/tooling/src/capture/usage.js +106 -0
- package/tooling/src/checks/ac-matrix.js +256 -0
- package/tooling/src/checks/brand-truth.js +3 -6
- package/tooling/src/checks/command-registry.js +13 -0
- package/tooling/src/checks/docs-truth.js +1 -1
- package/tooling/src/checks/runtime-surface.js +3 -7
- package/tooling/src/checks/skills.js +111 -0
- package/tooling/src/cli.js +17 -3
- package/tooling/src/commands/stats.js +161 -0
- package/tooling/src/commands/validate.js +5 -1
- package/tooling/src/export/compiler.js +33 -37
- package/tooling/src/gating/agent.js +145 -0
- package/tooling/src/guards/phase-prerequisite-guard.js +127 -0
- package/tooling/src/hooks/routing-logic.js +69 -0
- package/tooling/src/init/auto-detect.js +260 -0
- package/tooling/src/init/command.js +161 -0
- package/tooling/src/input/scanner.js +46 -0
- package/tooling/src/reports/command.js +103 -0
- package/tooling/src/reports/phase-report.js +323 -0
- package/tooling/src/state/command.js +160 -0
- package/tooling/src/state/db.js +287 -0
- package/tooling/src/status/command.js +53 -1
- package/wazir.manifest.yaml +26 -17
- package/workflows/clarify.md +4 -0
- package/workflows/design-review.md +4 -0
- package/workflows/design.md +4 -0
- package/workflows/discover.md +4 -0
- package/workflows/execute.md +4 -0
- package/workflows/plan-review.md +4 -0
- package/workflows/plan.md +4 -0
- package/workflows/spec-challenge.md +4 -0
- package/workflows/specify.md +4 -0
- package/workflows/verify.md +4 -0
|
@@ -5,16 +5,101 @@ description: Run a self-audit loop in an isolated git worktree — validates, au
|
|
|
5
5
|
|
|
6
6
|
# Self-Audit — Worktree-Isolated Audit-Fix Loop
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
## Overview
|
|
9
22
|
|
|
10
23
|
This skill runs a structured self-audit of the Wazir project itself, operating entirely in an isolated git worktree. It validates the project against all canonical checks, performs deeper structural analysis, fixes issues found, verifies the fixes pass, and only merges back on all-green.
|
|
11
24
|
|
|
12
25
|
**Safety guarantee:** The main worktree is never modified until all checks pass in isolation.
|
|
13
26
|
|
|
27
|
+
## Severity Levels
|
|
28
|
+
|
|
29
|
+
Every finding is assigned a severity that determines handling:
|
|
30
|
+
|
|
31
|
+
| Severity | Action | Description |
|
|
32
|
+
|----------|--------|-------------|
|
|
33
|
+
| **critical** | Abort loop | Structural integrity threat — cannot safely continue. Discard worktree. |
|
|
34
|
+
| **high** | Fix now | Must be resolved in this loop before proceeding to verify. |
|
|
35
|
+
| **medium** | Fix if time | Fix within remaining loop budget. Skip if loop cap approaching. |
|
|
36
|
+
| **low** | Log and skip | Record in report. No fix attempted. |
|
|
37
|
+
|
|
38
|
+
Severity assignment rules:
|
|
39
|
+
- Protected-path violation → **critical**
|
|
40
|
+
- Test failure, broken hook, missing manifest entry → **high**
|
|
41
|
+
- Documentation drift, stale export, missing schema → **medium**
|
|
42
|
+
- Style issues, minor inconsistency, cosmetic → **low**
|
|
43
|
+
|
|
44
|
+
## Quality Scoring
|
|
45
|
+
|
|
46
|
+
Each loop measures a quality score **before** and **after** fixes:
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
quality_score = (checks_passing / total_checks) * 100
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Track per loop:
|
|
53
|
+
- `quality_score_before` — score at start of loop (after Phase 1)
|
|
54
|
+
- `quality_score_after` — score at end of loop (after Phase 4 verify)
|
|
55
|
+
- `delta` — improvement from this loop's fixes
|
|
56
|
+
|
|
57
|
+
**Effectiveness threshold:** If 3 consecutive loops show `delta < 2%`, the audit has converged — skip remaining loops.
|
|
58
|
+
|
|
59
|
+
## Learning Integration
|
|
60
|
+
|
|
61
|
+
After each audit loop, findings feed the learning pipeline:
|
|
62
|
+
|
|
63
|
+
1. **Propose learnings:** For each finding category that appeared in this loop:
|
|
64
|
+
- Check `state.sqlite` findings table for the same `finding_hash` in previous runs
|
|
65
|
+
- If `recurrence_count >= 2`: auto-propose a learning to `memory/learnings/proposed/`
|
|
66
|
+
- Learning scope tags: `scope_roles: [executor]`, `scope_concerns: [quality]`
|
|
67
|
+
2. **Store findings:** Insert all findings into `state.sqlite` via `insertFinding()` with severity and finding_hash
|
|
68
|
+
3. **Store audit record:** Insert `{run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after}` into `audit_history`
|
|
69
|
+
|
|
70
|
+
## Trend Tracking
|
|
71
|
+
|
|
72
|
+
Before starting Loop 1, query previous audit results:
|
|
73
|
+
|
|
74
|
+
```javascript
|
|
75
|
+
const trend = getAuditTrend(db, 5); // last 5 audits
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Present trend in the report:
|
|
79
|
+
- Are finding counts trending up or down?
|
|
80
|
+
- Are the same finding_hashes recurring? If so, the fixes aren't preventing recurrence — escalate.
|
|
81
|
+
- Has quality_score improved across runs?
|
|
82
|
+
|
|
83
|
+
## Escalation Path
|
|
84
|
+
|
|
85
|
+
Manual-required findings that cannot be auto-fixed are escalated:
|
|
86
|
+
|
|
87
|
+
1. **Within the audit:** Logged in the report with remediation guidance
|
|
88
|
+
2. **Cross-run recurrence:** If the same manual finding recurs across 3+ audits, escalate to:
|
|
89
|
+
- Create a task spec in `.wazir/runs/latest/tasks/` describing the fix
|
|
90
|
+
- Flag in the audit report as **RECURRING — needs dedicated task**
|
|
91
|
+
3. **Critical findings:** Immediately logged. If 2+ critical findings in a single loop, abort the entire audit run.
|
|
92
|
+
|
|
14
93
|
## Trigger
|
|
15
94
|
|
|
16
95
|
On-demand: operator invokes `/self-audit` or requests a self-audit loop.
|
|
17
96
|
|
|
97
|
+
### Parameters
|
|
98
|
+
|
|
99
|
+
| Flag | Default | Max | Description |
|
|
100
|
+
|------|---------|-----|-------------|
|
|
101
|
+
| `--loops N` | 5 | 10 | Number of audit-fix loops to run. Each loop executes the full Phase 1-5 cycle. If a loop finds 0 new issues, subsequent loops are skipped (convergence detection). |
|
|
102
|
+
|
|
18
103
|
## Worktree Isolation Model
|
|
19
104
|
|
|
20
105
|
```
|
|
@@ -45,7 +130,9 @@ node tooling/src/cli.js doctor --json
|
|
|
45
130
|
node tooling/src/cli.js export --check
|
|
46
131
|
```
|
|
47
132
|
|
|
48
|
-
Collect pass/fail for each. Any failure is a finding.
|
|
133
|
+
Collect pass/fail for each. Any failure is a finding. Assign severity per the severity table above.
|
|
134
|
+
|
|
135
|
+
Calculate `quality_score_before` from the pass/fail results.
|
|
49
136
|
|
|
50
137
|
## Phase 2: Deep Structural Audit
|
|
51
138
|
|
|
@@ -79,10 +166,52 @@ Beyond CLI checks, inspect for:
|
|
|
79
166
|
- Each skill dir under `skills/` has a well-formed `SKILL.md` with frontmatter
|
|
80
167
|
- Skills referenced in documentation actually exist
|
|
81
168
|
|
|
169
|
+
7. **Code Quality**
|
|
170
|
+
- Run `node tooling/src/cli.js validate` (all subcommands) and capture exit codes
|
|
171
|
+
- If `eslint` is present in `package.json` scripts or devDependencies, run `npx eslint .` and capture results
|
|
172
|
+
- If `tsc` is present in `package.json` scripts or devDependencies, run `npx tsc --noEmit` and capture results
|
|
173
|
+
- Tools not found in `package.json` are skipped with a note in the report
|
|
174
|
+
|
|
175
|
+
8. **Test Coverage**
|
|
176
|
+
- Run `npm test` and capture pass/fail counts from output
|
|
177
|
+
- Any test failure is a finding
|
|
178
|
+
|
|
179
|
+
9. **Expertise Coverage**
|
|
180
|
+
- Read `expertise/composition-map.yaml`
|
|
181
|
+
- For every module path referenced, check that the file exists under `expertise/`
|
|
182
|
+
- Missing files are findings
|
|
183
|
+
|
|
184
|
+
10. **Export Freshness**
|
|
185
|
+
- Run `wazir export --check`
|
|
186
|
+
- Any drift detected is a finding
|
|
187
|
+
|
|
188
|
+
## Protected-Path Safety Rails
|
|
189
|
+
|
|
190
|
+
Before applying ANY fix in Phase 3, check if the target file is in a protected path. The self-audit loop MUST NOT modify files in:
|
|
191
|
+
|
|
192
|
+
- `skills/`
|
|
193
|
+
- `workflows/`
|
|
194
|
+
- `roles/`
|
|
195
|
+
- `schemas/`
|
|
196
|
+
- `wazir.manifest.yaml`
|
|
197
|
+
- `docs/concepts/`
|
|
198
|
+
- `docs/reference/`
|
|
199
|
+
- `expertise/composition-map.yaml`
|
|
200
|
+
- `docs/plans/`
|
|
201
|
+
- `program.md`
|
|
202
|
+
|
|
203
|
+
If a fix would touch a protected path, log it as a **manual-required** finding and skip. If `git diff --name-only` shows any protected path was modified during a loop iteration, **ABORT** the loop and discard the worktree.
|
|
204
|
+
|
|
82
205
|
## Phase 3: Fix
|
|
83
206
|
|
|
84
|
-
For each finding from Phases 1-2:
|
|
207
|
+
For each finding from Phases 1-2, ordered by severity (critical first):
|
|
208
|
+
|
|
209
|
+
1. **Critical findings:** Abort immediately. Discard worktree. Report the critical finding.
|
|
210
|
+
2. **High findings:** Must fix now.
|
|
211
|
+
3. **Medium findings:** Fix if loop budget allows (remaining loops > 1).
|
|
212
|
+
4. **Low findings:** Log only. No fix attempted.
|
|
85
213
|
|
|
214
|
+
For high/medium findings:
|
|
86
215
|
1. Categorize as **auto-fixable** or **manual-required**
|
|
87
216
|
2. Auto-fixable issues: apply the fix directly
|
|
88
217
|
- Missing files → create stubs or fix references
|
|
@@ -90,7 +219,7 @@ For each finding from Phases 1-2:
|
|
|
90
219
|
- Documentation drift → update docs to match reality
|
|
91
220
|
- Permission issues → `chmod +x` hook scripts
|
|
92
221
|
- Schema formatting → auto-format
|
|
93
|
-
3. Manual-required issues: document in the audit report with remediation guidance
|
|
222
|
+
3. Manual-required issues: document in the audit report with remediation guidance. Check escalation path for recurrence.
|
|
94
223
|
|
|
95
224
|
**Fix constraints:**
|
|
96
225
|
- Never modify `input/` (read-only operator surface)
|
|
@@ -106,27 +235,61 @@ If any check fails after fixes:
|
|
|
106
235
|
- Document the revert and the root cause
|
|
107
236
|
- Re-verify
|
|
108
237
|
|
|
109
|
-
## Phase 5: Report & Commit
|
|
238
|
+
## Phase 5: Report, Learn & Commit
|
|
239
|
+
|
|
240
|
+
### Quality Score
|
|
241
|
+
|
|
242
|
+
Re-run Phase 1 checks and calculate `quality_score_after`. Compute delta.
|
|
243
|
+
|
|
244
|
+
### Learning Extraction
|
|
245
|
+
|
|
246
|
+
1. Hash each finding description → `finding_hash`
|
|
247
|
+
2. Store all findings in `state.sqlite` via `insertFinding(db, {run_id, phase: 'self-audit', source: 'self-audit', severity, description, finding_hash})`
|
|
248
|
+
3. Query `getRecurringFindingHashes(db, 2)` — findings occurring 2+ times across runs
|
|
249
|
+
4. For each recurring finding not already in `memory/learnings/proposed/`:
|
|
250
|
+
- Write a learning proposal: `memory/learnings/proposed/self-audit-<hash-prefix>.md`
|
|
251
|
+
- Content: what the issue is, how often it recurs, recommended prevention
|
|
252
|
+
5. Store audit record: `insertAuditRecord(db, {run_id, finding_count, fix_count, manual_count, quality_score_before, quality_score_after})`
|
|
253
|
+
|
|
254
|
+
### Report
|
|
110
255
|
|
|
111
256
|
Produce a structured report:
|
|
112
257
|
|
|
113
258
|
```markdown
|
|
114
259
|
# Self-Audit Report — Loop N — <date>
|
|
115
260
|
|
|
261
|
+
## Quality Score
|
|
262
|
+
- Before: X% → After: Y% (delta: +Z%)
|
|
263
|
+
|
|
264
|
+
## Trend (last 5 audits)
|
|
265
|
+
| Run | Date | Findings | Fixes | Quality |
|
|
266
|
+
|-----|------|----------|-------|---------|
|
|
267
|
+
| ... | ... | ... | ... | ... |
|
|
268
|
+
|
|
116
269
|
## Validation Sweep
|
|
117
|
-
| Check | Before | After |
|
|
118
|
-
|
|
119
|
-
| manifest | PASS/FAIL | PASS |
|
|
120
|
-
| hooks | PASS/FAIL | PASS |
|
|
121
|
-
| ... | ... | ... |
|
|
122
|
-
|
|
123
|
-
## Findings
|
|
124
|
-
###
|
|
125
|
-
|
|
270
|
+
| Check | Severity | Before | After |
|
|
271
|
+
|-------|----------|--------|-------|
|
|
272
|
+
| manifest | high | PASS/FAIL | PASS |
|
|
273
|
+
| hooks | high | PASS/FAIL | PASS |
|
|
274
|
+
| ... | ... | ... | ... |
|
|
275
|
+
|
|
276
|
+
## Findings by Severity
|
|
277
|
+
### Critical (N) — loop aborted if any
|
|
278
|
+
### High (N) — fixed
|
|
279
|
+
### Medium (N) — fixed if budget allowed
|
|
280
|
+
### Low (N) — logged only
|
|
281
|
+
|
|
282
|
+
## Auto-Fixed (N)
|
|
283
|
+
- [F-001] [high] <description> — fixed by <change>
|
|
126
284
|
- ...
|
|
127
285
|
|
|
128
|
-
|
|
129
|
-
- [M-001] <description> — remediation: <guidance>
|
|
286
|
+
## Manual Required (N)
|
|
287
|
+
- [M-001] [medium] <description> — remediation: <guidance>
|
|
288
|
+
- [M-002] [medium] <description> — **RECURRING (3x)** — needs dedicated task
|
|
289
|
+
- ...
|
|
290
|
+
|
|
291
|
+
## Proposed Learnings (N)
|
|
292
|
+
- <learning-file>: <summary>
|
|
130
293
|
- ...
|
|
131
294
|
|
|
132
295
|
## Changes Made
|
|
@@ -146,8 +309,26 @@ The worktree agent returns its results. If changes were made, the caller can mer
|
|
|
146
309
|
|
|
147
310
|
## Loop Behavior
|
|
148
311
|
|
|
312
|
+
Default: **5 loops** (override with `--loops N`, max 10).
|
|
313
|
+
|
|
149
314
|
When running multiple loops:
|
|
150
315
|
- Loop 1 audits the current state, fixes what it finds
|
|
151
316
|
- Loop 2 audits the result of Loop 1, catches anything missed or introduced
|
|
152
317
|
- Each loop is independent and runs in its own fresh worktree
|
|
153
|
-
- Convergence
|
|
318
|
+
- **Convergence detection:** if Loop N finds **0 new issues** (no new findings beyond what previous loops already reported), all subsequent loops are skipped and the audit terminates early
|
|
319
|
+
- **Effectiveness convergence:** if 3 consecutive loops show `quality_score delta < 2%`, skip remaining loops
|
|
320
|
+
- **Critical abort:** if any loop encounters 2+ critical findings, abort the entire audit run
|
|
321
|
+
- If a loop modifies a protected path (see Protected-Path Safety Rails above), the loop is aborted and the worktree is discarded
|
|
322
|
+
|
|
323
|
+
The final branch is **NOT auto-merged** — it requires human review.
|
|
324
|
+
|
|
325
|
+
## State Database Integration
|
|
326
|
+
|
|
327
|
+
The self-audit skill requires `state.sqlite` (see `tooling/src/state/db.js`). At audit start:
|
|
328
|
+
|
|
329
|
+
```javascript
|
|
330
|
+
const { openStateDb, getAuditTrend, insertFinding, insertAuditRecord, getRecurringFindingHashes } = require('../../tooling/src/state/db');
|
|
331
|
+
const db = openStateDb(stateRoot);
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
All findings are persisted across runs, enabling trend detection and learning extraction.
|
|
@@ -5,6 +5,19 @@ description: Use when executing implementation plans with independent tasks in t
|
|
|
5
5
|
|
|
6
6
|
# Subagent-Driven Development
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.
|
|
9
22
|
|
|
10
23
|
**Why subagents:** You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.
|
|
@@ -45,6 +58,7 @@ digraph process {
|
|
|
45
58
|
|
|
46
59
|
subgraph cluster_per_task {
|
|
47
60
|
label="Per Task";
|
|
61
|
+
"Capture PRE_TASK_SHA" [shape=box];
|
|
48
62
|
"Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
|
|
49
63
|
"Implementer subagent asks questions?" [shape=diamond];
|
|
50
64
|
"Answer questions, provide context" [shape=box];
|
|
@@ -63,7 +77,8 @@ digraph process {
|
|
|
63
77
|
"Dispatch final code reviewer subagent for entire implementation" [shape=box];
|
|
64
78
|
"Use wz:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
|
|
65
79
|
|
|
66
|
-
"Read plan, extract all tasks with full text, note context, create TodoWrite" -> "
|
|
80
|
+
"Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Capture PRE_TASK_SHA";
|
|
81
|
+
"Capture PRE_TASK_SHA" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
|
67
82
|
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
|
|
68
83
|
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
|
|
69
84
|
"Answer questions, provide context" -> "Implementer subagent implements, tests, commits, self-reviews";
|
|
@@ -78,12 +93,32 @@ digraph process {
|
|
|
78
93
|
"Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
|
|
79
94
|
"Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)";
|
|
80
95
|
"Mark task complete in TodoWrite" -> "More tasks remain?";
|
|
81
|
-
"More tasks remain?" -> "
|
|
96
|
+
"More tasks remain?" -> "Capture PRE_TASK_SHA" [label="yes"];
|
|
82
97
|
"More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"];
|
|
83
98
|
"Dispatch final code reviewer subagent for entire implementation" -> "Use wz:finishing-a-development-branch";
|
|
84
99
|
}
|
|
85
100
|
```
|
|
86
101
|
|
|
102
|
+
### Code Review Scoping
|
|
103
|
+
|
|
104
|
+
The implementer subagent commits before review. The spec reviewer and code quality reviewer must use `codex review --base <pre-task-sha>` to scope their review to the task's changes. Capture `PRE_TASK_SHA=$(git rev-parse HEAD)` before dispatching the implementer.
|
|
105
|
+
|
|
106
|
+
### Review Loop Alignment
|
|
107
|
+
|
|
108
|
+
Both review stages follow the review loop pattern in `docs/reference/review-loop-pattern.md` with explicit `--mode task-review`:
|
|
109
|
+
- **Spec compliance review:** Uses spec dimensions with `--mode task-review`
|
|
110
|
+
- **Code quality review:** Uses 5 task-execution dimensions with `--mode task-review`
|
|
111
|
+
|
|
112
|
+
Review logs use task-scoped filenames: `execute-task-<NNN>-review-pass-<N>.md`
|
|
113
|
+
|
|
114
|
+
Each review stage respects the loop cap via `wazir capture loop-check --task-id <NNN>`. If the cap is reached (exit 43), escalate to the controller (you) for a decision.
|
|
115
|
+
|
|
116
|
+
### Codex Error Handling
|
|
117
|
+
|
|
118
|
+
If codex exits non-zero during review, log the error, mark the pass as codex-unavailable, and use self-review findings only. Do not treat a Codex failure as a clean pass.
|
|
119
|
+
|
|
120
|
+
**Standalone mode:** When no `.wazir/runs/latest/` exists, review logs go to `docs/plans/`.
|
|
121
|
+
|
|
87
122
|
## Prompt Templates
|
|
88
123
|
|
|
89
124
|
- `./implementer-prompt.md` - Dispatch implementer subagent
|
|
@@ -137,6 +172,7 @@ digraph process {
|
|
|
137
172
|
- Let implementer self-review replace actual review (both are needed)
|
|
138
173
|
- **Start code quality review before spec compliance is PASS** (wrong order)
|
|
139
174
|
- Move to next task while either review has open issues
|
|
175
|
+
- **Review the wrong diff -- always scope to the current task's changes using --base**
|
|
140
176
|
|
|
141
177
|
**If subagent asks questions:**
|
|
142
178
|
- Answer clearly and completely
|
|
@@ -17,6 +17,8 @@ Task tool (wz:code-reviewer):
|
|
|
17
17
|
DESCRIPTION: [task summary]
|
|
18
18
|
```
|
|
19
19
|
|
|
20
|
+
**Codebase Exploration:** Use wazir index search-symbols before direct file reads. Query `wazir index search-symbols <query>` to locate relevant code, then use `wazir recall file <path> --tier L1` for targeted reads.
|
|
21
|
+
|
|
20
22
|
**In addition to standard code quality concerns, the reviewer should check:**
|
|
21
23
|
- Does each file have one clear responsibility with a well-defined interface?
|
|
22
24
|
- Are units decomposed so they can be understood and tested independently?
|
|
@@ -26,6 +26,14 @@ Task tool (general-purpose):
|
|
|
26
26
|
|
|
27
27
|
**Ask them now.** Raise any concerns before starting work.
|
|
28
28
|
|
|
29
|
+
## Codebase Exploration
|
|
30
|
+
|
|
31
|
+
Use wazir index search-symbols before direct file reads.
|
|
32
|
+
1. Query `wazir index search-symbols <query>` to locate relevant code
|
|
33
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
34
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
35
|
+
4. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
36
|
+
|
|
29
37
|
## Your Job
|
|
30
38
|
|
|
31
39
|
Once you're clear on requirements:
|
|
@@ -34,6 +34,13 @@ Task tool (general-purpose):
|
|
|
34
34
|
- Check for missing pieces they claimed to implement
|
|
35
35
|
- Look for extra features they didn't mention
|
|
36
36
|
|
|
37
|
+
## Codebase Exploration
|
|
38
|
+
|
|
39
|
+
Use wazir index search-symbols before direct file reads.
|
|
40
|
+
1. Query `wazir index search-symbols <query>` to locate relevant code
|
|
41
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
42
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
43
|
+
|
|
37
44
|
## Your Job
|
|
38
45
|
|
|
39
46
|
Read the implementation code and verify:
|
package/skills/tdd/SKILL.md
CHANGED
|
@@ -5,11 +5,30 @@ description: Use for implementation work that changes behavior. Follow RED -> GR
|
|
|
5
5
|
|
|
6
6
|
# Test-Driven Development
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Sequence:
|
|
9
22
|
|
|
10
23
|
1. RED
|
|
11
24
|
Write or update a test that expresses the new behavior or the bug being fixed, then run it and confirm failure.
|
|
12
25
|
|
|
26
|
+
**Test quality check (single-pass):** Before proceeding to GREEN, verify:
|
|
27
|
+
- Are these tests testing the right behavior?
|
|
28
|
+
- Are they real assertions, not tautologies?
|
|
29
|
+
- Do they fail for the right reason (not a syntax error or import failure)?
|
|
30
|
+
If any check fails, fix the test before moving on. This is a single-pass quality check, not a full review loop.
|
|
31
|
+
|
|
13
32
|
2. GREEN
|
|
14
33
|
Write the smallest implementation change that makes the failing test pass.
|
|
15
34
|
|
|
@@ -21,3 +40,5 @@ Rules:
|
|
|
21
40
|
- do not skip the failing-test step when automated verification is feasible
|
|
22
41
|
- do not rewrite tests to fit broken behavior
|
|
23
42
|
- rerun verification after each meaningful refactor
|
|
43
|
+
|
|
44
|
+
For the full review loop pattern, see `docs/reference/review-loop-pattern.md`. TDD uses a single-pass quality check, not the full loop.
|
|
@@ -5,6 +5,19 @@ description: Use when starting feature work that needs isolation from current wo
|
|
|
5
5
|
|
|
6
6
|
# Using Git Worktrees
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
## Overview
|
|
9
22
|
|
|
10
23
|
Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching.
|
|
@@ -3,6 +3,19 @@ name: wz:using-skills
|
|
|
3
3
|
description: Use when starting any conversation — establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
+
## Command Routing
|
|
7
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
8
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
9
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
10
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
11
|
+
|
|
12
|
+
## Codebase Exploration
|
|
13
|
+
1. Query `wazir index search-symbols <query>` first
|
|
14
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
15
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
16
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
17
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
18
|
+
|
|
6
19
|
<EXTREMELY_IMPORTANT>
|
|
7
20
|
If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill.
|
|
8
21
|
|
|
@@ -5,6 +5,19 @@ description: Use before claiming work is complete. Every completion claim needs
|
|
|
5
5
|
|
|
6
6
|
# Verification
|
|
7
7
|
|
|
8
|
+
## Command Routing
|
|
9
|
+
Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
|
|
10
|
+
- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
|
|
11
|
+
- Small commands (git status, ls, pwd, wazir CLI) → native Bash
|
|
12
|
+
- If context-mode unavailable, fall back to native Bash with warning
|
|
13
|
+
|
|
14
|
+
## Codebase Exploration
|
|
15
|
+
1. Query `wazir index search-symbols <query>` first
|
|
16
|
+
2. Use `wazir recall file <path> --tier L1` for targeted reads
|
|
17
|
+
3. Fall back to direct file reads ONLY for files identified by index queries
|
|
18
|
+
4. Maximum 10 direct file reads without a justifying index query
|
|
19
|
+
5. If no index exists: `wazir index build && wazir index summarize --tier all`
|
|
20
|
+
|
|
8
21
|
Every completion claim must include:
|
|
9
22
|
|
|
10
23
|
- what was verified
|