@wazir-dev/cli 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/CHANGELOG.md +39 -44
  2. package/README.md +13 -13
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/why-wazir.md +1 -1
  9. package/docs/readmes/INDEX.md +1 -1
  10. package/docs/readmes/features/expertise/README.md +1 -1
  11. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  12. package/docs/reference/hooks.md +1 -0
  13. package/docs/reference/launch-checklist.md +3 -3
  14. package/docs/reference/review-loop-pattern.md +3 -2
  15. package/docs/reference/skill-tiers.md +2 -2
  16. package/expertise/antipatterns/process/ai-coding-antipatterns.md +117 -0
  17. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  18. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  19. package/exports/hosts/claude/export.manifest.json +2 -2
  20. package/exports/hosts/codex/export.manifest.json +2 -2
  21. package/exports/hosts/cursor/export.manifest.json +2 -2
  22. package/exports/hosts/gemini/export.manifest.json +2 -2
  23. package/llms-full.txt +48 -18
  24. package/package.json +2 -3
  25. package/schemas/phase-report.schema.json +9 -0
  26. package/skills/brainstorming/SKILL.md +14 -2
  27. package/skills/clarifier/SKILL.md +189 -35
  28. package/skills/executor/SKILL.md +67 -0
  29. package/skills/init-pipeline/SKILL.md +0 -1
  30. package/skills/reviewer/SKILL.md +86 -13
  31. package/skills/self-audit/SKILL.md +20 -0
  32. package/skills/skill-research/SKILL.md +188 -0
  33. package/skills/verification/SKILL.md +41 -3
  34. package/skills/wazir/SKILL.md +304 -38
  35. package/tooling/src/capture/command.js +17 -1
  36. package/tooling/src/capture/store.js +32 -0
  37. package/tooling/src/capture/user-input.js +66 -0
  38. package/tooling/src/checks/security-sensitivity.js +69 -0
  39. package/tooling/src/cli.js +28 -26
  40. package/tooling/src/guards/phase-prerequisite-guard.js +58 -0
  41. package/tooling/src/init/auto-detect.js +0 -2
  42. package/tooling/src/init/command.js +3 -95
  43. package/tooling/src/status/command.js +6 -1
  44. package/tooling/src/verify/proof-collector.js +299 -0
  45. package/workflows/plan-review.md +3 -1
  46. package/workflows/verify.md +30 -1
@@ -0,0 +1,188 @@
1
+ ---
2
+ name: wz:skill-research
3
+ description: Deep competitive analysis of Wazir skills against the ecosystem. Research only — never auto-applies changes.
4
+ ---
5
+
6
+ # Skill Research — Overnight Competitive Analysis
7
+
8
+ Deeply analyze Wazir skills against equivalent skills in other frameworks. Produces comparison reports with ratings and recommendations. **Research only — never modifies skill files.**
9
+
10
+ ## Invocation
11
+
12
+ ```
13
+ /wazir audit skills --all # Analyze all skills
14
+ /wazir audit skills --skill tdd,debugging # Analyze specific skills
15
+ /wazir audit skills --skill executor --deep # Deep analysis of one skill
16
+ ```
17
+
18
+ ## Command Routing
19
+ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
20
+ - Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
21
+ - Small commands (git status, ls, pwd, wazir CLI) → native Bash
22
+ - If context-mode unavailable, fall back to native Bash with warning
23
+
24
+ ## Isolation
25
+
26
+ This skill MUST run in an isolated git worktree:
27
+
28
+ 1. Create worktree: `git worktree add .worktrees/skill-research-<date> -b skill-research-<date>`
29
+ 2. All report files are written inside the worktree
30
+ 3. Commits contain ONLY report files — never skill changes
31
+ 4. On completion, present the branch for user review
32
+
33
+ ## Per-Skill Research Process
34
+
35
+ For each skill being analyzed:
36
+
37
+ ### Step 1: Read the Wazir Skill
38
+
39
+ Read the full `SKILL.md` for the skill being analyzed. Extract:
40
+ - Purpose and trigger conditions
41
+ - Enforcement mechanisms (hard gates, checks, rules)
42
+ - Anti-rationalization coverage (how does it prevent agents from skipping steps?)
43
+ - Token cost estimate (how many tokens does this skill add to context?)
44
+
45
+ ### Step 2: Research Competitors
46
+
47
+ Fetch and analyze equivalent skills from:
48
+
49
+ 1. **superpowers** — the primary competitor. Fetch the equivalent skill from GitHub.
50
+ 2. **2-3 other frameworks** — depending on the skill type:
51
+ - For TDD: cursor-rules TDD patterns, aider commit conventions
52
+ - For debugging: rubber-duck debugging frameworks, systematic debugging methodologies
53
+ - For planning: software architecture patterns, agile story mapping tools
54
+ - For review: CodeRabbit, GitHub Copilot review, PR review best practices
55
+
56
+ Use `WebFetch` or context-mode `fetch_and_index` to retrieve competitor content.
57
+
58
+ ### Step 3: Side-by-Side Comparison
59
+
60
+ Produce a comparison table:
61
+
62
+ ```markdown
63
+ | Dimension | Wazir | superpowers | Competitor B | Competitor C |
64
+ |-----------|-------|-------------|-------------|-------------|
65
+ | Completeness | ... | ... | ... | ... |
66
+ | Enforcement | ... | ... | ... | ... |
67
+ | Token efficiency | ... | ... | ... | ... |
68
+ | Anti-rationalization | ... | ... | ... | ... |
69
+ ```
70
+
71
+ For each dimension, note:
72
+ - **Wazir strengths** — what Wazir does better
73
+ - **Wazir weaknesses** — what competitors do better
74
+ - **Gaps** — things competitors have that Wazir lacks entirely
75
+
76
+ ### Step 4: Rate
77
+
78
+ Rate each skill on 4 dimensions (1-5 scale):
79
+
80
+ 1. **Completeness** (1-5) — Does the skill cover all necessary cases? Are there gaps in the workflow?
81
+ 2. **Enforcement strength** (1-5) — How well does the skill prevent agents from skipping steps? Are there hard gates or just suggestions?
82
+ 3. **Token efficiency** (1-5) — How concise is the skill? Could it achieve the same enforcement with fewer tokens?
83
+ 4. **Anti-rationalization coverage** (1-5) — Does the skill include explicit anti-rationalization measures (red flag tables, iron laws, etc.)?
84
+
85
+ Each rating must include a 1-2 sentence justification.
86
+
87
+ ### Step 5: Recommend
88
+
89
+ For each skill, produce specific, actionable recommendations:
90
+
91
+ - What to add (with reasoning from competitor analysis)
92
+ - What to remove (token bloat without enforcement value)
93
+ - What to restructure (better organization for the same content)
94
+ - Priority: high / medium / low
95
+
96
+ **Recommendations are NEVER auto-applied.** They go in the report for human review.
97
+
98
+ ## Output Format
99
+
100
+ Reports saved to `reports/skill-audit-<YYYY-MM-DD>/`:
101
+
102
+ ```
103
+ reports/skill-audit-2026-03-20/
104
+ ├── README.md # Summary with aggregate ratings
105
+ ├── skill-tdd.md # Per-skill report
106
+ ├── skill-debugging.md
107
+ ├── skill-executor.md
108
+ └── ...
109
+ ```
110
+
111
+ ### Per-Skill Report Template
112
+
113
+ ```markdown
114
+ # Skill Research: [skill name]
115
+
116
+ **Date:** YYYY-MM-DD
117
+ **Wazir version:** [commit hash]
118
+ **Competitors analyzed:** [list]
119
+
120
+ ## Current State
121
+ [Summary of what the Wazir skill does, its enforcement mechanisms, and token cost]
122
+
123
+ ## Competitor Analysis
124
+ [Side-by-side comparison table]
125
+
126
+ ## Ratings
127
+
128
+ | Dimension | Score | Justification |
129
+ |-----------|-------|---------------|
130
+ | Completeness | X/5 | ... |
131
+ | Enforcement | X/5 | ... |
132
+ | Token efficiency | X/5 | ... |
133
+ | Anti-rationalization | X/5 | ... |
134
+ | **Overall** | **X/20** | |
135
+
136
+ ## Strengths
137
+ [What Wazir does well]
138
+
139
+ ## Weaknesses
140
+ [What competitors do better]
141
+
142
+ ## Recommendations
143
+ | # | Priority | Recommendation | Reasoning |
144
+ |---|----------|---------------|-----------|
145
+ | 1 | high | ... | Based on [competitor] analysis |
146
+ | 2 | medium | ... | ... |
147
+
148
+ ## Sources
149
+ [URLs and references for all competitor content analyzed]
150
+ ```
151
+
152
+ ### Summary README Template
153
+
154
+ ```markdown
155
+ # Skill Audit — YYYY-MM-DD
156
+
157
+ **Skills analyzed:** N
158
+ **Average score:** X/20
159
+
160
+ | Skill | Completeness | Enforcement | Efficiency | Anti-rational | Total |
161
+ |-------|-------------|-------------|------------|--------------|-------|
162
+ | tdd | 4 | 5 | 3 | 4 | 16/20 |
163
+ | debugging | 3 | 3 | 4 | 2 | 12/20 |
164
+ | ... | | | | | |
165
+
166
+ ## Top Recommendations (cross-skill)
167
+ 1. ...
168
+ 2. ...
169
+ 3. ...
170
+ ```
171
+
172
+ ## Completion
173
+
174
+ After all skills are analyzed:
175
+
176
+ 1. Commit reports in the worktree: `feat(reports): skill audit YYYY-MM-DD`
177
+ 2. Present the branch name and summary to the user
178
+ 3. Do NOT merge — user reviews and decides what to implement
179
+ 4. Do NOT modify any skill files — reports only
180
+
181
+ > **Skill research complete.**
182
+ >
183
+ > - Skills analyzed: [N]
184
+ > - Reports: `reports/skill-audit-<date>/` on branch `skill-research-<date>`
185
+ > - Average score: [X]/20
186
+ > - Top recommendations: [list top 3]
187
+ >
188
+ > **Next:** Review reports and decide which recommendations to implement.
@@ -18,18 +18,56 @@ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
18
18
  4. Maximum 10 direct file reads without a justifying index query
19
19
  5. If no index exists: `wazir index build && wazir index summarize --tier all`
20
20
 
21
+ ## Proof of Implementation
22
+
23
+ 1. Detect project type: `detectRunnableType(projectRoot)` → web | api | cli | library
24
+ 2. Collect evidence: `collectProof(taskSpec, runConfig)`
25
+ 3. Save evidence to `.wazir/runs/<id>/artifacts/proof-<task>.json`
26
+
27
+ **For runnable output (web/api/cli):** Run the application and capture evidence (build output, screenshots, curl responses, CLI output).
28
+
29
+ **For non-runnable output (library/config/skills):** Run lint, format check, type check, and tests. All must pass.
30
+
31
+ Evidence collection uses `tooling/src/verify/proof-collector.js`.
32
+
33
+ ## Verification Requirements
34
+
21
35
  Every completion claim must include:
22
36
 
23
37
  - what was verified
24
38
  - the exact command or deterministic check
25
39
  - the actual result
26
40
 
27
- Minimum rule:
41
+ ## Proof Collection
42
+
43
+ Use `proof-collector` (`tooling/src/verify/proof-collector.js`) for automated evidence gathering:
44
+
45
+ 1. **`detectRunnableType(projectRoot)`** — detects whether the project is `web`, `api`, `cli`, or `library` from `package.json`. Detection order: `pkg.bin` (cli), web framework deps (web), API framework deps (api), default (library).
46
+
47
+ 2. **`collectProof(projectRoot, opts?)`** — runs type-appropriate verification commands and returns structured evidence:
48
+ - **web:** `npm run build` + library checks
49
+ - **api:** library checks (test, tsc, eslint, prettier)
50
+ - **cli:** `<bin> --help` + library checks
51
+ - **library:** `npm test`, `tsc --noEmit`, `eslint .`, `prettier --check .`
52
+
53
+ All commands use `execFileSync` (never shell `exec`) for security. Evidence is returned as `{ type, evidence: [{ check, ok, output }] }`.
54
+
55
+ ## Minimum Rules
28
56
 
29
57
  - no success claim without fresh evidence from the current change
58
+ - always use `proof-collector` for Node.js projects to gather deterministic evidence
59
+ - attach the evidence array to the verification proof artifact
30
60
 
31
61
  When verification fails:
32
62
 
33
63
  - do not mark the work complete
34
- - fix the issue or report the gap honestly
35
- - rerun verification after the fix
64
+ - report the gap honestly
65
+
66
+ Ask the user via AskUserQuestion:
67
+ - **Question:** "Verification failed for [specific criteria]. How should we proceed?"
68
+ - **Options:**
69
+ 1. "Fix the issue and re-verify" *(Recommended)*
70
+ 2. "Accept partial verification with documented gaps"
71
+ 3. "Abort and review what went wrong"
72
+
73
+ Wait for the user's selection before continuing.