npm - @wazir-dev/cli - Versions diffs - 1.2.0 → 1.3.0 - Mend

@wazir-dev/cli 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/CHANGELOG.md +39 -44
package/README.md +13 -13
package/assets/demo.cast +47 -0
package/assets/demo.gif +0 -0
package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
package/docs/concepts/architecture.md +1 -1
package/docs/concepts/why-wazir.md +1 -1
package/docs/readmes/INDEX.md +1 -1
package/docs/readmes/features/expertise/README.md +1 -1
package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
package/docs/reference/hooks.md +1 -0
package/docs/reference/launch-checklist.md +3 -3
package/docs/reference/review-loop-pattern.md +3 -2
package/docs/reference/skill-tiers.md +2 -2
package/expertise/antipatterns/process/ai-coding-antipatterns.md +117 -0
package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
package/exports/hosts/claude/.claude/commands/verify.md +30 -1
package/exports/hosts/claude/export.manifest.json +2 -2
package/exports/hosts/codex/export.manifest.json +2 -2
package/exports/hosts/cursor/export.manifest.json +2 -2
package/exports/hosts/gemini/export.manifest.json +2 -2
package/llms-full.txt +48 -18
package/package.json +2 -3
package/schemas/phase-report.schema.json +9 -0
package/skills/brainstorming/SKILL.md +14 -2
package/skills/clarifier/SKILL.md +189 -35
package/skills/executor/SKILL.md +67 -0
package/skills/init-pipeline/SKILL.md +0 -1
package/skills/reviewer/SKILL.md +86 -13
package/skills/self-audit/SKILL.md +20 -0
package/skills/skill-research/SKILL.md +188 -0
package/skills/verification/SKILL.md +41 -3
package/skills/wazir/SKILL.md +304 -38
package/tooling/src/capture/command.js +17 -1
package/tooling/src/capture/store.js +32 -0
package/tooling/src/capture/user-input.js +66 -0
package/tooling/src/checks/security-sensitivity.js +69 -0
package/tooling/src/cli.js +28 -26
package/tooling/src/guards/phase-prerequisite-guard.js +58 -0
package/tooling/src/init/auto-detect.js +0 -2
package/tooling/src/init/command.js +3 -95
package/tooling/src/status/command.js +6 -1
package/tooling/src/verify/proof-collector.js +299 -0
package/workflows/plan-review.md +3 -1
package/workflows/verify.md +30 -1

package/skills/skill-research/SKILL.md ADDED Viewed

@@ -0,0 +1,188 @@
+---
+name: wz:skill-research
+description: Deep competitive analysis of Wazir skills against the ecosystem. Research only — never auto-applies changes.
+---
+# Skill Research — Overnight Competitive Analysis
+Deeply analyze Wazir skills against equivalent skills in other frameworks. Produces comparison reports with ratings and recommendations. **Research only — never modifies skill files.**
+## Invocation
+```
+/wazir audit skills --all                    # Analyze all skills
+/wazir audit skills --skill tdd,debugging    # Analyze specific skills
+/wazir audit skills --skill executor --deep  # Deep analysis of one skill
+```
+## Command Routing
+Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
+- Large commands (test runners, builds, diffs, dependency trees, linting) → context-mode tools
+- Small commands (git status, ls, pwd, wazir CLI) → native Bash
+- If context-mode unavailable, fall back to native Bash with warning
+## Isolation
+This skill MUST run in an isolated git worktree:
+1. Create worktree: `git worktree add .worktrees/skill-research-<date> -b skill-research-<date>`
+2. All report files are written inside the worktree
+3. Commits contain ONLY report files — never skill changes
+4. On completion, present the branch for user review
+## Per-Skill Research Process
+For each skill being analyzed:
+### Step 1: Read the Wazir Skill
+Read the full `SKILL.md` for the skill being analyzed. Extract:
+- Purpose and trigger conditions
+- Enforcement mechanisms (hard gates, checks, rules)
+- Anti-rationalization coverage (how does it prevent agents from skipping steps?)
+- Token cost estimate (how many tokens does this skill add to context?)
+### Step 2: Research Competitors
+Fetch and analyze equivalent skills from:
+1. **superpowers** — the primary competitor. Fetch the equivalent skill from GitHub.
+2. **2-3 other frameworks** — depending on the skill type:
+   - For TDD: cursor-rules TDD patterns, aider commit conventions
+   - For debugging: rubber-duck debugging frameworks, systematic debugging methodologies
+   - For planning: software architecture patterns, agile story mapping tools
+   - For review: CodeRabbit, GitHub Copilot review, PR review best practices
+Use `WebFetch` or context-mode `fetch_and_index` to retrieve competitor content.
+### Step 3: Side-by-Side Comparison
+Produce a comparison table:
+```markdown
+| Dimension | Wazir | superpowers | Competitor B | Competitor C |
+|-----------|-------|-------------|-------------|-------------|
+| Completeness | ... | ... | ... | ... |
+| Enforcement | ... | ... | ... | ... |
+| Token efficiency | ... | ... | ... | ... |
+| Anti-rationalization | ... | ... | ... | ... |
+```
+For each dimension, note:
+- **Wazir strengths** — what Wazir does better
+- **Wazir weaknesses** — what competitors do better
+- **Gaps** — things competitors have that Wazir lacks entirely
+### Step 4: Rate
+Rate each skill on 4 dimensions (1-5 scale):
+1. **Completeness** (1-5) — Does the skill cover all necessary cases? Are there gaps in the workflow?
+2. **Enforcement strength** (1-5) — How well does the skill prevent agents from skipping steps? Are there hard gates or just suggestions?
+3. **Token efficiency** (1-5) — How concise is the skill? Could it achieve the same enforcement with fewer tokens?
+4. **Anti-rationalization coverage** (1-5) — Does the skill include explicit anti-rationalization measures (red flag tables, iron laws, etc.)?
+Each rating must include a 1-2 sentence justification.
+### Step 5: Recommend
+For each skill, produce specific, actionable recommendations:
+- What to add (with reasoning from competitor analysis)
+- What to remove (token bloat without enforcement value)
+- What to restructure (better organization for the same content)
+- Priority: high / medium / low
+**Recommendations are NEVER auto-applied.** They go in the report for human review.
+## Output Format
+Reports saved to `reports/skill-audit-<YYYY-MM-DD>/`:
+```
+reports/skill-audit-2026-03-20/
+├── README.md              # Summary with aggregate ratings
+├── skill-tdd.md           # Per-skill report
+├── skill-debugging.md
+├── skill-executor.md
+└── ...
+```
+### Per-Skill Report Template
+```markdown
+# Skill Research: [skill name]
+**Date:** YYYY-MM-DD
+**Wazir version:** [commit hash]
+**Competitors analyzed:** [list]
+## Current State
+[Summary of what the Wazir skill does, its enforcement mechanisms, and token cost]
+## Competitor Analysis
+[Side-by-side comparison table]
+## Ratings
+| Dimension | Score | Justification |
+|-----------|-------|---------------|
+| Completeness | X/5 | ... |
+| Enforcement | X/5 | ... |
+| Token efficiency | X/5 | ... |
+| Anti-rationalization | X/5 | ... |
+| **Overall** | **X/20** | |
+## Strengths
+[What Wazir does well]
+## Weaknesses
+[What competitors do better]
+## Recommendations
+| # | Priority | Recommendation | Reasoning |
+|---|----------|---------------|-----------|
+| 1 | high | ... | Based on [competitor] analysis |
+| 2 | medium | ... | ... |
+## Sources
+[URLs and references for all competitor content analyzed]
+```
+### Summary README Template
+```markdown
+# Skill Audit — YYYY-MM-DD
+**Skills analyzed:** N
+**Average score:** X/20
+| Skill | Completeness | Enforcement | Efficiency | Anti-rational | Total |
+|-------|-------------|-------------|------------|--------------|-------|
+| tdd | 4 | 5 | 3 | 4 | 16/20 |
+| debugging | 3 | 3 | 4 | 2 | 12/20 |
+| ... | | | | | |
+## Top Recommendations (cross-skill)
+1. ...
+2. ...
+3. ...
+```
+## Completion
+After all skills are analyzed:
+1. Commit reports in the worktree: `feat(reports): skill audit YYYY-MM-DD`
+2. Present the branch name and summary to the user
+3. Do NOT merge — user reviews and decides what to implement
+4. Do NOT modify any skill files — reports only
+> **Skill research complete.**
+>
+> - Skills analyzed: [N]
+> - Reports: `reports/skill-audit-<date>/` on branch `skill-research-<date>`
+> - Average score: [X]/20
+> - Top recommendations: [list top 3]
+>
+> **Next:** Review reports and decide which recommendations to implement.

package/skills/verification/SKILL.md CHANGED Viewed

@@ -18,18 +18,56 @@ Follow the Canonical Command Matrix in `hooks/routing-matrix.json`.
 4. Maximum 10 direct file reads without a justifying index query
 5. If no index exists: `wazir index build && wazir index summarize --tier all`
+## Proof of Implementation
+1. Detect project type: `detectRunnableType(projectRoot)` → web | api | cli | library
+2. Collect evidence: `collectProof(taskSpec, runConfig)`
+3. Save evidence to `.wazir/runs/<id>/artifacts/proof-<task>.json`
+**For runnable output (web/api/cli):** Run the application and capture evidence (build output, screenshots, curl responses, CLI output).
+**For non-runnable output (library/config/skills):** Run lint, format check, type check, and tests. All must pass.
+Evidence collection uses `tooling/src/verify/proof-collector.js`.
+## Verification Requirements
 Every completion claim must include:
 - what was verified
 - the exact command or deterministic check
 - the actual result
-Minimum rule:
+## Proof Collection
+Use `proof-collector` (`tooling/src/verify/proof-collector.js`) for automated evidence gathering:
+1. **`detectRunnableType(projectRoot)`** — detects whether the project is `web`, `api`, `cli`, or `library` from `package.json`. Detection order: `pkg.bin` (cli), web framework deps (web), API framework deps (api), default (library).
+2. **`collectProof(projectRoot, opts?)`** — runs type-appropriate verification commands and returns structured evidence:
+   - **web:** `npm run build` + library checks
+   - **api:** library checks (test, tsc, eslint, prettier)
+   - **cli:** `<bin> --help` + library checks
+   - **library:** `npm test`, `tsc --noEmit`, `eslint .`, `prettier --check .`
+All commands use `execFileSync` (never shell `exec`) for security. Evidence is returned as `{ type, evidence: [{ check, ok, output }] }`.
+## Minimum Rules
 - no success claim without fresh evidence from the current change
+- always use `proof-collector` for Node.js projects to gather deterministic evidence
+- attach the evidence array to the verification proof artifact
 When verification fails:
 - do not mark the work complete
-- fix the issue or report the gap honestly
-- rerun verification after the fix
+- report the gap honestly
+Ask the user via AskUserQuestion:
+- **Question:** "Verification failed for [specific criteria]. How should we proceed?"
+- **Options:**
+  1. "Fix the issue and re-verify" *(Recommended)*
+  2. "Accept partial verification with documented gaps"
+  3. "Abort and review what went wrong"
+Wait for the user's selection before continuing.