@wazir-dev/cli 1.2.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (161) hide show
  1. package/CHANGELOG.md +54 -44
  2. package/README.md +13 -13
  3. package/assets/demo.cast +47 -0
  4. package/assets/demo.gif +0 -0
  5. package/docs/anti-patterns/AP-23-skipping-enabled-workflows.md +28 -0
  6. package/docs/anti-patterns/AP-24-clarifier-deciding-scope.md +34 -0
  7. package/docs/concepts/architecture.md +1 -1
  8. package/docs/concepts/why-wazir.md +1 -1
  9. package/docs/readmes/INDEX.md +1 -1
  10. package/docs/readmes/features/expertise/README.md +1 -1
  11. package/docs/readmes/features/hooks/pre-compact-summary.md +1 -1
  12. package/docs/reference/hooks.md +1 -0
  13. package/docs/reference/launch-checklist.md +3 -3
  14. package/docs/reference/review-loop-pattern.md +3 -2
  15. package/docs/reference/skill-tiers.md +2 -2
  16. package/docs/research/2026-03-20-agents/a18fb002157904af5.txt +187 -0
  17. package/docs/research/2026-03-20-agents/a1d0ac79ac2f11e6f.txt +2 -0
  18. package/docs/research/2026-03-20-agents/a324079de037abd7c.txt +198 -0
  19. package/docs/research/2026-03-20-agents/a357586bccfafb0e5.txt +256 -0
  20. package/docs/research/2026-03-20-agents/a4365394e4d753105.txt +137 -0
  21. package/docs/research/2026-03-20-agents/a492af28bc52d3613.txt +136 -0
  22. package/docs/research/2026-03-20-agents/a4984db0b6a8eee07.txt +124 -0
  23. package/docs/research/2026-03-20-agents/a5b30e59d34bbb062.txt +214 -0
  24. package/docs/research/2026-03-20-agents/a5cf7829dab911586.txt +165 -0
  25. package/docs/research/2026-03-20-agents/a607157c30dd97c9e.txt +96 -0
  26. package/docs/research/2026-03-20-agents/a60b68b1e19d1e16b.txt +115 -0
  27. package/docs/research/2026-03-20-agents/a722af01c5594aba0.txt +166 -0
  28. package/docs/research/2026-03-20-agents/a787bdc516faa5829.txt +181 -0
  29. package/docs/research/2026-03-20-agents/a7c46d1bba1056ed2.txt +132 -0
  30. package/docs/research/2026-03-20-agents/a7e5abbab2b281a0d.txt +100 -0
  31. package/docs/research/2026-03-20-agents/a8dbadc66cd0d7d5a.txt +95 -0
  32. package/docs/research/2026-03-20-agents/a904d9f45d6b86a6d.txt +75 -0
  33. package/docs/research/2026-03-20-agents/a927659a942ee7f60.txt +102 -0
  34. package/docs/research/2026-03-20-agents/a962cb569191f7583.txt +125 -0
  35. package/docs/research/2026-03-20-agents/aab6decea538aac41.txt +148 -0
  36. package/docs/research/2026-03-20-agents/abd58b853dd938a1b.txt +295 -0
  37. package/docs/research/2026-03-20-agents/ac009da573eff7f65.txt +100 -0
  38. package/docs/research/2026-03-20-agents/ac1bc783364405e5f.txt +190 -0
  39. package/docs/research/2026-03-20-agents/aca5e2b57fde152a0.txt +132 -0
  40. package/docs/research/2026-03-20-agents/ad849b8c0a7e95b8b.txt +176 -0
  41. package/docs/research/2026-03-20-agents/adc2b12a4da32c962.txt +258 -0
  42. package/docs/research/2026-03-20-agents/af97caaaa9a80e4cb.txt +146 -0
  43. package/docs/research/2026-03-20-agents/afc5faceee368b3ca.txt +111 -0
  44. package/docs/research/2026-03-20-agents/afdb282d866e3c1e4.txt +164 -0
  45. package/docs/research/2026-03-20-agents/afe9d1f61c02b1e8d.txt +299 -0
  46. package/docs/research/2026-03-20-agents/b4hmkwril.txt +1856 -0
  47. package/docs/research/2026-03-20-agents/b80ptk89g.txt +1856 -0
  48. package/docs/research/2026-03-20-agents/bf54s1jss.txt +1150 -0
  49. package/docs/research/2026-03-20-agents/bhd6kq2kx.txt +1856 -0
  50. package/docs/research/2026-03-20-agents/bmb2fodyr.txt +988 -0
  51. package/docs/research/2026-03-20-agents/bmmsrij8i.txt +826 -0
  52. package/docs/research/2026-03-20-agents/bn4t2ywpu.txt +2175 -0
  53. package/docs/research/2026-03-20-agents/bu22t9f1z.txt +0 -0
  54. package/docs/research/2026-03-20-agents/bwvl98v2p.txt +738 -0
  55. package/docs/research/2026-03-20-agents/psych-a3697a7fd06eb64fd.txt +135 -0
  56. package/docs/research/2026-03-20-agents/psych-a37776fabc870feae.txt +123 -0
  57. package/docs/research/2026-03-20-agents/psych-a5b1fe05c0589efaf.txt +2 -0
  58. package/docs/research/2026-03-20-agents/psych-a95c15b1f29424435.txt +76 -0
  59. package/docs/research/2026-03-20-agents/psych-a9c26f4d9172dde7c.txt +2 -0
  60. package/docs/research/2026-03-20-agents/psych-aa19c69f0ca2c5ad3.txt +2 -0
  61. package/docs/research/2026-03-20-agents/psych-aa4e4cb70e1be5ecb.txt +95 -0
  62. package/docs/research/2026-03-20-agents/psych-ab5b302f26a554663.txt +102 -0
  63. package/docs/research/2026-03-20-deep-research-complete.md +101 -0
  64. package/docs/research/2026-03-20-deep-research-status.md +38 -0
  65. package/docs/research/2026-03-20-enforcement-research.md +107 -0
  66. package/expertise/antipatterns/process/ai-coding-antipatterns.md +117 -0
  67. package/expertise/composition-map.yaml +27 -8
  68. package/expertise/digests/reviewer/ai-coding-digest.md +83 -0
  69. package/expertise/digests/reviewer/architectural-thinking-digest.md +63 -0
  70. package/expertise/digests/reviewer/architecture-antipatterns-digest.md +49 -0
  71. package/expertise/digests/reviewer/code-smells-digest.md +53 -0
  72. package/expertise/digests/reviewer/coupling-cohesion-digest.md +54 -0
  73. package/expertise/digests/reviewer/ddd-digest.md +60 -0
  74. package/expertise/digests/reviewer/dependency-risk-digest.md +40 -0
  75. package/expertise/digests/reviewer/error-handling-digest.md +55 -0
  76. package/expertise/digests/reviewer/review-methodology-digest.md +49 -0
  77. package/exports/hosts/claude/.claude/commands/learn.md +61 -8
  78. package/exports/hosts/claude/.claude/commands/plan-review.md +3 -1
  79. package/exports/hosts/claude/.claude/commands/verify.md +30 -1
  80. package/exports/hosts/claude/.claude/settings.json +7 -6
  81. package/exports/hosts/claude/export.manifest.json +8 -5
  82. package/exports/hosts/claude/host-package.json +3 -0
  83. package/exports/hosts/codex/export.manifest.json +8 -5
  84. package/exports/hosts/codex/host-package.json +3 -0
  85. package/exports/hosts/cursor/.cursor/hooks.json +6 -6
  86. package/exports/hosts/cursor/export.manifest.json +8 -5
  87. package/exports/hosts/cursor/host-package.json +3 -0
  88. package/exports/hosts/gemini/export.manifest.json +8 -5
  89. package/exports/hosts/gemini/host-package.json +3 -0
  90. package/hooks/definitions/pretooluse_dispatcher.yaml +26 -0
  91. package/hooks/definitions/pretooluse_pipeline_guard.yaml +22 -0
  92. package/hooks/definitions/stop_pipeline_gate.yaml +22 -0
  93. package/hooks/hooks.json +7 -6
  94. package/hooks/pretooluse-dispatcher +84 -0
  95. package/hooks/pretooluse-pipeline-guard +9 -0
  96. package/hooks/stop-pipeline-gate +9 -0
  97. package/llms-full.txt +48 -18
  98. package/package.json +2 -3
  99. package/schemas/decision.schema.json +15 -0
  100. package/schemas/hook.schema.json +4 -1
  101. package/schemas/phase-report.schema.json +9 -0
  102. package/skills/TEMPLATE-3-ZONE.md +160 -0
  103. package/skills/brainstorming/SKILL.md +137 -21
  104. package/skills/clarifier/SKILL.md +364 -53
  105. package/skills/claude-cli/SKILL.md +91 -12
  106. package/skills/codex-cli/SKILL.md +91 -12
  107. package/skills/debugging/SKILL.md +133 -38
  108. package/skills/design/SKILL.md +173 -37
  109. package/skills/dispatching-parallel-agents/SKILL.md +129 -31
  110. package/skills/executing-plans/SKILL.md +113 -25
  111. package/skills/executor/SKILL.md +252 -21
  112. package/skills/finishing-a-development-branch/SKILL.md +107 -18
  113. package/skills/gemini-cli/SKILL.md +91 -12
  114. package/skills/humanize/SKILL.md +92 -13
  115. package/skills/init-pipeline/SKILL.md +90 -18
  116. package/skills/prepare-next/SKILL.md +93 -24
  117. package/skills/receiving-code-review/SKILL.md +90 -16
  118. package/skills/requesting-code-review/SKILL.md +100 -24
  119. package/skills/requesting-code-review/code-reviewer.md +29 -17
  120. package/skills/reviewer/SKILL.md +270 -57
  121. package/skills/run-audit/SKILL.md +92 -15
  122. package/skills/scan-project/SKILL.md +93 -14
  123. package/skills/self-audit/SKILL.md +133 -39
  124. package/skills/skill-research/SKILL.md +275 -0
  125. package/skills/subagent-driven-development/SKILL.md +129 -30
  126. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +30 -2
  127. package/skills/subagent-driven-development/implementer-prompt.md +40 -27
  128. package/skills/subagent-driven-development/spec-reviewer-prompt.md +25 -12
  129. package/skills/tdd/SKILL.md +125 -20
  130. package/skills/using-git-worktrees/SKILL.md +118 -28
  131. package/skills/using-skills/SKILL.md +116 -29
  132. package/skills/verification/SKILL.md +160 -17
  133. package/skills/wazir/SKILL.md +750 -120
  134. package/skills/writing-plans/SKILL.md +134 -28
  135. package/skills/writing-skills/SKILL.md +91 -13
  136. package/skills/writing-skills/anthropic-best-practices.md +104 -64
  137. package/skills/writing-skills/persuasion-principles.md +100 -34
  138. package/tooling/src/capture/command.js +46 -2
  139. package/tooling/src/capture/decision.js +40 -0
  140. package/tooling/src/capture/store.js +33 -0
  141. package/tooling/src/capture/user-input.js +66 -0
  142. package/tooling/src/checks/security-sensitivity.js +69 -0
  143. package/tooling/src/cli.js +28 -26
  144. package/tooling/src/config/depth-table.js +60 -0
  145. package/tooling/src/export/compiler.js +7 -8
  146. package/tooling/src/guards/guardrail-functions.js +131 -0
  147. package/tooling/src/guards/phase-prerequisite-guard.js +97 -3
  148. package/tooling/src/hooks/pretooluse-dispatcher.js +300 -0
  149. package/tooling/src/hooks/pretooluse-pipeline-guard.js +141 -0
  150. package/tooling/src/hooks/stop-pipeline-gate.js +92 -0
  151. package/tooling/src/init/auto-detect.js +0 -2
  152. package/tooling/src/init/command.js +3 -95
  153. package/tooling/src/learn/pipeline.js +177 -0
  154. package/tooling/src/state/db.js +251 -2
  155. package/tooling/src/state/pipeline-state.js +262 -0
  156. package/tooling/src/status/command.js +6 -1
  157. package/tooling/src/verify/proof-collector.js +299 -0
  158. package/wazir.manifest.yaml +3 -0
  159. package/workflows/learn.md +61 -8
  160. package/workflows/plan-review.md +3 -1
  161. package/workflows/verify.md +30 -1
@@ -0,0 +1,101 @@
1
+ # Deep Research Complete — 2026-03-20
2
+
3
+ ## 25 Research Agents — ALL COMPLETED
4
+
5
+ Total output: ~6.8MB across 25 agents. Full transcripts at:
6
+ `/private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/`
7
+
8
+ ## Executive Summary
9
+
10
+ ### The Architecture for Wazir v2
11
+
12
+ **Three-layer enforcement pyramid:**
13
+ 1. **Hooks** (mechanical, can't bypass): Stop blocks completion, PreToolUse blocks writes/commits/pushes
14
+ 2. **Subagent isolation** (architectural, can't see full pipeline): one agent per phase, controller holds the loop
15
+ 3. **Persuasion engineering** (behavioral, won't bypass): superpowers-style rationalization tables, red flags, authority language
16
+
17
+ ### Key Findings
18
+
19
+ **Hooks:**
20
+ - Stop hook CAN block completion (`{"decision": "block"}`) — proven by ralph-loop (492+ iterations)
21
+ - PreToolUse has 7 decision patterns: silent allow, advisory, systemMessage, modify, JSON deny, exit-code deny, echo-trick redirect
22
+ - State tracking via `pipeline-state.json` — hooks read, CLI writes, atomic temp+rename
23
+ - Critical limitations: hooks can block but not compel; "hook error" labels poison the model; SubagentStop is broken; agent can escape via AskUserQuestion
24
+ - Must check `stop_hook_active` to prevent infinite loops; must allow context-limit and user-abort stops
25
+
26
+ **Subagent Architecture:**
27
+ - Controller-as-orchestrator: wz:wazir holds the loop, dispatches one subagent per phase
28
+ - Each subagent gets fresh 200K context (~165K usable after overhead)
29
+ - No nesting (depth=1) — controller dispatches ALL subagents directly
30
+ - File-mediated handoff (MetaGPT pattern): artifacts on disk, not in context
31
+ - Artifact dependency: each artifact has `requires` block with predecessor digest for staleness detection
32
+ - Guardrail functions per phase boundary with concrete pass/fail criteria
33
+ - Retry ladder: same-model×2 → model-escalation×1 → human escalation
34
+ - Error classification: transient (retry), quality (retry+feedback), deterministic (escalate), resource (model-escalate)
35
+
36
+ **Persuasion Engineering:**
37
+ - Superpowers is 100% prompt engineering, zero mechanical enforcement — and agents STILL skip (issue #463)
38
+ - Meincke et al. 2025: persuasion doubles compliance (33%→72%, N=28,000, p<.001)
39
+ - Best combination: Authority + Commitment + Scarcity
40
+ - CSO critical: skill descriptions must be triggers only, never process summaries
41
+ - 47 rationalization entries across 5 superpowers skills — Wazir has ZERO
42
+ - "Violating the letter is violating the spirit" — single most impactful sentence
43
+ - TDD for skills: RED (observe baseline failures) → GREEN (write skill addressing those) → REFACTOR (close new loopholes)
44
+
45
+ **Learning System:**
46
+ - 4-stage pipeline: Tally → Candidate → Promote → Active
47
+ - Findings classified by 8 categories × 4 severity levels
48
+ - Recurrence detection via finding_hash dedup (PagerDuty pattern)
49
+ - Semi-automatic promotion: auto-propose, human-approve (CodeGuru + Snyk model)
50
+ - Drift prevention: 30 active project learnings max, 90-day TTL, 5% hit-rate demotion, principle consolidation at 25+ entries
51
+ - Decision audit trail: v2 schema with category, alternatives, confidence, outcome_ref, supersedes
52
+ - User feedback: capture corrections/approvals in ndjson, classify signal vs noise
53
+
54
+ **Review Architecture:**
55
+ - Two-tier: internal (Sonnet, expertise-loaded, pattern-matching) → external (Codex, fresh eyes, unknown-unknowns)
56
+ - Critical finding: reviewer always-layer is 99K tokens against 50K ceiling — 5 of 8 modules are dropped
57
+ - Fix: mode-specific reviewer composition (different modules per review mode)
58
+ - Reviewer digest modules: 3-5K tokens each (not 12K originals)
59
+ - Findings classified by 8 categories (correctness, security, completeness, wiring, verification, drift, performance, style)
60
+ - Auto-classification rules per category with severity floors
61
+ - Feedback-to-learning: 7-step loop, LLM-assisted clustering for pattern detection
62
+
63
+ **Interactive UX:**
64
+ - AskUserQuestion: 1-4 questions, 2-4 options each, arrow-key selection, multiSelect supported
65
+ - Bug: DO NOT list in skill's allowed-tools (causes empty answers)
66
+ - Progressive disclosure: status line (what) → paragraph (why) → full report (everything)
67
+ - Key formula: "Name the action. State the dependency. Omit the journey."
68
+ - 5 progress patterns: phase map, meaningful updates, artifact previews, time estimates, heartbeat
69
+ - Heartbeat: never >2min silence (standard), >90s (deep), >3min (quick)
70
+ - Steerability: classify mutation level → show impact → selective regeneration → preserve completed work
71
+ - Three modes: auto (gating agent steers) / guided (checkpoints steer) / interactive (continuous steer)
72
+
73
+ ## Agent Output Index
74
+
75
+ | # | Agent | Key Deliverable |
76
+ |---|-------|----------------|
77
+ | 1 | Stop hook patterns | Complete blueprint for pipeline-gate Stop hook with 10 edge cases |
78
+ | 2 | PreToolUse catalog | 7 decision patterns with code examples from 4 real plugins |
79
+ | 3 | State machine design | pipeline-state.json schema with 30+ fields, update rules, session isolation |
80
+ | 4 | Hook limitations | 13 limitations with workarounds, including "hook error" label poisoning |
81
+ | 5 | Persuasion playbook | 10 patterns, 47 rationalization entries, CSO rules, implementation checklist |
82
+ | 6 | Controller pattern | Hybrid architecture: flat orchestration with file-mediated handoff |
83
+ | 7 | Artifact dependencies | Per-phase schemas with requires/digest, write-time validation |
84
+ | 8 | Context isolation | 200K per subagent, no nesting, MCP tool caveats, MetaGPT pub-sub |
85
+ | 9 | Guardrail validation | 6 guardrail functions with concrete pass/fail criteria per phase |
86
+ | 10 | Failure + retry | 3-tier ladder (same-model→escalate→human), error classification |
87
+ | 11 | AskUserQuestion API | Full schema, 2-4 options, multiSelect, known bugs, plugin examples |
88
+ | 12 | Showing reasoning | Progressive disclosure templates at 3 levels with anti-patterns |
89
+ | 13 | Depth parameters | (in bladnman analysis) 4 depth levels with per-parameter tables |
90
+ | 14 | Steerability | Mutation classification, impact assessment, selective regeneration |
91
+ | 15 | Progress reporting | 5 patterns (phase map, finding updates, previews, time estimates, heartbeat) |
92
+ | 16 | Findings → antipatterns | 4-stage promotion pipeline, 3+ occurrence threshold, human gate |
93
+ | 17 | Cumulative tracking | SQLite schema (5 tables), dedup algorithm, recurrence detection |
94
+ | 18 | Drift prevention | 7 mechanisms with concrete limits (30 active, 90-day TTL, 5% demotion) |
95
+ | 19 | Decision audit trail | v2 schema with alternatives, confidence, outcome correlation |
96
+ | 20 | User feedback capture | Signal classification, correction weighting, ndjson format |
97
+ | 21 | Two-tier review | Internal→external, critical asymmetry (known vs unknown unknowns) |
98
+ | 22 | Reviewer composition | Mode-specific modules, 3-5K digests, 50K budget analysis |
99
+ | 23 | Findings classification | 8 categories × 4 severities, auto-classification rules |
100
+ | 24 | Feedback-to-learning | 7-step loop, LLM clustering, minimum viable phases A-D |
101
+ | 25 | Proof-of-implementation | Per-type matrix (web/API/CLI/library), Playwright MCP, Symphony model |
@@ -0,0 +1,38 @@
1
+ # Deep Research Status — 2026-03-20
2
+
3
+ ## 25 Research Agents — Progress
4
+
5
+ ### Completed (14/25)
6
+ 1. Hook: Stop hook patterns (ralph-loop analysis) ✅
7
+ 2. Hook: PreToolUse catalog (7 decision patterns) ✅
8
+ 3. Hook: State machine design (pipeline-state.json) ✅
9
+ 4. Subagent: Artifact dependencies (per-phase schemas) ✅
10
+ 5. Subagent: Guardrail validation (per-phase functions) ✅
11
+ 6. Subagent: Failure + retry (3-tier ladder) ✅
12
+ 7. Interactive: Showing reasoning (progressive disclosure) ✅
13
+ 8. Learning: Findings → antipatterns (4-stage pipeline) ✅
14
+ 9. Learning: Cumulative tracking (SQLite schema) ✅
15
+ 10. Learning: Drift prevention (7 mechanisms) ✅
16
+ 11. Learning: User feedback capture ✅
17
+ 12. Review: Feedback-to-learning pipeline (7-step loop) ✅
18
+ 13. Review: Proof-of-implementation (per-type matrix) ✅
19
+ 14. Hook: Persuasion engineering (superpowers analysis) — in first batch ✅
20
+
21
+ ### Pending (11/25)
22
+ 15. Hook: Limitations + workarounds
23
+ 16. Subagent: Controller pattern
24
+ 17. Subagent: Context isolation
25
+ 18. Interactive: AskUserQuestion API
26
+ 19. Interactive: Depth parameters
27
+ 20. Interactive: Steerability
28
+ 21. Interactive: Progress reporting
29
+ 22. Review: Two-tier architecture
30
+ 23. Review: Reviewer composition
31
+ 24. Review: Findings classification
32
+ 25. Learning: Decision audit trail
33
+
34
+ ## Key Findings So Far
35
+
36
+ All research output files at: /private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/
37
+
38
+ Full synthesis will be compiled when all 25 agents complete.
@@ -0,0 +1,107 @@
1
+ # Enforcement Research — 2026-03-20
2
+
3
+ ## The Answer
4
+
5
+ **Prose instructions don't work. The agent will always rationalize skipping them.** Every framework that achieves reliable enforcement uses the same pattern: **the framework holds the loop, not the agent.**
6
+
7
+ ## The Three-Layer Strategy
8
+
9
+ ### Layer 1: Mechanical Hooks (agent CANNOT bypass)
10
+
11
+ **Stop hook** blocks completion: `{"decision": "block", "reason": "..."}` — proven by ralph-loop plugin (official marketplace). The agent literally cannot stop until all artifacts exist.
12
+
13
+ **PreToolUse hooks** block actions:
14
+ - `PreToolUse:Write|Edit` — blocks implementation code if no plan artifact exists
15
+ - `PreToolUse:Bash` — blocks `git commit` if no tests run, blocks `git push` if no review
16
+ - Returns `permissionDecision: "deny"` — the tool call is prevented entirely
17
+
18
+ **State tracking** via `pipeline-state.json` — hooks READ state, CLI WRITES state. No race conditions.
19
+
20
+ **Key: command hooks only, never prompt hooks.** Prompt hooks re-introduce the rationalization problem.
21
+
22
+ ### Layer 2: Subagent Isolation (agent CANNOT see full pipeline)
23
+
24
+ From every framework (CrewAI, LangGraph, Symphony, ideation_team_skill): **give the agent a task, not a plan.**
25
+
26
+ - Each phase is a separate subagent invocation
27
+ - Phase N+1 receives phase N's artifact as input — if it doesn't exist, the call fails
28
+ - The controller (wazir skill) holds the loop and decides what runs next
29
+ - No single agent can rationalize skipping from research to code
30
+
31
+ ### Layer 3: Persuasion Engineering (agent WON'T bypass — 72% compliance)
32
+
33
+ From superpowers (100K stars, backed by Meincke et al. 2025, N=28,000):
34
+
35
+ - **Rationalization tables** — enumerate exact thoughts the agent has when skipping, with rebuttals
36
+ - **"Violating the letter is violating the spirit"** — kills the #1 escape pattern
37
+ - **Red flags lists** — specific phrases that mean STOP
38
+ - **Authority + Commitment + Social Proof** — doubles compliance (33% → 72%)
39
+ - **CSO (Claude Search Optimization)** — skill descriptions must be triggers, never process summaries
40
+
41
+ ## Key Findings Per Source
42
+
43
+ ### Claude Code Hooks
44
+ - Stop hook CAN block (`{"decision": "block"}`) — proven by ralph-loop
45
+ - PreToolUse CAN deny AND modify tool calls — proven by context-mode plugin
46
+ - Hooks are stateless but can read/write files for state
47
+ - Hooks loaded at session start, can't be added mid-session
48
+ - **Limitation: hooks block actions but can't compel them**
49
+
50
+ ### Superpowers (100K stars)
51
+ - 100% prompt engineering, zero mechanical enforcement
52
+ - Single SessionStart hook injects meta-skill in `<EXTREMELY_IMPORTANT>` tags
53
+ - **Issue #463: agents STILL skip reviews** — the author knows it's unsolved
54
+ - Commenter: "The only reliable fix is making reviews structural, not instructional"
55
+ - TDD skill is best-in-class prompt engineering but still fails sometimes
56
+ - Persuasion research: authority language doubles compliance but doesn't reach 100%
57
+
58
+ ### Framework Enforcement Patterns
59
+ - **CrewAI:** Python for-loop + guardrail functions. Agent produces output, framework validates.
60
+ - **LangGraph:** Channel triggers + NamedBarrierValue. Node can't fire until inputs ready.
61
+ - **Temporal:** `await` keyword is the enforcement. Language-level blocking.
62
+ - **Symphony:** State machine + data dependencies. Each phase produces data the next requires.
63
+ - **GitHub Actions:** `needs:` DAG. Scheduler prevents jobs from starting without dependencies.
64
+ - **Universal pattern:** framework holds program counter, not agent.
65
+
66
+ ### UX / User Engagement
67
+ - **bladnman/ideation_team_skill:** AskUserQuestion for pre-flight interview, depth-aware parameters, cognitive role separation across agents
68
+ - **Devin:** PR-as-proof, screen recordings, conversational Slack updates, async delegation
69
+ - **Copilot Workspace:** Spec → Plan → Code, each editable. Steerability = trust.
70
+ - **Anthropic:** Show planning steps explicitly, programmatic checks at intermediate steps
71
+
72
+ ## What Wazir Must Build
73
+
74
+ ### 1. Pipeline State Machine (hooks + state file)
75
+
76
+ ```
77
+ SessionStart → initialize pipeline-state.json
78
+ PreToolUse:Write|Edit → deny if phase gate not passed
79
+ PreToolUse:Bash → deny git commit/push without tests/review
80
+ Stop → deny if any enabled workflow incomplete or proof missing
81
+ ```
82
+
83
+ ### 2. Subagent-Per-Phase Architecture
84
+
85
+ The `/wazir` skill becomes a CONTROLLER that:
86
+ - Spawns a clarifier subagent → receives clarification artifact
87
+ - Spawns a spec subagent → receives spec artifact
88
+ - Spawns a design subagent → receives design artifact
89
+ - Spawns an executor subagent → receives implementation
90
+ - Spawns a reviewer subagent → receives review verdict
91
+ - Each subagent sees ONLY its phase, not the full pipeline
92
+
93
+ ### 3. Superpowers-Style Persuasion on Every Skill
94
+
95
+ For each discipline rule:
96
+ - Iron Law statement
97
+ - Rationalization table (empirically derived)
98
+ - Red flags list
99
+ - "Violating the letter is violating the spirit"
100
+ - `<EXTREMELY_IMPORTANT>` wrapper on session injection
101
+
102
+ ### 4. User Engagement Templates
103
+
104
+ - Pre-flight interview via AskUserQuestion (batched, not serial)
105
+ - Three-tier progress reporting (status line / key decisions / full record)
106
+ - Artifacts as proof (self-describing, contain lineage and reasoning)
107
+ - Steerability at phase boundaries (edit upstream, regenerate downstream)
@@ -917,6 +917,121 @@ Plan: "10 tasks covering all 10 items. Suggested order: [...]"
917
917
 
918
918
  ---
919
919
 
920
+ ### AP-23: Stale Documentation Counts
921
+
922
+ **Also known as:** Count Drift, Number Rot, Metric Desync
923
+ **Frequency:** Very Common
924
+ **Severity:** Medium
925
+ **Detection difficulty:** Low (mechanical)
926
+
927
+ **What it looks like:**
928
+
929
+ Documentation claims "268 expertise modules" when the actual count is 315. README says "7 hooks" when 8 exist. Counts in multiple files diverge from each other and from reality. The numbers were correct when written but drifted as the project grew.
930
+
931
+ **Why AI agents do it:**
932
+
933
+ Agents update the source of truth (add a new hook, write new expertise modules) but do not grep for every downstream reference. Each file is edited in isolation. No automated check enforces that prose counts match filesystem reality.
934
+
935
+ **What goes wrong:**
936
+
937
+ Users see contradictory numbers across docs and lose trust. Reviewers waste time verifying which number is correct. Launch materials ship with wrong counts, creating a first impression of sloppiness.
938
+
939
+ **Detection signals:**
940
+
941
+ - `find expertise -name '*.md' | wc -l` disagrees with counts in README, architecture docs, and readmes
942
+ - `ls hooks/definitions/ | wc -l` disagrees with hook count claims
943
+ - Different files claim different counts for the same metric
944
+
945
+ **The fix:**
946
+
947
+ 1. **Self-audit loop** — run `wazir validate docs` which cross-references prose claims against filesystem counts
948
+ 2. **Single source of truth** — reference manifest counts programmatically where possible; avoid hardcoding counts in prose
949
+ 3. **Grep sweep on every addition** — when adding a new module, hook, or skill, grep for the old count and update all references
950
+ 4. **CI enforcement** — `wazir validate docs` in CI catches drift before merge
951
+
952
+ **Example:**
953
+
954
+ Bad:
955
+ ```
956
+ README.md: "268 expertise modules"
957
+ architecture.md: "268 curated knowledge modules"
958
+ expertise/README.md: "268 knowledge modules"
959
+ Actual count: 315
960
+ ```
961
+
962
+ Good:
963
+ ```
964
+ README.md: "315 expertise modules"
965
+ architecture.md: "315 curated knowledge modules"
966
+ expertise/README.md: "315 knowledge modules"
967
+ Actual count: 315
968
+ All references match.
969
+ ```
970
+
971
+ **Related:** AP-06 (Partial Updates — same root cause applied to code), `wazir validate docs`, self-audit skill
972
+
973
+ ---
974
+
975
+ ### AP-24: Silent Checkpoint Bypass
976
+
977
+ **Also known as:** Gate Ghosting, Approval Amnesia, Review Skipping
978
+ **Frequency:** Common
979
+ **Severity:** Critical
980
+ **Detection difficulty:** Moderate
981
+
982
+ **What it looks like:**
983
+
984
+ The agent reaches an approval gate (spec-challenge, plan-review, or final review) and proceeds without obtaining explicit reviewer approval. The gate exists in the workflow definition but the agent treats it as advisory, not blocking. Review artifacts are either missing or contain self-generated approvals.
985
+
986
+ **Why AI agents do it:**
987
+
988
+ The agent conflates "review" with "self-review." Without a hard external gate (different model, different session, or user confirmation), the agent reviews its own work and approves it. Optimism bias means self-review almost never rejects. The agent also optimizes for speed, and gates are the slowest part of the pipeline.
989
+
990
+ **What goes wrong:**
991
+
992
+ Spec errors propagate to implementation. Design flaws survive to production. The entire adversarial review structure becomes theater — gates exist on paper but provide no actual quality assurance. Bugs caught in final review could have been caught in spec-challenge at 10x lower cost.
993
+
994
+ **Detection signals:**
995
+
996
+ - Review pass files authored by the same agent that authored the reviewed artifact
997
+ - Approval granted on the first pass with zero findings
998
+ - Missing review artifacts in the run state directory
999
+ - `wazir capture loop-check` shows 0 review iterations for a gate phase
1000
+
1001
+ **The fix:**
1002
+
1003
+ 1. **External reviewer enforcement** — gate phases must invoke a different model or require user confirmation via `AskUserQuestion`
1004
+ 2. **Minimum findings threshold** — first-pass reviews that report zero findings trigger a warning; real adversarial review almost always finds something
1005
+ 3. **Artifact validation** — `wazir validate runtime` checks that review artifacts exist and were not authored by the same role as the reviewed artifact
1006
+ 4. **Loop cap guard** — `hooks/loop-cap-guard` tracks review iterations; zero iterations at a gate phase is a validation failure
1007
+
1008
+ **Example:**
1009
+
1010
+ Bad:
1011
+ ```
1012
+ # spec-challenge pass 1
1013
+ Reviewer: executor (same agent)
1014
+ Findings: 0
1015
+ Decision: APPROVED
1016
+ ```
1017
+
1018
+ Good:
1019
+ ```
1020
+ # spec-challenge pass 1
1021
+ Reviewer: codex-cli (external model)
1022
+ Findings: 3 (ambiguous acceptance criteria, missing edge case, unclear priority)
1023
+ Decision: REVISE
1024
+
1025
+ # spec-challenge pass 2
1026
+ Reviewer: codex-cli (external model)
1027
+ Findings: 0 (all 3 resolved)
1028
+ Decision: APPROVED
1029
+ ```
1030
+
1031
+ **Related:** AP-21 (Pipeline Phase Skipping — bypassing the gate entirely vs. rubber-stamping it), AP-08 (Test Theater — similar pattern of going through motions without rigor), `docs/reference/review-loop-pattern.md`, `hooks/loop-cap-guard`
1032
+
1033
+ ---
1034
+
920
1035
  ## Code Smell Quick Reference
921
1036
 
922
1037
  | Anti-Pattern | Severity | Frequency | Key Signal | First Action |
@@ -943,6 +1058,8 @@ Plan: "10 tasks covering all 10 items. Suggested order: [...]"
943
1058
  | AP-20 Resumption Errors | High | Common | Mixed ID types across files | Architecture file in every session |
944
1059
  | AP-21 Pipeline Phase Skipping | Critical | Common | Missing clarified/* artifacts | Enforce hard gates in skills + CLI |
945
1060
  | AP-22 Autonomous Scope Reduction | Critical | Common | Plan has fewer tasks than input items | Scope coverage guard + user approval |
1061
+ | AP-23 Stale Documentation Counts | Medium | Very Common | Doc counts disagree with filesystem | Grep sweep + `wazir validate docs` |
1062
+ | AP-24 Silent Checkpoint Bypass | Critical | Common | Self-approved gate with 0 findings | External reviewer + minimum findings |
946
1063
 
947
1064
  ---
948
1065
 
@@ -27,19 +27,38 @@ always:
27
27
  - antipatterns/code/state-management-antipatterns.md
28
28
  - quality/evidence-based-verification.md
29
29
  reviewer:
30
- - antipatterns/process/ai-coding-antipatterns.md
31
- - antipatterns/code/code-smells.md
32
- - antipatterns/process/code-review-antipatterns.md
33
- - antipatterns/code/dependency-antipatterns.md
34
- - architecture/foundations/architectural-thinking.md
35
- - architecture/foundations/coupling-and-cohesion.md
36
- - antipatterns/code/architecture-antipatterns.md
37
- - architecture/foundations/domain-driven-design.md
30
+ # Mode-agnostic core — loaded for ALL review modes (~6K tokens)
31
+ - digests/reviewer/review-methodology-digest.md
32
+ - digests/reviewer/ai-coding-digest.md
38
33
  content-author:
39
34
  - i18n/content/translation-management.md
40
35
  - i18n/foundations/string-externalization.md
41
36
  - i18n/foundations/pluralization-and-gender.md
42
37
 
38
+ # Mode-specific reviewer composition
39
+ # Loaded ON TOP of always.reviewer based on the --mode flag
40
+ # Total budget per mode: ~15-25K tokens (digests + auto + stack modules)
41
+ reviewer_modes:
42
+ task-review:
43
+ - digests/reviewer/code-smells-digest.md
44
+ - digests/reviewer/error-handling-digest.md
45
+ spec-challenge:
46
+ - digests/reviewer/architectural-thinking-digest.md
47
+ - digests/reviewer/ddd-digest.md
48
+ design-review:
49
+ - digests/reviewer/architectural-thinking-digest.md
50
+ - digests/reviewer/coupling-cohesion-digest.md
51
+ plan-review:
52
+ - digests/reviewer/architectural-thinking-digest.md
53
+ - digests/reviewer/coupling-cohesion-digest.md
54
+ - digests/reviewer/ai-coding-digest.md
55
+ final:
56
+ - digests/reviewer/code-smells-digest.md
57
+ - digests/reviewer/architecture-antipatterns-digest.md
58
+ - digests/reviewer/dependency-risk-digest.md
59
+ research-review: []
60
+ clarification-review: []
61
+
43
62
  auto:
44
63
  all-stacks:
45
64
  all-roles:
@@ -0,0 +1,83 @@
1
+ # AI Coding Antipatterns — Reviewer Digest
2
+
3
+ > Detection-focused extract for reviewer context. For full analysis, see `antipatterns/process/ai-coding-antipatterns.md`.
4
+
5
+ ## Specification Drift (AP-01)
6
+ - **Signal:** Implementation differs from stated requirements without documented reason
7
+ - **Check:** Compare task spec acceptance criteria against actual code behavior
8
+ - **Severity:** high
9
+
10
+ ## Hallucinated APIs (AP-02)
11
+ - **Signal:** Import or call to function/class/module that doesn't exist in the dependency tree
12
+ - **Check:** Verify every imported symbol resolves to an actual export
13
+ - **Severity:** critical
14
+
15
+ ## Outdated Patterns (AP-03)
16
+ - **Signal:** Using deprecated APIs, class components in React 2025, callback-based async when promises are standard
17
+ - **Check:** Compare patterns against current library version best practices
18
+ - **Severity:** high
19
+
20
+ ## Premature Abstraction (AP-04)
21
+ - **Signal:** Generic utility/helper that is used exactly once
22
+ - **Check:** Count call sites for each abstraction introduced
23
+ - **Severity:** medium
24
+
25
+ ## Context Window Stuffing (AP-05)
26
+ - **Signal:** Agent reads 10+ files without index queries; loads entire modules instead of targeted slices
27
+ - **Check:** Review tool call patterns — excessive Read calls without preceding search
28
+ - **Severity:** low (efficiency, not correctness)
29
+
30
+ ## Fake Testing (AP-06)
31
+ - **Signal:** Tests that assert implementation details, use mocks that mirror the implementation, or test tautologies
32
+ - **Check:** Would the test fail if the implementation had a real bug? If not, it's fake.
33
+ - **Severity:** high
34
+
35
+ ## Scope Creep (AP-07)
36
+ - **Signal:** Files modified or features added that were not in the task spec
37
+ - **Check:** Diff includes changes outside the task's specified file scope
38
+ - **Severity:** medium
39
+
40
+ ## Optimistic Error Handling (AP-08)
41
+ - **Signal:** Missing try/catch around I/O operations, network calls, file operations, JSON parsing
42
+ - **Check:** Every async operation and external call has error handling
43
+ - **Severity:** high
44
+
45
+ ## Stale Dependency (AP-09)
46
+ - **Signal:** Importing deprecated APIs, using outdated package versions with known CVEs
47
+ - **Check:** Package versions against known vulnerability databases
48
+ - **Severity:** medium-high
49
+
50
+ ## Cargo-Cult Patterns (AP-10)
51
+ - **Signal:** Design patterns applied without the problem they solve (Factory for single type, Observer for single listener)
52
+ - **Check:** Does the pattern's complexity serve a real need?
53
+ - **Severity:** medium
54
+
55
+ ## Gold Plating (AP-11)
56
+ - **Signal:** Extra configuration, extensibility points, or features not in the spec
57
+ - **Check:** Is every public API/config option traceable to a requirement?
58
+ - **Severity:** medium
59
+
60
+ ## Sycophantic Compliance (AP-12)
61
+ - **Signal:** Agent implements exactly what was asked even when the request contains contradictions or obvious errors
62
+ - **Check:** Look for requirements that conflict with each other or with the codebase's existing contracts
63
+ - **Severity:** high
64
+
65
+ ## Phantom Error Handling (AP-13)
66
+ - **Signal:** Error handling code that looks comprehensive but handles errors incorrectly (swallows, retries without backoff, logs without propagating)
67
+ - **Check:** Trace each error path — does it actually reach a handler that does the right thing?
68
+ - **Severity:** high
69
+
70
+ ## Inconsistent State After Failure (AP-14)
71
+ - **Signal:** Multi-step operations where a failure in step N leaves steps 1..N-1 committed
72
+ - **Check:** Are multi-step mutations wrapped in transactions or compensating actions?
73
+ - **Severity:** high
74
+
75
+ ## Over-Confident Comments (AP-15)
76
+ - **Signal:** Comments claiming "this handles all edge cases" or "this is thread-safe" without evidence
77
+ - **Check:** Does the code actually handle what the comment claims?
78
+ - **Severity:** medium
79
+
80
+ ## Training Data Leakage (AP-16)
81
+ - **Signal:** Code that closely mirrors common training examples but doesn't fit the actual use case
82
+ - **Check:** Does the implementation structure match the problem, or does it match a textbook example?
83
+ - **Severity:** medium
@@ -0,0 +1,63 @@
1
+ # Architectural Thinking — Reviewer Digest
2
+
3
+ > Evaluation-focused extract for reviewer context. For full guidance, see `architecture/foundations/architectural-thinking.md`.
4
+
5
+ ## Architecture Review Checklist
6
+
7
+ ### Separation of Concerns
8
+ - Does each module/file have a single clear responsibility?
9
+ - Are business logic, data access, and presentation in separate layers?
10
+ - Can you describe what a module does in one sentence without "and"?
11
+
12
+ ### Dependency Direction
13
+ - Do dependencies point inward (toward core domain), not outward?
14
+ - Are infrastructure details (DB, HTTP, filesystem) behind abstractions?
15
+ - Could you swap the database without changing business logic?
16
+
17
+ ### Interface Design
18
+ - Are public APIs minimal (expose only what is needed)?
19
+ - Are contracts (types, schemas, interfaces) explicit and documented?
20
+ - Do functions have clear input/output contracts without hidden side effects?
21
+
22
+ ### Change Impact
23
+ - Can you add a feature without modifying existing code (Open-Closed)?
24
+ - Are changes localized (changing one feature doesn't cascade across modules)?
25
+ - Is the dependency graph shallow (max 3-4 levels deep)?
26
+
27
+ ### Reversibility Assessment
28
+ - Which decisions in this diff are hard to reverse?
29
+ - Are irreversible decisions (data models, service boundaries, consistency models) justified with documented reasoning?
30
+ - Are reversible decisions (naming, folder structure, library choices) made quickly without over-analysis?
31
+
32
+ ### Trade-off Reasoning
33
+ Every architectural decision involves trade-offs. During review, check:
34
+ - Is the trade-off acknowledged? ("We chose X because Y, accepting Z")
35
+ - Is the trade-off appropriate for the context? (startup vs. enterprise, prototype vs. production)
36
+ - Are rejected alternatives documented?
37
+
38
+ ## Architecture Smells (Quick Detection)
39
+
40
+ | Smell | Signal | Severity |
41
+ |-------|--------|----------|
42
+ | **Big Ball of Mud** | No discernible module boundaries; any module calls any other | critical |
43
+ | **Layering Violation** | UI code calling database directly; domain importing from infrastructure | high |
44
+ | **Circular Module Dependency** | Module A depends on Module B depends on Module A | high |
45
+ | **God Module** | One module >1000 LOC handling multiple concerns | medium |
46
+ | **Leaky Abstraction** | Internal implementation details exposed in public interface | medium |
47
+ | **Distributed Monolith** | Multiple services that must be deployed together | high |
48
+ | **Accidental Complexity** | Architecture complexity not justified by problem complexity | medium |
49
+ | **Architecture Astronaut** | Abstractions solving problems no one has yet | medium |
50
+ | **Dead End Architecture** | Design choices that prevent future evolution (no extension points, hardcoded assumptions) | high |
51
+
52
+ ## Quality Attribute Checklist
53
+
54
+ When reviewing architectural decisions, verify the relevant quality attributes are addressed:
55
+
56
+ | Attribute | Review Question |
57
+ |-----------|----------------|
58
+ | **Performance** | Are there obvious bottlenecks? N+1 queries? Unbounded loops? |
59
+ | **Scalability** | Can this handle 10x load without structural changes? |
60
+ | **Security** | Are trust boundaries enforced? Input validated at boundaries? |
61
+ | **Availability** | What happens when a dependency fails? Is there a fallback? |
62
+ | **Modifiability** | How many files change to add a typical feature? |
63
+ | **Testability** | Can components be tested in isolation without complex setup? |
@@ -0,0 +1,49 @@
1
+ # Architecture Antipatterns — Reviewer Digest
2
+
3
+ > Detection-focused extract for reviewer context. For full analysis, see `antipatterns/code/architecture-antipatterns.md`.
4
+
5
+ ## Structural Antipatterns
6
+
7
+ | Antipattern | Detection Signal | Severity |
8
+ |-------------|-----------------|----------|
9
+ | **Big Ball of Mud** | No discernible module boundaries; any module calls any other; package diagram is fully connected | critical |
10
+ | **God Object / God Service** | Class/module with >10 public methods touching >3 concerns; single service handling unrelated domains | high |
11
+ | **Golden Hammer** | Same pattern/library used for every problem regardless of fit (everything is a microservice, everything uses Redux) | medium |
12
+ | **Architecture Astronaut** | Layers of abstraction solving problems no one has; meta-frameworks, plugin systems with zero plugins | medium |
13
+ | **Dead Code / Lava Flow** | Unreachable code paths, unused exports, commented-out blocks; code preserved "because it might be needed" | medium |
14
+ | **Copy-Paste Architecture** | Duplicated modules with minor variations instead of shared abstraction | high |
15
+ | **Boat Anchor** | Unused infrastructure "for future use" (empty interfaces, unused config, skeleton services) | medium |
16
+ | **Accidental Complexity** | System complexity far exceeds problem complexity; over-engineered for the actual requirements | medium |
17
+ | **Stovepipe System** | Modules built in isolation with no integration architecture; each uses different patterns, different data formats | high |
18
+ | **Swiss Army Knife** | One component tries to serve every use case; endlessly configurable but hard to use for any single purpose | medium |
19
+
20
+ ## Integration Antipatterns
21
+
22
+ | Antipattern | Detection Signal | Severity |
23
+ |-------------|-----------------|----------|
24
+ | **Distributed Monolith** | Multiple services that must be deployed together; shared database; lock-step releases | critical |
25
+ | **Chatty Interface** | >5 sequential API calls to complete one logical operation | medium |
26
+ | **Shared Database** | Multiple services reading/writing the same database tables directly | critical |
27
+ | **Circular Dependency** | Service A calls B calls C calls A (or module-level equivalent) | high |
28
+ | **Hardcoded Endpoints** | URLs, hostnames, or ports as string literals in source code | medium |
29
+ | **Missing Circuit Breaker** | External service calls without timeout or failure handling | high |
30
+ | **Sinkhole Anti-pattern** | Requests pass through multiple layers that add no value (pure pass-through) | medium |
31
+
32
+ ## Layering Antipatterns
33
+
34
+ | Antipattern | Detection Signal | Severity |
35
+ |-------------|-----------------|----------|
36
+ | **Upward Dependency** | Core/domain module imports from UI/API layer | critical |
37
+ | **Layer Bypass** | UI code calling database/repository directly, skipping service layer | high |
38
+ | **Anemic Domain** | Domain objects are pure data holders; all logic in services | medium |
39
+ | **Fat Controller** | Controller/handler contains business logic instead of delegating | high |
40
+ | **Inner Platform Effect** | Building a general-purpose engine inside the application that reimplements what the platform already provides | high |
41
+
42
+ ## Root Cause Patterns
43
+
44
+ Most architecture antipatterns share a few root causes:
45
+ - **Shipping pressure:** Shortcuts that accumulate into structural debt
46
+ - **Missing boundaries:** No enforced module boundaries in build tooling
47
+ - **Conway's Law misalignment:** Architecture doesn't match team structure
48
+ - **Premature optimization:** Distributed complexity without proven need
49
+ - **BDUF backlash:** Avoiding all upfront design, resulting in no design