@wazir-dev/cli 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (133) hide show
  1. package/CHANGELOG.md +17 -2
  2. package/docs/research/2026-03-20-agents/a18fb002157904af5.txt +187 -0
  3. package/docs/research/2026-03-20-agents/a1d0ac79ac2f11e6f.txt +2 -0
  4. package/docs/research/2026-03-20-agents/a324079de037abd7c.txt +198 -0
  5. package/docs/research/2026-03-20-agents/a357586bccfafb0e5.txt +256 -0
  6. package/docs/research/2026-03-20-agents/a4365394e4d753105.txt +137 -0
  7. package/docs/research/2026-03-20-agents/a492af28bc52d3613.txt +136 -0
  8. package/docs/research/2026-03-20-agents/a4984db0b6a8eee07.txt +124 -0
  9. package/docs/research/2026-03-20-agents/a5b30e59d34bbb062.txt +214 -0
  10. package/docs/research/2026-03-20-agents/a5cf7829dab911586.txt +165 -0
  11. package/docs/research/2026-03-20-agents/a607157c30dd97c9e.txt +96 -0
  12. package/docs/research/2026-03-20-agents/a60b68b1e19d1e16b.txt +115 -0
  13. package/docs/research/2026-03-20-agents/a722af01c5594aba0.txt +166 -0
  14. package/docs/research/2026-03-20-agents/a787bdc516faa5829.txt +181 -0
  15. package/docs/research/2026-03-20-agents/a7c46d1bba1056ed2.txt +132 -0
  16. package/docs/research/2026-03-20-agents/a7e5abbab2b281a0d.txt +100 -0
  17. package/docs/research/2026-03-20-agents/a8dbadc66cd0d7d5a.txt +95 -0
  18. package/docs/research/2026-03-20-agents/a904d9f45d6b86a6d.txt +75 -0
  19. package/docs/research/2026-03-20-agents/a927659a942ee7f60.txt +102 -0
  20. package/docs/research/2026-03-20-agents/a962cb569191f7583.txt +125 -0
  21. package/docs/research/2026-03-20-agents/aab6decea538aac41.txt +148 -0
  22. package/docs/research/2026-03-20-agents/abd58b853dd938a1b.txt +295 -0
  23. package/docs/research/2026-03-20-agents/ac009da573eff7f65.txt +100 -0
  24. package/docs/research/2026-03-20-agents/ac1bc783364405e5f.txt +190 -0
  25. package/docs/research/2026-03-20-agents/aca5e2b57fde152a0.txt +132 -0
  26. package/docs/research/2026-03-20-agents/ad849b8c0a7e95b8b.txt +176 -0
  27. package/docs/research/2026-03-20-agents/adc2b12a4da32c962.txt +258 -0
  28. package/docs/research/2026-03-20-agents/af97caaaa9a80e4cb.txt +146 -0
  29. package/docs/research/2026-03-20-agents/afc5faceee368b3ca.txt +111 -0
  30. package/docs/research/2026-03-20-agents/afdb282d866e3c1e4.txt +164 -0
  31. package/docs/research/2026-03-20-agents/afe9d1f61c02b1e8d.txt +299 -0
  32. package/docs/research/2026-03-20-agents/b4hmkwril.txt +1856 -0
  33. package/docs/research/2026-03-20-agents/b80ptk89g.txt +1856 -0
  34. package/docs/research/2026-03-20-agents/bf54s1jss.txt +1150 -0
  35. package/docs/research/2026-03-20-agents/bhd6kq2kx.txt +1856 -0
  36. package/docs/research/2026-03-20-agents/bmb2fodyr.txt +988 -0
  37. package/docs/research/2026-03-20-agents/bmmsrij8i.txt +826 -0
  38. package/docs/research/2026-03-20-agents/bn4t2ywpu.txt +2175 -0
  39. package/docs/research/2026-03-20-agents/bu22t9f1z.txt +0 -0
  40. package/docs/research/2026-03-20-agents/bwvl98v2p.txt +738 -0
  41. package/docs/research/2026-03-20-agents/psych-a3697a7fd06eb64fd.txt +135 -0
  42. package/docs/research/2026-03-20-agents/psych-a37776fabc870feae.txt +123 -0
  43. package/docs/research/2026-03-20-agents/psych-a5b1fe05c0589efaf.txt +2 -0
  44. package/docs/research/2026-03-20-agents/psych-a95c15b1f29424435.txt +76 -0
  45. package/docs/research/2026-03-20-agents/psych-a9c26f4d9172dde7c.txt +2 -0
  46. package/docs/research/2026-03-20-agents/psych-aa19c69f0ca2c5ad3.txt +2 -0
  47. package/docs/research/2026-03-20-agents/psych-aa4e4cb70e1be5ecb.txt +95 -0
  48. package/docs/research/2026-03-20-agents/psych-ab5b302f26a554663.txt +102 -0
  49. package/docs/research/2026-03-20-deep-research-complete.md +101 -0
  50. package/docs/research/2026-03-20-deep-research-status.md +38 -0
  51. package/docs/research/2026-03-20-enforcement-research.md +107 -0
  52. package/expertise/composition-map.yaml +27 -8
  53. package/expertise/digests/reviewer/ai-coding-digest.md +83 -0
  54. package/expertise/digests/reviewer/architectural-thinking-digest.md +63 -0
  55. package/expertise/digests/reviewer/architecture-antipatterns-digest.md +49 -0
  56. package/expertise/digests/reviewer/code-smells-digest.md +53 -0
  57. package/expertise/digests/reviewer/coupling-cohesion-digest.md +54 -0
  58. package/expertise/digests/reviewer/ddd-digest.md +60 -0
  59. package/expertise/digests/reviewer/dependency-risk-digest.md +40 -0
  60. package/expertise/digests/reviewer/error-handling-digest.md +55 -0
  61. package/expertise/digests/reviewer/review-methodology-digest.md +49 -0
  62. package/exports/hosts/claude/.claude/commands/learn.md +61 -8
  63. package/exports/hosts/claude/.claude/settings.json +7 -6
  64. package/exports/hosts/claude/export.manifest.json +6 -3
  65. package/exports/hosts/claude/host-package.json +3 -0
  66. package/exports/hosts/codex/export.manifest.json +6 -3
  67. package/exports/hosts/codex/host-package.json +3 -0
  68. package/exports/hosts/cursor/.cursor/hooks.json +6 -6
  69. package/exports/hosts/cursor/export.manifest.json +6 -3
  70. package/exports/hosts/cursor/host-package.json +3 -0
  71. package/exports/hosts/gemini/export.manifest.json +6 -3
  72. package/exports/hosts/gemini/host-package.json +3 -0
  73. package/hooks/definitions/pretooluse_dispatcher.yaml +26 -0
  74. package/hooks/definitions/pretooluse_pipeline_guard.yaml +22 -0
  75. package/hooks/definitions/stop_pipeline_gate.yaml +22 -0
  76. package/hooks/hooks.json +7 -6
  77. package/hooks/pretooluse-dispatcher +84 -0
  78. package/hooks/pretooluse-pipeline-guard +9 -0
  79. package/hooks/stop-pipeline-gate +9 -0
  80. package/package.json +2 -2
  81. package/schemas/decision.schema.json +15 -0
  82. package/schemas/hook.schema.json +4 -1
  83. package/skills/TEMPLATE-3-ZONE.md +160 -0
  84. package/skills/brainstorming/SKILL.md +127 -23
  85. package/skills/clarifier/SKILL.md +175 -18
  86. package/skills/claude-cli/SKILL.md +91 -12
  87. package/skills/codex-cli/SKILL.md +91 -12
  88. package/skills/debugging/SKILL.md +133 -38
  89. package/skills/design/SKILL.md +173 -37
  90. package/skills/dispatching-parallel-agents/SKILL.md +129 -31
  91. package/skills/executing-plans/SKILL.md +113 -25
  92. package/skills/executor/SKILL.md +185 -21
  93. package/skills/finishing-a-development-branch/SKILL.md +107 -18
  94. package/skills/gemini-cli/SKILL.md +91 -12
  95. package/skills/humanize/SKILL.md +92 -13
  96. package/skills/init-pipeline/SKILL.md +90 -17
  97. package/skills/prepare-next/SKILL.md +93 -24
  98. package/skills/receiving-code-review/SKILL.md +90 -16
  99. package/skills/requesting-code-review/SKILL.md +100 -24
  100. package/skills/requesting-code-review/code-reviewer.md +29 -17
  101. package/skills/reviewer/SKILL.md +190 -50
  102. package/skills/run-audit/SKILL.md +92 -15
  103. package/skills/scan-project/SKILL.md +93 -14
  104. package/skills/self-audit/SKILL.md +113 -39
  105. package/skills/skill-research/SKILL.md +94 -7
  106. package/skills/subagent-driven-development/SKILL.md +129 -30
  107. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +30 -2
  108. package/skills/subagent-driven-development/implementer-prompt.md +40 -27
  109. package/skills/subagent-driven-development/spec-reviewer-prompt.md +25 -12
  110. package/skills/tdd/SKILL.md +125 -20
  111. package/skills/using-git-worktrees/SKILL.md +118 -28
  112. package/skills/using-skills/SKILL.md +116 -29
  113. package/skills/verification/SKILL.md +127 -22
  114. package/skills/wazir/SKILL.md +517 -153
  115. package/skills/writing-plans/SKILL.md +134 -28
  116. package/skills/writing-skills/SKILL.md +91 -13
  117. package/skills/writing-skills/anthropic-best-practices.md +104 -64
  118. package/skills/writing-skills/persuasion-principles.md +100 -34
  119. package/tooling/src/capture/command.js +29 -1
  120. package/tooling/src/capture/decision.js +40 -0
  121. package/tooling/src/capture/store.js +1 -0
  122. package/tooling/src/config/depth-table.js +60 -0
  123. package/tooling/src/export/compiler.js +7 -8
  124. package/tooling/src/guards/guardrail-functions.js +131 -0
  125. package/tooling/src/guards/phase-prerequisite-guard.js +39 -3
  126. package/tooling/src/hooks/pretooluse-dispatcher.js +300 -0
  127. package/tooling/src/hooks/pretooluse-pipeline-guard.js +141 -0
  128. package/tooling/src/hooks/stop-pipeline-gate.js +92 -0
  129. package/tooling/src/learn/pipeline.js +177 -0
  130. package/tooling/src/state/db.js +251 -2
  131. package/tooling/src/state/pipeline-state.js +262 -0
  132. package/wazir.manifest.yaml +3 -0
  133. package/workflows/learn.md +61 -8
@@ -0,0 +1,101 @@
1
+ # Deep Research Complete — 2026-03-20
2
+
3
+ ## 25 Research Agents — ALL COMPLETED
4
+
5
+ Total output: ~6.8MB across 25 agents. Full transcripts at:
6
+ `/private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/`
7
+
8
+ ## Executive Summary
9
+
10
+ ### The Architecture for Wazir v2
11
+
12
+ **Three-layer enforcement pyramid:**
13
+ 1. **Hooks** (mechanical, can't bypass): Stop blocks completion, PreToolUse blocks writes/commits/pushes
14
+ 2. **Subagent isolation** (architectural, can't see full pipeline): one agent per phase, controller holds the loop
15
+ 3. **Persuasion engineering** (behavioral, won't bypass): superpowers-style rationalization tables, red flags, authority language
16
+
17
+ ### Key Findings
18
+
19
+ **Hooks:**
20
+ - Stop hook CAN block completion (`{"decision": "block"}`) — proven by ralph-loop (492+ iterations)
21
+ - PreToolUse has 7 decision patterns: silent allow, advisory, systemMessage, modify, JSON deny, exit-code deny, echo-trick redirect
22
+ - State tracking via `pipeline-state.json` — hooks read, CLI writes, atomic temp+rename
23
+ - Critical limitations: hooks can block but not compel; "hook error" labels poison the model; SubagentStop is broken; agent can escape via AskUserQuestion
24
+ - Must check `stop_hook_active` to prevent infinite loops; must allow context-limit and user-abort stops
25
+
26
+ **Subagent Architecture:**
27
+ - Controller-as-orchestrator: wz:wazir holds the loop, dispatches one subagent per phase
28
+ - Each subagent gets fresh 200K context (~165K usable after overhead)
29
+ - No nesting (depth=1) — controller dispatches ALL subagents directly
30
+ - File-mediated handoff (MetaGPT pattern): artifacts on disk, not in context
31
+ - Artifact dependency: each artifact has `requires` block with predecessor digest for staleness detection
32
+ - Guardrail functions per phase boundary with concrete pass/fail criteria
33
+ - Retry ladder: same-model×2 → model-escalation×1 → human escalation
34
+ - Error classification: transient (retry), quality (retry+feedback), deterministic (escalate), resource (model-escalate)
35
+
36
+ **Persuasion Engineering:**
37
+ - Superpowers is 100% prompt engineering, zero mechanical enforcement — and agents STILL skip (issue #463)
38
+ - Meincke et al. 2025: persuasion doubles compliance (33%→72%, N=28,000, p<.001)
39
+ - Best combination: Authority + Commitment + Scarcity
40
+ - CSO critical: skill descriptions must be triggers only, never process summaries
41
+ - 47 rationalization entries across 5 superpowers skills — Wazir has ZERO
42
+ - "Violating the letter is violating the spirit" — single most impactful sentence
43
+ - TDD for skills: RED (observe baseline failures) → GREEN (write skill addressing those) → REFACTOR (close new loopholes)
44
+
45
+ **Learning System:**
46
+ - 4-stage pipeline: Tally → Candidate → Promote → Active
47
+ - Findings classified by 8 categories × 4 severity levels
48
+ - Recurrence detection via finding_hash dedup (PagerDuty pattern)
49
+ - Semi-automatic promotion: auto-propose, human-approve (CodeGuru + Snyk model)
50
+ - Drift prevention: 30 active project learnings max, 90-day TTL, 5% hit-rate demotion, principle consolidation at 25+ entries
51
+ - Decision audit trail: v2 schema with category, alternatives, confidence, outcome_ref, supersedes
52
+ - User feedback: capture corrections/approvals in ndjson, classify signal vs noise
53
+
54
+ **Review Architecture:**
55
+ - Two-tier: internal (Sonnet, expertise-loaded, pattern-matching) → external (Codex, fresh eyes, unknown-unknowns)
56
+ - Critical finding: reviewer always-layer is 99K tokens against 50K ceiling — 5 of 8 modules are dropped
57
+ - Fix: mode-specific reviewer composition (different modules per review mode)
58
+ - Reviewer digest modules: 3-5K tokens each (not 12K originals)
59
+ - Findings classified by 8 categories (correctness, security, completeness, wiring, verification, drift, performance, style)
60
+ - Auto-classification rules per category with severity floors
61
+ - Feedback-to-learning: 7-step loop, LLM-assisted clustering for pattern detection
62
+
63
+ **Interactive UX:**
64
+ - AskUserQuestion: 1-4 questions, 2-4 options each, arrow-key selection, multiSelect supported
65
+ - Bug: DO NOT list in skill's allowed-tools (causes empty answers)
66
+ - Progressive disclosure: status line (what) → paragraph (why) → full report (everything)
67
+ - Key formula: "Name the action. State the dependency. Omit the journey."
68
+ - 5 progress patterns: phase map, meaningful updates, artifact previews, time estimates, heartbeat
69
+ - Heartbeat: never >2min silence (standard), >90s (deep), >3min (quick)
70
+ - Steerability: classify mutation level → show impact → selective regeneration → preserve completed work
71
+ - Three modes: auto (gating agent steers) / guided (checkpoints steer) / interactive (continuous steer)
72
+
73
+ ## Agent Output Index
74
+
75
+ | # | Agent | Key Deliverable |
76
+ |---|-------|----------------|
77
+ | 1 | Stop hook patterns | Complete blueprint for pipeline-gate Stop hook with 10 edge cases |
78
+ | 2 | PreToolUse catalog | 7 decision patterns with code examples from 4 real plugins |
79
+ | 3 | State machine design | pipeline-state.json schema with 30+ fields, update rules, session isolation |
80
+ | 4 | Hook limitations | 13 limitations with workarounds, including "hook error" label poisoning |
81
+ | 5 | Persuasion playbook | 10 patterns, 47 rationalization entries, CSO rules, implementation checklist |
82
+ | 6 | Controller pattern | Hybrid architecture: flat orchestration with file-mediated handoff |
83
+ | 7 | Artifact dependencies | Per-phase schemas with requires/digest, write-time validation |
84
+ | 8 | Context isolation | 200K per subagent, no nesting, MCP tool caveats, MetaGPT pub-sub |
85
+ | 9 | Guardrail validation | 6 guardrail functions with concrete pass/fail criteria per phase |
86
+ | 10 | Failure + retry | 3-tier ladder (same-model→escalate→human), error classification |
87
+ | 11 | AskUserQuestion API | Full schema, 2-4 options, multiSelect, known bugs, plugin examples |
88
+ | 12 | Showing reasoning | Progressive disclosure templates at 3 levels with anti-patterns |
89
+ | 13 | Depth parameters | (in bladnman analysis) 4 depth levels with per-parameter tables |
90
+ | 14 | Steerability | Mutation classification, impact assessment, selective regeneration |
91
+ | 15 | Progress reporting | 5 patterns (phase map, finding updates, previews, time estimates, heartbeat) |
92
+ | 16 | Findings → antipatterns | 4-stage promotion pipeline, 3+ occurrence threshold, human gate |
93
+ | 17 | Cumulative tracking | SQLite schema (5 tables), dedup algorithm, recurrence detection |
94
+ | 18 | Drift prevention | 7 mechanisms with concrete limits (30 active, 90-day TTL, 5% demotion) |
95
+ | 19 | Decision audit trail | v2 schema with alternatives, confidence, outcome correlation |
96
+ | 20 | User feedback capture | Signal classification, correction weighting, ndjson format |
97
+ | 21 | Two-tier review | Internal→external, critical asymmetry (known vs unknown unknowns) |
98
+ | 22 | Reviewer composition | Mode-specific modules, 3-5K digests, 50K budget analysis |
99
+ | 23 | Findings classification | 8 categories × 4 severities, auto-classification rules |
100
+ | 24 | Feedback-to-learning | 7-step loop, LLM clustering, minimum viable phases A-D |
101
+ | 25 | Proof-of-implementation | Per-type matrix (web/API/CLI/library), Playwright MCP, Symphony model |
@@ -0,0 +1,38 @@
1
+ # Deep Research Status — 2026-03-20
2
+
3
+ ## 25 Research Agents — Progress
4
+
5
+ ### Completed (14/25)
6
+ 1. Hook: Stop hook patterns (ralph-loop analysis) ✅
7
+ 2. Hook: PreToolUse catalog (7 decision patterns) ✅
8
+ 3. Hook: State machine design (pipeline-state.json) ✅
9
+ 4. Subagent: Artifact dependencies (per-phase schemas) ✅
10
+ 5. Subagent: Guardrail validation (per-phase functions) ✅
11
+ 6. Subagent: Failure + retry (3-tier ladder) ✅
12
+ 7. Interactive: Showing reasoning (progressive disclosure) ✅
13
+ 8. Learning: Findings → antipatterns (4-stage pipeline) ✅
14
+ 9. Learning: Cumulative tracking (SQLite schema) ✅
15
+ 10. Learning: Drift prevention (7 mechanisms) ✅
16
+ 11. Learning: User feedback capture ✅
17
+ 12. Review: Feedback-to-learning pipeline (7-step loop) ✅
18
+ 13. Review: Proof-of-implementation (per-type matrix) ✅
19
+ 14. Hook: Persuasion engineering (superpowers analysis) — in first batch ✅
20
+
21
+ ### Pending (11/25)
22
+ 15. Hook: Limitations + workarounds
23
+ 16. Subagent: Controller pattern
24
+ 17. Subagent: Context isolation
25
+ 18. Interactive: AskUserQuestion API
26
+ 19. Interactive: Depth parameters
27
+ 20. Interactive: Steerability
28
+ 21. Interactive: Progress reporting
29
+ 22. Review: Two-tier architecture
30
+ 23. Review: Reviewer composition
31
+ 24. Review: Findings classification
32
+ 25. Learning: Decision audit trail
33
+
34
+ ## Key Findings So Far
35
+
36
+ All research output files at: /private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/
37
+
38
+ Full synthesis will be compiled when all 25 agents complete.
@@ -0,0 +1,107 @@
1
+ # Enforcement Research — 2026-03-20
2
+
3
+ ## The Answer
4
+
5
+ **Prose instructions don't work. The agent will always rationalize skipping them.** Every framework that achieves reliable enforcement uses the same pattern: **the framework holds the loop, not the agent.**
6
+
7
+ ## The Three-Layer Strategy
8
+
9
+ ### Layer 1: Mechanical Hooks (agent CANNOT bypass)
10
+
11
+ **Stop hook** blocks completion: `{"decision": "block", "reason": "..."}` — proven by ralph-loop plugin (official marketplace). The agent literally cannot stop until all artifacts exist.
12
+
13
+ **PreToolUse hooks** block actions:
14
+ - `PreToolUse:Write|Edit` — blocks implementation code if no plan artifact exists
15
+ - `PreToolUse:Bash` — blocks `git commit` if no tests run, blocks `git push` if no review
16
+ - Returns `permissionDecision: "deny"` — the tool call is prevented entirely
17
+
18
+ **State tracking** via `pipeline-state.json` — hooks READ state, CLI WRITES state. No race conditions.
19
+
20
+ **Key: command hooks only, never prompt hooks.** Prompt hooks re-introduce the rationalization problem.
21
+
22
+ ### Layer 2: Subagent Isolation (agent CANNOT see full pipeline)
23
+
24
+ From every framework (CrewAI, LangGraph, Symphony, ideation_team_skill): **give the agent a task, not a plan.**
25
+
26
+ - Each phase is a separate subagent invocation
27
+ - Phase N+1 receives phase N's artifact as input — if it doesn't exist, the call fails
28
+ - The controller (wazir skill) holds the loop and decides what runs next
29
+ - No single agent can rationalize skipping from research to code
30
+
31
+ ### Layer 3: Persuasion Engineering (agent WON'T bypass — 72% compliance)
32
+
33
+ From superpowers (100K stars, backed by Meincke et al. 2025, N=28,000):
34
+
35
+ - **Rationalization tables** — enumerate exact thoughts the agent has when skipping, with rebuttals
36
+ - **"Violating the letter is violating the spirit"** — kills the #1 escape pattern
37
+ - **Red flags lists** — specific phrases that mean STOP
38
+ - **Authority + Commitment + Social Proof** — doubles compliance (33% → 72%)
39
+ - **CSO (Claude Search Optimization)** — skill descriptions must be triggers, never process summaries
40
+
41
+ ## Key Findings Per Source
42
+
43
+ ### Claude Code Hooks
44
+ - Stop hook CAN block (`{"decision": "block"}`) — proven by ralph-loop
45
+ - PreToolUse CAN deny AND modify tool calls — proven by context-mode plugin
46
+ - Hooks are stateless but can read/write files for state
47
+ - Hooks loaded at session start, can't be added mid-session
48
+ - **Limitation: hooks block actions but can't compel them**
49
+
50
+ ### Superpowers (100K stars)
51
+ - 100% prompt engineering, zero mechanical enforcement
52
+ - Single SessionStart hook injects meta-skill in `<EXTREMELY_IMPORTANT>` tags
53
+ - **Issue #463: agents STILL skip reviews** — the author knows it's unsolved
54
+ - Commenter: "The only reliable fix is making reviews structural, not instructional"
55
+ - TDD skill is best-in-class prompt engineering but still fails sometimes
56
+ - Persuasion research: authority language doubles compliance but doesn't reach 100%
57
+
58
+ ### Framework Enforcement Patterns
59
+ - **CrewAI:** Python for-loop + guardrail functions. Agent produces output, framework validates.
60
+ - **LangGraph:** Channel triggers + NamedBarrierValue. Node can't fire until inputs ready.
61
+ - **Temporal:** `await` keyword is the enforcement. Language-level blocking.
62
+ - **Symphony:** State machine + data dependencies. Each phase produces data the next requires.
63
+ - **GitHub Actions:** `needs:` DAG. Scheduler prevents jobs from starting without dependencies.
64
+ - **Universal pattern:** framework holds program counter, not agent.
65
+
66
+ ### UX / User Engagement
67
+ - **bladnman/ideation_team_skill:** AskUserQuestion for pre-flight interview, depth-aware parameters, cognitive role separation across agents
68
+ - **Devin:** PR-as-proof, screen recordings, conversational Slack updates, async delegation
69
+ - **Copilot Workspace:** Spec → Plan → Code, each editable. Steerability = trust.
70
+ - **Anthropic:** Show planning steps explicitly, programmatic checks at intermediate steps
71
+
72
+ ## What Wazir Must Build
73
+
74
+ ### 1. Pipeline State Machine (hooks + state file)
75
+
76
+ ```
77
+ SessionStart → initialize pipeline-state.json
78
+ PreToolUse:Write|Edit → deny if phase gate not passed
79
+ PreToolUse:Bash → deny git commit/push without tests/review
80
+ Stop → deny if any enabled workflow incomplete or proof missing
81
+ ```
82
+
83
+ ### 2. Subagent-Per-Phase Architecture
84
+
85
+ The `/wazir` skill becomes a CONTROLLER that:
86
+ - Spawns a clarifier subagent → receives clarification artifact
87
+ - Spawns a spec subagent → receives spec artifact
88
+ - Spawns a design subagent → receives design artifact
89
+ - Spawns an executor subagent → receives implementation
90
+ - Spawns a reviewer subagent → receives review verdict
91
+ - Each subagent sees ONLY its phase, not the full pipeline
92
+
93
+ ### 3. Superpowers-Style Persuasion on Every Skill
94
+
95
+ For each discipline rule:
96
+ - Iron Law statement
97
+ - Rationalization table (empirically derived)
98
+ - Red flags list
99
+ - "Violating the letter is violating the spirit"
100
+ - `<EXTREMELY_IMPORTANT>` wrapper on session injection
101
+
102
+ ### 4. User Engagement Templates
103
+
104
+ - Pre-flight interview via AskUserQuestion (batched, not serial)
105
+ - Three-tier progress reporting (status line / key decisions / full record)
106
+ - Artifacts as proof (self-describing, contain lineage and reasoning)
107
+ - Steerability at phase boundaries (edit upstream, regenerate downstream)
@@ -27,19 +27,38 @@ always:
27
27
  - antipatterns/code/state-management-antipatterns.md
28
28
  - quality/evidence-based-verification.md
29
29
  reviewer:
30
- - antipatterns/process/ai-coding-antipatterns.md
31
- - antipatterns/code/code-smells.md
32
- - antipatterns/process/code-review-antipatterns.md
33
- - antipatterns/code/dependency-antipatterns.md
34
- - architecture/foundations/architectural-thinking.md
35
- - architecture/foundations/coupling-and-cohesion.md
36
- - antipatterns/code/architecture-antipatterns.md
37
- - architecture/foundations/domain-driven-design.md
30
+ # Mode-agnostic core — loaded for ALL review modes (~6K tokens)
31
+ - digests/reviewer/review-methodology-digest.md
32
+ - digests/reviewer/ai-coding-digest.md
38
33
  content-author:
39
34
  - i18n/content/translation-management.md
40
35
  - i18n/foundations/string-externalization.md
41
36
  - i18n/foundations/pluralization-and-gender.md
42
37
 
38
+ # Mode-specific reviewer composition
39
+ # Loaded ON TOP of always.reviewer based on the --mode flag
40
+ # Total budget per mode: ~15-25K tokens (digests + auto + stack modules)
41
+ reviewer_modes:
42
+ task-review:
43
+ - digests/reviewer/code-smells-digest.md
44
+ - digests/reviewer/error-handling-digest.md
45
+ spec-challenge:
46
+ - digests/reviewer/architectural-thinking-digest.md
47
+ - digests/reviewer/ddd-digest.md
48
+ design-review:
49
+ - digests/reviewer/architectural-thinking-digest.md
50
+ - digests/reviewer/coupling-cohesion-digest.md
51
+ plan-review:
52
+ - digests/reviewer/architectural-thinking-digest.md
53
+ - digests/reviewer/coupling-cohesion-digest.md
54
+ - digests/reviewer/ai-coding-digest.md
55
+ final:
56
+ - digests/reviewer/code-smells-digest.md
57
+ - digests/reviewer/architecture-antipatterns-digest.md
58
+ - digests/reviewer/dependency-risk-digest.md
59
+ research-review: []
60
+ clarification-review: []
61
+
43
62
  auto:
44
63
  all-stacks:
45
64
  all-roles:
@@ -0,0 +1,83 @@
1
+ # AI Coding Antipatterns — Reviewer Digest
2
+
3
+ > Detection-focused extract for reviewer context. For full analysis, see `antipatterns/process/ai-coding-antipatterns.md`.
4
+
5
+ ## Specification Drift (AP-01)
6
+ - **Signal:** Implementation differs from stated requirements without documented reason
7
+ - **Check:** Compare task spec acceptance criteria against actual code behavior
8
+ - **Severity:** high
9
+
10
+ ## Hallucinated APIs (AP-02)
11
+ - **Signal:** Import or call to function/class/module that doesn't exist in the dependency tree
12
+ - **Check:** Verify every imported symbol resolves to an actual export
13
+ - **Severity:** critical
14
+
15
+ ## Outdated Patterns (AP-03)
16
+ - **Signal:** Using deprecated APIs, class components in React 2025, callback-based async when promises are standard
17
+ - **Check:** Compare patterns against current library version best practices
18
+ - **Severity:** high
19
+
20
+ ## Premature Abstraction (AP-04)
21
+ - **Signal:** Generic utility/helper that is used exactly once
22
+ - **Check:** Count call sites for each abstraction introduced
23
+ - **Severity:** medium
24
+
25
+ ## Context Window Stuffing (AP-05)
26
+ - **Signal:** Agent reads 10+ files without index queries; loads entire modules instead of targeted slices
27
+ - **Check:** Review tool call patterns — excessive Read calls without preceding search
28
+ - **Severity:** low (efficiency, not correctness)
29
+
30
+ ## Fake Testing (AP-06)
31
+ - **Signal:** Tests that assert implementation details, use mocks that mirror the implementation, or test tautologies
32
+ - **Check:** Would the test fail if the implementation had a real bug? If not, it's fake.
33
+ - **Severity:** high
34
+
35
+ ## Scope Creep (AP-07)
36
+ - **Signal:** Files modified or features added that were not in the task spec
37
+ - **Check:** Diff includes changes outside the task's specified file scope
38
+ - **Severity:** medium
39
+
40
+ ## Optimistic Error Handling (AP-08)
41
+ - **Signal:** Missing try/catch around I/O operations, network calls, file operations, JSON parsing
42
+ - **Check:** Every async operation and external call has error handling
43
+ - **Severity:** high
44
+
45
+ ## Stale Dependency (AP-09)
46
+ - **Signal:** Importing deprecated APIs, using outdated package versions with known CVEs
47
+ - **Check:** Package versions against known vulnerability databases
48
+ - **Severity:** medium-high
49
+
50
+ ## Cargo-Cult Patterns (AP-10)
51
+ - **Signal:** Design patterns applied without the problem they solve (Factory for single type, Observer for single listener)
52
+ - **Check:** Does the pattern's complexity serve a real need?
53
+ - **Severity:** medium
54
+
55
+ ## Gold Plating (AP-11)
56
+ - **Signal:** Extra configuration, extensibility points, or features not in the spec
57
+ - **Check:** Is every public API/config option traceable to a requirement?
58
+ - **Severity:** medium
59
+
60
+ ## Sycophantic Compliance (AP-12)
61
+ - **Signal:** Agent implements exactly what was asked even when the request contains contradictions or obvious errors
62
+ - **Check:** Look for requirements that conflict with each other or with the codebase's existing contracts
63
+ - **Severity:** high
64
+
65
+ ## Phantom Error Handling (AP-13)
66
+ - **Signal:** Error handling code that looks comprehensive but handles errors incorrectly (swallows, retries without backoff, logs without propagating)
67
+ - **Check:** Trace each error path — does it actually reach a handler that does the right thing?
68
+ - **Severity:** high
69
+
70
+ ## Inconsistent State After Failure (AP-14)
71
+ - **Signal:** Multi-step operations where a failure in step N leaves steps 1..N-1 committed
72
+ - **Check:** Are multi-step mutations wrapped in transactions or compensating actions?
73
+ - **Severity:** high
74
+
75
+ ## Over-Confident Comments (AP-15)
76
+ - **Signal:** Comments claiming "this handles all edge cases" or "this is thread-safe" without evidence
77
+ - **Check:** Does the code actually handle what the comment claims?
78
+ - **Severity:** medium
79
+
80
+ ## Training Data Leakage (AP-16)
81
+ - **Signal:** Code that closely mirrors common training examples but doesn't fit the actual use case
82
+ - **Check:** Does the implementation structure match the problem, or does it match a textbook example?
83
+ - **Severity:** medium
@@ -0,0 +1,63 @@
1
+ # Architectural Thinking — Reviewer Digest
2
+
3
+ > Evaluation-focused extract for reviewer context. For full guidance, see `architecture/foundations/architectural-thinking.md`.
4
+
5
+ ## Architecture Review Checklist
6
+
7
+ ### Separation of Concerns
8
+ - Does each module/file have a single clear responsibility?
9
+ - Are business logic, data access, and presentation in separate layers?
10
+ - Can you describe what a module does in one sentence without "and"?
11
+
12
+ ### Dependency Direction
13
+ - Do dependencies point inward (toward core domain), not outward?
14
+ - Are infrastructure details (DB, HTTP, filesystem) behind abstractions?
15
+ - Could you swap the database without changing business logic?
16
+
17
+ ### Interface Design
18
+ - Are public APIs minimal (expose only what is needed)?
19
+ - Are contracts (types, schemas, interfaces) explicit and documented?
20
+ - Do functions have clear input/output contracts without hidden side effects?
21
+
22
+ ### Change Impact
23
+ - Can you add a feature without modifying existing code (Open-Closed)?
24
+ - Are changes localized (changing one feature doesn't cascade across modules)?
25
+ - Is the dependency graph shallow (max 3-4 levels deep)?
26
+
27
+ ### Reversibility Assessment
28
+ - Which decisions in this diff are hard to reverse?
29
+ - Are irreversible decisions (data models, service boundaries, consistency models) justified with documented reasoning?
30
+ - Are reversible decisions (naming, folder structure, library choices) made quickly without over-analysis?
31
+
32
+ ### Trade-off Reasoning
33
+ Every architectural decision involves trade-offs. During review, check:
34
+ - Is the trade-off acknowledged? ("We chose X because Y, accepting Z")
35
+ - Is the trade-off appropriate for the context? (startup vs. enterprise, prototype vs. production)
36
+ - Are rejected alternatives documented?
37
+
38
+ ## Architecture Smells (Quick Detection)
39
+
40
+ | Smell | Signal | Severity |
41
+ |-------|--------|----------|
42
+ | **Big Ball of Mud** | No discernible module boundaries; any module calls any other | critical |
43
+ | **Layering Violation** | UI code calling database directly; domain importing from infrastructure | high |
44
+ | **Circular Module Dependency** | Module A depends on Module B depends on Module A | high |
45
+ | **God Module** | One module >1000 LOC handling multiple concerns | medium |
46
+ | **Leaky Abstraction** | Internal implementation details exposed in public interface | medium |
47
+ | **Distributed Monolith** | Multiple services that must be deployed together | high |
48
+ | **Accidental Complexity** | Architecture complexity not justified by problem complexity | medium |
49
+ | **Architecture Astronaut** | Abstractions solving problems no one has yet | medium |
50
+ | **Dead End Architecture** | Design choices that prevent future evolution (no extension points, hardcoded assumptions) | high |
51
+
52
+ ## Quality Attribute Checklist
53
+
54
+ When reviewing architectural decisions, verify the relevant quality attributes are addressed:
55
+
56
+ | Attribute | Review Question |
57
+ |-----------|----------------|
58
+ | **Performance** | Are there obvious bottlenecks? N+1 queries? Unbounded loops? |
59
+ | **Scalability** | Can this handle 10x load without structural changes? |
60
+ | **Security** | Are trust boundaries enforced? Input validated at boundaries? |
61
+ | **Availability** | What happens when a dependency fails? Is there a fallback? |
62
+ | **Modifiability** | How many files change to add a typical feature? |
63
+ | **Testability** | Can components be tested in isolation without complex setup? |
@@ -0,0 +1,49 @@
1
+ # Architecture Antipatterns — Reviewer Digest
2
+
3
+ > Detection-focused extract for reviewer context. For full analysis, see `antipatterns/code/architecture-antipatterns.md`.
4
+
5
+ ## Structural Antipatterns
6
+
7
+ | Antipattern | Detection Signal | Severity |
8
+ |-------------|-----------------|----------|
9
+ | **Big Ball of Mud** | No discernible module boundaries; any module calls any other; package diagram is fully connected | critical |
10
+ | **God Object / God Service** | Class/module with >10 public methods touching >3 concerns; single service handling unrelated domains | high |
11
+ | **Golden Hammer** | Same pattern/library used for every problem regardless of fit (everything is a microservice, everything uses Redux) | medium |
12
+ | **Architecture Astronaut** | Layers of abstraction solving problems no one has; meta-frameworks, plugin systems with zero plugins | medium |
13
+ | **Dead Code / Lava Flow** | Unreachable code paths, unused exports, commented-out blocks; code preserved "because it might be needed" | medium |
14
+ | **Copy-Paste Architecture** | Duplicated modules with minor variations instead of shared abstraction | high |
15
+ | **Boat Anchor** | Unused infrastructure "for future use" (empty interfaces, unused config, skeleton services) | medium |
16
+ | **Accidental Complexity** | System complexity far exceeds problem complexity; over-engineered for the actual requirements | medium |
17
+ | **Stovepipe System** | Modules built in isolation with no integration architecture; each uses different patterns, different data formats | high |
18
+ | **Swiss Army Knife** | One component tries to serve every use case; endlessly configurable but hard to use for any single purpose | medium |
19
+
20
+ ## Integration Antipatterns
21
+
22
+ | Antipattern | Detection Signal | Severity |
23
+ |-------------|-----------------|----------|
24
+ | **Distributed Monolith** | Multiple services that must be deployed together; shared database; lock-step releases | critical |
25
+ | **Chatty Interface** | >5 sequential API calls to complete one logical operation | medium |
26
+ | **Shared Database** | Multiple services reading/writing the same database tables directly | critical |
27
+ | **Circular Dependency** | Service A calls B calls C calls A (or module-level equivalent) | high |
28
+ | **Hardcoded Endpoints** | URLs, hostnames, or ports as string literals in source code | medium |
29
+ | **Missing Circuit Breaker** | External service calls without timeout or failure handling | high |
30
+ | **Sinkhole Anti-pattern** | Requests pass through multiple layers that add no value (pure pass-through) | medium |
31
+
32
+ ## Layering Antipatterns
33
+
34
+ | Antipattern | Detection Signal | Severity |
35
+ |-------------|-----------------|----------|
36
+ | **Upward Dependency** | Core/domain module imports from UI/API layer | critical |
37
+ | **Layer Bypass** | UI code calling database/repository directly, skipping service layer | high |
38
+ | **Anemic Domain** | Domain objects are pure data holders; all logic in services | medium |
39
+ | **Fat Controller** | Controller/handler contains business logic instead of delegating | high |
40
+ | **Inner Platform Effect** | Building a general-purpose engine inside the application that reimplements what the platform already provides | high |
41
+
42
+ ## Root Cause Patterns
43
+
44
+ Most architecture antipatterns share a few root causes:
45
+ - **Shipping pressure:** Shortcuts that accumulate into structural debt
46
+ - **Missing boundaries:** No enforced module boundaries in build tooling
47
+ - **Conway's Law misalignment:** Architecture doesn't match team structure
48
+ - **Premature optimization:** Distributed complexity without proven need
49
+ - **BDUF backlash:** Avoiding all upfront design, resulting in no design
@@ -0,0 +1,53 @@
1
+ # Code Smells — Reviewer Digest
2
+
3
+ > Detection-focused extract for reviewer context. For full remediation guidance, see `antipatterns/code/code-smells.md`.
4
+
5
+ ## Method-Level Smells
6
+
7
+ | Smell | Detection Signal | Severity |
8
+ |-------|-----------------|----------|
9
+ | **Long Method** | >30 lines or >3 levels of nesting | medium |
10
+ | **Parameter List** | >4 parameters | medium |
11
+ | **Feature Envy** | Method accesses another object's data more than its own | high |
12
+ | **Message Chains** | `a.b().c().d()` — 3+ chained calls | medium |
13
+ | **Inappropriate Intimacy** | Class reaches into another's private/internal state | high |
14
+ | **Refused Bequest** | Subclass overrides parent methods to do nothing | medium |
15
+
16
+ ## Class-Level Smells
17
+
18
+ | Smell | Detection Signal | Severity |
19
+ |-------|-----------------|----------|
20
+ | **Large Class** | >300 lines or >10 public methods | medium |
21
+ | **God Class** | Handles >3 unrelated responsibilities | high |
22
+ | **Data Class** | Only getters/setters, no behavior | low |
23
+ | **Lazy Class** | <3 methods, delegating everything | low |
24
+ | **Speculative Generality** | Abstract classes/interfaces with single implementation | medium |
25
+ | **Middle Man** | Class delegates >80% of methods to another | medium |
26
+
27
+ ## Structural Smells
28
+
29
+ | Smell | Detection Signal | Severity |
30
+ |-------|-----------------|----------|
31
+ | **Shotgun Surgery** | One change requires edits to 5+ files | high |
32
+ | **Divergent Change** | One file changes for multiple unrelated reasons | high |
33
+ | **Parallel Inheritance** | Adding a subclass in one hierarchy requires adding one in another | medium |
34
+ | **Data Clumps** | Same 3+ fields appear together in multiple places | medium |
35
+ | **Primitive Obsession** | Using primitives where a domain type would be clearer | low |
36
+ | **Switch Statements** | Repeated switch/if-else on the same discriminant | medium |
37
+
38
+ ## Code Duplication
39
+
40
+ | Smell | Detection Signal | Severity |
41
+ |-------|-----------------|----------|
42
+ | **Exact Duplication** | Identical blocks >5 lines | high |
43
+ | **Structural Duplication** | Same algorithm with different types/names | medium |
44
+ | **Semantic Duplication** | Different code doing the same thing | medium |
45
+
46
+ ## Naming Smells
47
+
48
+ | Smell | Detection Signal | Severity |
49
+ |-------|-----------------|----------|
50
+ | **Misleading Name** | Name implies different behavior than actual | high |
51
+ | **Inconsistent Naming** | Same concept has different names across files | medium |
52
+ | **Generic Name** | `data`, `info`, `handler`, `manager`, `utils` without qualifier | medium |
53
+ | **Encoded Type** | Hungarian notation or type in name (`strName`, `arrList`) | low |
@@ -0,0 +1,54 @@
1
+ # Coupling & Cohesion — Reviewer Digest
2
+
3
+ > Evaluation-focused extract for reviewer context. For full guidance, see `architecture/foundations/coupling-and-cohesion.md`.
4
+
5
+ ## Coupling Assessment
6
+
7
+ | Coupling Type | Detection Signal | Severity |
8
+ |---------------|-----------------|----------|
9
+ | **Content Coupling** | One module directly modifies another's internal state (private fields, internal data structures) | critical |
10
+ | **Common Coupling** | Multiple modules read/write shared global state (global config they all mutate, shared DB table without coordination) | high |
11
+ | **Control Coupling** | Function parameter controls another module's execution flow (boolean flag argument that switches behavior) | medium |
12
+ | **Stamp Coupling** | Passing a large object when only 1-2 fields are needed (entire User object for a function that needs email) | low |
13
+ | **Data Coupling** | Passing only needed data — this is GOOD coupling | none (target) |
14
+ | **Message Coupling** | Modules communicate through events/messages with no identity knowledge — this is BEST coupling | none (target) |
15
+
16
+ ## Cohesion Assessment
17
+
18
+ | Cohesion Type | Detection Signal | Quality |
19
+ |---------------|-----------------|---------|
20
+ | **Functional** | Module does one thing and does it completely | best |
21
+ | **Sequential** | Output of one operation feeds input of the next (ETL pipeline) | good |
22
+ | **Communicational** | Operations work on the same data (read, compute, format same record) | acceptable |
23
+ | **Temporal** | Operations happen at the same time but serve different purposes (init, cleanup) | poor |
24
+ | **Logical** | Operations share control flow but not purpose (util files, catch-all handlers) | poor |
25
+ | **Coincidental** | No relationship between operations (random grab-bag module) | worst |
26
+
27
+ ## Quick Check
28
+
29
+ For each module in the diff, ask:
30
+ 1. **Cohesion:** Can I describe this module's purpose in one sentence without "and"? If no, low cohesion.
31
+ 2. **Coupling:** If I change this module's internals, how many other files need to change? If >2, high coupling.
32
+ 3. **Direction:** Do dependencies flow from unstable (UI, API handlers) toward stable (domain, utilities)? If reversed, structural debt.
33
+
34
+ ## Connascence Quick Reference
35
+
36
+ Connascence refines coupling into a more granular classification. Ordered from weakest (acceptable) to strongest (most harmful):
37
+
38
+ | Connascence | Description | Acceptable? |
39
+ |-------------|-------------|-------------|
40
+ | **Name** | Two components must agree on a name (function name, variable name) | Yes — unavoidable and cheaply refactored |
41
+ | **Type** | Two components must agree on a type (parameter type, return type) | Yes — enforced by type systems |
42
+ | **Meaning** | Two components must agree on the meaning of a value (true = active, 0 = success) | Caution — use enums/constants instead of magic values |
43
+ | **Position** | Two components must agree on parameter order | Caution — use named parameters or option objects |
44
+ | **Algorithm** | Two components must use the same algorithm (hashing, encoding) | Risky — extract shared algorithm to single location |
45
+ | **Execution** | Two components must execute in a specific order | Risky — make ordering explicit in control flow |
46
+ | **Timing** | Two components must execute at the same time or within a window | High risk — source of race conditions |
47
+ | **Value** | Two components must have correlated values (e.g., two arrays that must be same length) | High risk — encapsulate into a single data structure |
48
+ | **Identity** | Two components must reference the same object instance | High risk — shared mutable state |
49
+
50
+ ## Module Boundary Health
51
+
52
+ - Modules should have narrow interfaces (few public exports relative to total code)
53
+ - Changes should be localized: a bugfix in module A should not require changes in modules B, C, D
54
+ - Test for boundary health: can you write a unit test for this module without importing 5+ other modules?