npm - @wazir-dev/cli - Versions diffs - 1.3.0 → 1.4.0 - Mend

@wazir-dev/cli 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (133) hide show

package/CHANGELOG.md +17 -2
package/docs/research/2026-03-20-agents/a18fb002157904af5.txt +187 -0
package/docs/research/2026-03-20-agents/a1d0ac79ac2f11e6f.txt +2 -0
package/docs/research/2026-03-20-agents/a324079de037abd7c.txt +198 -0
package/docs/research/2026-03-20-agents/a357586bccfafb0e5.txt +256 -0
package/docs/research/2026-03-20-agents/a4365394e4d753105.txt +137 -0
package/docs/research/2026-03-20-agents/a492af28bc52d3613.txt +136 -0
package/docs/research/2026-03-20-agents/a4984db0b6a8eee07.txt +124 -0
package/docs/research/2026-03-20-agents/a5b30e59d34bbb062.txt +214 -0
package/docs/research/2026-03-20-agents/a5cf7829dab911586.txt +165 -0
package/docs/research/2026-03-20-agents/a607157c30dd97c9e.txt +96 -0
package/docs/research/2026-03-20-agents/a60b68b1e19d1e16b.txt +115 -0
package/docs/research/2026-03-20-agents/a722af01c5594aba0.txt +166 -0
package/docs/research/2026-03-20-agents/a787bdc516faa5829.txt +181 -0
package/docs/research/2026-03-20-agents/a7c46d1bba1056ed2.txt +132 -0
package/docs/research/2026-03-20-agents/a7e5abbab2b281a0d.txt +100 -0
package/docs/research/2026-03-20-agents/a8dbadc66cd0d7d5a.txt +95 -0
package/docs/research/2026-03-20-agents/a904d9f45d6b86a6d.txt +75 -0
package/docs/research/2026-03-20-agents/a927659a942ee7f60.txt +102 -0
package/docs/research/2026-03-20-agents/a962cb569191f7583.txt +125 -0
package/docs/research/2026-03-20-agents/aab6decea538aac41.txt +148 -0
package/docs/research/2026-03-20-agents/abd58b853dd938a1b.txt +295 -0
package/docs/research/2026-03-20-agents/ac009da573eff7f65.txt +100 -0
package/docs/research/2026-03-20-agents/ac1bc783364405e5f.txt +190 -0
package/docs/research/2026-03-20-agents/aca5e2b57fde152a0.txt +132 -0
package/docs/research/2026-03-20-agents/ad849b8c0a7e95b8b.txt +176 -0
package/docs/research/2026-03-20-agents/adc2b12a4da32c962.txt +258 -0
package/docs/research/2026-03-20-agents/af97caaaa9a80e4cb.txt +146 -0
package/docs/research/2026-03-20-agents/afc5faceee368b3ca.txt +111 -0
package/docs/research/2026-03-20-agents/afdb282d866e3c1e4.txt +164 -0
package/docs/research/2026-03-20-agents/afe9d1f61c02b1e8d.txt +299 -0
package/docs/research/2026-03-20-agents/b4hmkwril.txt +1856 -0
package/docs/research/2026-03-20-agents/b80ptk89g.txt +1856 -0
package/docs/research/2026-03-20-agents/bf54s1jss.txt +1150 -0
package/docs/research/2026-03-20-agents/bhd6kq2kx.txt +1856 -0
package/docs/research/2026-03-20-agents/bmb2fodyr.txt +988 -0
package/docs/research/2026-03-20-agents/bmmsrij8i.txt +826 -0
package/docs/research/2026-03-20-agents/bn4t2ywpu.txt +2175 -0
package/docs/research/2026-03-20-agents/bu22t9f1z.txt +0 -0
package/docs/research/2026-03-20-agents/bwvl98v2p.txt +738 -0
package/docs/research/2026-03-20-agents/psych-a3697a7fd06eb64fd.txt +135 -0
package/docs/research/2026-03-20-agents/psych-a37776fabc870feae.txt +123 -0
package/docs/research/2026-03-20-agents/psych-a5b1fe05c0589efaf.txt +2 -0
package/docs/research/2026-03-20-agents/psych-a95c15b1f29424435.txt +76 -0
package/docs/research/2026-03-20-agents/psych-a9c26f4d9172dde7c.txt +2 -0
package/docs/research/2026-03-20-agents/psych-aa19c69f0ca2c5ad3.txt +2 -0
package/docs/research/2026-03-20-agents/psych-aa4e4cb70e1be5ecb.txt +95 -0
package/docs/research/2026-03-20-agents/psych-ab5b302f26a554663.txt +102 -0
package/docs/research/2026-03-20-deep-research-complete.md +101 -0
package/docs/research/2026-03-20-deep-research-status.md +38 -0
package/docs/research/2026-03-20-enforcement-research.md +107 -0
package/expertise/composition-map.yaml +27 -8
package/expertise/digests/reviewer/ai-coding-digest.md +83 -0
package/expertise/digests/reviewer/architectural-thinking-digest.md +63 -0
package/expertise/digests/reviewer/architecture-antipatterns-digest.md +49 -0
package/expertise/digests/reviewer/code-smells-digest.md +53 -0
package/expertise/digests/reviewer/coupling-cohesion-digest.md +54 -0
package/expertise/digests/reviewer/ddd-digest.md +60 -0
package/expertise/digests/reviewer/dependency-risk-digest.md +40 -0
package/expertise/digests/reviewer/error-handling-digest.md +55 -0
package/expertise/digests/reviewer/review-methodology-digest.md +49 -0
package/exports/hosts/claude/.claude/commands/learn.md +61 -8
package/exports/hosts/claude/.claude/settings.json +7 -6
package/exports/hosts/claude/export.manifest.json +6 -3
package/exports/hosts/claude/host-package.json +3 -0
package/exports/hosts/codex/export.manifest.json +6 -3
package/exports/hosts/codex/host-package.json +3 -0
package/exports/hosts/cursor/.cursor/hooks.json +6 -6
package/exports/hosts/cursor/export.manifest.json +6 -3
package/exports/hosts/cursor/host-package.json +3 -0
package/exports/hosts/gemini/export.manifest.json +6 -3
package/exports/hosts/gemini/host-package.json +3 -0
package/hooks/definitions/pretooluse_dispatcher.yaml +26 -0
package/hooks/definitions/pretooluse_pipeline_guard.yaml +22 -0
package/hooks/definitions/stop_pipeline_gate.yaml +22 -0
package/hooks/hooks.json +7 -6
package/hooks/pretooluse-dispatcher +84 -0
package/hooks/pretooluse-pipeline-guard +9 -0
package/hooks/stop-pipeline-gate +9 -0
package/package.json +2 -2
package/schemas/decision.schema.json +15 -0
package/schemas/hook.schema.json +4 -1
package/skills/TEMPLATE-3-ZONE.md +160 -0
package/skills/brainstorming/SKILL.md +127 -23
package/skills/clarifier/SKILL.md +175 -18
package/skills/claude-cli/SKILL.md +91 -12
package/skills/codex-cli/SKILL.md +91 -12
package/skills/debugging/SKILL.md +133 -38
package/skills/design/SKILL.md +173 -37
package/skills/dispatching-parallel-agents/SKILL.md +129 -31
package/skills/executing-plans/SKILL.md +113 -25
package/skills/executor/SKILL.md +185 -21
package/skills/finishing-a-development-branch/SKILL.md +107 -18
package/skills/gemini-cli/SKILL.md +91 -12
package/skills/humanize/SKILL.md +92 -13
package/skills/init-pipeline/SKILL.md +90 -17
package/skills/prepare-next/SKILL.md +93 -24
package/skills/receiving-code-review/SKILL.md +90 -16
package/skills/requesting-code-review/SKILL.md +100 -24
package/skills/requesting-code-review/code-reviewer.md +29 -17
package/skills/reviewer/SKILL.md +190 -50
package/skills/run-audit/SKILL.md +92 -15
package/skills/scan-project/SKILL.md +93 -14
package/skills/self-audit/SKILL.md +113 -39
package/skills/skill-research/SKILL.md +94 -7
package/skills/subagent-driven-development/SKILL.md +129 -30
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +30 -2
package/skills/subagent-driven-development/implementer-prompt.md +40 -27
package/skills/subagent-driven-development/spec-reviewer-prompt.md +25 -12
package/skills/tdd/SKILL.md +125 -20
package/skills/using-git-worktrees/SKILL.md +118 -28
package/skills/using-skills/SKILL.md +116 -29
package/skills/verification/SKILL.md +127 -22
package/skills/wazir/SKILL.md +517 -153
package/skills/writing-plans/SKILL.md +134 -28
package/skills/writing-skills/SKILL.md +91 -13
package/skills/writing-skills/anthropic-best-practices.md +104 -64
package/skills/writing-skills/persuasion-principles.md +100 -34
package/tooling/src/capture/command.js +29 -1
package/tooling/src/capture/decision.js +40 -0
package/tooling/src/capture/store.js +1 -0
package/tooling/src/config/depth-table.js +60 -0
package/tooling/src/export/compiler.js +7 -8
package/tooling/src/guards/guardrail-functions.js +131 -0
package/tooling/src/guards/phase-prerequisite-guard.js +39 -3
package/tooling/src/hooks/pretooluse-dispatcher.js +300 -0
package/tooling/src/hooks/pretooluse-pipeline-guard.js +141 -0
package/tooling/src/hooks/stop-pipeline-gate.js +92 -0
package/tooling/src/learn/pipeline.js +177 -0
package/tooling/src/state/db.js +251 -2
package/tooling/src/state/pipeline-state.js +262 -0
package/wazir.manifest.yaml +3 -0
package/workflows/learn.md +61 -8

package/docs/research/2026-03-20-deep-research-complete.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Deep Research Complete — 2026-03-20
+## 25 Research Agents — ALL COMPLETED
+Total output: ~6.8MB across 25 agents. Full transcripts at:
+`/private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/`
+## Executive Summary
+### The Architecture for Wazir v2
+**Three-layer enforcement pyramid:**
+1. **Hooks** (mechanical, can't bypass): Stop blocks completion, PreToolUse blocks writes/commits/pushes
+2. **Subagent isolation** (architectural, can't see full pipeline): one agent per phase, controller holds the loop
+3. **Persuasion engineering** (behavioral, won't bypass): superpowers-style rationalization tables, red flags, authority language
+### Key Findings
+**Hooks:**
+- Stop hook CAN block completion (`{"decision": "block"}`) — proven by ralph-loop (492+ iterations)
+- PreToolUse has 7 decision patterns: silent allow, advisory, systemMessage, modify, JSON deny, exit-code deny, echo-trick redirect
+- State tracking via `pipeline-state.json` — hooks read, CLI writes, atomic temp+rename
+- Critical limitations: hooks can block but not compel; "hook error" labels poison the model; SubagentStop is broken; agent can escape via AskUserQuestion
+- Must check `stop_hook_active` to prevent infinite loops; must allow context-limit and user-abort stops
+**Subagent Architecture:**
+- Controller-as-orchestrator: wz:wazir holds the loop, dispatches one subagent per phase
+- Each subagent gets fresh 200K context (~165K usable after overhead)
+- No nesting (depth=1) — controller dispatches ALL subagents directly
+- File-mediated handoff (MetaGPT pattern): artifacts on disk, not in context
+- Artifact dependency: each artifact has `requires` block with predecessor digest for staleness detection
+- Guardrail functions per phase boundary with concrete pass/fail criteria
+- Retry ladder: same-model×2 → model-escalation×1 → human escalation
+- Error classification: transient (retry), quality (retry+feedback), deterministic (escalate), resource (model-escalate)
+**Persuasion Engineering:**
+- Superpowers is 100% prompt engineering, zero mechanical enforcement — and agents STILL skip (issue #463)
+- Meincke et al. 2025: persuasion doubles compliance (33%→72%, N=28,000, p<.001)
+- Best combination: Authority + Commitment + Scarcity
+- CSO critical: skill descriptions must be triggers only, never process summaries
+- 47 rationalization entries across 5 superpowers skills — Wazir has ZERO
+- "Violating the letter is violating the spirit" — single most impactful sentence
+- TDD for skills: RED (observe baseline failures) → GREEN (write skill addressing those) → REFACTOR (close new loopholes)
+**Learning System:**
+- 4-stage pipeline: Tally → Candidate → Promote → Active
+- Findings classified by 8 categories × 4 severity levels
+- Recurrence detection via finding_hash dedup (PagerDuty pattern)
+- Semi-automatic promotion: auto-propose, human-approve (CodeGuru + Snyk model)
+- Drift prevention: 30 active project learnings max, 90-day TTL, 5% hit-rate demotion, principle consolidation at 25+ entries
+- Decision audit trail: v2 schema with category, alternatives, confidence, outcome_ref, supersedes
+- User feedback: capture corrections/approvals in ndjson, classify signal vs noise
+**Review Architecture:**
+- Two-tier: internal (Sonnet, expertise-loaded, pattern-matching) → external (Codex, fresh eyes, unknown-unknowns)
+- Critical finding: reviewer always-layer is 99K tokens against 50K ceiling — 5 of 8 modules are dropped
+- Fix: mode-specific reviewer composition (different modules per review mode)
+- Reviewer digest modules: 3-5K tokens each (not 12K originals)
+- Findings classified by 8 categories (correctness, security, completeness, wiring, verification, drift, performance, style)
+- Auto-classification rules per category with severity floors
+- Feedback-to-learning: 7-step loop, LLM-assisted clustering for pattern detection
+**Interactive UX:**
+- AskUserQuestion: 1-4 questions, 2-4 options each, arrow-key selection, multiSelect supported
+- Bug: DO NOT list in skill's allowed-tools (causes empty answers)
+- Progressive disclosure: status line (what) → paragraph (why) → full report (everything)
+- Key formula: "Name the action. State the dependency. Omit the journey."
+- 5 progress patterns: phase map, meaningful updates, artifact previews, time estimates, heartbeat
+- Heartbeat: never >2min silence (standard), >90s (deep), >3min (quick)
+- Steerability: classify mutation level → show impact → selective regeneration → preserve completed work
+- Three modes: auto (gating agent steers) / guided (checkpoints steer) / interactive (continuous steer)
+## Agent Output Index
+| # | Agent | Key Deliverable |
+|---|-------|----------------|
+| 1 | Stop hook patterns | Complete blueprint for pipeline-gate Stop hook with 10 edge cases |
+| 2 | PreToolUse catalog | 7 decision patterns with code examples from 4 real plugins |
+| 3 | State machine design | pipeline-state.json schema with 30+ fields, update rules, session isolation |
+| 4 | Hook limitations | 13 limitations with workarounds, including "hook error" label poisoning |
+| 5 | Persuasion playbook | 10 patterns, 47 rationalization entries, CSO rules, implementation checklist |
+| 6 | Controller pattern | Hybrid architecture: flat orchestration with file-mediated handoff |
+| 7 | Artifact dependencies | Per-phase schemas with requires/digest, write-time validation |
+| 8 | Context isolation | 200K per subagent, no nesting, MCP tool caveats, MetaGPT pub-sub |
+| 9 | Guardrail validation | 6 guardrail functions with concrete pass/fail criteria per phase |
+| 10 | Failure + retry | 3-tier ladder (same-model→escalate→human), error classification |
+| 11 | AskUserQuestion API | Full schema, 2-4 options, multiSelect, known bugs, plugin examples |
+| 12 | Showing reasoning | Progressive disclosure templates at 3 levels with anti-patterns |
+| 13 | Depth parameters | (in bladnman analysis) 4 depth levels with per-parameter tables |
+| 14 | Steerability | Mutation classification, impact assessment, selective regeneration |
+| 15 | Progress reporting | 5 patterns (phase map, finding updates, previews, time estimates, heartbeat) |
+| 16 | Findings → antipatterns | 4-stage promotion pipeline, 3+ occurrence threshold, human gate |
+| 17 | Cumulative tracking | SQLite schema (5 tables), dedup algorithm, recurrence detection |
+| 18 | Drift prevention | 7 mechanisms with concrete limits (30 active, 90-day TTL, 5% demotion) |
+| 19 | Decision audit trail | v2 schema with alternatives, confidence, outcome correlation |
+| 20 | User feedback capture | Signal classification, correction weighting, ndjson format |
+| 21 | Two-tier review | Internal→external, critical asymmetry (known vs unknown unknowns) |
+| 22 | Reviewer composition | Mode-specific modules, 3-5K digests, 50K budget analysis |
+| 23 | Findings classification | 8 categories × 4 severities, auto-classification rules |
+| 24 | Feedback-to-learning | 7-step loop, LLM clustering, minimum viable phases A-D |
+| 25 | Proof-of-implementation | Per-type matrix (web/API/CLI/library), Playwright MCP, Symphony model |

package/docs/research/2026-03-20-deep-research-status.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Deep Research Status — 2026-03-20
+## 25 Research Agents — Progress
+### Completed (14/25)
+1. Hook: Stop hook patterns (ralph-loop analysis) ✅
+2. Hook: PreToolUse catalog (7 decision patterns) ✅
+3. Hook: State machine design (pipeline-state.json) ✅
+4. Subagent: Artifact dependencies (per-phase schemas) ✅
+5. Subagent: Guardrail validation (per-phase functions) ✅
+6. Subagent: Failure + retry (3-tier ladder) ✅
+7. Interactive: Showing reasoning (progressive disclosure) ✅
+8. Learning: Findings → antipatterns (4-stage pipeline) ✅
+9. Learning: Cumulative tracking (SQLite schema) ✅
+10. Learning: Drift prevention (7 mechanisms) ✅
+11. Learning: User feedback capture ✅
+12. Review: Feedback-to-learning pipeline (7-step loop) ✅
+13. Review: Proof-of-implementation (per-type matrix) ✅
+14. Hook: Persuasion engineering (superpowers analysis) — in first batch ✅
+### Pending (11/25)
+15. Hook: Limitations + workarounds
+16. Subagent: Controller pattern
+17. Subagent: Context isolation
+18. Interactive: AskUserQuestion API
+19. Interactive: Depth parameters
+20. Interactive: Steerability
+21. Interactive: Progress reporting
+22. Review: Two-tier architecture
+23. Review: Reviewer composition
+24. Review: Findings classification
+25. Learning: Decision audit trail
+## Key Findings So Far
+All research output files at: /private/tmp/claude-501/-Users-mohamedabdallah-Work-Wazir/96398c18-9868-43bc-a4d6-d7f388880d4a/tasks/
+Full synthesis will be compiled when all 25 agents complete.

package/docs/research/2026-03-20-enforcement-research.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Enforcement Research — 2026-03-20
+## The Answer
+**Prose instructions don't work. The agent will always rationalize skipping them.** Every framework that achieves reliable enforcement uses the same pattern: **the framework holds the loop, not the agent.**
+## The Three-Layer Strategy
+### Layer 1: Mechanical Hooks (agent CANNOT bypass)
+**Stop hook** blocks completion: `{"decision": "block", "reason": "..."}` — proven by ralph-loop plugin (official marketplace). The agent literally cannot stop until all artifacts exist.
+**PreToolUse hooks** block actions:
+- `PreToolUse:Write|Edit` — blocks implementation code if no plan artifact exists
+- `PreToolUse:Bash` — blocks `git commit` if no tests run, blocks `git push` if no review
+- Returns `permissionDecision: "deny"` — the tool call is prevented entirely
+**State tracking** via `pipeline-state.json` — hooks READ state, CLI WRITES state. No race conditions.
+**Key: command hooks only, never prompt hooks.** Prompt hooks re-introduce the rationalization problem.
+### Layer 2: Subagent Isolation (agent CANNOT see full pipeline)
+From every framework (CrewAI, LangGraph, Symphony, ideation_team_skill): **give the agent a task, not a plan.**
+- Each phase is a separate subagent invocation
+- Phase N+1 receives phase N's artifact as input — if it doesn't exist, the call fails
+- The controller (wazir skill) holds the loop and decides what runs next
+- No single agent can rationalize skipping from research to code
+### Layer 3: Persuasion Engineering (agent WON'T bypass — 72% compliance)
+From superpowers (100K stars, backed by Meincke et al. 2025, N=28,000):
+- **Rationalization tables** — enumerate exact thoughts the agent has when skipping, with rebuttals
+- **"Violating the letter is violating the spirit"** — kills the #1 escape pattern
+- **Red flags lists** — specific phrases that mean STOP
+- **Authority + Commitment + Social Proof** — doubles compliance (33% → 72%)
+- **CSO (Claude Search Optimization)** — skill descriptions must be triggers, never process summaries
+## Key Findings Per Source
+### Claude Code Hooks
+- Stop hook CAN block (`{"decision": "block"}`) — proven by ralph-loop
+- PreToolUse CAN deny AND modify tool calls — proven by context-mode plugin
+- Hooks are stateless but can read/write files for state
+- Hooks loaded at session start, can't be added mid-session
+- **Limitation: hooks block actions but can't compel them**
+### Superpowers (100K stars)
+- 100% prompt engineering, zero mechanical enforcement
+- Single SessionStart hook injects meta-skill in `<EXTREMELY_IMPORTANT>` tags
+- **Issue #463: agents STILL skip reviews** — the author knows it's unsolved
+- Commenter: "The only reliable fix is making reviews structural, not instructional"
+- TDD skill is best-in-class prompt engineering but still fails sometimes
+- Persuasion research: authority language doubles compliance but doesn't reach 100%
+### Framework Enforcement Patterns
+- **CrewAI:** Python for-loop + guardrail functions. Agent produces output, framework validates.
+- **LangGraph:** Channel triggers + NamedBarrierValue. Node can't fire until inputs ready.
+- **Temporal:** `await` keyword is the enforcement. Language-level blocking.
+- **Symphony:** State machine + data dependencies. Each phase produces data the next requires.
+- **GitHub Actions:** `needs:` DAG. Scheduler prevents jobs from starting without dependencies.
+- **Universal pattern:** framework holds program counter, not agent.
+### UX / User Engagement
+- **bladnman/ideation_team_skill:** AskUserQuestion for pre-flight interview, depth-aware parameters, cognitive role separation across agents
+- **Devin:** PR-as-proof, screen recordings, conversational Slack updates, async delegation
+- **Copilot Workspace:** Spec → Plan → Code, each editable. Steerability = trust.
+- **Anthropic:** Show planning steps explicitly, programmatic checks at intermediate steps
+## What Wazir Must Build
+### 1. Pipeline State Machine (hooks + state file)
+```
+SessionStart → initialize pipeline-state.json
+PreToolUse:Write|Edit → deny if phase gate not passed
+PreToolUse:Bash → deny git commit/push without tests/review
+Stop → deny if any enabled workflow incomplete or proof missing
+```
+### 2. Subagent-Per-Phase Architecture
+The `/wazir` skill becomes a CONTROLLER that:
+- Spawns a clarifier subagent → receives clarification artifact
+- Spawns a spec subagent → receives spec artifact
+- Spawns a design subagent → receives design artifact
+- Spawns an executor subagent → receives implementation
+- Spawns a reviewer subagent → receives review verdict
+- Each subagent sees ONLY its phase, not the full pipeline
+### 3. Superpowers-Style Persuasion on Every Skill
+For each discipline rule:
+- Iron Law statement
+- Rationalization table (empirically derived)
+- Red flags list
+- "Violating the letter is violating the spirit"
+- `<EXTREMELY_IMPORTANT>` wrapper on session injection
+### 4. User Engagement Templates
+- Pre-flight interview via AskUserQuestion (batched, not serial)
+- Three-tier progress reporting (status line / key decisions / full record)
+- Artifacts as proof (self-describing, contain lineage and reasoning)
+- Steerability at phase boundaries (edit upstream, regenerate downstream)

package/expertise/composition-map.yaml CHANGED Viewed

@@ -27,19 +27,38 @@ always:
     - antipatterns/code/state-management-antipatterns.md
     - quality/evidence-based-verification.md
   reviewer:
-    - antipatterns/process/ai-coding-antipatterns.md
-    - antipatterns/code/code-smells.md
-    - antipatterns/process/code-review-antipatterns.md
-    - antipatterns/code/dependency-antipatterns.md
-    - architecture/foundations/architectural-thinking.md
-    - architecture/foundations/coupling-and-cohesion.md
-    - antipatterns/code/architecture-antipatterns.md
-    - architecture/foundations/domain-driven-design.md
+    # Mode-agnostic core — loaded for ALL review modes (~6K tokens)
+    - digests/reviewer/review-methodology-digest.md
+    - digests/reviewer/ai-coding-digest.md
   content-author:
     - i18n/content/translation-management.md
     - i18n/foundations/string-externalization.md
     - i18n/foundations/pluralization-and-gender.md
+# Mode-specific reviewer composition
+# Loaded ON TOP of always.reviewer based on the --mode flag
+# Total budget per mode: ~15-25K tokens (digests + auto + stack modules)
+reviewer_modes:
+  task-review:
+    - digests/reviewer/code-smells-digest.md
+    - digests/reviewer/error-handling-digest.md
+  spec-challenge:
+    - digests/reviewer/architectural-thinking-digest.md
+    - digests/reviewer/ddd-digest.md
+  design-review:
+    - digests/reviewer/architectural-thinking-digest.md
+    - digests/reviewer/coupling-cohesion-digest.md
+  plan-review:
+    - digests/reviewer/architectural-thinking-digest.md
+    - digests/reviewer/coupling-cohesion-digest.md
+    - digests/reviewer/ai-coding-digest.md
+  final:
+    - digests/reviewer/code-smells-digest.md
+    - digests/reviewer/architecture-antipatterns-digest.md
+    - digests/reviewer/dependency-risk-digest.md
+  research-review: []
+  clarification-review: []
 auto:
   all-stacks:
     all-roles:

package/expertise/digests/reviewer/ai-coding-digest.md ADDED Viewed

@@ -0,0 +1,83 @@
+# AI Coding Antipatterns — Reviewer Digest
+> Detection-focused extract for reviewer context. For full analysis, see `antipatterns/process/ai-coding-antipatterns.md`.
+## Specification Drift (AP-01)
+- **Signal:** Implementation differs from stated requirements without documented reason
+- **Check:** Compare task spec acceptance criteria against actual code behavior
+- **Severity:** high
+## Hallucinated APIs (AP-02)
+- **Signal:** Import or call to function/class/module that doesn't exist in the dependency tree
+- **Check:** Verify every imported symbol resolves to an actual export
+- **Severity:** critical
+## Outdated Patterns (AP-03)
+- **Signal:** Using deprecated APIs, class components in React 2025, callback-based async when promises are standard
+- **Check:** Compare patterns against current library version best practices
+- **Severity:** high
+## Premature Abstraction (AP-04)
+- **Signal:** Generic utility/helper that is used exactly once
+- **Check:** Count call sites for each abstraction introduced
+- **Severity:** medium
+## Context Window Stuffing (AP-05)
+- **Signal:** Agent reads 10+ files without index queries; loads entire modules instead of targeted slices
+- **Check:** Review tool call patterns — excessive Read calls without preceding search
+- **Severity:** low (efficiency, not correctness)
+## Fake Testing (AP-06)
+- **Signal:** Tests that assert implementation details, use mocks that mirror the implementation, or test tautologies
+- **Check:** Would the test fail if the implementation had a real bug? If not, it's fake.
+- **Severity:** high
+## Scope Creep (AP-07)
+- **Signal:** Files modified or features added that were not in the task spec
+- **Check:** Diff includes changes outside the task's specified file scope
+- **Severity:** medium
+## Optimistic Error Handling (AP-08)
+- **Signal:** Missing try/catch around I/O operations, network calls, file operations, JSON parsing
+- **Check:** Every async operation and external call has error handling
+- **Severity:** high
+## Stale Dependency (AP-09)
+- **Signal:** Importing deprecated APIs, using outdated package versions with known CVEs
+- **Check:** Package versions against known vulnerability databases
+- **Severity:** medium-high
+## Cargo-Cult Patterns (AP-10)
+- **Signal:** Design patterns applied without the problem they solve (Factory for single type, Observer for single listener)
+- **Check:** Does the pattern's complexity serve a real need?
+- **Severity:** medium
+## Gold Plating (AP-11)
+- **Signal:** Extra configuration, extensibility points, or features not in the spec
+- **Check:** Is every public API/config option traceable to a requirement?
+- **Severity:** medium
+## Sycophantic Compliance (AP-12)
+- **Signal:** Agent implements exactly what was asked even when the request contains contradictions or obvious errors
+- **Check:** Look for requirements that conflict with each other or with the codebase's existing contracts
+- **Severity:** high
+## Phantom Error Handling (AP-13)
+- **Signal:** Error handling code that looks comprehensive but handles errors incorrectly (swallows, retries without backoff, logs without propagating)
+- **Check:** Trace each error path — does it actually reach a handler that does the right thing?
+- **Severity:** high
+## Inconsistent State After Failure (AP-14)
+- **Signal:** Multi-step operations where a failure in step N leaves steps 1..N-1 committed
+- **Check:** Are multi-step mutations wrapped in transactions or compensating actions?
+- **Severity:** high
+## Over-Confident Comments (AP-15)
+- **Signal:** Comments claiming "this handles all edge cases" or "this is thread-safe" without evidence
+- **Check:** Does the code actually handle what the comment claims?
+- **Severity:** medium
+## Training Data Leakage (AP-16)
+- **Signal:** Code that closely mirrors common training examples but doesn't fit the actual use case
+- **Check:** Does the implementation structure match the problem, or does it match a textbook example?
+- **Severity:** medium

package/expertise/digests/reviewer/architectural-thinking-digest.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Architectural Thinking — Reviewer Digest
+> Evaluation-focused extract for reviewer context. For full guidance, see `architecture/foundations/architectural-thinking.md`.
+## Architecture Review Checklist
+### Separation of Concerns
+- Does each module/file have a single clear responsibility?
+- Are business logic, data access, and presentation in separate layers?
+- Can you describe what a module does in one sentence without "and"?
+### Dependency Direction
+- Do dependencies point inward (toward core domain), not outward?
+- Are infrastructure details (DB, HTTP, filesystem) behind abstractions?
+- Could you swap the database without changing business logic?
+### Interface Design
+- Are public APIs minimal (expose only what is needed)?
+- Are contracts (types, schemas, interfaces) explicit and documented?
+- Do functions have clear input/output contracts without hidden side effects?
+### Change Impact
+- Can you add a feature without modifying existing code (Open-Closed)?
+- Are changes localized (changing one feature doesn't cascade across modules)?
+- Is the dependency graph shallow (max 3-4 levels deep)?
+### Reversibility Assessment
+- Which decisions in this diff are hard to reverse?
+- Are irreversible decisions (data models, service boundaries, consistency models) justified with documented reasoning?
+- Are reversible decisions (naming, folder structure, library choices) made quickly without over-analysis?
+### Trade-off Reasoning
+Every architectural decision involves trade-offs. During review, check:
+- Is the trade-off acknowledged? ("We chose X because Y, accepting Z")
+- Is the trade-off appropriate for the context? (startup vs. enterprise, prototype vs. production)
+- Are rejected alternatives documented?
+## Architecture Smells (Quick Detection)
+| Smell | Signal | Severity |
+|-------|--------|----------|
+| **Big Ball of Mud** | No discernible module boundaries; any module calls any other | critical |
+| **Layering Violation** | UI code calling database directly; domain importing from infrastructure | high |
+| **Circular Module Dependency** | Module A depends on Module B depends on Module A | high |
+| **God Module** | One module >1000 LOC handling multiple concerns | medium |
+| **Leaky Abstraction** | Internal implementation details exposed in public interface | medium |
+| **Distributed Monolith** | Multiple services that must be deployed together | high |
+| **Accidental Complexity** | Architecture complexity not justified by problem complexity | medium |
+| **Architecture Astronaut** | Abstractions solving problems no one has yet | medium |
+| **Dead End Architecture** | Design choices that prevent future evolution (no extension points, hardcoded assumptions) | high |
+## Quality Attribute Checklist
+When reviewing architectural decisions, verify the relevant quality attributes are addressed:
+| Attribute | Review Question |
+|-----------|----------------|
+| **Performance** | Are there obvious bottlenecks? N+1 queries? Unbounded loops? |
+| **Scalability** | Can this handle 10x load without structural changes? |
+| **Security** | Are trust boundaries enforced? Input validated at boundaries? |
+| **Availability** | What happens when a dependency fails? Is there a fallback? |
+| **Modifiability** | How many files change to add a typical feature? |
+| **Testability** | Can components be tested in isolation without complex setup? |

package/expertise/digests/reviewer/architecture-antipatterns-digest.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Architecture Antipatterns — Reviewer Digest
+> Detection-focused extract for reviewer context. For full analysis, see `antipatterns/code/architecture-antipatterns.md`.
+## Structural Antipatterns
+| Antipattern | Detection Signal | Severity |
+|-------------|-----------------|----------|
+| **Big Ball of Mud** | No discernible module boundaries; any module calls any other; package diagram is fully connected | critical |
+| **God Object / God Service** | Class/module with >10 public methods touching >3 concerns; single service handling unrelated domains | high |
+| **Golden Hammer** | Same pattern/library used for every problem regardless of fit (everything is a microservice, everything uses Redux) | medium |
+| **Architecture Astronaut** | Layers of abstraction solving problems no one has; meta-frameworks, plugin systems with zero plugins | medium |
+| **Dead Code / Lava Flow** | Unreachable code paths, unused exports, commented-out blocks; code preserved "because it might be needed" | medium |
+| **Copy-Paste Architecture** | Duplicated modules with minor variations instead of shared abstraction | high |
+| **Boat Anchor** | Unused infrastructure "for future use" (empty interfaces, unused config, skeleton services) | medium |
+| **Accidental Complexity** | System complexity far exceeds problem complexity; over-engineered for the actual requirements | medium |
+| **Stovepipe System** | Modules built in isolation with no integration architecture; each uses different patterns, different data formats | high |
+| **Swiss Army Knife** | One component tries to serve every use case; endlessly configurable but hard to use for any single purpose | medium |
+## Integration Antipatterns
+| Antipattern | Detection Signal | Severity |
+|-------------|-----------------|----------|
+| **Distributed Monolith** | Multiple services that must be deployed together; shared database; lock-step releases | critical |
+| **Chatty Interface** | >5 sequential API calls to complete one logical operation | medium |
+| **Shared Database** | Multiple services reading/writing the same database tables directly | critical |
+| **Circular Dependency** | Service A calls B calls C calls A (or module-level equivalent) | high |
+| **Hardcoded Endpoints** | URLs, hostnames, or ports as string literals in source code | medium |
+| **Missing Circuit Breaker** | External service calls without timeout or failure handling | high |
+| **Sinkhole Anti-pattern** | Requests pass through multiple layers that add no value (pure pass-through) | medium |
+## Layering Antipatterns
+| Antipattern | Detection Signal | Severity |
+|-------------|-----------------|----------|
+| **Upward Dependency** | Core/domain module imports from UI/API layer | critical |
+| **Layer Bypass** | UI code calling database/repository directly, skipping service layer | high |
+| **Anemic Domain** | Domain objects are pure data holders; all logic in services | medium |
+| **Fat Controller** | Controller/handler contains business logic instead of delegating | high |
+| **Inner Platform Effect** | Building a general-purpose engine inside the application that reimplements what the platform already provides | high |
+## Root Cause Patterns
+Most architecture antipatterns share a few root causes:
+- **Shipping pressure:** Shortcuts that accumulate into structural debt
+- **Missing boundaries:** No enforced module boundaries in build tooling
+- **Conway's Law misalignment:** Architecture doesn't match team structure
+- **Premature optimization:** Distributed complexity without proven need
+- **BDUF backlash:** Avoiding all upfront design, resulting in no design

package/expertise/digests/reviewer/code-smells-digest.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Code Smells — Reviewer Digest
+> Detection-focused extract for reviewer context. For full remediation guidance, see `antipatterns/code/code-smells.md`.
+## Method-Level Smells
+| Smell | Detection Signal | Severity |
+|-------|-----------------|----------|
+| **Long Method** | >30 lines or >3 levels of nesting | medium |
+| **Parameter List** | >4 parameters | medium |
+| **Feature Envy** | Method accesses another object's data more than its own | high |
+| **Message Chains** | `a.b().c().d()` — 3+ chained calls | medium |
+| **Inappropriate Intimacy** | Class reaches into another's private/internal state | high |
+| **Refused Bequest** | Subclass overrides parent methods to do nothing | medium |
+## Class-Level Smells
+| Smell | Detection Signal | Severity |
+|-------|-----------------|----------|
+| **Large Class** | >300 lines or >10 public methods | medium |
+| **God Class** | Handles >3 unrelated responsibilities | high |
+| **Data Class** | Only getters/setters, no behavior | low |
+| **Lazy Class** | <3 methods, delegating everything | low |
+| **Speculative Generality** | Abstract classes/interfaces with single implementation | medium |
+| **Middle Man** | Class delegates >80% of methods to another | medium |
+## Structural Smells
+| Smell | Detection Signal | Severity |
+|-------|-----------------|----------|
+| **Shotgun Surgery** | One change requires edits to 5+ files | high |
+| **Divergent Change** | One file changes for multiple unrelated reasons | high |
+| **Parallel Inheritance** | Adding a subclass in one hierarchy requires adding one in another | medium |
+| **Data Clumps** | Same 3+ fields appear together in multiple places | medium |
+| **Primitive Obsession** | Using primitives where a domain type would be clearer | low |
+| **Switch Statements** | Repeated switch/if-else on the same discriminant | medium |
+## Code Duplication
+| Smell | Detection Signal | Severity |
+|-------|-----------------|----------|
+| **Exact Duplication** | Identical blocks >5 lines | high |
+| **Structural Duplication** | Same algorithm with different types/names | medium |
+| **Semantic Duplication** | Different code doing the same thing | medium |
+## Naming Smells
+| Smell | Detection Signal | Severity |
+|-------|-----------------|----------|
+| **Misleading Name** | Name implies different behavior than actual | high |
+| **Inconsistent Naming** | Same concept has different names across files | medium |
+| **Generic Name** | `data`, `info`, `handler`, `manager`, `utils` without qualifier | medium |
+| **Encoded Type** | Hungarian notation or type in name (`strName`, `arrList`) | low |

package/expertise/digests/reviewer/coupling-cohesion-digest.md ADDED Viewed

@@ -0,0 +1,54 @@
+# Coupling & Cohesion — Reviewer Digest
+> Evaluation-focused extract for reviewer context. For full guidance, see `architecture/foundations/coupling-and-cohesion.md`.
+## Coupling Assessment
+| Coupling Type | Detection Signal | Severity |
+|---------------|-----------------|----------|
+| **Content Coupling** | One module directly modifies another's internal state (private fields, internal data structures) | critical |
+| **Common Coupling** | Multiple modules read/write shared global state (global config they all mutate, shared DB table without coordination) | high |
+| **Control Coupling** | Function parameter controls another module's execution flow (boolean flag argument that switches behavior) | medium |
+| **Stamp Coupling** | Passing a large object when only 1-2 fields are needed (entire User object for a function that needs email) | low |
+| **Data Coupling** | Passing only needed data — this is GOOD coupling | none (target) |
+| **Message Coupling** | Modules communicate through events/messages with no identity knowledge — this is BEST coupling | none (target) |
+## Cohesion Assessment
+| Cohesion Type | Detection Signal | Quality |
+|---------------|-----------------|---------|
+| **Functional** | Module does one thing and does it completely | best |
+| **Sequential** | Output of one operation feeds input of the next (ETL pipeline) | good |
+| **Communicational** | Operations work on the same data (read, compute, format same record) | acceptable |
+| **Temporal** | Operations happen at the same time but serve different purposes (init, cleanup) | poor |
+| **Logical** | Operations share control flow but not purpose (util files, catch-all handlers) | poor |
+| **Coincidental** | No relationship between operations (random grab-bag module) | worst |
+## Quick Check
+For each module in the diff, ask:
+1. **Cohesion:** Can I describe this module's purpose in one sentence without "and"? If no, low cohesion.
+2. **Coupling:** If I change this module's internals, how many other files need to change? If >2, high coupling.
+3. **Direction:** Do dependencies flow from unstable (UI, API handlers) toward stable (domain, utilities)? If reversed, structural debt.
+## Connascence Quick Reference
+Connascence refines coupling into a more granular classification. Ordered from weakest (acceptable) to strongest (most harmful):
+| Connascence | Description | Acceptable? |
+|-------------|-------------|-------------|
+| **Name** | Two components must agree on a name (function name, variable name) | Yes — unavoidable and cheaply refactored |
+| **Type** | Two components must agree on a type (parameter type, return type) | Yes — enforced by type systems |
+| **Meaning** | Two components must agree on the meaning of a value (true = active, 0 = success) | Caution — use enums/constants instead of magic values |
+| **Position** | Two components must agree on parameter order | Caution — use named parameters or option objects |
+| **Algorithm** | Two components must use the same algorithm (hashing, encoding) | Risky — extract shared algorithm to single location |
+| **Execution** | Two components must execute in a specific order | Risky — make ordering explicit in control flow |
+| **Timing** | Two components must execute at the same time or within a window | High risk — source of race conditions |
+| **Value** | Two components must have correlated values (e.g., two arrays that must be same length) | High risk — encapsulate into a single data structure |
+| **Identity** | Two components must reference the same object instance | High risk — shared mutable state |
+## Module Boundary Health
+- Modules should have narrow interfaces (few public exports relative to total code)
+- Changes should be localized: a bugfix in module A should not require changes in modules B, C, D
+- Test for boundary health: can you write a unit test for this module without importing 5+ other modules?