npm - @torka/claude-workflows - Versions diffs - 0.13.1 → 0.13.2 - Mend

@torka/claude-workflows 0.13.1 → 0.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +30 -0
package/commands/deep-audit.md +163 -22
package/commands/plan-parallelization.md +20 -3
package/examples/settings.local.example.json +17 -0
package/install.js +12 -0
package/package.json +1 -1
package/skills/deep-audit/SKILL.md +63 -3
package/skills/deep-audit/agents/api-contract-reviewer.md +4 -0
package/skills/deep-audit/agents/architecture-and-complexity.md +11 -0
package/skills/deep-audit/agents/code-health.md +4 -0
package/skills/deep-audit/agents/data-layer-reviewer.md +11 -0
package/skills/deep-audit/agents/documentation-health.md +44 -0
package/skills/deep-audit/agents/performance-profiler.md +4 -0
package/skills/deep-audit/agents/refactoring-planner.md +161 -0
package/skills/deep-audit/agents/security-and-error-handling.md +4 -0
package/skills/deep-audit/agents/seo-accessibility-auditor.md +4 -0
package/skills/deep-audit/agents/test-strategy-analyst.md +61 -0
package/skills/deep-audit/agents/type-design-analyzer.md +11 -0
package/skills/deep-audit/templates/report-template.md +65 -0
package/uninstall.js +3 -1
package/skills/deep-audit/agents/test-coverage-analyst.md +0 -37

package/README.md CHANGED Viewed

@@ -43,6 +43,36 @@ After installation, try running one of the commands to test:
 /designer-founder
 ```
+### Auto-Format on Edit (optional)
+Auto-run linters and formatters after Claude edits files. Add to your project's `.claude/settings.local.json`:
+```json
+{
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|MultiEdit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx ]]; then npx eslint \"$CLAUDE_TOOL_FILE_PATH\" --fix 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then pylint \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
+          },
+          {
+            "type": "command",
+            "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.json || \"$CLAUDE_TOOL_FILE_PATH\" == *.css || \"$CLAUDE_TOOL_FILE_PATH\" == *.html ]]; then npx prettier --write \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then black \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.go ]]; then gofmt -w \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.rs ]]; then rustfmt \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.php ]]; then php-cs-fixer fix \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+**Supported languages:** JavaScript/TypeScript (ESLint + Prettier), Python (Pylint + Black), Go (gofmt), Rust (rustfmt), PHP (php-cs-fixer)
+> **Note:** This is a project-level setting — formatter choice varies per project. Commands fail silently (`|| true`) if tools aren't installed.
 ## Usage
 ### Commands

package/commands/deep-audit.md CHANGED Viewed

@@ -8,12 +8,15 @@ Comprehensive multi-agent codebase audit. Spawns parallel review agents across m
 **Usage:**
 ```
-/deep-audit                     # Quick mode (3 agents) on full project
-/deep-audit --full              # Full mode (9 agents) on full project
-/deep-audit --pr 42             # Audit a specific PR diff
-/deep-audit --since abc123f     # Audit changes since a commit hash
-/deep-audit --since 2025-01-15  # Audit changes since a date
-/deep-audit --full --pr 42      # Full mode on a specific PR
+/deep-audit                              # Quick mode + auto refactoring plan
+/deep-audit --full                       # Full mode + auto refactoring plan
+/deep-audit --review-before-plan         # Pause after findings, ask before plan
+/deep-audit --pr 42                      # Audit a specific PR diff
+/deep-audit --since abc123f              # Audit changes since a commit hash
+/deep-audit --since 2025-01-15           # Audit changes since a date
+/deep-audit --full --pr 42               # Full mode on a specific PR
+/deep-audit --agent security-and-error-handling     # Run only one agent
+/deep-audit --agent performance-profiler --pr 42    # Single agent on a PR
 ```
 <workflow CRITICAL="TRUE">
@@ -25,10 +28,20 @@ IT IS CRITICAL THAT YOU FOLLOW THIS WORKFLOW EXACTLY.
 Parse `$ARGUMENTS` to determine:
 1. **Mode**: Check for `--full` flag
-   - If `--full` present → `mode = "full"` (9 agents)
+   - If `--full` present → `mode = "full"` (10 agents)
    - Otherwise → `mode = "quick"` (3 agents)
-2. **Scope**: Check for scope flags (mutually exclusive)
+2. **Single Agent**: Check for `--agent <name>` flag
+   - If `--agent <name>` present → `single_agent = "<name>"`
+   - Otherwise → `single_agent = null`
+   - If both `--full` and `--agent` are present: warn that `--full` is ignored in single-agent mode, set `mode = "single"`, ignore `--full`
+   - If `--agent` is present without `--full`: set `mode = "single"`
+3. **Review Before Plan**: Check for `--review-before-plan` flag
+   - If `--review-before-plan` present → `review_before_plan = true`
+   - Otherwise → `review_before_plan = false`
+4. **Scope**: Check for scope flags (mutually exclusive)
    - `--pr <number>` → `scope = "pr"`, `scope_value = <number>`
    - `--since <value>` → detect format:
      - If matches date pattern (YYYY-MM-DD) → `scope = "since-date"`, `scope_value = <date>`
@@ -116,29 +129,52 @@ Read the SKILL.md file at the path relative to this command:
 skills/deep-audit/SKILL.md
 ```
-From SKILL.md, build the agent list based on mode:
+From SKILL.md, build the complete agent roster (used for both mode selection and `--agent` validation):
 **Quick mode agents:**
 ```
-agents = [
+quick_agents = [
   { file: "security-and-error-handling.md", model: "opus",   dimensions: ["Security", "Error Handling"] },
   { file: "architecture-and-complexity.md", model: "opus",   dimensions: ["Architecture", "Simplification"] },
   { file: "code-health.md",                model: "sonnet", dimensions: ["AI Slop Detection", "Dependency Health"] }
 ]
 ```
-**Full mode** — add these to the quick list:
+**Full mode additional agents:**
 ```
-additional_agents = [
+full_agents = [
   { file: "performance-profiler.md",     model: "sonnet", dimensions: ["Performance"] },
-  { file: "test-coverage-analyst.md",    model: "sonnet", dimensions: ["Test Coverage"] },
+  { file: "test-strategy-analyst.md",    model: "opus",   dimensions: ["Test Coverage", "Test Efficiency"] },
   { file: "type-design-analyzer.md",     model: "sonnet", dimensions: ["Type Design"] },
   { file: "data-layer-reviewer.md",      model: "opus",   dimensions: ["Data Layer & Database"] },
   { file: "api-contract-reviewer.md",    model: "sonnet", dimensions: ["API Contracts & Interface Consistency"] },
-  { file: "seo-accessibility-auditor.md", model: "sonnet", dimensions: ["SEO & Accessibility"] }
+  { file: "seo-accessibility-auditor.md", model: "sonnet", dimensions: ["SEO & Accessibility"] },
+  { file: "documentation-health.md",    model: "sonnet", dimensions: ["Documentation Health"] }
 ]
 ```
+**Build the active agent list based on mode:**
+- **If `mode = "single"`**: Search both `quick_agents` and `full_agents` for an agent whose filename (without `.md`) matches `single_agent`. If found → `agents = [matched_agent]`. If NOT found → print the error below and **STOP** (do not proceed):
+  ```
+  Unknown agent: "<single_agent>"
+  Available agents:
+    Quick mode: security-and-error-handling, architecture-and-complexity, code-health
+    Full mode:  performance-profiler, test-strategy-analyst, type-design-analyzer,
+                data-layer-reviewer, api-contract-reviewer, seo-accessibility-auditor,
+                documentation-health
+  ```
+- **If `mode = "quick"`**: `agents = quick_agents`
+- **If `mode = "full"`**: `agents = quick_agents + full_agents`
+**Refactoring planner** — added when mode is NOT "single" (runs separately in Phase 6, NOT in Phase 4):
+```
+planner_agent = { file: "refactoring-planner.md", model: "opus", dimensions: ["Refactoring"] }
+```
+Include this agent in `state.agents` for tracking when mode is not "single", but do NOT spawn it in Phase 4.
 If resuming: filter `agents` to only those with status "pending" in `previous_state.agents`.
 ---
@@ -152,9 +188,10 @@ Write `_bmad-output/deep-audit/state.json`:
 ```json
 {
   "status": "in_progress",
-  "mode": "<quick|full>",
+  "mode": "<quick|full|single>",
   "scope": "<scope type>",
   "scope_value": "<scope value or null>",
+  "review_before_plan": false,
   "start_commit": "<current_commit>",
   "start_time": "<ISO timestamp>",
   "detected_stack": "<detected_stack>",
@@ -170,10 +207,13 @@ Write `_bmad-output/deep-audit/state.json`:
     }
   },
   "findings": [],
+  "refactoring_plan": null,
   "report_path": null
 }
 ```
+Note: The `agents` object includes the `refactoring-planner` entry with `status: "pending"`. It is tracked like all agents but spawned in Phase 6, not Phase 4.
 If resuming, merge pending agent statuses into the existing state (keep completed agents' data intact).
 Print status:
@@ -181,7 +221,7 @@ Print status:
 Deep Audit — <mode> mode
 Scope: <scope description>
 Stack: <detected_stack>
-Agents: <count> (<list of agent names>)
+Agent(s): <count> (<list of agent names>)
 Commit: <short hash>
 ```
@@ -263,7 +303,88 @@ Deduplication: <original count> findings → <deduped count> findings (<removed
 ---
-## Phase 6: Generate Report
+## Phase 6: Refactoring Planner
+Skip this phase if `mode = "single"` (single-agent audits don't warrant cross-cutting refactoring plans). There is no planner agent in state.json to update.
+Also skip this phase if the deduplicated findings count is 0. Set the planner agent status to "skipped" in state.json and continue.
+### Step 1: Confirm (if --review-before-plan)
+If `review_before_plan` is true:
+1. Print a findings summary to the user:
+   ```
+   FINDINGS SUMMARY: X total (Y critical, Z important, W minor)
+   Top findings:
+   1. F-001: <title> (P1)
+   2. F-002: <title> (P1)
+   3. F-003: <title> (P2)
+   ```
+2. Ask the user: **"Generate refactoring plan from these findings? (Y/n)"**
+3. If the user says no → set planner agent status to "skipped" in state.json, skip to Phase 7.
+If `review_before_plan` is false, proceed directly to Step 2.
+### Step 2: Generate plan
+1. Serialize all deduplicated findings from Phase 5 into a single text block using the `=== FINDING ===` format. Include the assigned `id` field (F-001, etc.) so the planner can reference them.
+2. Read the agent prompt from `skills/deep-audit/agents/refactoring-planner.md`
+3. Spawn via Task tool (same pattern as Phase 4):
+   ```
+   Tool: Task
+   subagent_type: general-purpose
+   model: opus
+   description: "deep-audit: refactoring-planner"
+   prompt: |
+     <agent prompt content>
+     ---
+     ## Input Findings (injected by orchestrator)
+     <serialized findings payload>
+     ---
+     ## Output Format Reminder
+     You MUST produce output using the exact format defined above:
+     - === THEME === blocks for each refactoring theme
+     - Exactly one === EXECUTION ORDER === block at the end
+     Produce NO other output besides these blocks.
+   ```
+4. Parse the response:
+   - Extract all `=== THEME ===` blocks: id, name, effort, risk, finding_ids, dependencies, coverage_gate, blast_radius, warnings, phase, summary, steps, files, tests_before, tests_after
+   - Extract the single `=== EXECUTION ORDER ===` block: phase_1 through phase_4, quick_wins, total_effort, summary
+5. Store parsed data in `state.refactoring_plan`:
+   ```json
+   {
+     "themes": [ ...parsed theme objects... ],
+     "execution_order": { ...parsed execution order... }
+   }
+   ```
+6. Update planner agent status in `state.agents` to "completed" with timestamps and raw_output.
+7. Write updated state.json to disk.
+If the planner agent fails (Task tool returns error):
+- Set planner agent status to "failed" in state.json
+- Log a warning but continue to Phase 7 (the report generates without the roadmap section)
+Print progress:
+```
+Refactoring Planner: <theme_count> themes (<quick_win_count> quick wins), total effort: <total_effort>
+```
+---
+## Phase 7: Generate Report
 Read the report template from:
 ```
@@ -303,7 +424,14 @@ Fill in the template:
 4. **Action Plan**: Select the top 5 findings (by severity, then confidence) and format as a numbered action list with brief description of what to fix and why.
-5. **Statistics**: Total findings, per-severity counts, agent count, dimension count, per-agent breakdown table.
+5. **Refactoring Roadmap** (only if `state.refactoring_plan` is not null):
+   - Fill the `{{#IF_REFACTOR_PLAN}}` conditional block
+   - Set `{{THEME_COUNT}}`, `{{QUICK_WIN_COUNT}}`, `{{TOTAL_EFFORT}}`, `{{EXECUTION_SUMMARY}}`
+   - Render `{{QUICK_WIN_ITEMS}}`: themes flagged as quick wins
+   - Render `{{PHASE_1_THEMES}}` through `{{PHASE_4_THEMES}}`: themes grouped by phase
+   - For each theme, render using the Theme Detail Template (see report-template.md)
+6. **Statistics**: Total findings, per-severity counts, agent count, dimension count, per-agent breakdown table.
 ### Write the report
@@ -317,7 +445,7 @@ Update state.json with `report_path`.
 ---
-## Phase 7: Finalize State
+## Phase 8: Finalize State
 Update `_bmad-output/deep-audit/state.json`:
 - Set `status = "completed"`
@@ -326,7 +454,7 @@ Update `_bmad-output/deep-audit/state.json`:
 ---
-## Phase 8: Present Summary
+## Phase 9: Present Summary
 Print a concise summary to the user:
@@ -335,7 +463,7 @@ Print a concise summary to the user:
   DEEP AUDIT COMPLETE
 ═══════════════════════════════════════════════════
-Mode: <quick|full> (<agent_count> agents)
+Mode: <quick|full|single> (<agent_count> agent(s))
 Scope: <scope description>
 Duration: <duration>
@@ -359,7 +487,20 @@ State:  _bmad-output/deep-audit/state.json
 ═══════════════════════════════════════════════════
 ```
-If any agents failed, add a section:
+If `state.refactoring_plan` is not null, add after TOP ACTIONS and before Report:
+```
+REFACTORING ROADMAP
+─────────────────────────────────────────────────
+<theme_count> themes across 4 phases | <quick_win_count> quick wins
+Total effort: <total_effort>
+QUICK WINS (do these now)
+1. T-NNN: <theme name> (<effort>, <risk> risk)
+2. T-NNN: <theme name> (<effort>, <risk> risk)
+...
+```
+If any agents failed (including the planner), add a section:
 ```
 WARNINGS
 - Agent <name> failed: <error summary>

package/commands/plan-parallelization.md CHANGED Viewed

@@ -89,13 +89,16 @@ The user may provide one or more of:
    - Recommended worktree strategy
 8. **Save Report**
-   Write to: `_bmad-output/planning-artifacts/parallelization-analysis-{YYYY-MM-DD-HHmm}.md`
-   (Includes timestamp to prevent same-day collisions)
+   Get the current local timestamp by running: `date "+%Y-%m-%d-%H%M"`
+   Write to: `_bmad-output/planning-artifacts/parallelization-analysis-{timestamp}.md`
+   (Includes timestamp to prevent same-day collisions — do NOT guess the time)
 </steps>
 ## Output Template
-Use this structure for the report:
+Follow this structure exactly. You may add a "Visual Dependency Graph" section
+(ASCII art showing the phase flow) after the Dependency Matrix, but do not add
+other ad-hoc sections or restructure the template:
 ```markdown
 # Epic Parallelization Analysis
@@ -171,11 +174,25 @@ git worktree add ../epic-3-email-system feature/epic-3
 | 2 | User Auth | Epic 1 (Infrastructure) | Pending |
 | 5 | Analytics | Epic 2 (Auth) | Pending |
+## Visual Dependency Graph
+<!-- ASCII art showing phase flow. Example: -->
+```
+Phase 1:   [Epic 1]    [Epic 3]
+              │
+       ┌──────┼──────┐
+       ▼      ▼      ▼
+Phase 2: [E2]  [E4]  [E5]
+       └──────┼──────┘
+              ▼
+Phase 3:   [Epic 6]
+```
 ## Worktree Strategy Recommendations
 - **Max parallel worktrees**: [recommended number based on dependencies]
 - **Critical path**: Epic X → Epic Y → Epic Z
 - **Bottleneck epics**: [epics that block the most others]
 - **Quick wins**: [small epics that can be completed to unblock others]
+- **Merge order**: [for parallel phases, specify which epic to merge first based on what it unblocks]
 ```
 ## Important Notes

package/examples/settings.local.example.json CHANGED Viewed

@@ -14,5 +14,22 @@
     ],
     "deny": [],
     "ask": []
+  },
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|MultiEdit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx ]]; then npx eslint \"$CLAUDE_TOOL_FILE_PATH\" --fix 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then pylint \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
+          },
+          {
+            "type": "command",
+            "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.json || \"$CLAUDE_TOOL_FILE_PATH\" == *.css || \"$CLAUDE_TOOL_FILE_PATH\" == *.html ]]; then npx prettier --write \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then black \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.go ]]; then gofmt -w \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.rs ]]; then rustfmt \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.php ]]; then php-cs-fixer fix \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
+          }
+        ]
+      }
+    ]
   }
 }

package/install.js CHANGED Viewed

@@ -261,6 +261,18 @@ function install() {
     }
   }
+  // Migration: remove renamed agent files from previous versions
+  const migrations = [
+    { old: 'skills/deep-audit/agents/test-coverage-analyst.md', renamed: 'test-strategy-analyst.md' },
+  ];
+  for (const { old: oldFile, renamed } of migrations) {
+    const oldPath = path.join(targetBase, oldFile);
+    if (fs.existsSync(oldPath)) {
+      fs.unlinkSync(oldPath);
+      log(`  Migrated: removed old ${path.basename(oldFile)} (renamed to ${renamed})`, 'blue');
+    }
+  }
   // Ensure gitignore entries for BMAD workflow
   const bmadDir = path.join(targetBase, '../_bmad');
   if (fs.existsSync(bmadDir)) {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@torka/claude-workflows",
-  "version": "0.13.1",
+  "version": "0.13.2",
   "description": "Claude Code workflow helpers: epic automation, git cleanup, agents, and design workflows",
   "keywords": [
     "claude-code",

package/skills/deep-audit/SKILL.md CHANGED Viewed

@@ -17,11 +17,22 @@ This file is the single source of truth for agent roster, dimension boundaries,
 | Agent File | Dimension | Model |
 |------------|-----------|-------|
 | `performance-profiler.md` | Performance | sonnet |
-| `test-coverage-analyst.md` | Test Coverage | sonnet |
+| `test-strategy-analyst.md` | Test Coverage, Test Efficiency | opus |
 | `type-design-analyzer.md` | Type Design | sonnet |
 | `data-layer-reviewer.md` | Data Layer & Database | opus |
 | `api-contract-reviewer.md` | API Contracts & Interface Consistency | sonnet |
 | `seo-accessibility-auditor.md` | SEO & Accessibility | sonnet |
+| `documentation-health.md` | Documentation Health | sonnet |
+### Refactoring Planner (runs by default after all audit agents)
+| Agent File | Purpose | Model |
+|------------|---------|-------|
+| `refactoring-planner.md` | Synthesizes findings into refactoring themes and execution plan | opus |
+This agent runs in Phase 6 AFTER deduplication. It receives findings as input (not the codebase). It is skipped when there are 0 findings or the user declines via `--review-before-plan`. See the command file for details.
+Each theme includes: `coverage_gate` (REQUIRED/ADEQUATE), `blast_radius` (CONTAINED/MODERATE/WIDE), and `warnings` (anti-pattern flags). See `refactoring-planner.md` for full output format.
 ## Dimension Boundaries
@@ -62,7 +73,7 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
 - Unnecessary indirection (wrapper functions that just pass through)
 - Configuration for things that never change
 - Dead code, unused exports, orphaned files
-- **NOT**: intentional design patterns, library APIs (they need flexibility)
+- **NOT**: intentional design patterns, library APIs (they need flexibility), dead/skipped test files and orphaned test utilities (that's Test Efficiency)
 ### AI Slop Detection
 - Excessive/unnecessary comments explaining obvious code
@@ -100,7 +111,19 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
 - Tests that test implementation rather than behavior
 - Missing integration tests for API endpoints
 - Test fixtures with hardcoded secrets or PII
-- **NOT**: 100% coverage goals, testing trivial getters/setters
+- **NOT**: 100% coverage goals, testing trivial getters/setters, test efficiency/waste (that's Test Efficiency)
+### Test Efficiency
+- Trivial tests that provide no signal (render-only, getter/setter, library wrapper tests)
+- Tests that mirror implementation instead of asserting behavior (zero-signal mock tests)
+- Dead tests: skipped tests, orphaned test utilities, tests excluded by runner config
+- Redundant coverage: E2E tests duplicating unit-level assertions
+- CI pipeline design: missing regression gate, missing caching, excessive pipeline duration
+- Test suite shape (testing diamond): over-testing trivial code, under-testing critical paths at the right layer
+- Snapshot test overuse (large snapshots, frequently-changing snapshots)
+- Test fixture bloat and duplication
+- Test-to-source code ratio indicating maintenance burden
+- **NOT**: missing tests (that's Test Coverage), test correctness issues, flaky tests (that's Test Coverage)
 ### Type Design
 - `any` types that should be specific
@@ -144,6 +167,19 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
 - Missing Open Graph / social sharing metadata
 - **NOT**: content quality, marketing strategy, visual design choices
+### Documentation Health
+- README completeness (description, install, usage, quickstart)
+- Setup and onboarding documentation accuracy
+- Configuration documentation (env vars, config files, feature flags)
+- Public/exported API documentation for complex interfaces
+- Inline documentation for non-obvious logic (complex algorithms, regexes, magic numbers)
+- Doc structure, navigation, and discoverability
+- Doc-code synchronization (stale references, outdated examples)
+- Dead links and broken internal references
+- CLAUDE.md and AI assistant documentation
+- Contributing, licensing, and maintenance docs
+- **NOT**: trivial JSDoc/docstrings (AI Slop dimension), undocumented API endpoints (API Contracts dimension), git-diff-based staleness (/docs-quick-update command), prose style or grammar quality
 ## Scoring Rubric
 Each dimension is scored 1–10:
@@ -165,10 +201,12 @@ Each dimension is scored 1–10:
 - Dependency Health: weight 1
 - Performance: weight 2 (full mode only)
 - Test Coverage: weight 2 (full mode only)
+- Test Efficiency: weight 1 (full mode only)
 - Type Design: weight 1 (full mode only)
 - Data Layer: weight 2 (full mode only)
 - API Contracts: weight 1 (full mode only)
 - SEO & Accessibility: weight 1 (full mode only)
+- Documentation Health: weight 1 (full mode only)
 ## Severity Definitions
@@ -251,3 +289,25 @@ assessment: |
 - Do NOT report the same issue multiple times across different files — report the pattern once and list affected files
 - If no findings for a dimension, still include the DIMENSION SUMMARY with score and assessment
 - Keep descriptions factual and evidence-based; avoid vague language like "could potentially" or "might cause issues"
+## Tool Usage Strategy
+### When Serena MCP is Available
+If `find_symbol`, `find_referencing_symbols`, or other Serena MCP tools are available in your tool list, prefer them over Read/Grep for targeted code exploration:
+| Task | Without Serena | With Serena |
+|------|---------------|-------------|
+| Find all usages of a function | Grep for function name | `find_referencing_symbols` |
+| Understand module dependencies | Read import statements across files | `find_symbol` + references |
+| Check type definitions | Grep for `interface`/`type` keywords | `find_symbol` with type filter |
+| Trace call chains | Read multiple files following imports | `find_referencing_symbols` recursively |
+| Find implementations | Grep for class/function names | `find_symbol` with implementation filter |
+**Fallback**: If Serena tools are not available or return errors, fall back to Read/Grep. Do not fail the audit because an MCP tool is unavailable.
+### General Tool Guidelines
+- **Prefer targeted reads**: Read specific functions/sections rather than entire files when possible
+- **Use Glob first**: Find relevant files before reading them
+- **Batch searches**: Make parallel Grep calls when checking for multiple patterns

package/skills/deep-audit/agents/api-contract-reviewer.md CHANGED Viewed

@@ -28,6 +28,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Compare similar endpoints**: Group endpoints by resource type. Verify they follow the same conventions (naming, pagination, error format, status codes).
 4. **Check internal contracts**: Look at service-to-service function calls. Verify that parameter types, return types, and error handling patterns are consistent across similar services.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/architecture-and-complexity.md CHANGED Viewed

@@ -38,6 +38,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Look for patterns**: Don't review files in isolation. Look for inconsistencies ACROSS similar files. If 8 out of 10 route handlers follow one pattern but 2 follow a different pattern, that's a finding.
 4. **Assess value per complexity**: For each abstraction layer, ask: "Does this indirection add value or just make the code harder to follow?" If removing the abstraction would make the code simpler AND not harder to change, it's over-engineering.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
+- **Circular dependency detection**: Use `find_referencing_symbols` to trace import chains between modules instead of reading every file's import block
+- **God object identification**: Use `find_symbol` to enumerate symbols per module and count responsibilities
+- **Module boundary mapping**: Use `find_referencing_symbols` to map which modules depend on which, revealing tight coupling and incorrect dependency direction
+- **Dead code detection**: Use `find_referencing_symbols` on exported functions/types — zero references means dead code
+If Serena tools are not available, fall back to Glob + Grep + Read.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/code-health.md CHANGED Viewed

@@ -41,6 +41,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Check dependency manifest**: Read `package.json` (and lock file if present). For each dependency, assess: Is it still needed? Is it maintained? Is there a lighter alternative? Is it in the right section (dependencies vs devDependencies)?
 4. **Look for patterns, not individual instances**: Don't report every unnecessary comment — identify the PATTERN (e.g., "all service files have redundant JSDoc on every method") and report it once with affected file list.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/data-layer-reviewer.md CHANGED Viewed

@@ -28,6 +28,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Check migration history**: Read migration files in order. Look for risky migrations (data loss, long locks, irreversible changes). Check that each migration has a reasonable rollback strategy.
 4. **Review query patterns**: Look at how the application queries data. Check for missing indexes, N+1 patterns, and unbounded queries. Focus on queries in hot paths (frequently executed endpoints).
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
+- **Tracing query patterns**: Use `find_referencing_symbols` to trace how queries flow from route handlers through services to the data layer
+- **Finding ORM model usage**: Use `find_symbol` to locate model definitions, then `find_referencing_symbols` to see where they're queried
+- **Missing transaction detection**: Use `find_referencing_symbols` on mutation functions to check if callers wrap them in transactions
+- **Schema/code mismatches**: Use `find_symbol` to compare ORM model definitions against migration files
+If Serena tools are not available, fall back to Glob + Grep + Read.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/documentation-health.md ADDED Viewed

@@ -0,0 +1,44 @@
+# Documentation Health Auditor
+You are a **senior technical writer and developer experience specialist** performing a focused codebase audit. You evaluate whether the project's documentation enables a new contributor to understand, configure, and contribute to the project without reading source code.
+## Dimensions
+You cover **Documentation Health** from SKILL.md. Focus on documentation that is missing, misleading, or structurally broken — not on prose style or formatting preferences.
+Read SKILL.md for exact dimension boundaries and output format requirements.
+## What to Check
+1. **README completeness**: Missing or empty README.md. README lacks project description (what the project does and why). README missing setup/installation instructions. README missing usage examples or a quick-start section. README references a tech stack or architecture that no longer matches the codebase.
+2. **Setup and onboarding docs**: Missing environment setup instructions (required env vars, external services, database setup). Missing prerequisites section (Node version, system dependencies). No "getting started" flow that takes a new developer from clone to running application. Setup instructions that reference commands or scripts that do not exist.
+3. **Configuration documentation**: Environment variables used in code but not documented anywhere. Config files (.env.example, settings files) missing or incomplete. Feature flags or toggles without explanation of what they control. Missing documentation for deployment or CI/CD configuration.
+4. **Exported/public API documentation**: Public modules or packages with no top-level doc comments or README. Exported functions with complex signatures (3+ params, generics, union types) lacking any description. SDK or library code intended for external consumers without usage examples. Missing changelog or migration guide for versioned libraries.
+5. **Inline documentation gaps**: Complex algorithms or business logic (20+ lines of non-obvious logic) without any explaining comment. Regex patterns without a comment explaining what they match. Magic numbers or hardcoded thresholds without explanation. Workarounds or hacks without a comment explaining why the straightforward approach was avoided.
+6. **Doc structure and navigation**: docs/ folder exists but has no index or table of contents. Documentation spread across multiple locations with no cross-references. Orphaned doc files not linked from any entry point. Deeply nested doc structure with no navigation aid.
+7. **Doc-code synchronization**: Code examples in docs that use API signatures or function names that no longer exist. Architecture diagrams or descriptions that contradict the actual directory structure. Version numbers in docs that do not match package.json or recent releases. CLI usage docs that reference flags or subcommands that have been removed.
+8. **Dead links and broken references**: Internal doc links pointing to files that do not exist. Image references pointing to missing files. Links to external resources that are clearly stale (e.g., referencing archived repos or old domain names). Anchor links within markdown that point to headings that do not exist.
+9. **CLAUDE.md / AI assistant docs**: Missing CLAUDE.md in a project that clearly uses Claude Code (presence of .claude/ directory). CLAUDE.md that is a stub or template with no project-specific content. CLAUDE.md with outdated directory structure, stale command references, or wrong technology stack. Missing development commands section when the project has build/test/lint scripts.
+10. **Contributing and maintenance docs**: Missing CONTRIBUTING.md in open-source or team projects. Missing LICENSE file for published packages. No code of conduct for community projects. Missing ADR (Architecture Decision Records) when the codebase contains non-obvious architectural choices.
+## How to Review
+1. **Start from the entry point**: Read README.md first. Can you understand what this project does, how to install it, and how to run it? Note every gap or outdated reference.
+2. **Walk the new-contributor path**: Mentally simulate: clone, install dependencies, configure environment, run the app, run tests. At each step, check if documentation exists and is accurate.
+3. **Cross-reference docs with code**: For each claim in the docs (file paths, function names, commands, config keys), verify it actually exists in the codebase. Flag any mismatch.
+4. **Check doc discoverability**: Is there a clear path from README to deeper docs? Can someone find the information they need without reading every file?
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
+## Output Rules
+- Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
+- Sort findings by severity (P1 first)
+- Only report findings with confidence >= 80
+- For doc-code sync findings, quote the specific stale reference from the doc and what the code actually shows
+- Do NOT flag: absence of JSDoc on trivial functions (that is AI Slop territory), undocumented API endpoints (that is API Contracts territory), or prose style/grammar issues
+- Do NOT duplicate what /docs-quick-update does — that tool is reactive (git-diff driven). You are proactive (comprehensive health check of all documentation regardless of recent changes)
+- Skip this entire audit if the project has no documentation files at all AND no README — produce a single DIMENSION SUMMARY with score 1 and note "No documentation found"
+- Produce one DIMENSION SUMMARY for "Documentation Health"

package/skills/deep-audit/agents/performance-profiler.md CHANGED Viewed

@@ -29,6 +29,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Check resource lifecycle**: For every resource created (connections, listeners, subscriptions, timers), verify there's a corresponding cleanup path. Check error paths too — resources must be cleaned up even when operations fail.
 4. **Assess impact**: Only report findings that would cause noticeable performance degradation (>100ms latency increase, >10MB memory growth, visible UI jank). Skip micro-optimizations.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/refactoring-planner.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Refactoring Planner
+You are a **principal software architect and tech lead** specializing in incremental refactoring strategy. You receive the complete set of deduplicated audit findings from a multi-agent codebase audit and synthesize them into an actionable refactoring roadmap.
+You do NOT review code directly. Your input is the findings produced by other agents. Your job is synthesis, prioritization, and sequencing.
+## Your Input
+You will receive deduplicated findings in this format:
+```
+=== FINDING ===
+id: F-NNN
+agent: <name>
+severity: P1|P2|P3
+confidence: <80-100>
+file: <relative file path>
+line: <line number or range>
+dimension: <dimension name>
+title: <one-line>
+description: |
+  <2-4 sentences>
+suggestion: |
+  <specific fix>
+=== END FINDING ===
+```
+## What You Must Produce
+### Step 1: Identify Refactoring Themes
+Group related findings into themes. A theme is a coherent refactoring effort that addresses multiple findings together. Name themes for the **outcome**, not the problem (e.g., "Consolidate Auth Middleware" not "Auth Issues").
+Guidelines for grouping:
+- Findings touching the same files or module → likely same theme
+- Findings in the same dimension that share a root cause → same theme
+- Findings across dimensions that require the same code changes → same theme
+- A finding may belong to multiple themes (list it in both)
+- Singleton findings that don't group → create a theme with one finding
+Aim for 3-8 themes. Fewer than 3 means the grouping is too coarse. More than 8 means it is too granular.
+### Step 2: Analyze Each Theme
+For each theme, determine:
+1. **Summary**: What is wrong and what is the combined impact? Reference specific finding IDs.
+2. **Steps**: Concrete, ordered refactoring steps. Each step should be a single commit-sized change. Use imperative voice ("Extract middleware", "Add index", "Remove dead code").
+3. **Files**: All files involved (aggregated from constituent findings).
+4. **Effort**: S (< 2 hours), M (2-8 hours), L (> 8 hours).
+5. **Risk**: LOW (no behavior change, additive only), MEDIUM (behavior preserved but code paths change), HIGH (behavior changes possible, needs careful testing).
+6. **Dependencies**: Which other themes must complete first? Use theme IDs. If none, state "None".
+7. **Test requirements**: What tests should exist BEFORE starting (safety net) and what tests should be added AFTER completion (regression).
+8. **Coverage gate**: If the `tests_before` field would be "None" or "Minimal" (the affected area has no/insufficient existing tests), you MUST:
+   - Set `coverage_gate: REQUIRED` in the output
+   - Make step 1 of the `steps` field: "Write characterization tests for [affected area] to establish safety net"
+   - Factor the test-writing effort into the `effort` estimate
+   If existing tests are adequate, set `coverage_gate: ADEQUATE`.
+9. **Blast radius**: Estimate how many files outside the theme's `files` list import or depend on the files being changed. Categorize as:
+   - `CONTAINED` (0-2 external consumers)
+   - `MODERATE` (3-10 external consumers)
+   - `WIDE` (11+ external consumers)
+   Consider: if 3 files are changed but 40 modules import them, the blast radius is WIDE.
+### Step 3: Determine Execution Order
+Assign each theme to a phase:
+- **Phase 1**: Safe refactors (LOW risk, no dependencies). Builds confidence and reduces noise.
+- **Phase 2**: Enablers (themes that other themes depend on). Order by most dependents first.
+- **Phase 3**: High-impact refactors (most P1/P2 findings or broadest file coverage).
+- **Phase 4**: Polish (remaining themes, typically P3-heavy).
+Within each phase, order by: highest impact first, then lowest effort.
+### Step 3.5: Validate Against Anti-Patterns
+Before finalizing, check each theme against these common refactoring anti-patterns. Add a `warnings` field listing any that apply (or "None"):
+- **"Large blast radius — consider splitting into sub-themes"**: Theme touches >10 files
+- **"Refactoring without test safety net"**: `coverage_gate` is REQUIRED and no test-writing step exists (should not happen if Step 2.8 is followed, but acts as a double-check)
+- **"Mixed concerns — separate structural changes from behavior changes"**: Theme steps include both structural refactoring (rename, move, extract) AND behavior changes (new logic, changed business rules)
+### Step 4: Flag Quick Wins
+Identify themes (or individual steps within themes) that meet ALL of:
+- Effort: S
+- Risk: LOW
+- Addresses at least one P1 or P2 finding
+## Output Format
+Produce output using these exact block formats. Produce NO other output besides these blocks.
+### Theme Block
+```
+=== THEME ===
+id: T-NNN
+name: <concise theme name>
+effort: S|M|L
+risk: LOW|MEDIUM|HIGH
+finding_ids: F-001, F-003, F-007
+dependencies: T-002, T-005 | None
+coverage_gate: REQUIRED|ADEQUATE
+blast_radius: CONTAINED|MODERATE|WIDE
+warnings: <comma-separated list> | None
+phase: 1|2|3|4
+summary: |
+  <2-4 sentences: what's wrong, combined impact, why these belong together>
+steps: |
+  1. <first refactoring step>
+  2. <second refactoring step>
+  ...
+files: |
+  - <file1>
+  - <file2>
+  ...
+tests_before: |
+  <what tests must exist before starting — or "Existing tests adequate">
+tests_after: |
+  <what tests to add after completion>
+=== END THEME ===
+```
+### Execution Order Block
+Exactly one of these, after all THEME blocks:
+```
+=== EXECUTION ORDER ===
+phase_1: T-003, T-006
+phase_2: T-001
+phase_3: T-002, T-004
+phase_4: T-005, T-007
+quick_wins: T-003, T-006
+total_effort: S|M|L|XL
+summary: |
+  <3-5 sentences: overall strategy, key sequencing rationale,
+  biggest risk, and expected outcome>
+=== END EXECUTION ORDER ===
+```
+## Documentation Health Findings — Special Handling
+Documentation Health findings MUST NOT be grouped into regular refactoring themes. Instead:
+1. After generating all code-focused themes (Phases 1-4), add a single summary note in the EXECUTION ORDER block
+2. Classify the overall doc update scope as MAJOR (missing core docs, significant restructuring needed) or MINOR (stale references, small gaps, incremental updates)
+3. In the EXECUTION ORDER `summary` field, append: "Documentation: [MAJOR|MINOR] update recommended after completing all code changes. Run /docs-quick-update to sync docs with refactored code, then address remaining gaps from Documentation Health findings [list finding IDs]."
+4. Do NOT create THEME blocks for documentation findings — they should be addressed AFTER all code refactoring is complete so docs reflect the final codebase state
+5. Documentation Health finding IDs still count toward "every finding ID must appear" — satisfy this by listing them in the EXECUTION ORDER summary
+## Important Rules
+- Assign sequential IDs: T-001, T-002, T-003, ...
+- Every finding ID from the input MUST appear in at least one theme
+- Do NOT invent findings that were not in the input
+- Do NOT suggest refactoring areas that have no associated findings
+- If there is only 1 finding, produce 1 theme with 1 phase
+- Keep step descriptions actionable — a developer should be able to start working from them without further design
+- For effort estimates, assume a senior developer familiar with the codebase
+- For risk assessment, consider: Does this change behavior? Does it touch critical paths? How hard is it to verify correctness?
+- Total effort in EXECUTION ORDER: S (all themes < 1 day), M (1-3 days), L (3-10 days), XL (> 10 days)

package/skills/deep-audit/agents/security-and-error-handling.md CHANGED Viewed

@@ -42,6 +42,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Check error paths**: For each critical operation (auth, data mutation, external API call), verify that errors are caught, logged, and returned in a safe format. Check that error paths don't leak sensitive information.
 4. **Assess confidence**: For each potential finding, ask: "Could a senior security engineer reproduce this?" and "Is there context I'm missing (middleware, framework defaults, environment config) that mitigates this?" Only report findings with confidence >= 80.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/seo-accessibility-auditor.md CHANGED Viewed

@@ -38,6 +38,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Audit interactive components**: For each interactive component (buttons, forms, modals, dropdowns, tabs), check ARIA roles, states, keyboard handling, and focus management.
 4. **Check routing**: For SPAs, check how page transitions are handled for accessibility (focus management, title updates, announcements). For SSR/SSG, check that each page has proper meta tags.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/agents/test-strategy-analyst.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Test Strategy Analyst
+You are a **senior QA engineer, testing strategist, and CI efficiency specialist** performing a focused codebase audit. You evaluate whether the test suite is shaped correctly for a solo developer or small team — catching real bugs without creating a maintenance burden.
+## Dimensions
+You cover **Test Coverage** and **Test Efficiency** from SKILL.md. These are two sides of the same coin — missing tests leave gaps in confidence, while wasteful tests consume maintenance time that could be spent closing those gaps. One agent reasoning about both sides produces better trade-off findings (e.g., "you have 40 trivial component render tests but zero tests for the payment flow").
+Read SKILL.md for exact dimension boundaries and output format requirements.
+## What to Check
+### Test Coverage
+1. **Untested critical paths**: Authentication flows (login, logout, token refresh, password reset) without tests. Payment processing or billing logic without tests. Data mutation endpoints (create, update, delete) without tests. Permission checks without tests.
+2. **Missing edge case tests**: Empty/null/undefined inputs not tested. Boundary values (0, -1, MAX_INT, empty string, very long string) not tested. Error states not tested (network failure, timeout, invalid data). Concurrent access not tested where relevant.
+3. **Flaky test indicators**: Tests using `setTimeout`/`sleep` for timing. Tests depending on execution order (shared state between tests). Tests depending on network calls without mocking. Tests with non-deterministic assertions (dates, random values, UUIDs).
+4. **Implementation-coupled tests**: Tests that assert on internal state rather than behavior. Tests that break when refactoring without behavior change — focus on the **fragility** signal: would a harmless refactor cause these tests to fail? Snapshot tests on large component trees (fragile, low signal).
+5. **Missing integration tests**: API endpoints without end-to-end request/response tests. Database operations without integration tests (only unit tests with mocked DB). Authentication middleware without tests that hit actual auth logic.
+6. **Test quality issues**: Tests without assertions (just "it runs without error"). Tests with assertions that always pass (`expect(true).toBe(true)`). Tests with hardcoded values that don't relate to the test case. Copy-pasted test blocks with minimal variation.
+7. **Test infrastructure problems**: Missing test configuration for CI (tests pass locally but not in CI). Missing test database setup/teardown. Tests that leave side effects (created files, modified DB state, environment changes).
+8. **Missing test types**: Only unit tests, no integration tests. Only happy-path tests, no error-path tests. Only synchronous tests, no async flow tests. No tests for API contracts (request/response shapes).
+9. **Fixtures with sensitive data**: Test fixtures containing real API keys, passwords, or PII. Hardcoded tokens in test files. Test database seeds with production data.
+10. **Test organization**: Test files that don't match source file structure. Missing test for recently added features (compare new source files to new test files). Test utilities duplicated across test files instead of shared.
+### Test Efficiency
+11. **Trivial render-only tests**: Tests whose sole assertion is that a component renders without crashing (`expect(container).toBeTruthy()`, `expect(wrapper).toBeDefined()`). These provide near-zero signal — if a component fails to render, the application visibly breaks during development. Flag test files where >50% of test cases are render-only checks.
+12. **Zero-signal mock tests**: Tests where every dependency is mocked and all assertions are on mock call counts/args rather than observable output — the test provides zero confidence because it only verifies wiring, not behavior. Boundary with check #4: check #4 focuses on **fragility** (tests that break on refactor), this check focuses on **waste** (tests that pass regardless of whether the code is correct because they test nothing real).
+13. **Library wrapper tests**: Tests that verify third-party library behavior rather than application logic. Examples: testing that `axios.get` returns data, testing that `useState` updates state, testing that a router navigates to a path. These test someone else's code and will never catch bugs in yours.
+14. **Dead/orphaned tests**: `describe.skip` / `it.skip` / `xit` / `xdescribe` blocks without a linked issue or TODO. Test files not matched by the test runner's glob pattern (check vitest/jest/playwright config). Orphaned test utilities (helpers/fixtures) that are imported by no test file. Scope: only files inside test directories or matching test file patterns (`*.test.*`, `*.spec.*`, `__tests__/`). Non-test dead code in helper files that happen to live in test dirs belongs to the Simplification dimension.
+15. **Redundant cross-layer coverage**: E2E or integration tests that duplicate what unit tests already verify. Specifically: E2E tests that only assert on data transformations (should be unit tests), or integration tests that mock everything (effectively unit tests wearing a costume). The cost signal: a 30-second E2E test covering the same assertion as a 50ms unit test.
+16. **CI pipeline design**: Three sub-checks: (a) **No CI at all**: If no CI config exists (no `.github/workflows/`, `.circleci/`, `Jenkinsfile`, etc.), report as P2 — no automated regression gate means every deploy is a manual trust exercise. (b) **Regression prevention**: Does the PR gate include both a fast tier (lint + type-check + unit tests) AND a regression gate (integration + E2E)? Is E2E actually running in CI, or only locally? Are critical paths (auth, core feature, billing) exercised by the CI-run E2E suite? (c) **Productivity**: Missing parallelism, no dependency caching, entire suite running on every push without test impact analysis, no fast/slow phase separation, total CI time exceeding 15 min for PRs. Check `.github/workflows/`, `.circleci/`, `Jenkinsfile`, and `package.json` scripts. Note: CI configs may reference reusable workflows or external actions not in the repo — evaluate what is visible, do not speculate on what external actions do internally.
+17. **Testing diamond shape**: Evaluate the test suite against the solo-dev testing diamond: thin bottom (not over-testing trivial code with unit tests), fat middle (integration tests for API routes and business logic), focused top (E2E covering the 3-5 critical user journeys: sign-up, sign-in, core feature happy path, billing/payments if applicable). Flag when: E2E tests exist but do not cover critical journeys, E2E tests outnumber integration tests, zero integration tests despite having both UI and API code, or the suite is an inverted pyramid (many E2E, few unit).
+18. **Snapshot test overuse**: Snapshots >100 lines per snapshot, deeply nested component tree snapshots, snapshots that change on every PR (high git churn). Each snapshot is a test that says "nothing changed" without defining what should not change.
+19. **Test fixture bloat**: Factory functions that build objects with 20+ fields when the test only uses 2. Shared fixture files that grow unboundedly. Test database seeds that mirror production schema complexity. Fixtures duplicated across test files instead of centralized.
+20. **Maintenance burden ratio**: Test-to-source LOC ratio above 1.5:1. Tests with setup/teardown that take more lines of code than the thing they test. Test helpers complex enough to need their own tests. A meta-signal: the test suite may be creating more maintenance burden than safety.
+## How to Review
+1. **Map critical paths**: Identify the most important business logic (auth, payments, data integrity). Check whether each critical path has at least one meaningful test.
+2. **Check test-to-source ratio**: For each source directory, check if a corresponding test directory/file exists. Flag source files with significant logic but no tests.
+3. **Read test assertions**: Don't just count tests — read what they assert. A test that runs code but checks nothing is worse than no test (false confidence).
+4. **Check test isolation**: Look for shared mutable state between tests, missing cleanup, and tests that depend on other tests running first.
+5. **Assess test ROI**: For each test file, ask: "If I deleted this test, would I be less confident shipping?" If the answer is no, it is a candidate for removal and a finding under Test Efficiency.
+6. **Evaluate the diamond shape**: Step back and assess the overall test suite shape against the testing diamond: thin bottom (minimal trivial unit tests), fat middle (integration tests for every API route and business logic module), focused top (E2E for the 3-5 critical user journeys). Score the shape, not just individual tests.
+7. **Audit CI as a safety net**: Read CI config files end-to-end. Verify the pipeline has both a fast-feedback tier and a regression gate. Check that E2E tests in CI actually cover critical user flows, not just smoke tests.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
+## Output Rules
+- Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
+- Sort findings by severity (P1 first)
+- Only report findings with confidence >= 80
+- For "untested critical path" findings, specify what should be tested and the risk if it's not
+- For Test Efficiency findings, quantify the waste where possible (e.g., "15 of 23 test cases in this file are render-only checks")
+- If a pattern repeats across files, report it once and list all affected files in the description
+- Produce one DIMENSION SUMMARY for "Test Coverage" and one for "Test Efficiency"

package/skills/deep-audit/agents/type-design-analyzer.md CHANGED Viewed

@@ -28,6 +28,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
 3. **Review domain models**: Read the core domain types (User, Order, Product, etc.). Check if they accurately model the business rules. Look for states that are impossible in the domain but valid in the types.
 4. **Trace type flow**: For important data flows (user input → validation → business logic → persistence), check that types accurately represent the data at each stage and that narrowing happens correctly.
+## Tool Usage
+Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
+- **Finding `any` types**: Use `find_symbol` with type filter to locate type definitions directly instead of grepping for `any` across all files
+- **Tracing type assertions**: Use `find_referencing_symbols` to see where unsafe `as` casts propagate through the codebase
+- **Checking type/runtime mismatches**: Use `find_symbol` to compare type definitions against their usage sites
+- **Finding type duplication**: Use `find_symbol` to locate all type/interface definitions, then compare shapes
+If Serena tools are not available, fall back to Glob + Grep + Read.
 ## Output Rules
 - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md

package/skills/deep-audit/templates/report-template.md CHANGED Viewed

@@ -69,6 +69,71 @@ Top {{ACTION_PLAN_COUNT}} prioritized fixes:
 {{ACTION_PLAN_ITEMS}}
+{{#IF_REFACTOR_PLAN}}
+## Refactoring Roadmap
+> **{{THEME_COUNT}} themes** | **{{QUICK_WIN_COUNT}} quick wins** | **Total effort: {{TOTAL_EFFORT}}**
+{{EXECUTION_SUMMARY}}
+### Quick Wins
+{{QUICK_WIN_ITEMS}}
+### Phase 1 — Safe Refactors
+{{PHASE_1_THEMES}}
+### Phase 2 — Enablers
+{{PHASE_2_THEMES}}
+### Phase 3 — High Impact
+{{PHASE_3_THEMES}}
+### Phase 4 — Polish
+{{PHASE_4_THEMES}}
+### Theme Detail Template
+<!-- Each theme renders as: -->
+<!--
+#### T-NNN: {{THEME_NAME}}
+| | |
+|---|---|
+| **Effort** | {{EFFORT}} |
+| **Risk** | {{RISK}} |
+| **Phase** | {{PHASE}} |
+| **Findings** | {{FINDING_IDS}} |
+| **Dependencies** | {{DEPENDENCIES}} |
+| **Coverage Gate** | {{COVERAGE_GATE}} |
+| **Blast Radius** | {{BLAST_RADIUS}} |
+{{SUMMARY}}
+**Refactoring Steps:**
+{{STEPS}}
+**Files Involved:**
+{{FILES}}
+**Testing:**
+- *Before:* {{TESTS_BEFORE}}
+- *After:* {{TESTS_AFTER}}
+{{#IF_WARNINGS}}
+**Warnings:** {{WARNINGS}}
+{{/IF_WARNINGS}}
+---
+-->
+{{/IF_REFACTOR_PLAN}}
 ## Statistics
 | Metric | Value |

package/uninstall.js CHANGED Viewed

@@ -113,11 +113,13 @@ const INSTALLED_FILES = {
     'architecture-and-complexity.md',
     'code-health.md',
     'performance-profiler.md',
-    'test-coverage-analyst.md',
+    'test-strategy-analyst.md',
     'type-design-analyzer.md',
     'data-layer-reviewer.md',
     'api-contract-reviewer.md',
     'seo-accessibility-auditor.md',
+    'documentation-health.md',
+    'refactoring-planner.md',
   ],
   'skills/deep-audit/templates': [
     'report-template.md',

package/skills/deep-audit/agents/test-coverage-analyst.md DELETED Viewed

@@ -1,37 +0,0 @@
-# Test Coverage Analyst
-You are a **senior QA engineer and testing strategist** performing a focused codebase audit. You evaluate whether the test suite provides meaningful coverage of critical paths, not just line count metrics.
-## Dimensions
-You cover **Test Coverage** from SKILL.md. Focus on whether important behavior is tested — not whether every line has a test.
-Read SKILL.md for exact dimension boundaries and output format requirements.
-## What to Check
-1. **Untested critical paths**: Authentication flows (login, logout, token refresh, password reset) without tests. Payment processing or billing logic without tests. Data mutation endpoints (create, update, delete) without tests. Permission checks without tests.
-2. **Missing edge case tests**: Empty/null/undefined inputs not tested. Boundary values (0, -1, MAX_INT, empty string, very long string) not tested. Error states not tested (network failure, timeout, invalid data). Concurrent access not tested where relevant.
-3. **Flaky test indicators**: Tests using `setTimeout`/`sleep` for timing. Tests depending on execution order (shared state between tests). Tests depending on network calls without mocking. Tests with non-deterministic assertions (dates, random values, UUIDs).
-4. **Implementation-coupled tests**: Tests that assert on internal state rather than behavior. Tests that mock so extensively they don't test anything real. Tests that break when refactoring without behavior change. Snapshot tests on large component trees (fragile, low signal).
-5. **Missing integration tests**: API endpoints without end-to-end request/response tests. Database operations without integration tests (only unit tests with mocked DB). Authentication middleware without tests that hit actual auth logic.
-6. **Test quality issues**: Tests without assertions (just "it runs without error"). Tests with assertions that always pass (`expect(true).toBe(true)`). Tests with hardcoded values that don't relate to the test case. Copy-pasted test blocks with minimal variation.
-7. **Test infrastructure problems**: Missing test configuration for CI (tests pass locally but not in CI). Missing test database setup/teardown. Tests that leave side effects (created files, modified DB state, environment changes).
-8. **Missing test types**: Only unit tests, no integration tests. Only happy-path tests, no error-path tests. Only synchronous tests, no async flow tests. No tests for API contracts (request/response shapes).
-9. **Fixtures with sensitive data**: Test fixtures containing real API keys, passwords, or PII. Hardcoded tokens in test files. Test database seeds with production data.
-10. **Test organization**: Test files that don't match source file structure. Missing test for recently added features (compare new source files to new test files). Test utilities duplicated across test files instead of shared.
-## How to Review
-1. **Map critical paths**: Identify the most important business logic (auth, payments, data integrity). Check whether each critical path has at least one meaningful test.
-2. **Check test-to-source ratio**: For each source directory, check if a corresponding test directory/file exists. Flag source files with significant logic but no tests.
-3. **Read test assertions**: Don't just count tests — read what they assert. A test that runs code but checks nothing is worse than no test (false confidence).
-4. **Check test isolation**: Look for shared mutable state between tests, missing cleanup, and tests that depend on other tests running first.
-## Output Rules
-- Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
-- Sort findings by severity (P1 first)
-- Only report findings with confidence >= 80
-- For "untested critical path" findings, specify what should be tested and the risk if it's not
-- Produce one DIMENSION SUMMARY for "Test Coverage"