@torka/claude-workflows 0.13.1 → 0.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -43,6 +43,36 @@ After installation, try running one of the commands to test:
43
43
  /designer-founder
44
44
  ```
45
45
 
46
+ ### Auto-Format on Edit (optional)
47
+
48
+ Auto-run linters and formatters after Claude edits files. Add to your project's `.claude/settings.local.json`:
49
+
50
+ ```json
51
+ {
52
+ "hooks": {
53
+ "PostToolUse": [
54
+ {
55
+ "matcher": "Edit|MultiEdit",
56
+ "hooks": [
57
+ {
58
+ "type": "command",
59
+ "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx ]]; then npx eslint \"$CLAUDE_TOOL_FILE_PATH\" --fix 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then pylint \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
60
+ },
61
+ {
62
+ "type": "command",
63
+ "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.json || \"$CLAUDE_TOOL_FILE_PATH\" == *.css || \"$CLAUDE_TOOL_FILE_PATH\" == *.html ]]; then npx prettier --write \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then black \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.go ]]; then gofmt -w \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.rs ]]; then rustfmt \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.php ]]; then php-cs-fixer fix \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
64
+ }
65
+ ]
66
+ }
67
+ ]
68
+ }
69
+ }
70
+ ```
71
+
72
+ **Supported languages:** JavaScript/TypeScript (ESLint + Prettier), Python (Pylint + Black), Go (gofmt), Rust (rustfmt), PHP (php-cs-fixer)
73
+
74
+ > **Note:** This is a project-level setting — formatter choice varies per project. Commands fail silently (`|| true`) if tools aren't installed.
75
+
46
76
  ## Usage
47
77
 
48
78
  ### Commands
@@ -8,12 +8,15 @@ Comprehensive multi-agent codebase audit. Spawns parallel review agents across m
8
8
 
9
9
  **Usage:**
10
10
  ```
11
- /deep-audit # Quick mode (3 agents) on full project
12
- /deep-audit --full # Full mode (9 agents) on full project
13
- /deep-audit --pr 42 # Audit a specific PR diff
14
- /deep-audit --since abc123f # Audit changes since a commit hash
15
- /deep-audit --since 2025-01-15 # Audit changes since a date
16
- /deep-audit --full --pr 42 # Full mode on a specific PR
11
+ /deep-audit # Quick mode + auto refactoring plan
12
+ /deep-audit --full # Full mode + auto refactoring plan
13
+ /deep-audit --review-before-plan # Pause after findings, ask before plan
14
+ /deep-audit --pr 42 # Audit a specific PR diff
15
+ /deep-audit --since abc123f # Audit changes since a commit hash
16
+ /deep-audit --since 2025-01-15 # Audit changes since a date
17
+ /deep-audit --full --pr 42 # Full mode on a specific PR
18
+ /deep-audit --agent security-and-error-handling # Run only one agent
19
+ /deep-audit --agent performance-profiler --pr 42 # Single agent on a PR
17
20
  ```
18
21
 
19
22
  <workflow CRITICAL="TRUE">
@@ -25,10 +28,20 @@ IT IS CRITICAL THAT YOU FOLLOW THIS WORKFLOW EXACTLY.
25
28
  Parse `$ARGUMENTS` to determine:
26
29
 
27
30
  1. **Mode**: Check for `--full` flag
28
- - If `--full` present → `mode = "full"` (9 agents)
31
+ - If `--full` present → `mode = "full"` (10 agents)
29
32
  - Otherwise → `mode = "quick"` (3 agents)
30
33
 
31
- 2. **Scope**: Check for scope flags (mutually exclusive)
34
+ 2. **Single Agent**: Check for `--agent <name>` flag
35
+ - If `--agent <name>` present → `single_agent = "<name>"`
36
+ - Otherwise → `single_agent = null`
37
+ - If both `--full` and `--agent` are present: warn that `--full` is ignored in single-agent mode, set `mode = "single"`, ignore `--full`
38
+ - If `--agent` is present without `--full`: set `mode = "single"`
39
+
40
+ 3. **Review Before Plan**: Check for `--review-before-plan` flag
41
+ - If `--review-before-plan` present → `review_before_plan = true`
42
+ - Otherwise → `review_before_plan = false`
43
+
44
+ 4. **Scope**: Check for scope flags (mutually exclusive)
32
45
  - `--pr <number>` → `scope = "pr"`, `scope_value = <number>`
33
46
  - `--since <value>` → detect format:
34
47
  - If matches date pattern (YYYY-MM-DD) → `scope = "since-date"`, `scope_value = <date>`
@@ -116,29 +129,52 @@ Read the SKILL.md file at the path relative to this command:
116
129
  skills/deep-audit/SKILL.md
117
130
  ```
118
131
 
119
- From SKILL.md, build the agent list based on mode:
132
+ From SKILL.md, build the complete agent roster (used for both mode selection and `--agent` validation):
120
133
 
121
134
  **Quick mode agents:**
122
135
  ```
123
- agents = [
136
+ quick_agents = [
124
137
  { file: "security-and-error-handling.md", model: "opus", dimensions: ["Security", "Error Handling"] },
125
138
  { file: "architecture-and-complexity.md", model: "opus", dimensions: ["Architecture", "Simplification"] },
126
139
  { file: "code-health.md", model: "sonnet", dimensions: ["AI Slop Detection", "Dependency Health"] }
127
140
  ]
128
141
  ```
129
142
 
130
- **Full mode** add these to the quick list:
143
+ **Full mode additional agents:**
131
144
  ```
132
- additional_agents = [
145
+ full_agents = [
133
146
  { file: "performance-profiler.md", model: "sonnet", dimensions: ["Performance"] },
134
- { file: "test-coverage-analyst.md", model: "sonnet", dimensions: ["Test Coverage"] },
147
+ { file: "test-strategy-analyst.md", model: "opus", dimensions: ["Test Coverage", "Test Efficiency"] },
135
148
  { file: "type-design-analyzer.md", model: "sonnet", dimensions: ["Type Design"] },
136
149
  { file: "data-layer-reviewer.md", model: "opus", dimensions: ["Data Layer & Database"] },
137
150
  { file: "api-contract-reviewer.md", model: "sonnet", dimensions: ["API Contracts & Interface Consistency"] },
138
- { file: "seo-accessibility-auditor.md", model: "sonnet", dimensions: ["SEO & Accessibility"] }
151
+ { file: "seo-accessibility-auditor.md", model: "sonnet", dimensions: ["SEO & Accessibility"] },
152
+ { file: "documentation-health.md", model: "sonnet", dimensions: ["Documentation Health"] }
139
153
  ]
140
154
  ```
141
155
 
156
+ **Build the active agent list based on mode:**
157
+
158
+ - **If `mode = "single"`**: Search both `quick_agents` and `full_agents` for an agent whose filename (without `.md`) matches `single_agent`. If found → `agents = [matched_agent]`. If NOT found → print the error below and **STOP** (do not proceed):
159
+ ```
160
+ Unknown agent: "<single_agent>"
161
+
162
+ Available agents:
163
+ Quick mode: security-and-error-handling, architecture-and-complexity, code-health
164
+ Full mode: performance-profiler, test-strategy-analyst, type-design-analyzer,
165
+ data-layer-reviewer, api-contract-reviewer, seo-accessibility-auditor,
166
+ documentation-health
167
+ ```
168
+ - **If `mode = "quick"`**: `agents = quick_agents`
169
+ - **If `mode = "full"`**: `agents = quick_agents + full_agents`
170
+
171
+ **Refactoring planner** — added when mode is NOT "single" (runs separately in Phase 6, NOT in Phase 4):
172
+ ```
173
+ planner_agent = { file: "refactoring-planner.md", model: "opus", dimensions: ["Refactoring"] }
174
+ ```
175
+
176
+ Include this agent in `state.agents` for tracking when mode is not "single", but do NOT spawn it in Phase 4.
177
+
142
178
  If resuming: filter `agents` to only those with status "pending" in `previous_state.agents`.
143
179
 
144
180
  ---
@@ -152,9 +188,10 @@ Write `_bmad-output/deep-audit/state.json`:
152
188
  ```json
153
189
  {
154
190
  "status": "in_progress",
155
- "mode": "<quick|full>",
191
+ "mode": "<quick|full|single>",
156
192
  "scope": "<scope type>",
157
193
  "scope_value": "<scope value or null>",
194
+ "review_before_plan": false,
158
195
  "start_commit": "<current_commit>",
159
196
  "start_time": "<ISO timestamp>",
160
197
  "detected_stack": "<detected_stack>",
@@ -170,10 +207,13 @@ Write `_bmad-output/deep-audit/state.json`:
170
207
  }
171
208
  },
172
209
  "findings": [],
210
+ "refactoring_plan": null,
173
211
  "report_path": null
174
212
  }
175
213
  ```
176
214
 
215
+ Note: The `agents` object includes the `refactoring-planner` entry with `status: "pending"`. It is tracked like all agents but spawned in Phase 6, not Phase 4.
216
+
177
217
  If resuming, merge pending agent statuses into the existing state (keep completed agents' data intact).
178
218
 
179
219
  Print status:
@@ -181,7 +221,7 @@ Print status:
181
221
  Deep Audit — <mode> mode
182
222
  Scope: <scope description>
183
223
  Stack: <detected_stack>
184
- Agents: <count> (<list of agent names>)
224
+ Agent(s): <count> (<list of agent names>)
185
225
  Commit: <short hash>
186
226
  ```
187
227
 
@@ -263,7 +303,88 @@ Deduplication: <original count> findings → <deduped count> findings (<removed
263
303
 
264
304
  ---
265
305
 
266
- ## Phase 6: Generate Report
306
+ ## Phase 6: Refactoring Planner
307
+
308
+ Skip this phase if `mode = "single"` (single-agent audits don't warrant cross-cutting refactoring plans). There is no planner agent in state.json to update.
309
+
310
+ Also skip this phase if the deduplicated findings count is 0. Set the planner agent status to "skipped" in state.json and continue.
311
+
312
+ ### Step 1: Confirm (if --review-before-plan)
313
+
314
+ If `review_before_plan` is true:
315
+
316
+ 1. Print a findings summary to the user:
317
+ ```
318
+ FINDINGS SUMMARY: X total (Y critical, Z important, W minor)
319
+
320
+ Top findings:
321
+ 1. F-001: <title> (P1)
322
+ 2. F-002: <title> (P1)
323
+ 3. F-003: <title> (P2)
324
+ ```
325
+
326
+ 2. Ask the user: **"Generate refactoring plan from these findings? (Y/n)"**
327
+
328
+ 3. If the user says no → set planner agent status to "skipped" in state.json, skip to Phase 7.
329
+
330
+ If `review_before_plan` is false, proceed directly to Step 2.
331
+
332
+ ### Step 2: Generate plan
333
+
334
+ 1. Serialize all deduplicated findings from Phase 5 into a single text block using the `=== FINDING ===` format. Include the assigned `id` field (F-001, etc.) so the planner can reference them.
335
+
336
+ 2. Read the agent prompt from `skills/deep-audit/agents/refactoring-planner.md`
337
+
338
+ 3. Spawn via Task tool (same pattern as Phase 4):
339
+ ```
340
+ Tool: Task
341
+ subagent_type: general-purpose
342
+ model: opus
343
+ description: "deep-audit: refactoring-planner"
344
+ prompt: |
345
+ <agent prompt content>
346
+
347
+ ---
348
+ ## Input Findings (injected by orchestrator)
349
+
350
+ <serialized findings payload>
351
+
352
+ ---
353
+ ## Output Format Reminder
354
+ You MUST produce output using the exact format defined above:
355
+ - === THEME === blocks for each refactoring theme
356
+ - Exactly one === EXECUTION ORDER === block at the end
357
+ Produce NO other output besides these blocks.
358
+ ```
359
+
360
+ 4. Parse the response:
361
+ - Extract all `=== THEME ===` blocks: id, name, effort, risk, finding_ids, dependencies, coverage_gate, blast_radius, warnings, phase, summary, steps, files, tests_before, tests_after
362
+ - Extract the single `=== EXECUTION ORDER ===` block: phase_1 through phase_4, quick_wins, total_effort, summary
363
+
364
+ 5. Store parsed data in `state.refactoring_plan`:
365
+ ```json
366
+ {
367
+ "themes": [ ...parsed theme objects... ],
368
+ "execution_order": { ...parsed execution order... }
369
+ }
370
+ ```
371
+
372
+ 6. Update planner agent status in `state.agents` to "completed" with timestamps and raw_output.
373
+
374
+ 7. Write updated state.json to disk.
375
+
376
+ If the planner agent fails (Task tool returns error):
377
+ - Set planner agent status to "failed" in state.json
378
+ - Log a warning but continue to Phase 7 (the report generates without the roadmap section)
379
+
380
+ Print progress:
381
+ ```
382
+ Refactoring Planner: <theme_count> themes (<quick_win_count> quick wins), total effort: <total_effort>
383
+ ```
384
+
385
+ ---
386
+
387
+ ## Phase 7: Generate Report
267
388
 
268
389
  Read the report template from:
269
390
  ```
@@ -303,7 +424,14 @@ Fill in the template:
303
424
 
304
425
  4. **Action Plan**: Select the top 5 findings (by severity, then confidence) and format as a numbered action list with brief description of what to fix and why.
305
426
 
306
- 5. **Statistics**: Total findings, per-severity counts, agent count, dimension count, per-agent breakdown table.
427
+ 5. **Refactoring Roadmap** (only if `state.refactoring_plan` is not null):
428
+ - Fill the `{{#IF_REFACTOR_PLAN}}` conditional block
429
+ - Set `{{THEME_COUNT}}`, `{{QUICK_WIN_COUNT}}`, `{{TOTAL_EFFORT}}`, `{{EXECUTION_SUMMARY}}`
430
+ - Render `{{QUICK_WIN_ITEMS}}`: themes flagged as quick wins
431
+ - Render `{{PHASE_1_THEMES}}` through `{{PHASE_4_THEMES}}`: themes grouped by phase
432
+ - For each theme, render using the Theme Detail Template (see report-template.md)
433
+
434
+ 6. **Statistics**: Total findings, per-severity counts, agent count, dimension count, per-agent breakdown table.
307
435
 
308
436
  ### Write the report
309
437
 
@@ -317,7 +445,7 @@ Update state.json with `report_path`.
317
445
 
318
446
  ---
319
447
 
320
- ## Phase 7: Finalize State
448
+ ## Phase 8: Finalize State
321
449
 
322
450
  Update `_bmad-output/deep-audit/state.json`:
323
451
  - Set `status = "completed"`
@@ -326,7 +454,7 @@ Update `_bmad-output/deep-audit/state.json`:
326
454
 
327
455
  ---
328
456
 
329
- ## Phase 8: Present Summary
457
+ ## Phase 9: Present Summary
330
458
 
331
459
  Print a concise summary to the user:
332
460
 
@@ -335,7 +463,7 @@ Print a concise summary to the user:
335
463
  DEEP AUDIT COMPLETE
336
464
  ═══════════════════════════════════════════════════
337
465
 
338
- Mode: <quick|full> (<agent_count> agents)
466
+ Mode: <quick|full|single> (<agent_count> agent(s))
339
467
  Scope: <scope description>
340
468
  Duration: <duration>
341
469
 
@@ -359,7 +487,20 @@ State: _bmad-output/deep-audit/state.json
359
487
  ═══════════════════════════════════════════════════
360
488
  ```
361
489
 
362
- If any agents failed, add a section:
490
+ If `state.refactoring_plan` is not null, add after TOP ACTIONS and before Report:
491
+ ```
492
+ REFACTORING ROADMAP
493
+ ─────────────────────────────────────────────────
494
+ <theme_count> themes across 4 phases | <quick_win_count> quick wins
495
+ Total effort: <total_effort>
496
+
497
+ QUICK WINS (do these now)
498
+ 1. T-NNN: <theme name> (<effort>, <risk> risk)
499
+ 2. T-NNN: <theme name> (<effort>, <risk> risk)
500
+ ...
501
+ ```
502
+
503
+ If any agents failed (including the planner), add a section:
363
504
  ```
364
505
  WARNINGS
365
506
  - Agent <name> failed: <error summary>
@@ -89,13 +89,16 @@ The user may provide one or more of:
89
89
  - Recommended worktree strategy
90
90
 
91
91
  8. **Save Report**
92
- Write to: `_bmad-output/planning-artifacts/parallelization-analysis-{YYYY-MM-DD-HHmm}.md`
93
- (Includes timestamp to prevent same-day collisions)
92
+ Get the current local timestamp by running: `date "+%Y-%m-%d-%H%M"`
93
+ Write to: `_bmad-output/planning-artifacts/parallelization-analysis-{timestamp}.md`
94
+ (Includes timestamp to prevent same-day collisions — do NOT guess the time)
94
95
  </steps>
95
96
 
96
97
  ## Output Template
97
98
 
98
- Use this structure for the report:
99
+ Follow this structure exactly. You may add a "Visual Dependency Graph" section
100
+ (ASCII art showing the phase flow) after the Dependency Matrix, but do not add
101
+ other ad-hoc sections or restructure the template:
99
102
 
100
103
  ```markdown
101
104
  # Epic Parallelization Analysis
@@ -171,11 +174,25 @@ git worktree add ../epic-3-email-system feature/epic-3
171
174
  | 2 | User Auth | Epic 1 (Infrastructure) | Pending |
172
175
  | 5 | Analytics | Epic 2 (Auth) | Pending |
173
176
 
177
+ ## Visual Dependency Graph
178
+ <!-- ASCII art showing phase flow. Example: -->
179
+ ```
180
+ Phase 1: [Epic 1] [Epic 3]
181
+
182
+ ┌──────┼──────┐
183
+ ▼ ▼ ▼
184
+ Phase 2: [E2] [E4] [E5]
185
+ └──────┼──────┘
186
+
187
+ Phase 3: [Epic 6]
188
+ ```
189
+
174
190
  ## Worktree Strategy Recommendations
175
191
  - **Max parallel worktrees**: [recommended number based on dependencies]
176
192
  - **Critical path**: Epic X → Epic Y → Epic Z
177
193
  - **Bottleneck epics**: [epics that block the most others]
178
194
  - **Quick wins**: [small epics that can be completed to unblock others]
195
+ - **Merge order**: [for parallel phases, specify which epic to merge first based on what it unblocks]
179
196
  ```
180
197
 
181
198
  ## Important Notes
@@ -14,5 +14,22 @@
14
14
  ],
15
15
  "deny": [],
16
16
  "ask": []
17
+ },
18
+ "hooks": {
19
+ "PostToolUse": [
20
+ {
21
+ "matcher": "Edit|MultiEdit",
22
+ "hooks": [
23
+ {
24
+ "type": "command",
25
+ "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx ]]; then npx eslint \"$CLAUDE_TOOL_FILE_PATH\" --fix 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then pylint \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
26
+ },
27
+ {
28
+ "type": "command",
29
+ "command": "if [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.js || \"$CLAUDE_TOOL_FILE_PATH\" == *.ts || \"$CLAUDE_TOOL_FILE_PATH\" == *.jsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.tsx || \"$CLAUDE_TOOL_FILE_PATH\" == *.json || \"$CLAUDE_TOOL_FILE_PATH\" == *.css || \"$CLAUDE_TOOL_FILE_PATH\" == *.html ]]; then npx prettier --write \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.py ]]; then black \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.go ]]; then gofmt -w \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.rs ]]; then rustfmt \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; elif [[ \"$CLAUDE_TOOL_FILE_PATH\" == *.php ]]; then php-cs-fixer fix \"$CLAUDE_TOOL_FILE_PATH\" 2>/dev/null || true; fi"
30
+ }
31
+ ]
32
+ }
33
+ ]
17
34
  }
18
35
  }
package/install.js CHANGED
@@ -261,6 +261,18 @@ function install() {
261
261
  }
262
262
  }
263
263
 
264
+ // Migration: remove renamed agent files from previous versions
265
+ const migrations = [
266
+ { old: 'skills/deep-audit/agents/test-coverage-analyst.md', renamed: 'test-strategy-analyst.md' },
267
+ ];
268
+ for (const { old: oldFile, renamed } of migrations) {
269
+ const oldPath = path.join(targetBase, oldFile);
270
+ if (fs.existsSync(oldPath)) {
271
+ fs.unlinkSync(oldPath);
272
+ log(` Migrated: removed old ${path.basename(oldFile)} (renamed to ${renamed})`, 'blue');
273
+ }
274
+ }
275
+
264
276
  // Ensure gitignore entries for BMAD workflow
265
277
  const bmadDir = path.join(targetBase, '../_bmad');
266
278
  if (fs.existsSync(bmadDir)) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@torka/claude-workflows",
3
- "version": "0.13.1",
3
+ "version": "0.13.2",
4
4
  "description": "Claude Code workflow helpers: epic automation, git cleanup, agents, and design workflows",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -17,11 +17,22 @@ This file is the single source of truth for agent roster, dimension boundaries,
17
17
  | Agent File | Dimension | Model |
18
18
  |------------|-----------|-------|
19
19
  | `performance-profiler.md` | Performance | sonnet |
20
- | `test-coverage-analyst.md` | Test Coverage | sonnet |
20
+ | `test-strategy-analyst.md` | Test Coverage, Test Efficiency | opus |
21
21
  | `type-design-analyzer.md` | Type Design | sonnet |
22
22
  | `data-layer-reviewer.md` | Data Layer & Database | opus |
23
23
  | `api-contract-reviewer.md` | API Contracts & Interface Consistency | sonnet |
24
24
  | `seo-accessibility-auditor.md` | SEO & Accessibility | sonnet |
25
+ | `documentation-health.md` | Documentation Health | sonnet |
26
+
27
+ ### Refactoring Planner (runs by default after all audit agents)
28
+
29
+ | Agent File | Purpose | Model |
30
+ |------------|---------|-------|
31
+ | `refactoring-planner.md` | Synthesizes findings into refactoring themes and execution plan | opus |
32
+
33
+ This agent runs in Phase 6 AFTER deduplication. It receives findings as input (not the codebase). It is skipped when there are 0 findings or the user declines via `--review-before-plan`. See the command file for details.
34
+
35
+ Each theme includes: `coverage_gate` (REQUIRED/ADEQUATE), `blast_radius` (CONTAINED/MODERATE/WIDE), and `warnings` (anti-pattern flags). See `refactoring-planner.md` for full output format.
25
36
 
26
37
  ## Dimension Boundaries
27
38
 
@@ -62,7 +73,7 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
62
73
  - Unnecessary indirection (wrapper functions that just pass through)
63
74
  - Configuration for things that never change
64
75
  - Dead code, unused exports, orphaned files
65
- - **NOT**: intentional design patterns, library APIs (they need flexibility)
76
+ - **NOT**: intentional design patterns, library APIs (they need flexibility), dead/skipped test files and orphaned test utilities (that's Test Efficiency)
66
77
 
67
78
  ### AI Slop Detection
68
79
  - Excessive/unnecessary comments explaining obvious code
@@ -100,7 +111,19 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
100
111
  - Tests that test implementation rather than behavior
101
112
  - Missing integration tests for API endpoints
102
113
  - Test fixtures with hardcoded secrets or PII
103
- - **NOT**: 100% coverage goals, testing trivial getters/setters
114
+ - **NOT**: 100% coverage goals, testing trivial getters/setters, test efficiency/waste (that's Test Efficiency)
115
+
116
+ ### Test Efficiency
117
+ - Trivial tests that provide no signal (render-only, getter/setter, library wrapper tests)
118
+ - Tests that mirror implementation instead of asserting behavior (zero-signal mock tests)
119
+ - Dead tests: skipped tests, orphaned test utilities, tests excluded by runner config
120
+ - Redundant coverage: E2E tests duplicating unit-level assertions
121
+ - CI pipeline design: missing regression gate, missing caching, excessive pipeline duration
122
+ - Test suite shape (testing diamond): over-testing trivial code, under-testing critical paths at the right layer
123
+ - Snapshot test overuse (large snapshots, frequently-changing snapshots)
124
+ - Test fixture bloat and duplication
125
+ - Test-to-source code ratio indicating maintenance burden
126
+ - **NOT**: missing tests (that's Test Coverage), test correctness issues, flaky tests (that's Test Coverage)
104
127
 
105
128
  ### Type Design
106
129
  - `any` types that should be specific
@@ -144,6 +167,19 @@ Each dimension has a clear scope. Agents MUST stay within their assigned dimensi
144
167
  - Missing Open Graph / social sharing metadata
145
168
  - **NOT**: content quality, marketing strategy, visual design choices
146
169
 
170
+ ### Documentation Health
171
+ - README completeness (description, install, usage, quickstart)
172
+ - Setup and onboarding documentation accuracy
173
+ - Configuration documentation (env vars, config files, feature flags)
174
+ - Public/exported API documentation for complex interfaces
175
+ - Inline documentation for non-obvious logic (complex algorithms, regexes, magic numbers)
176
+ - Doc structure, navigation, and discoverability
177
+ - Doc-code synchronization (stale references, outdated examples)
178
+ - Dead links and broken internal references
179
+ - CLAUDE.md and AI assistant documentation
180
+ - Contributing, licensing, and maintenance docs
181
+ - **NOT**: trivial JSDoc/docstrings (AI Slop dimension), undocumented API endpoints (API Contracts dimension), git-diff-based staleness (/docs-quick-update command), prose style or grammar quality
182
+
147
183
  ## Scoring Rubric
148
184
 
149
185
  Each dimension is scored 1–10:
@@ -165,10 +201,12 @@ Each dimension is scored 1–10:
165
201
  - Dependency Health: weight 1
166
202
  - Performance: weight 2 (full mode only)
167
203
  - Test Coverage: weight 2 (full mode only)
204
+ - Test Efficiency: weight 1 (full mode only)
168
205
  - Type Design: weight 1 (full mode only)
169
206
  - Data Layer: weight 2 (full mode only)
170
207
  - API Contracts: weight 1 (full mode only)
171
208
  - SEO & Accessibility: weight 1 (full mode only)
209
+ - Documentation Health: weight 1 (full mode only)
172
210
 
173
211
  ## Severity Definitions
174
212
 
@@ -251,3 +289,25 @@ assessment: |
251
289
  - Do NOT report the same issue multiple times across different files — report the pattern once and list affected files
252
290
  - If no findings for a dimension, still include the DIMENSION SUMMARY with score and assessment
253
291
  - Keep descriptions factual and evidence-based; avoid vague language like "could potentially" or "might cause issues"
292
+
293
+ ## Tool Usage Strategy
294
+
295
+ ### When Serena MCP is Available
296
+
297
+ If `find_symbol`, `find_referencing_symbols`, or other Serena MCP tools are available in your tool list, prefer them over Read/Grep for targeted code exploration:
298
+
299
+ | Task | Without Serena | With Serena |
300
+ |------|---------------|-------------|
301
+ | Find all usages of a function | Grep for function name | `find_referencing_symbols` |
302
+ | Understand module dependencies | Read import statements across files | `find_symbol` + references |
303
+ | Check type definitions | Grep for `interface`/`type` keywords | `find_symbol` with type filter |
304
+ | Trace call chains | Read multiple files following imports | `find_referencing_symbols` recursively |
305
+ | Find implementations | Grep for class/function names | `find_symbol` with implementation filter |
306
+
307
+ **Fallback**: If Serena tools are not available or return errors, fall back to Read/Grep. Do not fail the audit because an MCP tool is unavailable.
308
+
309
+ ### General Tool Guidelines
310
+
311
+ - **Prefer targeted reads**: Read specific functions/sections rather than entire files when possible
312
+ - **Use Glob first**: Find relevant files before reading them
313
+ - **Batch searches**: Make parallel Grep calls when checking for multiple patterns
@@ -28,6 +28,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
28
28
  3. **Compare similar endpoints**: Group endpoints by resource type. Verify they follow the same conventions (naming, pagination, error format, status codes).
29
29
  4. **Check internal contracts**: Look at service-to-service function calls. Verify that parameter types, return types, and error handling patterns are consistent across similar services.
30
30
 
31
+ ## Tool Usage
32
+
33
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
34
+
31
35
  ## Output Rules
32
36
 
33
37
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -38,6 +38,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
38
38
  3. **Look for patterns**: Don't review files in isolation. Look for inconsistencies ACROSS similar files. If 8 out of 10 route handlers follow one pattern but 2 follow a different pattern, that's a finding.
39
39
  4. **Assess value per complexity**: For each abstraction layer, ask: "Does this indirection add value or just make the code harder to follow?" If removing the abstraction would make the code simpler AND not harder to change, it's over-engineering.
40
40
 
41
+ ## Tool Usage
42
+
43
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
44
+
45
+ - **Circular dependency detection**: Use `find_referencing_symbols` to trace import chains between modules instead of reading every file's import block
46
+ - **God object identification**: Use `find_symbol` to enumerate symbols per module and count responsibilities
47
+ - **Module boundary mapping**: Use `find_referencing_symbols` to map which modules depend on which, revealing tight coupling and incorrect dependency direction
48
+ - **Dead code detection**: Use `find_referencing_symbols` on exported functions/types — zero references means dead code
49
+
50
+ If Serena tools are not available, fall back to Glob + Grep + Read.
51
+
41
52
  ## Output Rules
42
53
 
43
54
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -41,6 +41,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
41
41
  3. **Check dependency manifest**: Read `package.json` (and lock file if present). For each dependency, assess: Is it still needed? Is it maintained? Is there a lighter alternative? Is it in the right section (dependencies vs devDependencies)?
42
42
  4. **Look for patterns, not individual instances**: Don't report every unnecessary comment — identify the PATTERN (e.g., "all service files have redundant JSDoc on every method") and report it once with affected file list.
43
43
 
44
+ ## Tool Usage
45
+
46
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
47
+
44
48
  ## Output Rules
45
49
 
46
50
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -28,6 +28,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
28
28
  3. **Check migration history**: Read migration files in order. Look for risky migrations (data loss, long locks, irreversible changes). Check that each migration has a reasonable rollback strategy.
29
29
  4. **Review query patterns**: Look at how the application queries data. Check for missing indexes, N+1 patterns, and unbounded queries. Focus on queries in hot paths (frequently executed endpoints).
30
30
 
31
+ ## Tool Usage
32
+
33
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
34
+
35
+ - **Tracing query patterns**: Use `find_referencing_symbols` to trace how queries flow from route handlers through services to the data layer
36
+ - **Finding ORM model usage**: Use `find_symbol` to locate model definitions, then `find_referencing_symbols` to see where they're queried
37
+ - **Missing transaction detection**: Use `find_referencing_symbols` on mutation functions to check if callers wrap them in transactions
38
+ - **Schema/code mismatches**: Use `find_symbol` to compare ORM model definitions against migration files
39
+
40
+ If Serena tools are not available, fall back to Glob + Grep + Read.
41
+
31
42
  ## Output Rules
32
43
 
33
44
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -0,0 +1,44 @@
1
+ # Documentation Health Auditor
2
+
3
+ You are a **senior technical writer and developer experience specialist** performing a focused codebase audit. You evaluate whether the project's documentation enables a new contributor to understand, configure, and contribute to the project without reading source code.
4
+
5
+ ## Dimensions
6
+
7
+ You cover **Documentation Health** from SKILL.md. Focus on documentation that is missing, misleading, or structurally broken — not on prose style or formatting preferences.
8
+
9
+ Read SKILL.md for exact dimension boundaries and output format requirements.
10
+
11
+ ## What to Check
12
+
13
+ 1. **README completeness**: Missing or empty README.md. README lacks project description (what the project does and why). README missing setup/installation instructions. README missing usage examples or a quick-start section. README references a tech stack or architecture that no longer matches the codebase.
14
+ 2. **Setup and onboarding docs**: Missing environment setup instructions (required env vars, external services, database setup). Missing prerequisites section (Node version, system dependencies). No "getting started" flow that takes a new developer from clone to running application. Setup instructions that reference commands or scripts that do not exist.
15
+ 3. **Configuration documentation**: Environment variables used in code but not documented anywhere. Config files (.env.example, settings files) missing or incomplete. Feature flags or toggles without explanation of what they control. Missing documentation for deployment or CI/CD configuration.
16
+ 4. **Exported/public API documentation**: Public modules or packages with no top-level doc comments or README. Exported functions with complex signatures (3+ params, generics, union types) lacking any description. SDK or library code intended for external consumers without usage examples. Missing changelog or migration guide for versioned libraries.
17
+ 5. **Inline documentation gaps**: Complex algorithms or business logic (20+ lines of non-obvious logic) without any explaining comment. Regex patterns without a comment explaining what they match. Magic numbers or hardcoded thresholds without explanation. Workarounds or hacks without a comment explaining why the straightforward approach was avoided.
18
+ 6. **Doc structure and navigation**: docs/ folder exists but has no index or table of contents. Documentation spread across multiple locations with no cross-references. Orphaned doc files not linked from any entry point. Deeply nested doc structure with no navigation aid.
19
+ 7. **Doc-code synchronization**: Code examples in docs that use API signatures or function names that no longer exist. Architecture diagrams or descriptions that contradict the actual directory structure. Version numbers in docs that do not match package.json or recent releases. CLI usage docs that reference flags or subcommands that have been removed.
20
+ 8. **Dead links and broken references**: Internal doc links pointing to files that do not exist. Image references pointing to missing files. Links to external resources that are clearly stale (e.g., referencing archived repos or old domain names). Anchor links within markdown that point to headings that do not exist.
21
+ 9. **CLAUDE.md / AI assistant docs**: Missing CLAUDE.md in a project that clearly uses Claude Code (presence of .claude/ directory). CLAUDE.md that is a stub or template with no project-specific content. CLAUDE.md with outdated directory structure, stale command references, or wrong technology stack. Missing development commands section when the project has build/test/lint scripts.
22
+ 10. **Contributing and maintenance docs**: Missing CONTRIBUTING.md in open-source or team projects. Missing LICENSE file for published packages. No code of conduct for community projects. Missing ADR (Architecture Decision Records) when the codebase contains non-obvious architectural choices.
23
+
24
+ ## How to Review
25
+
26
+ 1. **Start from the entry point**: Read README.md first. Can you understand what this project does, how to install it, and how to run it? Note every gap or outdated reference.
27
+ 2. **Walk the new-contributor path**: Mentally simulate: clone, install dependencies, configure environment, run the app, run tests. At each step, check if documentation exists and is accurate.
28
+ 3. **Cross-reference docs with code**: For each claim in the docs (file paths, function names, commands, config keys), verify it actually exists in the codebase. Flag any mismatch.
29
+ 4. **Check doc discoverability**: Is there a clear path from README to deeper docs? Can someone find the information they need without reading every file?
30
+
31
+ ## Tool Usage
32
+
33
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
34
+
35
+ ## Output Rules
36
+
37
+ - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
38
+ - Sort findings by severity (P1 first)
39
+ - Only report findings with confidence >= 80
40
+ - For doc-code sync findings, quote the specific stale reference from the doc and what the code actually shows
41
+ - Do NOT flag: absence of JSDoc on trivial functions (that is AI Slop territory), undocumented API endpoints (that is API Contracts territory), or prose style/grammar issues
42
+ - Do NOT duplicate what /docs-quick-update does — that tool is reactive (git-diff driven). You are proactive (comprehensive health check of all documentation regardless of recent changes)
43
+ - Skip this entire audit if the project has no documentation files at all AND no README — produce a single DIMENSION SUMMARY with score 1 and note "No documentation found"
44
+ - Produce one DIMENSION SUMMARY for "Documentation Health"
@@ -29,6 +29,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
29
29
  3. **Check resource lifecycle**: For every resource created (connections, listeners, subscriptions, timers), verify there's a corresponding cleanup path. Check error paths too — resources must be cleaned up even when operations fail.
30
30
  4. **Assess impact**: Only report findings that would cause noticeable performance degradation (>100ms latency increase, >10MB memory growth, visible UI jank). Skip micro-optimizations.
31
31
 
32
+ ## Tool Usage
33
+
34
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
35
+
32
36
  ## Output Rules
33
37
 
34
38
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -0,0 +1,161 @@
1
+ # Refactoring Planner
2
+
3
+ You are a **principal software architect and tech lead** specializing in incremental refactoring strategy. You receive the complete set of deduplicated audit findings from a multi-agent codebase audit and synthesize them into an actionable refactoring roadmap.
4
+
5
+ You do NOT review code directly. Your input is the findings produced by other agents. Your job is synthesis, prioritization, and sequencing.
6
+
7
+ ## Your Input
8
+
9
+ You will receive deduplicated findings in this format:
10
+
11
+ ```
12
+ === FINDING ===
13
+ id: F-NNN
14
+ agent: <name>
15
+ severity: P1|P2|P3
16
+ confidence: <80-100>
17
+ file: <relative file path>
18
+ line: <line number or range>
19
+ dimension: <dimension name>
20
+ title: <one-line>
21
+ description: |
22
+ <2-4 sentences>
23
+ suggestion: |
24
+ <specific fix>
25
+ === END FINDING ===
26
+ ```
27
+
28
+ ## What You Must Produce
29
+
30
+ ### Step 1: Identify Refactoring Themes
31
+
32
+ Group related findings into themes. A theme is a coherent refactoring effort that addresses multiple findings together. Name themes for the **outcome**, not the problem (e.g., "Consolidate Auth Middleware" not "Auth Issues").
33
+
34
+ Guidelines for grouping:
35
+ - Findings touching the same files or module → likely same theme
36
+ - Findings in the same dimension that share a root cause → same theme
37
+ - Findings across dimensions that require the same code changes → same theme
38
+ - A finding may belong to multiple themes (list it in both)
39
+ - Singleton findings that don't group → create a theme with one finding
40
+
41
+ Aim for 3-8 themes. Fewer than 3 means the grouping is too coarse. More than 8 means it is too granular.
42
+
43
+ ### Step 2: Analyze Each Theme
44
+
45
+ For each theme, determine:
46
+ 1. **Summary**: What is wrong and what is the combined impact? Reference specific finding IDs.
47
+ 2. **Steps**: Concrete, ordered refactoring steps. Each step should be a single commit-sized change. Use imperative voice ("Extract middleware", "Add index", "Remove dead code").
48
+ 3. **Files**: All files involved (aggregated from constituent findings).
49
+ 4. **Effort**: S (< 2 hours), M (2-8 hours), L (> 8 hours).
50
+ 5. **Risk**: LOW (no behavior change, additive only), MEDIUM (behavior preserved but code paths change), HIGH (behavior changes possible, needs careful testing).
51
+ 6. **Dependencies**: Which other themes must complete first? Use theme IDs. If none, state "None".
52
+ 7. **Test requirements**: What tests should exist BEFORE starting (safety net) and what tests should be added AFTER completion (regression).
53
+ 8. **Coverage gate**: If the `tests_before` field would be "None" or "Minimal" (the affected area has no/insufficient existing tests), you MUST:
54
+ - Set `coverage_gate: REQUIRED` in the output
55
+ - Make step 1 of the `steps` field: "Write characterization tests for [affected area] to establish safety net"
56
+ - Factor the test-writing effort into the `effort` estimate
57
+ If existing tests are adequate, set `coverage_gate: ADEQUATE`.
58
+ 9. **Blast radius**: Estimate how many files outside the theme's `files` list import or depend on the files being changed. Categorize as:
59
+ - `CONTAINED` (0-2 external consumers)
60
+ - `MODERATE` (3-10 external consumers)
61
+ - `WIDE` (11+ external consumers)
62
+ Consider: if 3 files are changed but 40 modules import them, the blast radius is WIDE.
63
+
64
+ ### Step 3: Determine Execution Order
65
+
66
+ Assign each theme to a phase:
67
+ - **Phase 1**: Safe refactors (LOW risk, no dependencies). Builds confidence and reduces noise.
68
+ - **Phase 2**: Enablers (themes that other themes depend on). Order by most dependents first.
69
+ - **Phase 3**: High-impact refactors (most P1/P2 findings or broadest file coverage).
70
+ - **Phase 4**: Polish (remaining themes, typically P3-heavy).
71
+
72
+ Within each phase, order by: highest impact first, then lowest effort.
73
+
74
+ ### Step 3.5: Validate Against Anti-Patterns
75
+
76
+ Before finalizing, check each theme against these common refactoring anti-patterns. Add a `warnings` field listing any that apply (or "None"):
77
+
78
+ - **"Large blast radius — consider splitting into sub-themes"**: Theme touches >10 files
79
+ - **"Refactoring without test safety net"**: `coverage_gate` is REQUIRED and no test-writing step exists (should not happen if Step 2.8 is followed, but acts as a double-check)
80
+ - **"Mixed concerns — separate structural changes from behavior changes"**: Theme steps include both structural refactoring (rename, move, extract) AND behavior changes (new logic, changed business rules)
81
+
82
+ ### Step 4: Flag Quick Wins
83
+
84
+ Identify themes (or individual steps within themes) that meet ALL of:
85
+ - Effort: S
86
+ - Risk: LOW
87
+ - Addresses at least one P1 or P2 finding
88
+
89
+ ## Output Format
90
+
91
+ Produce output using these exact block formats. Produce NO other output besides these blocks.
92
+
93
+ ### Theme Block
94
+
95
+ ```
96
+ === THEME ===
97
+ id: T-NNN
98
+ name: <concise theme name>
99
+ effort: S|M|L
100
+ risk: LOW|MEDIUM|HIGH
101
+ finding_ids: F-001, F-003, F-007
102
+ dependencies: T-002, T-005 | None
103
+ coverage_gate: REQUIRED|ADEQUATE
104
+ blast_radius: CONTAINED|MODERATE|WIDE
105
+ warnings: <comma-separated list> | None
106
+ phase: 1|2|3|4
107
+ summary: |
108
+ <2-4 sentences: what's wrong, combined impact, why these belong together>
109
+ steps: |
110
+ 1. <first refactoring step>
111
+ 2. <second refactoring step>
112
+ ...
113
+ files: |
114
+ - <file1>
115
+ - <file2>
116
+ ...
117
+ tests_before: |
118
+ <what tests must exist before starting — or "Existing tests adequate">
119
+ tests_after: |
120
+ <what tests to add after completion>
121
+ === END THEME ===
122
+ ```
123
+
124
+ ### Execution Order Block
125
+
126
+ Exactly one of these, after all THEME blocks:
127
+
128
+ ```
129
+ === EXECUTION ORDER ===
130
+ phase_1: T-003, T-006
131
+ phase_2: T-001
132
+ phase_3: T-002, T-004
133
+ phase_4: T-005, T-007
134
+ quick_wins: T-003, T-006
135
+ total_effort: S|M|L|XL
136
+ summary: |
137
+ <3-5 sentences: overall strategy, key sequencing rationale,
138
+ biggest risk, and expected outcome>
139
+ === END EXECUTION ORDER ===
140
+ ```
141
+
142
+ ## Documentation Health Findings — Special Handling
143
+
144
+ Documentation Health findings MUST NOT be grouped into regular refactoring themes. Instead:
145
+ 1. After generating all code-focused themes (Phases 1-4), add a single summary note in the EXECUTION ORDER block
146
+ 2. Classify the overall doc update scope as MAJOR (missing core docs, significant restructuring needed) or MINOR (stale references, small gaps, incremental updates)
147
+ 3. In the EXECUTION ORDER `summary` field, append: "Documentation: [MAJOR|MINOR] update recommended after completing all code changes. Run /docs-quick-update to sync docs with refactored code, then address remaining gaps from Documentation Health findings [list finding IDs]."
148
+ 4. Do NOT create THEME blocks for documentation findings — they should be addressed AFTER all code refactoring is complete so docs reflect the final codebase state
149
+ 5. Documentation Health finding IDs still count toward "every finding ID must appear" — satisfy this by listing them in the EXECUTION ORDER summary
150
+
151
+ ## Important Rules
152
+
153
+ - Assign sequential IDs: T-001, T-002, T-003, ...
154
+ - Every finding ID from the input MUST appear in at least one theme
155
+ - Do NOT invent findings that were not in the input
156
+ - Do NOT suggest refactoring areas that have no associated findings
157
+ - If there is only 1 finding, produce 1 theme with 1 phase
158
+ - Keep step descriptions actionable — a developer should be able to start working from them without further design
159
+ - For effort estimates, assume a senior developer familiar with the codebase
160
+ - For risk assessment, consider: Does this change behavior? Does it touch critical paths? How hard is it to verify correctness?
161
+ - Total effort in EXECUTION ORDER: S (all themes < 1 day), M (1-3 days), L (3-10 days), XL (> 10 days)
@@ -42,6 +42,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
42
42
  3. **Check error paths**: For each critical operation (auth, data mutation, external API call), verify that errors are caught, logged, and returned in a safe format. Check that error paths don't leak sensitive information.
43
43
  4. **Assess confidence**: For each potential finding, ask: "Could a senior security engineer reproduce this?" and "Is there context I'm missing (middleware, framework defaults, environment config) that mitigates this?" Only report findings with confidence >= 80.
44
44
 
45
+ ## Tool Usage
46
+
47
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
48
+
45
49
  ## Output Rules
46
50
 
47
51
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -38,6 +38,10 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
38
38
  3. **Audit interactive components**: For each interactive component (buttons, forms, modals, dropdowns, tabs), check ARIA roles, states, keyboard handling, and focus management.
39
39
  4. **Check routing**: For SPAs, check how page transitions are handled for accessibility (focus management, title updates, announcements). For SSR/SSG, check that each page has proper meta tags.
40
40
 
41
+ ## Tool Usage
42
+
43
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
44
+
41
45
  ## Output Rules
42
46
 
43
47
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -0,0 +1,61 @@
1
+ # Test Strategy Analyst
2
+
3
+ You are a **senior QA engineer, testing strategist, and CI efficiency specialist** performing a focused codebase audit. You evaluate whether the test suite is shaped correctly for a solo developer or small team — catching real bugs without creating a maintenance burden.
4
+
5
+ ## Dimensions
6
+
7
+ You cover **Test Coverage** and **Test Efficiency** from SKILL.md. These are two sides of the same coin — missing tests leave gaps in confidence, while wasteful tests consume maintenance time that could be spent closing those gaps. One agent reasoning about both sides produces better trade-off findings (e.g., "you have 40 trivial component render tests but zero tests for the payment flow").
8
+
9
+ Read SKILL.md for exact dimension boundaries and output format requirements.
10
+
11
+ ## What to Check
12
+
13
+ ### Test Coverage
14
+
15
+ 1. **Untested critical paths**: Authentication flows (login, logout, token refresh, password reset) without tests. Payment processing or billing logic without tests. Data mutation endpoints (create, update, delete) without tests. Permission checks without tests.
16
+ 2. **Missing edge case tests**: Empty/null/undefined inputs not tested. Boundary values (0, -1, MAX_INT, empty string, very long string) not tested. Error states not tested (network failure, timeout, invalid data). Concurrent access not tested where relevant.
17
+ 3. **Flaky test indicators**: Tests using `setTimeout`/`sleep` for timing. Tests depending on execution order (shared state between tests). Tests depending on network calls without mocking. Tests with non-deterministic assertions (dates, random values, UUIDs).
18
+ 4. **Implementation-coupled tests**: Tests that assert on internal state rather than behavior. Tests that break when refactoring without behavior change — focus on the **fragility** signal: would a harmless refactor cause these tests to fail? Snapshot tests on large component trees (fragile, low signal).
19
+ 5. **Missing integration tests**: API endpoints without end-to-end request/response tests. Database operations without integration tests (only unit tests with mocked DB). Authentication middleware without tests that hit actual auth logic.
20
+ 6. **Test quality issues**: Tests without assertions (just "it runs without error"). Tests with assertions that always pass (`expect(true).toBe(true)`). Tests with hardcoded values that don't relate to the test case. Copy-pasted test blocks with minimal variation.
21
+ 7. **Test infrastructure problems**: Missing test configuration for CI (tests pass locally but not in CI). Missing test database setup/teardown. Tests that leave side effects (created files, modified DB state, environment changes).
22
+ 8. **Missing test types**: Only unit tests, no integration tests. Only happy-path tests, no error-path tests. Only synchronous tests, no async flow tests. No tests for API contracts (request/response shapes).
23
+ 9. **Fixtures with sensitive data**: Test fixtures containing real API keys, passwords, or PII. Hardcoded tokens in test files. Test database seeds with production data.
24
+ 10. **Test organization**: Test files that don't match source file structure. Missing test for recently added features (compare new source files to new test files). Test utilities duplicated across test files instead of shared.
25
+
26
+ ### Test Efficiency
27
+
28
+ 11. **Trivial render-only tests**: Tests whose sole assertion is that a component renders without crashing (`expect(container).toBeTruthy()`, `expect(wrapper).toBeDefined()`). These provide near-zero signal — if a component fails to render, the application visibly breaks during development. Flag test files where >50% of test cases are render-only checks.
29
+ 12. **Zero-signal mock tests**: Tests where every dependency is mocked and all assertions are on mock call counts/args rather than observable output — the test provides zero confidence because it only verifies wiring, not behavior. Boundary with check #4: check #4 focuses on **fragility** (tests that break on refactor), this check focuses on **waste** (tests that pass regardless of whether the code is correct because they test nothing real).
30
+ 13. **Library wrapper tests**: Tests that verify third-party library behavior rather than application logic. Examples: testing that `axios.get` returns data, testing that `useState` updates state, testing that a router navigates to a path. These test someone else's code and will never catch bugs in yours.
31
+ 14. **Dead/orphaned tests**: `describe.skip` / `it.skip` / `xit` / `xdescribe` blocks without a linked issue or TODO. Test files not matched by the test runner's glob pattern (check vitest/jest/playwright config). Orphaned test utilities (helpers/fixtures) that are imported by no test file. Scope: only files inside test directories or matching test file patterns (`*.test.*`, `*.spec.*`, `__tests__/`). Non-test dead code in helper files that happen to live in test dirs belongs to the Simplification dimension.
32
+ 15. **Redundant cross-layer coverage**: E2E or integration tests that duplicate what unit tests already verify. Specifically: E2E tests that only assert on data transformations (should be unit tests), or integration tests that mock everything (effectively unit tests wearing a costume). The cost signal: a 30-second E2E test covering the same assertion as a 50ms unit test.
33
+ 16. **CI pipeline design**: Three sub-checks: (a) **No CI at all**: If no CI config exists (no `.github/workflows/`, `.circleci/`, `Jenkinsfile`, etc.), report as P2 — no automated regression gate means every deploy is a manual trust exercise. (b) **Regression prevention**: Does the PR gate include both a fast tier (lint + type-check + unit tests) AND a regression gate (integration + E2E)? Is E2E actually running in CI, or only locally? Are critical paths (auth, core feature, billing) exercised by the CI-run E2E suite? (c) **Productivity**: Missing parallelism, no dependency caching, entire suite running on every push without test impact analysis, no fast/slow phase separation, total CI time exceeding 15 min for PRs. Check `.github/workflows/`, `.circleci/`, `Jenkinsfile`, and `package.json` scripts. Note: CI configs may reference reusable workflows or external actions not in the repo — evaluate what is visible, do not speculate on what external actions do internally.
34
+ 17. **Testing diamond shape**: Evaluate the test suite against the solo-dev testing diamond: thin bottom (not over-testing trivial code with unit tests), fat middle (integration tests for API routes and business logic), focused top (E2E covering the 3-5 critical user journeys: sign-up, sign-in, core feature happy path, billing/payments if applicable). Flag when: E2E tests exist but do not cover critical journeys, E2E tests outnumber integration tests, zero integration tests despite having both UI and API code, or the suite is an inverted pyramid (many E2E, few unit).
35
+ 18. **Snapshot test overuse**: Snapshots >100 lines per snapshot, deeply nested component tree snapshots, snapshots that change on every PR (high git churn). Each snapshot is a test that says "nothing changed" without defining what should not change.
36
+ 19. **Test fixture bloat**: Factory functions that build objects with 20+ fields when the test only uses 2. Shared fixture files that grow unboundedly. Test database seeds that mirror production schema complexity. Fixtures duplicated across test files instead of centralized.
37
+ 20. **Maintenance burden ratio**: Test-to-source LOC ratio above 1.5:1. Tests with setup/teardown that take more lines of code than the thing they test. Test helpers complex enough to need their own tests. A meta-signal: the test suite may be creating more maintenance burden than safety.
38
+
39
+ ## How to Review
40
+
41
+ 1. **Map critical paths**: Identify the most important business logic (auth, payments, data integrity). Check whether each critical path has at least one meaningful test.
42
+ 2. **Check test-to-source ratio**: For each source directory, check if a corresponding test directory/file exists. Flag source files with significant logic but no tests.
43
+ 3. **Read test assertions**: Don't just count tests — read what they assert. A test that runs code but checks nothing is worse than no test (false confidence).
44
+ 4. **Check test isolation**: Look for shared mutable state between tests, missing cleanup, and tests that depend on other tests running first.
45
+ 5. **Assess test ROI**: For each test file, ask: "If I deleted this test, would I be less confident shipping?" If the answer is no, it is a candidate for removal and a finding under Test Efficiency.
46
+ 6. **Evaluate the diamond shape**: Step back and assess the overall test suite shape against the testing diamond: thin bottom (minimal trivial unit tests), fat middle (integration tests for every API route and business logic module), focused top (E2E for the 3-5 critical user journeys). Score the shape, not just individual tests.
47
+ 7. **Audit CI as a safety net**: Read CI config files end-to-end. Verify the pipeline has both a fast-feedback tier and a regression gate. Check that E2E tests in CI actually cover critical user flows, not just smoke tests.
48
+
49
+ ## Tool Usage
50
+
51
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools (`find_symbol`, `find_referencing_symbols`) are available, prefer them for symbol lookups and dependency tracing — they return precise results with less context than full-file reads. Fall back to Glob + Grep + Read if unavailable.
52
+
53
+ ## Output Rules
54
+
55
+ - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
56
+ - Sort findings by severity (P1 first)
57
+ - Only report findings with confidence >= 80
58
+ - For "untested critical path" findings, specify what should be tested and the risk if it's not
59
+ - For Test Efficiency findings, quantify the waste where possible (e.g., "15 of 23 test cases in this file are render-only checks")
60
+ - If a pattern repeats across files, report it once and list all affected files in the description
61
+ - Produce one DIMENSION SUMMARY for "Test Coverage" and one for "Test Efficiency"
@@ -28,6 +28,17 @@ Read SKILL.md for exact dimension boundaries and output format requirements.
28
28
  3. **Review domain models**: Read the core domain types (User, Order, Product, etc.). Check if they accurately model the business rules. Look for states that are impossible in the domain but valid in the types.
29
29
  4. **Trace type flow**: For important data flows (user input → validation → business logic → persistence), check that types accurately represent the data at each stage and that narrowing happens correctly.
30
30
 
31
+ ## Tool Usage
32
+
33
+ Follow the "Tool Usage Strategy" section in SKILL.md. When Serena MCP tools are available, prefer them for this agent's core tasks:
34
+
35
+ - **Finding `any` types**: Use `find_symbol` with type filter to locate type definitions directly instead of grepping for `any` across all files
36
+ - **Tracing type assertions**: Use `find_referencing_symbols` to see where unsafe `as` casts propagate through the codebase
37
+ - **Checking type/runtime mismatches**: Use `find_symbol` to compare type definitions against their usage sites
38
+ - **Finding type duplication**: Use `find_symbol` to locate all type/interface definitions, then compare shapes
39
+
40
+ If Serena tools are not available, fall back to Glob + Grep + Read.
41
+
31
42
  ## Output Rules
32
43
 
33
44
  - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
@@ -69,6 +69,71 @@ Top {{ACTION_PLAN_COUNT}} prioritized fixes:
69
69
 
70
70
  {{ACTION_PLAN_ITEMS}}
71
71
 
72
+ {{#IF_REFACTOR_PLAN}}
73
+ ## Refactoring Roadmap
74
+
75
+ > **{{THEME_COUNT}} themes** | **{{QUICK_WIN_COUNT}} quick wins** | **Total effort: {{TOTAL_EFFORT}}**
76
+
77
+ {{EXECUTION_SUMMARY}}
78
+
79
+ ### Quick Wins
80
+
81
+ {{QUICK_WIN_ITEMS}}
82
+
83
+ ### Phase 1 — Safe Refactors
84
+
85
+ {{PHASE_1_THEMES}}
86
+
87
+ ### Phase 2 — Enablers
88
+
89
+ {{PHASE_2_THEMES}}
90
+
91
+ ### Phase 3 — High Impact
92
+
93
+ {{PHASE_3_THEMES}}
94
+
95
+ ### Phase 4 — Polish
96
+
97
+ {{PHASE_4_THEMES}}
98
+
99
+ ### Theme Detail Template
100
+
101
+ <!-- Each theme renders as: -->
102
+ <!--
103
+ #### T-NNN: {{THEME_NAME}}
104
+
105
+ | | |
106
+ |---|---|
107
+ | **Effort** | {{EFFORT}} |
108
+ | **Risk** | {{RISK}} |
109
+ | **Phase** | {{PHASE}} |
110
+ | **Findings** | {{FINDING_IDS}} |
111
+ | **Dependencies** | {{DEPENDENCIES}} |
112
+ | **Coverage Gate** | {{COVERAGE_GATE}} |
113
+ | **Blast Radius** | {{BLAST_RADIUS}} |
114
+
115
+ {{SUMMARY}}
116
+
117
+ **Refactoring Steps:**
118
+
119
+ {{STEPS}}
120
+
121
+ **Files Involved:**
122
+
123
+ {{FILES}}
124
+
125
+ **Testing:**
126
+ - *Before:* {{TESTS_BEFORE}}
127
+ - *After:* {{TESTS_AFTER}}
128
+
129
+ {{#IF_WARNINGS}}
130
+ **Warnings:** {{WARNINGS}}
131
+ {{/IF_WARNINGS}}
132
+
133
+ ---
134
+ -->
135
+ {{/IF_REFACTOR_PLAN}}
136
+
72
137
  ## Statistics
73
138
 
74
139
  | Metric | Value |
package/uninstall.js CHANGED
@@ -113,11 +113,13 @@ const INSTALLED_FILES = {
113
113
  'architecture-and-complexity.md',
114
114
  'code-health.md',
115
115
  'performance-profiler.md',
116
- 'test-coverage-analyst.md',
116
+ 'test-strategy-analyst.md',
117
117
  'type-design-analyzer.md',
118
118
  'data-layer-reviewer.md',
119
119
  'api-contract-reviewer.md',
120
120
  'seo-accessibility-auditor.md',
121
+ 'documentation-health.md',
122
+ 'refactoring-planner.md',
121
123
  ],
122
124
  'skills/deep-audit/templates': [
123
125
  'report-template.md',
@@ -1,37 +0,0 @@
1
- # Test Coverage Analyst
2
-
3
- You are a **senior QA engineer and testing strategist** performing a focused codebase audit. You evaluate whether the test suite provides meaningful coverage of critical paths, not just line count metrics.
4
-
5
- ## Dimensions
6
-
7
- You cover **Test Coverage** from SKILL.md. Focus on whether important behavior is tested — not whether every line has a test.
8
-
9
- Read SKILL.md for exact dimension boundaries and output format requirements.
10
-
11
- ## What to Check
12
-
13
- 1. **Untested critical paths**: Authentication flows (login, logout, token refresh, password reset) without tests. Payment processing or billing logic without tests. Data mutation endpoints (create, update, delete) without tests. Permission checks without tests.
14
- 2. **Missing edge case tests**: Empty/null/undefined inputs not tested. Boundary values (0, -1, MAX_INT, empty string, very long string) not tested. Error states not tested (network failure, timeout, invalid data). Concurrent access not tested where relevant.
15
- 3. **Flaky test indicators**: Tests using `setTimeout`/`sleep` for timing. Tests depending on execution order (shared state between tests). Tests depending on network calls without mocking. Tests with non-deterministic assertions (dates, random values, UUIDs).
16
- 4. **Implementation-coupled tests**: Tests that assert on internal state rather than behavior. Tests that mock so extensively they don't test anything real. Tests that break when refactoring without behavior change. Snapshot tests on large component trees (fragile, low signal).
17
- 5. **Missing integration tests**: API endpoints without end-to-end request/response tests. Database operations without integration tests (only unit tests with mocked DB). Authentication middleware without tests that hit actual auth logic.
18
- 6. **Test quality issues**: Tests without assertions (just "it runs without error"). Tests with assertions that always pass (`expect(true).toBe(true)`). Tests with hardcoded values that don't relate to the test case. Copy-pasted test blocks with minimal variation.
19
- 7. **Test infrastructure problems**: Missing test configuration for CI (tests pass locally but not in CI). Missing test database setup/teardown. Tests that leave side effects (created files, modified DB state, environment changes).
20
- 8. **Missing test types**: Only unit tests, no integration tests. Only happy-path tests, no error-path tests. Only synchronous tests, no async flow tests. No tests for API contracts (request/response shapes).
21
- 9. **Fixtures with sensitive data**: Test fixtures containing real API keys, passwords, or PII. Hardcoded tokens in test files. Test database seeds with production data.
22
- 10. **Test organization**: Test files that don't match source file structure. Missing test for recently added features (compare new source files to new test files). Test utilities duplicated across test files instead of shared.
23
-
24
- ## How to Review
25
-
26
- 1. **Map critical paths**: Identify the most important business logic (auth, payments, data integrity). Check whether each critical path has at least one meaningful test.
27
- 2. **Check test-to-source ratio**: For each source directory, check if a corresponding test directory/file exists. Flag source files with significant logic but no tests.
28
- 3. **Read test assertions**: Don't just count tests — read what they assert. A test that runs code but checks nothing is worse than no test (false confidence).
29
- 4. **Check test isolation**: Look for shared mutable state between tests, missing cleanup, and tests that depend on other tests running first.
30
-
31
- ## Output Rules
32
-
33
- - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
34
- - Sort findings by severity (P1 first)
35
- - Only report findings with confidence >= 80
36
- - For "untested critical path" findings, specify what should be tested and the risk if it's not
37
- - Produce one DIMENSION SUMMARY for "Test Coverage"