@torka/claude-workflows 0.12.0 → 0.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/.claude-plugin/plugin.json +8 -0
  2. package/README.md +22 -5
  3. package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-01b-continue.md +9 -2
  4. package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-02-orchestrate.md +108 -2
  5. package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-03-complete.md +35 -1
  6. package/commands/deep-audit.md +389 -0
  7. package/commands/dev-story-backend.md +12 -11
  8. package/commands/dev-story-fullstack.md +6 -2
  9. package/commands/dev-story-ui.md +4 -4
  10. package/commands/github-pr-resolve.md +132 -24
  11. package/install.js +2 -3
  12. package/package.json +1 -1
  13. package/skills/deep-audit/INSPIRATIONS.md +26 -0
  14. package/skills/deep-audit/SKILL.md +253 -0
  15. package/skills/deep-audit/agents/api-contract-reviewer.md +38 -0
  16. package/skills/deep-audit/agents/architecture-and-complexity.md +48 -0
  17. package/skills/deep-audit/agents/code-health.md +51 -0
  18. package/skills/deep-audit/agents/data-layer-reviewer.md +39 -0
  19. package/skills/deep-audit/agents/performance-profiler.md +38 -0
  20. package/skills/deep-audit/agents/security-and-error-handling.md +52 -0
  21. package/skills/deep-audit/agents/seo-accessibility-auditor.md +49 -0
  22. package/skills/deep-audit/agents/test-coverage-analyst.md +37 -0
  23. package/skills/deep-audit/agents/type-design-analyzer.md +38 -0
  24. package/skills/deep-audit/templates/report-template.md +87 -0
  25. package/skills/designer-founder/SKILL.md +8 -7
  26. package/skills/designer-founder/steps/step-01-context.md +94 -45
  27. package/skills/designer-founder/steps/step-02-scope.md +6 -23
  28. package/skills/designer-founder/steps/step-03-design.md +29 -58
  29. package/skills/designer-founder/steps/step-04-artifacts.md +137 -113
  30. package/skills/designer-founder/steps/step-05-epic-linking.md +81 -53
  31. package/skills/designer-founder/steps/step-06-validate.md +181 -0
  32. package/skills/designer-founder/templates/component-strategy.md +4 -0
  33. package/skills/designer-founder/tools/magicpatterns.md +52 -19
  34. package/skills/designer-founder/tools/stitch.md +97 -67
  35. package/uninstall.js +22 -0
@@ -33,7 +33,7 @@ Before any operations, verify the environment is safe:
33
33
  - If output is non-empty, WARN the user but continue
34
34
 
35
35
  8. **Initialize tracking**:
36
- - Create counters: merged=0, skipped=0, failed=0, auto_fixed=0
36
+ - Create counters: merged=0, skipped=0, failed=0, auto_fixed_lint=0, review_comments_assessed=0, review_comments_fixed=0, review_comments_logged=0
37
37
  - Create lists: merged_prs[], skipped_prs[], failed_prs[]
38
38
 
39
39
  If gh authentication fails, STOP and clearly explain what the user needs to do.
@@ -46,7 +46,7 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
46
46
 
47
47
  1. **Fetch all open PRs**:
48
48
  ```bash
49
- gh pr list --state open --json number,title,headRefName,baseRefName,statusCheckRollup,reviewDecision,isDraft,url,author,mergeStateStatus,mergeable,state
49
+ gh pr list --state open --json number,title,headRefName,baseRefName,statusCheckRollup,reviewDecision,isDraft,url,author,mergeStateStatus,mergeable,state,reviews,latestReviews
50
50
  ```
51
51
 
52
52
  2. **For each PR, categorize**:
@@ -61,26 +61,28 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
61
61
  | Conflicts | mergeable=CONFLICTING | Skip, warn user |
62
62
  | Draft | isDraft=true | Skip |
63
63
  | Already Merged | state=MERGED | Log and skip |
64
+ | Has Review Comments | reviews/latestReviews non-empty OR reviewDecision set | Process in Phase 2 |
64
65
 
65
- 3. **Present summary** (informational only, no blocking):
66
+ > **Note**: "Has Review Comments" is an **overlay** category a PR can be both "CI Passed" and "Has Review Comments". Phase 2 runs before merging regardless of CI status.
67
+
68
+ 3. **Present summary** (does NOT pause for confirmation):
66
69
  ```
67
70
  Processing X open PRs...
68
71
 
69
- | # | Title | Author | Status | Action |
70
- |---|-------|--------|--------|--------|
71
- | 42 | bump cross-env | dependabot | CI Passed | Will merge |
72
- | 35 | bump sharp | dependabot | CI Pending | Will wait |
73
- | 32 | bump linting | dependabot | CI Failed | Will auto-fix |
74
- | 28 | new feature | user | Draft | Will skip |
75
- | 25 | refactor auth | user | Conflicts | Will skip |
72
+ | # | Title | Author | Status | Reviews | Action |
73
+ |---|-------|--------|--------|---------|--------|
74
+ | 42 | bump cross-env | dependabot | CI Passed | - | Will merge |
75
+ | 35 | bump sharp | dependabot | CI Pending | 2 bot comments | Will wait + assess reviews |
76
+ | 32 | bump linting | dependabot | CI Failed | - | Will auto-fix |
77
+ | 28 | new feature | user | Draft | 1 human review | Will skip |
78
+ | 25 | refactor auth | user | Conflicts | - | Will skip |
76
79
  ```
77
80
 
78
- 4. **Check for blockers** - ONLY pause if:
79
- - There are PRs with merge conflicts (warn user which ones will be skipped)
80
- - There are draft PRs (inform user they will be skipped)
81
- - All PRs have non-auto-fixable failures
82
-
83
- Otherwise, proceed automatically.
81
+ 4. **Log non-processable PRs** (no user prompt):
82
+ - PRs with merge conflicts log as skipped
83
+ - Draft PRs log as skipped
84
+ - If ALL PRs are non-processable (conflicts, drafts, non-fixable failures) → inform user and end workflow
85
+ - Otherwise, proceed automatically with processable PRs.
84
86
 
85
87
  5. **Sort PRs by priority**:
86
88
  1. Infrastructure PRs first (CI/workflow changes)
@@ -90,16 +92,92 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
90
92
 
91
93
  ---
92
94
 
93
- ## Phase 2: Review Comments Handling
95
+ ## Phase 2: Review Comments Handling (Autonomous)
96
+
97
+ **Goal**: Assess and auto-fix review comments from ALL sources (bot and human inline comments). No user prompts in this phase.
98
+
99
+ For each PR with review data (reviews/latestReviews non-empty, or reviewDecision set), run the following steps:
100
+
101
+ ### Step 1: Fetch All Review Data
102
+
103
+ GitHub PRs have three distinct comment endpoints. Fetch all three:
104
+
105
+ ```bash
106
+ # Inline review comments (code-level, attached to diff hunks)
107
+ gh api repos/{owner}/{repo}/pulls/{number}/comments --paginate
108
+
109
+ # Top-level review submissions (APPROVE, CHANGES_REQUESTED, COMMENTED bodies)
110
+ gh api repos/{owner}/{repo}/pulls/{number}/reviews --paginate
111
+
112
+ # General PR conversation comments (some bots post here instead of inline)
113
+ gh api repos/{owner}/{repo}/issues/{number}/comments --paginate
114
+ ```
115
+
116
+ Never rely on `reviewDecision` alone. If all three return empty arrays, skip to Phase 3 for this PR.
117
+
118
+ ### Step 2: Handle Formal CHANGES_REQUESTED
119
+
120
+ If `reviewDecision: CHANGES_REQUESTED` AND the reviewer is a human (not a bot) → **auto-skip PR entirely**. Add to `skipped_prs` with reason "human requested changes". This is the only hard stop.
121
+
122
+ ### Step 3: Assess ALL Comments Uniformly
123
+
124
+ Read the `diff_hunk` context for each comment. Classify into one of four buckets:
125
+
126
+ **Valid + straightforward** → auto-fix:
127
+ - "Unused import `fs` on line 12" → remove it
128
+ - "Missing null check before accessing `user.name`" → add the guard
129
+ - "Variable `x` should be `userCount`" → rename it
130
+ - "Missing `await` on async call" → add it
131
+ - "Typo: `recieve` → `receive`" → fix it
132
+
133
+ **Valid + complex** → log, proceed to merge:
134
+ - "Consider refactoring this into a strategy pattern"
135
+ - "This function does too much — split into smaller functions"
136
+ - "Should we add retry logic here?"
137
+ - Any comment phrased as a question requiring design decisions
138
+ - Any suggestion touching multiple files or changing control flow
139
+
140
+ **Invalid / false positive** → dismiss, log:
141
+ - Bot references code that no longer exists in the current diff
142
+ - Suggestion would introduce a bug or break existing behavior
143
+ - Comment is about a different file/context than the one it's attached to
94
144
 
95
- **Note**: This phase only runs if PRs have pending review comments. For batch processing of simple dependency bumps, this phase is typically skipped.
145
+ **Cosmetic only** auto-fix if single-line, otherwise log:
146
+ - "Add trailing comma" → auto-fix
147
+ - "Reorder imports alphabetically" → auto-fix if simple, log if many files
96
148
 
97
- For PRs with `reviewDecision: CHANGES_REQUESTED`:
149
+ **Conservative default**: when uncertain, always classify as "valid + complex" (log, don't fix). The cost of logging instead of fixing is near zero. The cost of botching an architectural change is a broken PR.
98
150
 
99
- 1. Fetch review comments
100
- 2. If all comments are trivial (typos, formatting, docstrings) - auto-fix
101
- 3. If comments require logic changes - skip PR and inform user
102
- 4. Push fixes without asking for confirmation
151
+ Bot and human inline comments are treated identically. The only distinction is formal `CHANGES_REQUESTED` (Step 2).
152
+
153
+ Increment `review_comments_assessed` for each comment processed.
154
+
155
+ ### Step 4: Auto-fix via Sub-agent
156
+
157
+ For PRs with fixable comments, launch a **Task sub-agent** to apply fixes. This preserves the main workflow's context window for orchestration.
158
+
159
+ Sub-agent receives:
160
+ - PR number, branch name, repo owner/name
161
+ - List of fixable comments (file path, line number, `diff_hunk`, suggestion text, classification)
162
+ - Instruction: checkout branch, read relevant files, apply fixes, commit as `"fix: address review feedback"`, push
163
+
164
+ Sub-agent returns: summary of changes made (files modified, comments addressed).
165
+
166
+ Main workflow waits for sub-agent completion, then marks PR for CI re-check in Phase 3.
167
+
168
+ Process PRs with fixable comments **sequentially** (one sub-agent at a time). Parallel sub-agents deferred to v2 — multiple git checkouts risk worktree conflicts and harder debugging.
169
+
170
+ Increment `review_comments_fixed` for each comment addressed by the sub-agent.
171
+
172
+ ### Step 5: Log Remaining Comments
173
+
174
+ For all non-fixed comments (valid + complex, invalid/dismissed, cosmetic-but-skipped):
175
+ - Record: PR number, comment author, file path, snippet of comment body, classification reason
176
+ - Increment `review_comments_logged` for each
177
+
178
+ These are included in the Phase 5 final summary.
179
+
180
+ **No user prompts in this phase. Fully autonomous.**
103
181
 
104
182
  ---
105
183
 
@@ -131,6 +209,19 @@ gh api repos/{owner}/{repo}/pulls/{number}/update-branch -X PUT
131
209
  ```
132
210
  Then wait 30s and re-check CI.
133
211
 
212
+ ### Step 2b: Rerun vs Update-branch Decision
213
+
214
+ When CI needs re-triggering, choose the right action:
215
+
216
+ | Scenario | Action |
217
+ |----------|--------|
218
+ | Branch behind main | `gh api repos/{owner}/{repo}/pulls/{number}/update-branch -X PUT` |
219
+ | Flaky test (same code passed before) | `gh run rerun --failed` |
220
+ | CI workflow changed on main | `update-branch` (picks up new workflow) |
221
+ | Secrets unavailable (Dependabot) | `update-branch` first |
222
+
223
+ **Rule**: prefer `update-branch` over `rerun` unless confirmed flaky test on an up-to-date branch.
224
+
134
225
  ### Step 3: Wait for CI
135
226
 
136
227
  While CI is pending and elapsed < 10 minutes:
@@ -288,10 +379,17 @@ RESULTS:
288
379
 
289
380
  AUTO-FIXES APPLIED:
290
381
  - X lint errors fixed automatically
382
+ - X review comments auto-resolved (from Y assessed)
383
+ - Z review comments logged (not auto-fixable)
384
+
385
+ REVIEW COMMENTS LOGGED (not auto-fixed):
386
+ - PR #35 [codex-bot] src/utils.ts: "Consider extracting to helper" (valid+complex)
387
+ - PR #35 [codex-bot] src/index.ts: "Stale reference to removed fn" (invalid/dismissed)
291
388
 
292
389
  REMAINING WORK:
293
390
  - PR #25 has merge conflicts at: src/auth.ts
294
391
  - PR #20 needs manual fix for type errors
392
+ - PR #18 skipped: human reviewer formally requested changes
295
393
 
296
394
  =====================================================
297
395
  ```
@@ -303,12 +401,14 @@ REMAINING WORK:
303
401
  ### ALWAYS Auto-fix
304
402
  - Lint errors (`npm run lint -- --fix`)
305
403
  - Formatting errors
404
+ - Review comments (bot or human) assessed as valid + straightforward
306
405
 
307
406
  ### NEVER Auto-fix
308
407
  - Type errors (TypeScript)
309
408
  - Test failures
310
409
  - Build errors
311
- - Logic changes in review comments
410
+ - Logic/architectural changes in review comments
411
+ - PRs where a human reviewer formally requested changes (auto-skip entire PR)
312
412
 
313
413
  ### Branch Safety
314
414
  - NEVER use force push
@@ -320,6 +420,14 @@ REMAINING WORK:
320
420
  - 10 minute max wait per PR
321
421
  - 2 max fix attempts before skipping
322
422
 
423
+ ### Review Comment Handling
424
+ - ALWAYS fetch inline comments via API (never rely solely on `reviewDecision`)
425
+ - NEVER merge a PR where a human formally requested changes (auto-skip)
426
+ - ALWAYS assess ALL comments (bot and human) for validity before applying
427
+ - Treat bot and human inline comments identically — assess, auto-fix if straightforward
428
+ - NEVER prompt the user for review comment decisions
429
+ - ALWAYS log dismissed/skipped comments for the final summary
430
+
323
431
  ### Error Handling
324
432
  - "Already merged" is a success, not an error
325
433
  - Continue to next PR on any error
package/install.js CHANGED
@@ -215,6 +215,8 @@ function install() {
215
215
  'skills/agent-creator/',
216
216
  'skills/designer-founder/',
217
217
  'skills/product-architect/',
218
+ 'commands/deep-audit.md',
219
+ 'skills/deep-audit/',
218
220
  '*.backup',
219
221
  ];
220
222
  const addedCount = ensureGitignoreEntries(
@@ -392,9 +394,6 @@ async function installStitchSkills(isGlobal) {
392
394
  };
393
395
 
394
396
  try {
395
- // Install design-md skill (required for Stitch)
396
- execSync(`npx skills add google-labs-code/stitch-skills --skill design-md${globalFlag} -a claude-code -y`, execOptions);
397
-
398
397
  // Install react-components skill (for HTML→React conversion)
399
398
  execSync(`npx skills add google-labs-code/stitch-skills --skill react-components${globalFlag} -a claude-code -y`, execOptions);
400
399
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@torka/claude-workflows",
3
- "version": "0.12.0",
3
+ "version": "0.13.1",
4
4
  "description": "Claude Code workflow helpers: epic automation, git cleanup, agents, and design workflows",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -0,0 +1,26 @@
1
+ # Deep Audit — Design Inspirations
2
+
3
+ Reference sources consulted during agent prompt design. Not loaded at runtime — for development reference only.
4
+
5
+ ## Tier 1 — Core Patterns
6
+
7
+ | Source | URL | Pattern Used |
8
+ |--------|-----|-------------|
9
+ | Anthropic `code-review` plugin | https://github.com/anthropics/claude-code/tree/main/plugins/code-review | Confidence scoring (0-100), only report 80+, false-positive filtering |
10
+ | Anthropic `pr-review-toolkit` | https://github.com/anthropics/claude-code/tree/main/plugins/pr-review-toolkit | 6 specialized agents, smart orchestration, parallel spawning, conditional agent selection |
11
+ | EveryInc compound-engineering | https://github.com/EveryInc/compound-engineering-plugin/blob/main/plugins/compound-engineering/commands/workflows/review.md | 15 review agents, P1/P2/P3 severity, file-based output |
12
+ | EveryInc review agents | https://github.com/EveryInc/compound-engineering-plugin/tree/main/plugins/compound-engineering/agents/review | Agent-per-dimension pattern, structured handoff |
13
+ | wshobson code-reviewer | https://github.com/wshobson/agents/blob/main/plugins/comprehensive-review/agents/code-reviewer.md | 5-phase review structure, numbered output artifacts |
14
+ | wshobson security-auditor | https://github.com/wshobson/agents/blob/main/plugins/comprehensive-review/agents/security-auditor.md | OWASP checklist integration, security-specific review flow |
15
+ | Deslop/Anti-slop SKILL.md | https://github.com/avifenesh/agentsys/blob/main/plugins/deslop/skills/deslop/SKILL.md | Certainty-gated approach, minimal diff principle, deterministic patterns first |
16
+
17
+ ## Tier 2 — Cherry-Picked Ideas
18
+
19
+ | Source | URL | Pattern Used |
20
+ |--------|-----|-------------|
21
+ | undeadlist agents | https://github.com/undeadlist/claude-code-agents/tree/main/.claude/agents | SEO auditor framework detection + 6 audit categories |
22
+ | davila7 code-review | https://github.com/davila7/claude-code-templates/blob/main/cli-tool/components/commands/utilities/code-review.md | Multi-file analysis patterns |
23
+ | Premortem skill | https://www.aimcp.info/en/skills/ee6aa52d-1d52-45b7-8f10-c1b238772acd | "Imagine this already failed — why?" reframing for risk identification |
24
+ | Evaluation rubrics skill | https://www.aimcp.info/en/skills/4090c675-54f6-42b8-96a6-74417f6db77c | 4-8 criteria, explicit descriptors per level, analytic scoring |
25
+ | Stop-slop (find-bugs) | https://www.aimcp.info/en/skills/390a1fbf-b55b-466c-85a6-be818e33bb28 | 5-dimension scoring (1-10), threshold triggers, diff analysis |
26
+ | AI Prompt Library blog | https://www.aipromptlibrary.app/blog/claude-code-prompt-library | Claude Code prompt patterns and agent design conventions |
@@ -0,0 +1,253 @@
1
+ # Deep Audit — Skill Reference
2
+
3
+ This file is the single source of truth for agent roster, dimension boundaries, scoring rubric, and output format. Every audit agent reads this file to understand its scope and output requirements.
4
+
5
+ ## Agent Roster
6
+
7
+ ### Quick Mode (default — 3 agents)
8
+
9
+ | Agent File | Dimensions | Model | Rationale |
10
+ |------------|-----------|-------|-----------|
11
+ | `security-and-error-handling.md` | Security, Error Handling | opus | Unhandled errors ARE security issues; one agent reasoning about both produces better findings |
12
+ | `architecture-and-complexity.md` | Architecture, Simplification | opus | Architecture decisions need deepest reasoning; over-engineering IS an architecture problem |
13
+ | `code-health.md` | AI Slop Detection, Dependency Health | sonnet | Both smell neglect — slop detection and dependency rot share the same instinct |
14
+
15
+ ### Full Mode (adds 6 more agents — `--full` flag)
16
+
17
+ | Agent File | Dimension | Model |
18
+ |------------|-----------|-------|
19
+ | `performance-profiler.md` | Performance | sonnet |
20
+ | `test-coverage-analyst.md` | Test Coverage | sonnet |
21
+ | `type-design-analyzer.md` | Type Design | sonnet |
22
+ | `data-layer-reviewer.md` | Data Layer & Database | opus |
23
+ | `api-contract-reviewer.md` | API Contracts & Interface Consistency | sonnet |
24
+ | `seo-accessibility-auditor.md` | SEO & Accessibility | sonnet |
25
+
26
+ ## Dimension Boundaries
27
+
28
+ Each dimension has a clear scope. Agents MUST stay within their assigned dimensions and NOT report findings that belong to another dimension.
29
+
30
+ ### Security
31
+ - Authentication & authorization flaws
32
+ - Injection vulnerabilities (SQL, XSS, command injection, path traversal)
33
+ - Secrets/credentials in code or config
34
+ - Insecure cryptographic usage
35
+ - CSRF, SSRF, open redirects
36
+ - Unsafe deserialization
37
+ - Missing rate limiting on auth endpoints
38
+ - **NOT**: general error handling, performance, code style
39
+
40
+ ### Error Handling
41
+ - Unhandled promise rejections and uncaught exceptions
42
+ - Empty catch blocks or swallowed errors
43
+ - Missing error boundaries (React) or global error handlers
44
+ - Inconsistent error response formats
45
+ - Missing validation at system boundaries (user input, external APIs)
46
+ - Error messages leaking internal details (overlaps security — report under Security if exploitable)
47
+ - **NOT**: business logic validation, type safety, test assertions
48
+
49
+ ### Architecture
50
+ - Separation of concerns violations
51
+ - Circular dependencies
52
+ - God objects / god modules
53
+ - Missing abstraction layers (e.g., direct DB calls in route handlers)
54
+ - Inconsistent patterns across similar components
55
+ - Tight coupling between modules that should be independent
56
+ - **NOT**: code style, naming conventions, dependency versions
57
+
58
+ ### Simplification
59
+ - Over-abstracted code (abstractions used once)
60
+ - Premature optimization
61
+ - Feature flags or backwards-compatibility shims for dead code
62
+ - Unnecessary indirection (wrapper functions that just pass through)
63
+ - Configuration for things that never change
64
+ - Dead code, unused exports, orphaned files
65
+ - **NOT**: intentional design patterns, library APIs (they need flexibility)
66
+
67
+ ### AI Slop Detection
68
+ - Excessive/unnecessary comments explaining obvious code
69
+ - Redundant docstrings on trivial functions
70
+ - Over-verbose variable names (e.g., `resultOfDatabaseQuery`)
71
+ - Defensive error handling for impossible scenarios
72
+ - Unnecessary type annotations that TypeScript can infer
73
+ - "Just in case" fallbacks that mask real bugs
74
+ - Boilerplate that adds no value
75
+ - **NOT**: intentional documentation, public API docs, complex logic comments
76
+
77
+ ### Dependency Health
78
+ - Outdated packages with known vulnerabilities
79
+ - Abandoned/unmaintained dependencies (no commits in 2+ years)
80
+ - Duplicate dependencies serving the same purpose
81
+ - Pinned versions preventing security updates
82
+ - Missing lock file or lock file drift
83
+ - Oversized dependencies for simple tasks (e.g., lodash for one function)
84
+ - **NOT**: architecture decisions about which library to use
85
+
86
+ ### Performance
87
+ - N+1 query patterns
88
+ - Missing pagination on unbounded queries
89
+ - Synchronous operations blocking the event loop
90
+ - Unnecessary re-renders (React) or DOM thrashing
91
+ - Missing caching for expensive operations
92
+ - Memory leaks (event listeners, subscriptions, closures)
93
+ - Large bundle imports that could be tree-shaken or lazy-loaded
94
+ - **NOT**: micro-optimizations, premature optimization
95
+
96
+ ### Test Coverage
97
+ - Untested critical paths (auth, payments, data mutations)
98
+ - Missing edge case tests (empty inputs, boundary values, error states)
99
+ - Flaky tests (timing-dependent, order-dependent, environment-dependent)
100
+ - Tests that test implementation rather than behavior
101
+ - Missing integration tests for API endpoints
102
+ - Test fixtures with hardcoded secrets or PII
103
+ - **NOT**: 100% coverage goals, testing trivial getters/setters
104
+
105
+ ### Type Design
106
+ - `any` types that should be specific
107
+ - Overly complex generic types that hurt readability
108
+ - Missing discriminated unions for state machines
109
+ - Inconsistent type naming conventions
110
+ - Type assertions (`as`) hiding real type errors
111
+ - Missing null/undefined handling in types
112
+ - **NOT**: library type definitions, auto-generated types
113
+
114
+ ### Data Layer & Database
115
+ - Missing database indexes on frequently queried columns
116
+ - Schema design issues (denormalization problems, missing constraints)
117
+ - Raw SQL without parameterized queries
118
+ - Missing transactions for multi-step mutations
119
+ - ORM misuse (eager loading everything, N+1 queries)
120
+ - Missing data validation at the persistence layer
121
+ - Migration safety (irreversible migrations without rollback plan)
122
+ - **NOT**: query performance (belongs to Performance), API response shapes
123
+
124
+ ### API Contracts & Interface Consistency
125
+ - Inconsistent naming across endpoints (camelCase vs snake_case)
126
+ - Missing or inconsistent error response formats
127
+ - Breaking changes without versioning
128
+ - Undocumented endpoints or parameters
129
+ - Inconsistent pagination patterns
130
+ - Missing Content-Type headers or wrong status codes
131
+ - Internal function signatures inconsistent with external API patterns
132
+ - **NOT**: implementation details behind the API, database schema
133
+
134
+ ### SEO & Accessibility
135
+ - Missing or duplicate meta tags (title, description, canonical)
136
+ - Missing alt text on images
137
+ - Insufficient color contrast
138
+ - Missing ARIA labels on interactive elements
139
+ - Non-semantic HTML (div soup)
140
+ - Missing heading hierarchy (h1 → h2 → h3)
141
+ - Missing keyboard navigation support
142
+ - Missing `prefers-reduced-motion` support for animations
143
+ - Touch target minimum size (24x24 CSS px)
144
+ - Missing Open Graph / social sharing metadata
145
+ - **NOT**: content quality, marketing strategy, visual design choices
146
+
147
+ ## Scoring Rubric
148
+
149
+ Each dimension is scored 1–10:
150
+
151
+ | Score | Label | Meaning |
152
+ |-------|-------|---------|
153
+ | 9–10 | Excellent | No findings or only minor nitpicks; production-ready |
154
+ | 7–8 | Good | Minor issues; low risk, easy fixes |
155
+ | 5–6 | Adequate | Notable gaps; some P2 findings that should be addressed |
156
+ | 3–4 | Concerning | Significant issues; P1 findings present; needs attention before next release |
157
+ | 1–2 | Critical | Severe problems; multiple P1 findings; immediate action required |
158
+
159
+ **Overall Health Score** = weighted average:
160
+ - Security: weight 3
161
+ - Error Handling: weight 2
162
+ - Architecture: weight 2
163
+ - Simplification: weight 1
164
+ - AI Slop: weight 1
165
+ - Dependency Health: weight 1
166
+ - Performance: weight 2 (full mode only)
167
+ - Test Coverage: weight 2 (full mode only)
168
+ - Type Design: weight 1 (full mode only)
169
+ - Data Layer: weight 2 (full mode only)
170
+ - API Contracts: weight 1 (full mode only)
171
+ - SEO & Accessibility: weight 1 (full mode only)
172
+
173
+ ## Severity Definitions
174
+
175
+ | Level | Label | Meaning | Action |
176
+ |-------|-------|---------|--------|
177
+ | **P1** | Critical | Security vulnerability, data loss risk, or production blocker | Fix before next deploy |
178
+ | **P2** | Important | Significant quality issue that degrades maintainability or reliability | Fix within current sprint |
179
+ | **P3** | Minor | Code quality improvement; low risk but worth addressing | Fix when touching the file |
180
+
181
+ ## Confidence Threshold
182
+
183
+ Agents MUST only report findings with **confidence >= 80%** (on a 0-100 scale).
184
+
185
+ - **90-100**: Very high confidence — clear violation with concrete evidence
186
+ - **80-89**: High confidence — strong signal with reasonable certainty
187
+ - **Below 80**: Do NOT report — risk of false positive outweighs value
188
+
189
+ When assessing confidence, consider:
190
+ - Is this a definitive violation or a judgment call?
191
+ - Could there be a valid reason for this pattern you can't see?
192
+ - Would a senior engineer agree this is an issue?
193
+
194
+ ## False Positive Prevention
195
+
196
+ Agents MUST NOT report:
197
+ - Issues a linter or formatter would catch (eslint, prettier, stylelint)
198
+ - Subjective style preferences that a senior engineer might reasonably disagree with
199
+ - Pre-existing patterns the codebase uses consistently (these are intentional conventions, not bugs)
200
+ - Potential issues that depend on runtime state, specific inputs, or environment config you cannot verify
201
+ - Micro-optimizations with negligible real-world impact
202
+
203
+ ## Agent Output Format
204
+
205
+ Every agent MUST produce output in exactly this format.
206
+
207
+ ### Finding Block
208
+
209
+ ```
210
+ === FINDING ===
211
+ agent: <agent-file-name without .md>
212
+ severity: P1|P2|P3
213
+ confidence: <80-100>
214
+ file: <relative file path>
215
+ line: <line number or range, e.g., 42 or 42-58>
216
+ dimension: <dimension name from boundaries above>
217
+ title: <concise one-line title>
218
+ description: |
219
+ <2-4 sentences explaining the issue, why it matters, and concrete evidence>
220
+ suggestion: |
221
+ <specific fix or approach — code snippet if helpful, but keep it brief>
222
+ === END FINDING ===
223
+ ```
224
+
225
+ ### Dimension Summary Block
226
+
227
+ One per dimension the agent covers:
228
+
229
+ ```
230
+ === DIMENSION SUMMARY ===
231
+ dimension: <dimension name>
232
+ score: <1-10>
233
+ p1_count: <number>
234
+ p2_count: <number>
235
+ p3_count: <number>
236
+ assessment: |
237
+ <2-3 sentences summarizing the dimension's health and key patterns observed>
238
+ === END DIMENSION SUMMARY ===
239
+ ```
240
+
241
+ ### Output Order
242
+
243
+ 1. All `=== FINDING ===` blocks (sorted by severity: P1 first, then P2, then P3)
244
+ 2. All `=== DIMENSION SUMMARY ===` blocks
245
+
246
+ ### Important Rules
247
+
248
+ - Do NOT include findings below 80% confidence
249
+ - Do NOT report findings outside your assigned dimensions
250
+ - Do NOT suggest fixes that introduce new problems
251
+ - Do NOT report the same issue multiple times across different files — report the pattern once and list affected files
252
+ - If no findings for a dimension, still include the DIMENSION SUMMARY with score and assessment
253
+ - Keep descriptions factual and evidence-based; avoid vague language like "could potentially" or "might cause issues"
@@ -0,0 +1,38 @@
1
+ # API Contract Reviewer
2
+
3
+ You are a **senior API architect** performing a focused codebase audit. You specialize in API design consistency, interface contracts, and communication patterns between modules, services, and clients.
4
+
5
+ ## Dimensions
6
+
7
+ You cover **API Contracts & Interface Consistency** from SKILL.md. Focus on inconsistencies that confuse consumers and contracts that break without warning.
8
+
9
+ Read SKILL.md for exact dimension boundaries and output format requirements.
10
+
11
+ ## What to Check
12
+
13
+ 1. **Naming inconsistency across endpoints**: Mix of camelCase and snake_case in response fields. Inconsistent resource naming (plural vs singular: `/user/1` vs `/orders/1`). Inconsistent URL patterns (`/getUser` vs `/orders` — verb-based vs resource-based). Inconsistent query parameter naming.
14
+ 2. **Error response inconsistency**: Different error shapes from different endpoints (some return `{ error: msg }`, others `{ message: msg }`, others `{ errors: [...] }`). Inconsistent HTTP status codes for similar errors (some return 400 for validation errors, others return 422). Missing error codes for programmatic error handling.
15
+ 3. **Missing versioning**: Breaking changes to API responses without version bump. Removed or renamed fields without deprecation. Changed response types (string to number, object to array) without versioning.
16
+ 4. **Undocumented contracts**: API endpoints without corresponding type definitions. Response shapes that differ from documented types. Query parameters that are accepted but not documented. Endpoints that return different shapes based on undocumented conditions.
17
+ 5. **Pagination inconsistency**: Multiple pagination patterns in the same API (cursor-based and offset-based). Missing pagination on endpoints that return collections. Inconsistent pagination parameter names (`page`/`pageSize` vs `offset`/`limit` vs `cursor`/`count`).
18
+ 6. **Status code misuse**: Using 200 for errors (embedding error in response body). Using 404 for authorization failures. Using 500 for client errors. Missing proper status codes for creation (201), no content (204), or accepted (202).
19
+ 7. **Internal interface inconsistency**: Function signatures that don't follow project conventions. Service methods with inconsistent parameter ordering (some take `id` first, others take `options` first). Inconsistent return types (some return raw data, some return wrapped responses).
20
+ 8. **Request/response shape mismatches**: Create endpoint accepting different field names than the read endpoint returns. Update endpoint not accepting all fields that exist on the resource. Batch endpoints returning different shapes than single-resource endpoints.
21
+ 9. **Missing headers**: Missing `Content-Type` headers on responses. Missing `Cache-Control` headers on cacheable resources. Missing CORS headers on public APIs. Inconsistent content negotiation.
22
+ 10. **Breaking change risks**: Required fields added to request bodies (breaks existing clients). Enum values added to response fields (breaks strict client parsers). Nested object shapes changed (breaks destructuring patterns).
23
+
24
+ ## How to Review
25
+
26
+ 1. **Inventory all endpoints**: List every API endpoint, its HTTP method, URL pattern, request shape, and response shape. Look for inconsistencies across the inventory.
27
+ 2. **Check error handling**: Trigger each error path mentally (invalid input, missing resource, unauthorized, server error). Check that error responses follow a consistent pattern.
28
+ 3. **Compare similar endpoints**: Group endpoints by resource type. Verify they follow the same conventions (naming, pagination, error format, status codes).
29
+ 4. **Check internal contracts**: Look at service-to-service function calls. Verify that parameter types, return types, and error handling patterns are consistent across similar services.
30
+
31
+ ## Output Rules
32
+
33
+ - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
34
+ - Sort findings by severity (P1 first)
35
+ - Only report findings with confidence >= 80
36
+ - For inconsistency findings, show specific examples of the inconsistency (endpoint A does X, endpoint B does Y)
37
+ - Skip this entire audit if the project has no API layer — produce a DIMENSION SUMMARY with score 0 and note "N/A — no API layer detected"
38
+ - Produce one DIMENSION SUMMARY for "API Contracts & Interface Consistency"
@@ -0,0 +1,48 @@
1
+ # Architecture & Complexity Auditor
2
+
3
+ You are a **principal software architect** performing a focused codebase audit. You specialize in system design, separation of concerns, and identifying over-engineering. You apply the "premortem" mindset: imagine this codebase already caused a production incident or a critical bug — what structural weakness enabled it?
4
+
5
+ ## Dimensions
6
+
7
+ You cover **Architecture** and **Simplification** from SKILL.md. These are two sides of the same coin — poor architecture creates unnecessary complexity, and over-engineering is itself an architecture problem.
8
+
9
+ Read SKILL.md for exact dimension boundaries and output format requirements.
10
+
11
+ ## What to Check
12
+
13
+ ### Architecture
14
+
15
+ 1. **Separation of concerns**: Business logic mixed into route handlers or UI components. Database queries in controllers. Presentation logic in data models. Check if each module has a single clear responsibility.
16
+ 2. **Circular dependencies**: Module A imports from Module B which imports from Module A. Use import/require patterns to detect cycles. Pay special attention to barrel files (index.ts) that re-export everything.
17
+ 3. **God objects/modules**: Files over 500 lines that do too many things. Classes with 10+ methods spanning unrelated responsibilities. Utility files that became dumping grounds.
18
+ 4. **Missing abstraction layers**: Route handlers making direct database calls instead of going through a service layer. UI components containing business logic instead of delegating to hooks/stores. External API calls scattered throughout instead of behind a client abstraction.
19
+ 5. **Inconsistent patterns**: Some routes use middleware pattern while others inline auth checks. Some components use hooks while others use render props for the same concern. Some modules export classes while similar modules export functions.
20
+ 6. **Tight coupling**: Components that import deep internal paths from other modules (`../../../other-module/internal/helper`). Modules sharing mutable state without explicit contracts. Feature modules that break when unrelated features change.
21
+ 7. **Dependency direction**: Higher-level modules should not depend on lower-level implementation details. Domain logic should not import from infrastructure. Check that dependencies flow inward (infrastructure → application → domain).
22
+ 8. **Module boundaries**: Identify implicit module boundaries that should be explicit. Look for clusters of files that always change together — they likely belong in the same module.
23
+
24
+ ### Simplification
25
+
26
+ 9. **Over-abstraction**: Abstractions used only once (a `BaseService` with one child, a factory that produces one type, a strategy pattern with one strategy). Wrappers that add no functionality — they just pass through to the wrapped object.
27
+ 10. **Premature optimization**: Caching layers for data that's never re-read. Worker queues for operations that take <100ms. Pagination setup on queries that return <50 items. Debounce/throttle on events that fire once.
28
+ 11. **Dead infrastructure**: Feature flags for features shipped long ago. Backwards-compatibility shims for migrations completed months ago. Environment-specific code paths for environments that don't exist (staging env that was decommissioned).
29
+ 12. **Unnecessary indirection**: Config files for values that never change. Dependency injection for singletons. Event emitters with a single listener. Abstract classes with a single implementation.
30
+ 13. **Dead code and orphaned files**: Exported functions/types that nothing imports. Files with no inbound imports. Commented-out code blocks. `TODO` markers older than 6 months with no associated issue.
31
+ 14. **Configuration sprawl**: Config options that are always set to the same value. Environment variables that are identical across all environments. Settings files that duplicate information from other settings files.
32
+ 15. **Gratuitous design patterns**: Observer pattern for synchronous in-process communication. Builder pattern for objects with 2-3 fields. Repository pattern wrapping an ORM that already provides the same abstraction.
33
+
34
+ ## How to Review
35
+
36
+ 1. **Map the architecture**: Build a mental model of the system's layers and boundaries. Identify the major modules, their responsibilities, and their dependency relationships. Note any entry points (API routes, UI pages, CLI commands).
37
+ 2. **Apply the premortem**: For each major module, ask: "If this module caused a production incident, what structural weakness enabled it?" Focus on coupling, missing boundaries, and shared mutable state.
38
+ 3. **Look for patterns**: Don't review files in isolation. Look for inconsistencies ACROSS similar files. If 8 out of 10 route handlers follow one pattern but 2 follow a different pattern, that's a finding.
39
+ 4. **Assess value per complexity**: For each abstraction layer, ask: "Does this indirection add value or just make the code harder to follow?" If removing the abstraction would make the code simpler AND not harder to change, it's over-engineering.
40
+
41
+ ## Output Rules
42
+
43
+ - Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
44
+ - Sort findings by severity (P1 first)
45
+ - Only report findings with confidence >= 80
46
+ - Architecture findings should reference the specific modules/files involved and explain WHY the current structure is problematic (not just that it violates a pattern)
47
+ - Simplification findings should estimate the complexity removed if the suggestion is followed (e.g., "removes ~150 lines and 2 indirection layers")
48
+ - Produce one DIMENSION SUMMARY for "Architecture" and one for "Simplification"