npm - @torka/claude-workflows - Versions diffs - 0.12.0 → 0.13.1 - Mend

@torka/claude-workflows 0.12.0 → 0.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/.claude-plugin/plugin.json +8 -0
package/README.md +22 -5
package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-01b-continue.md +9 -2
package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-02-orchestrate.md +108 -2
package/bmad-workflows/bmm/workflows/4-implementation/implement-epic-with-subagents/steps/step-03-complete.md +35 -1
package/commands/deep-audit.md +389 -0
package/commands/dev-story-backend.md +12 -11
package/commands/dev-story-fullstack.md +6 -2
package/commands/dev-story-ui.md +4 -4
package/commands/github-pr-resolve.md +132 -24
package/install.js +2 -3
package/package.json +1 -1
package/skills/deep-audit/INSPIRATIONS.md +26 -0
package/skills/deep-audit/SKILL.md +253 -0
package/skills/deep-audit/agents/api-contract-reviewer.md +38 -0
package/skills/deep-audit/agents/architecture-and-complexity.md +48 -0
package/skills/deep-audit/agents/code-health.md +51 -0
package/skills/deep-audit/agents/data-layer-reviewer.md +39 -0
package/skills/deep-audit/agents/performance-profiler.md +38 -0
package/skills/deep-audit/agents/security-and-error-handling.md +52 -0
package/skills/deep-audit/agents/seo-accessibility-auditor.md +49 -0
package/skills/deep-audit/agents/test-coverage-analyst.md +37 -0
package/skills/deep-audit/agents/type-design-analyzer.md +38 -0
package/skills/deep-audit/templates/report-template.md +87 -0
package/skills/designer-founder/SKILL.md +8 -7
package/skills/designer-founder/steps/step-01-context.md +94 -45
package/skills/designer-founder/steps/step-02-scope.md +6 -23
package/skills/designer-founder/steps/step-03-design.md +29 -58
package/skills/designer-founder/steps/step-04-artifacts.md +137 -113
package/skills/designer-founder/steps/step-05-epic-linking.md +81 -53
package/skills/designer-founder/steps/step-06-validate.md +181 -0
package/skills/designer-founder/templates/component-strategy.md +4 -0
package/skills/designer-founder/tools/magicpatterns.md +52 -19
package/skills/designer-founder/tools/stitch.md +97 -67
package/uninstall.js +22 -0

package/commands/github-pr-resolve.md CHANGED Viewed

@@ -33,7 +33,7 @@ Before any operations, verify the environment is safe:
    - If output is non-empty, WARN the user but continue
 8. **Initialize tracking**:
-   - Create counters: merged=0, skipped=0, failed=0, auto_fixed=0
+   - Create counters: merged=0, skipped=0, failed=0, auto_fixed_lint=0, review_comments_assessed=0, review_comments_fixed=0, review_comments_logged=0
    - Create lists: merged_prs[], skipped_prs[], failed_prs[]
 If gh authentication fails, STOP and clearly explain what the user needs to do.
@@ -46,7 +46,7 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
 1. **Fetch all open PRs**:
    ```bash
-   gh pr list --state open --json number,title,headRefName,baseRefName,statusCheckRollup,reviewDecision,isDraft,url,author,mergeStateStatus,mergeable,state
+   gh pr list --state open --json number,title,headRefName,baseRefName,statusCheckRollup,reviewDecision,isDraft,url,author,mergeStateStatus,mergeable,state,reviews,latestReviews
    ```
 2. **For each PR, categorize**:
@@ -61,26 +61,28 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
    | Conflicts | mergeable=CONFLICTING | Skip, warn user |
    | Draft | isDraft=true | Skip |
    | Already Merged | state=MERGED | Log and skip |
+   | Has Review Comments | reviews/latestReviews non-empty OR reviewDecision set | Process in Phase 2 |
-3. **Present summary** (informational only, no blocking):
+   > **Note**: "Has Review Comments" is an **overlay** category — a PR can be both "CI Passed" and "Has Review Comments". Phase 2 runs before merging regardless of CI status.
+3. **Present summary** (does NOT pause for confirmation):
    ```
    Processing X open PRs...
-   | # | Title | Author | Status | Action |
-   |---|-------|--------|--------|--------|
-   | 42 | bump cross-env | dependabot | CI Passed | Will merge |
-   | 35 | bump sharp | dependabot | CI Pending | Will wait |
-   | 32 | bump linting | dependabot | CI Failed | Will auto-fix |
-   | 28 | new feature | user | Draft | Will skip |
-   | 25 | refactor auth | user | Conflicts | Will skip |
+   | # | Title | Author | Status | Reviews | Action |
+   |---|-------|--------|--------|---------|--------|
+   | 42 | bump cross-env | dependabot | CI Passed | - | Will merge |
+   | 35 | bump sharp | dependabot | CI Pending | 2 bot comments | Will wait + assess reviews |
+   | 32 | bump linting | dependabot | CI Failed | - | Will auto-fix |
+   | 28 | new feature | user | Draft | 1 human review | Will skip |
+   | 25 | refactor auth | user | Conflicts | - | Will skip |
    ```
-4. **Check for blockers** - ONLY pause if:
-   - There are PRs with merge conflicts (warn user which ones will be skipped)
-   - There are draft PRs (inform user they will be skipped)
-   - All PRs have non-auto-fixable failures
-   Otherwise, proceed automatically.
+4. **Log non-processable PRs** (no user prompt):
+   - PRs with merge conflicts → log as skipped
+   - Draft PRs → log as skipped
+   - If ALL PRs are non-processable (conflicts, drafts, non-fixable failures) → inform user and end workflow
+   - Otherwise, proceed automatically with processable PRs.
 5. **Sort PRs by priority**:
    1. Infrastructure PRs first (CI/workflow changes)
@@ -90,16 +92,92 @@ If gh authentication fails, STOP and clearly explain what the user needs to do.
 ---
-## Phase 2: Review Comments Handling
+## Phase 2: Review Comments Handling (Autonomous)
+**Goal**: Assess and auto-fix review comments from ALL sources (bot and human inline comments). No user prompts in this phase.
+For each PR with review data (reviews/latestReviews non-empty, or reviewDecision set), run the following steps:
+### Step 1: Fetch All Review Data
+GitHub PRs have three distinct comment endpoints. Fetch all three:
+```bash
+# Inline review comments (code-level, attached to diff hunks)
+gh api repos/{owner}/{repo}/pulls/{number}/comments --paginate
+# Top-level review submissions (APPROVE, CHANGES_REQUESTED, COMMENTED bodies)
+gh api repos/{owner}/{repo}/pulls/{number}/reviews --paginate
+# General PR conversation comments (some bots post here instead of inline)
+gh api repos/{owner}/{repo}/issues/{number}/comments --paginate
+```
+Never rely on `reviewDecision` alone. If all three return empty arrays, skip to Phase 3 for this PR.
+### Step 2: Handle Formal CHANGES_REQUESTED
+If `reviewDecision: CHANGES_REQUESTED` AND the reviewer is a human (not a bot) → **auto-skip PR entirely**. Add to `skipped_prs` with reason "human requested changes". This is the only hard stop.
+### Step 3: Assess ALL Comments Uniformly
+Read the `diff_hunk` context for each comment. Classify into one of four buckets:
+**Valid + straightforward** → auto-fix:
+- "Unused import `fs` on line 12" → remove it
+- "Missing null check before accessing `user.name`" → add the guard
+- "Variable `x` should be `userCount`" → rename it
+- "Missing `await` on async call" → add it
+- "Typo: `recieve` → `receive`" → fix it
+**Valid + complex** → log, proceed to merge:
+- "Consider refactoring this into a strategy pattern"
+- "This function does too much — split into smaller functions"
+- "Should we add retry logic here?"
+- Any comment phrased as a question requiring design decisions
+- Any suggestion touching multiple files or changing control flow
+**Invalid / false positive** → dismiss, log:
+- Bot references code that no longer exists in the current diff
+- Suggestion would introduce a bug or break existing behavior
+- Comment is about a different file/context than the one it's attached to
-**Note**: This phase only runs if PRs have pending review comments. For batch processing of simple dependency bumps, this phase is typically skipped.
+**Cosmetic only** → auto-fix if single-line, otherwise log:
+- "Add trailing comma" → auto-fix
+- "Reorder imports alphabetically" → auto-fix if simple, log if many files
-For PRs with `reviewDecision: CHANGES_REQUESTED`:
+**Conservative default**: when uncertain, always classify as "valid + complex" (log, don't fix). The cost of logging instead of fixing is near zero. The cost of botching an architectural change is a broken PR.
-1. Fetch review comments
-2. If all comments are trivial (typos, formatting, docstrings) - auto-fix
-3. If comments require logic changes - skip PR and inform user
-4. Push fixes without asking for confirmation
+Bot and human inline comments are treated identically. The only distinction is formal `CHANGES_REQUESTED` (Step 2).
+Increment `review_comments_assessed` for each comment processed.
+### Step 4: Auto-fix via Sub-agent
+For PRs with fixable comments, launch a **Task sub-agent** to apply fixes. This preserves the main workflow's context window for orchestration.
+Sub-agent receives:
+- PR number, branch name, repo owner/name
+- List of fixable comments (file path, line number, `diff_hunk`, suggestion text, classification)
+- Instruction: checkout branch, read relevant files, apply fixes, commit as `"fix: address review feedback"`, push
+Sub-agent returns: summary of changes made (files modified, comments addressed).
+Main workflow waits for sub-agent completion, then marks PR for CI re-check in Phase 3.
+Process PRs with fixable comments **sequentially** (one sub-agent at a time). Parallel sub-agents deferred to v2 — multiple git checkouts risk worktree conflicts and harder debugging.
+Increment `review_comments_fixed` for each comment addressed by the sub-agent.
+### Step 5: Log Remaining Comments
+For all non-fixed comments (valid + complex, invalid/dismissed, cosmetic-but-skipped):
+- Record: PR number, comment author, file path, snippet of comment body, classification reason
+- Increment `review_comments_logged` for each
+These are included in the Phase 5 final summary.
+**No user prompts in this phase. Fully autonomous.**
 ---
@@ -131,6 +209,19 @@ gh api repos/{owner}/{repo}/pulls/{number}/update-branch -X PUT
 ```
 Then wait 30s and re-check CI.
+### Step 2b: Rerun vs Update-branch Decision
+When CI needs re-triggering, choose the right action:
+| Scenario | Action |
+|----------|--------|
+| Branch behind main | `gh api repos/{owner}/{repo}/pulls/{number}/update-branch -X PUT` |
+| Flaky test (same code passed before) | `gh run rerun --failed` |
+| CI workflow changed on main | `update-branch` (picks up new workflow) |
+| Secrets unavailable (Dependabot) | `update-branch` first |
+**Rule**: prefer `update-branch` over `rerun` unless confirmed flaky test on an up-to-date branch.
 ### Step 3: Wait for CI
 While CI is pending and elapsed < 10 minutes:
@@ -288,10 +379,17 @@ RESULTS:
 AUTO-FIXES APPLIED:
   - X lint errors fixed automatically
+  - X review comments auto-resolved (from Y assessed)
+  - Z review comments logged (not auto-fixable)
+REVIEW COMMENTS LOGGED (not auto-fixed):
+  - PR #35 [codex-bot] src/utils.ts: "Consider extracting to helper" (valid+complex)
+  - PR #35 [codex-bot] src/index.ts: "Stale reference to removed fn" (invalid/dismissed)
 REMAINING WORK:
   - PR #25 has merge conflicts at: src/auth.ts
   - PR #20 needs manual fix for type errors
+  - PR #18 skipped: human reviewer formally requested changes
 =====================================================
 ```
@@ -303,12 +401,14 @@ REMAINING WORK:
 ### ALWAYS Auto-fix
 - Lint errors (`npm run lint -- --fix`)
 - Formatting errors
+- Review comments (bot or human) assessed as valid + straightforward
 ### NEVER Auto-fix
 - Type errors (TypeScript)
 - Test failures
 - Build errors
-- Logic changes in review comments
+- Logic/architectural changes in review comments
+- PRs where a human reviewer formally requested changes (auto-skip entire PR)
 ### Branch Safety
 - NEVER use force push
@@ -320,6 +420,14 @@ REMAINING WORK:
 - 10 minute max wait per PR
 - 2 max fix attempts before skipping
+### Review Comment Handling
+- ALWAYS fetch inline comments via API (never rely solely on `reviewDecision`)
+- NEVER merge a PR where a human formally requested changes (auto-skip)
+- ALWAYS assess ALL comments (bot and human) for validity before applying
+- Treat bot and human inline comments identically — assess, auto-fix if straightforward
+- NEVER prompt the user for review comment decisions
+- ALWAYS log dismissed/skipped comments for the final summary
 ### Error Handling
 - "Already merged" is a success, not an error
 - Continue to next PR on any error

package/install.js CHANGED Viewed

@@ -215,6 +215,8 @@ function install() {
     'skills/agent-creator/',
     'skills/designer-founder/',
     'skills/product-architect/',
+    'commands/deep-audit.md',
+    'skills/deep-audit/',
     '*.backup',
   ];
   const addedCount = ensureGitignoreEntries(
@@ -392,9 +394,6 @@ async function installStitchSkills(isGlobal) {
   };
   try {
-    // Install design-md skill (required for Stitch)
-    execSync(`npx skills add google-labs-code/stitch-skills --skill design-md${globalFlag} -a claude-code -y`, execOptions);
     // Install react-components skill (for HTML→React conversion)
     execSync(`npx skills add google-labs-code/stitch-skills --skill react-components${globalFlag} -a claude-code -y`, execOptions);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@torka/claude-workflows",
-  "version": "0.12.0",
+  "version": "0.13.1",
   "description": "Claude Code workflow helpers: epic automation, git cleanup, agents, and design workflows",
   "keywords": [
     "claude-code",

package/skills/deep-audit/INSPIRATIONS.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Deep Audit — Design Inspirations
+Reference sources consulted during agent prompt design. Not loaded at runtime — for development reference only.
+## Tier 1 — Core Patterns
+| Source | URL | Pattern Used |
+|--------|-----|-------------|
+| Anthropic `code-review` plugin | https://github.com/anthropics/claude-code/tree/main/plugins/code-review | Confidence scoring (0-100), only report 80+, false-positive filtering |
+| Anthropic `pr-review-toolkit` | https://github.com/anthropics/claude-code/tree/main/plugins/pr-review-toolkit | 6 specialized agents, smart orchestration, parallel spawning, conditional agent selection |
+| EveryInc compound-engineering | https://github.com/EveryInc/compound-engineering-plugin/blob/main/plugins/compound-engineering/commands/workflows/review.md | 15 review agents, P1/P2/P3 severity, file-based output |
+| EveryInc review agents | https://github.com/EveryInc/compound-engineering-plugin/tree/main/plugins/compound-engineering/agents/review | Agent-per-dimension pattern, structured handoff |
+| wshobson code-reviewer | https://github.com/wshobson/agents/blob/main/plugins/comprehensive-review/agents/code-reviewer.md | 5-phase review structure, numbered output artifacts |
+| wshobson security-auditor | https://github.com/wshobson/agents/blob/main/plugins/comprehensive-review/agents/security-auditor.md | OWASP checklist integration, security-specific review flow |
+| Deslop/Anti-slop SKILL.md | https://github.com/avifenesh/agentsys/blob/main/plugins/deslop/skills/deslop/SKILL.md | Certainty-gated approach, minimal diff principle, deterministic patterns first |
+## Tier 2 — Cherry-Picked Ideas
+| Source | URL | Pattern Used |
+|--------|-----|-------------|
+| undeadlist agents | https://github.com/undeadlist/claude-code-agents/tree/main/.claude/agents | SEO auditor framework detection + 6 audit categories |
+| davila7 code-review | https://github.com/davila7/claude-code-templates/blob/main/cli-tool/components/commands/utilities/code-review.md | Multi-file analysis patterns |
+| Premortem skill | https://www.aimcp.info/en/skills/ee6aa52d-1d52-45b7-8f10-c1b238772acd | "Imagine this already failed — why?" reframing for risk identification |
+| Evaluation rubrics skill | https://www.aimcp.info/en/skills/4090c675-54f6-42b8-96a6-74417f6db77c | 4-8 criteria, explicit descriptors per level, analytic scoring |
+| Stop-slop (find-bugs) | https://www.aimcp.info/en/skills/390a1fbf-b55b-466c-85a6-be818e33bb28 | 5-dimension scoring (1-10), threshold triggers, diff analysis |
+| AI Prompt Library blog | https://www.aipromptlibrary.app/blog/claude-code-prompt-library | Claude Code prompt patterns and agent design conventions |

package/skills/deep-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,253 @@
+# Deep Audit — Skill Reference
+This file is the single source of truth for agent roster, dimension boundaries, scoring rubric, and output format. Every audit agent reads this file to understand its scope and output requirements.
+## Agent Roster
+### Quick Mode (default — 3 agents)
+| Agent File | Dimensions | Model | Rationale |
+|------------|-----------|-------|-----------|
+| `security-and-error-handling.md` | Security, Error Handling | opus | Unhandled errors ARE security issues; one agent reasoning about both produces better findings |
+| `architecture-and-complexity.md` | Architecture, Simplification | opus | Architecture decisions need deepest reasoning; over-engineering IS an architecture problem |
+| `code-health.md` | AI Slop Detection, Dependency Health | sonnet | Both smell neglect — slop detection and dependency rot share the same instinct |
+### Full Mode (adds 6 more agents — `--full` flag)
+| Agent File | Dimension | Model |
+|------------|-----------|-------|
+| `performance-profiler.md` | Performance | sonnet |
+| `test-coverage-analyst.md` | Test Coverage | sonnet |
+| `type-design-analyzer.md` | Type Design | sonnet |
+| `data-layer-reviewer.md` | Data Layer & Database | opus |
+| `api-contract-reviewer.md` | API Contracts & Interface Consistency | sonnet |
+| `seo-accessibility-auditor.md` | SEO & Accessibility | sonnet |
+## Dimension Boundaries
+Each dimension has a clear scope. Agents MUST stay within their assigned dimensions and NOT report findings that belong to another dimension.
+### Security
+- Authentication & authorization flaws
+- Injection vulnerabilities (SQL, XSS, command injection, path traversal)
+- Secrets/credentials in code or config
+- Insecure cryptographic usage
+- CSRF, SSRF, open redirects
+- Unsafe deserialization
+- Missing rate limiting on auth endpoints
+- **NOT**: general error handling, performance, code style
+### Error Handling
+- Unhandled promise rejections and uncaught exceptions
+- Empty catch blocks or swallowed errors
+- Missing error boundaries (React) or global error handlers
+- Inconsistent error response formats
+- Missing validation at system boundaries (user input, external APIs)
+- Error messages leaking internal details (overlaps security — report under Security if exploitable)
+- **NOT**: business logic validation, type safety, test assertions
+### Architecture
+- Separation of concerns violations
+- Circular dependencies
+- God objects / god modules
+- Missing abstraction layers (e.g., direct DB calls in route handlers)
+- Inconsistent patterns across similar components
+- Tight coupling between modules that should be independent
+- **NOT**: code style, naming conventions, dependency versions
+### Simplification
+- Over-abstracted code (abstractions used once)
+- Premature optimization
+- Feature flags or backwards-compatibility shims for dead code
+- Unnecessary indirection (wrapper functions that just pass through)
+- Configuration for things that never change
+- Dead code, unused exports, orphaned files
+- **NOT**: intentional design patterns, library APIs (they need flexibility)
+### AI Slop Detection
+- Excessive/unnecessary comments explaining obvious code
+- Redundant docstrings on trivial functions
+- Over-verbose variable names (e.g., `resultOfDatabaseQuery`)
+- Defensive error handling for impossible scenarios
+- Unnecessary type annotations that TypeScript can infer
+- "Just in case" fallbacks that mask real bugs
+- Boilerplate that adds no value
+- **NOT**: intentional documentation, public API docs, complex logic comments
+### Dependency Health
+- Outdated packages with known vulnerabilities
+- Abandoned/unmaintained dependencies (no commits in 2+ years)
+- Duplicate dependencies serving the same purpose
+- Pinned versions preventing security updates
+- Missing lock file or lock file drift
+- Oversized dependencies for simple tasks (e.g., lodash for one function)
+- **NOT**: architecture decisions about which library to use
+### Performance
+- N+1 query patterns
+- Missing pagination on unbounded queries
+- Synchronous operations blocking the event loop
+- Unnecessary re-renders (React) or DOM thrashing
+- Missing caching for expensive operations
+- Memory leaks (event listeners, subscriptions, closures)
+- Large bundle imports that could be tree-shaken or lazy-loaded
+- **NOT**: micro-optimizations, premature optimization
+### Test Coverage
+- Untested critical paths (auth, payments, data mutations)
+- Missing edge case tests (empty inputs, boundary values, error states)
+- Flaky tests (timing-dependent, order-dependent, environment-dependent)
+- Tests that test implementation rather than behavior
+- Missing integration tests for API endpoints
+- Test fixtures with hardcoded secrets or PII
+- **NOT**: 100% coverage goals, testing trivial getters/setters
+### Type Design
+- `any` types that should be specific
+- Overly complex generic types that hurt readability
+- Missing discriminated unions for state machines
+- Inconsistent type naming conventions
+- Type assertions (`as`) hiding real type errors
+- Missing null/undefined handling in types
+- **NOT**: library type definitions, auto-generated types
+### Data Layer & Database
+- Missing database indexes on frequently queried columns
+- Schema design issues (denormalization problems, missing constraints)
+- Raw SQL without parameterized queries
+- Missing transactions for multi-step mutations
+- ORM misuse (eager loading everything, N+1 queries)
+- Missing data validation at the persistence layer
+- Migration safety (irreversible migrations without rollback plan)
+- **NOT**: query performance (belongs to Performance), API response shapes
+### API Contracts & Interface Consistency
+- Inconsistent naming across endpoints (camelCase vs snake_case)
+- Missing or inconsistent error response formats
+- Breaking changes without versioning
+- Undocumented endpoints or parameters
+- Inconsistent pagination patterns
+- Missing Content-Type headers or wrong status codes
+- Internal function signatures inconsistent with external API patterns
+- **NOT**: implementation details behind the API, database schema
+### SEO & Accessibility
+- Missing or duplicate meta tags (title, description, canonical)
+- Missing alt text on images
+- Insufficient color contrast
+- Missing ARIA labels on interactive elements
+- Non-semantic HTML (div soup)
+- Missing heading hierarchy (h1 → h2 → h3)
+- Missing keyboard navigation support
+- Missing `prefers-reduced-motion` support for animations
+- Touch target minimum size (24x24 CSS px)
+- Missing Open Graph / social sharing metadata
+- **NOT**: content quality, marketing strategy, visual design choices
+## Scoring Rubric
+Each dimension is scored 1–10:
+| Score | Label | Meaning |
+|-------|-------|---------|
+| 9–10 | Excellent | No findings or only minor nitpicks; production-ready |
+| 7–8 | Good | Minor issues; low risk, easy fixes |
+| 5–6 | Adequate | Notable gaps; some P2 findings that should be addressed |
+| 3–4 | Concerning | Significant issues; P1 findings present; needs attention before next release |
+| 1–2 | Critical | Severe problems; multiple P1 findings; immediate action required |
+**Overall Health Score** = weighted average:
+- Security: weight 3
+- Error Handling: weight 2
+- Architecture: weight 2
+- Simplification: weight 1
+- AI Slop: weight 1
+- Dependency Health: weight 1
+- Performance: weight 2 (full mode only)
+- Test Coverage: weight 2 (full mode only)
+- Type Design: weight 1 (full mode only)
+- Data Layer: weight 2 (full mode only)
+- API Contracts: weight 1 (full mode only)
+- SEO & Accessibility: weight 1 (full mode only)
+## Severity Definitions
+| Level | Label | Meaning | Action |
+|-------|-------|---------|--------|
+| **P1** | Critical | Security vulnerability, data loss risk, or production blocker | Fix before next deploy |
+| **P2** | Important | Significant quality issue that degrades maintainability or reliability | Fix within current sprint |
+| **P3** | Minor | Code quality improvement; low risk but worth addressing | Fix when touching the file |
+## Confidence Threshold
+Agents MUST only report findings with **confidence >= 80%** (on a 0-100 scale).
+- **90-100**: Very high confidence — clear violation with concrete evidence
+- **80-89**: High confidence — strong signal with reasonable certainty
+- **Below 80**: Do NOT report — risk of false positive outweighs value
+When assessing confidence, consider:
+- Is this a definitive violation or a judgment call?
+- Could there be a valid reason for this pattern you can't see?
+- Would a senior engineer agree this is an issue?
+## False Positive Prevention
+Agents MUST NOT report:
+- Issues a linter or formatter would catch (eslint, prettier, stylelint)
+- Subjective style preferences that a senior engineer might reasonably disagree with
+- Pre-existing patterns the codebase uses consistently (these are intentional conventions, not bugs)
+- Potential issues that depend on runtime state, specific inputs, or environment config you cannot verify
+- Micro-optimizations with negligible real-world impact
+## Agent Output Format
+Every agent MUST produce output in exactly this format.
+### Finding Block
+```
+=== FINDING ===
+agent: <agent-file-name without .md>
+severity: P1|P2|P3
+confidence: <80-100>
+file: <relative file path>
+line: <line number or range, e.g., 42 or 42-58>
+dimension: <dimension name from boundaries above>
+title: <concise one-line title>
+description: |
+  <2-4 sentences explaining the issue, why it matters, and concrete evidence>
+suggestion: |
+  <specific fix or approach — code snippet if helpful, but keep it brief>
+=== END FINDING ===
+```
+### Dimension Summary Block
+One per dimension the agent covers:
+```
+=== DIMENSION SUMMARY ===
+dimension: <dimension name>
+score: <1-10>
+p1_count: <number>
+p2_count: <number>
+p3_count: <number>
+assessment: |
+  <2-3 sentences summarizing the dimension's health and key patterns observed>
+=== END DIMENSION SUMMARY ===
+```
+### Output Order
+1. All `=== FINDING ===` blocks (sorted by severity: P1 first, then P2, then P3)
+2. All `=== DIMENSION SUMMARY ===` blocks
+### Important Rules
+- Do NOT include findings below 80% confidence
+- Do NOT report findings outside your assigned dimensions
+- Do NOT suggest fixes that introduce new problems
+- Do NOT report the same issue multiple times across different files — report the pattern once and list affected files
+- If no findings for a dimension, still include the DIMENSION SUMMARY with score and assessment
+- Keep descriptions factual and evidence-based; avoid vague language like "could potentially" or "might cause issues"

package/skills/deep-audit/agents/api-contract-reviewer.md ADDED Viewed

@@ -0,0 +1,38 @@
+# API Contract Reviewer
+You are a **senior API architect** performing a focused codebase audit. You specialize in API design consistency, interface contracts, and communication patterns between modules, services, and clients.
+## Dimensions
+You cover **API Contracts & Interface Consistency** from SKILL.md. Focus on inconsistencies that confuse consumers and contracts that break without warning.
+Read SKILL.md for exact dimension boundaries and output format requirements.
+## What to Check
+1. **Naming inconsistency across endpoints**: Mix of camelCase and snake_case in response fields. Inconsistent resource naming (plural vs singular: `/user/1` vs `/orders/1`). Inconsistent URL patterns (`/getUser` vs `/orders` — verb-based vs resource-based). Inconsistent query parameter naming.
+2. **Error response inconsistency**: Different error shapes from different endpoints (some return `{ error: msg }`, others `{ message: msg }`, others `{ errors: [...] }`). Inconsistent HTTP status codes for similar errors (some return 400 for validation errors, others return 422). Missing error codes for programmatic error handling.
+3. **Missing versioning**: Breaking changes to API responses without version bump. Removed or renamed fields without deprecation. Changed response types (string to number, object to array) without versioning.
+4. **Undocumented contracts**: API endpoints without corresponding type definitions. Response shapes that differ from documented types. Query parameters that are accepted but not documented. Endpoints that return different shapes based on undocumented conditions.
+5. **Pagination inconsistency**: Multiple pagination patterns in the same API (cursor-based and offset-based). Missing pagination on endpoints that return collections. Inconsistent pagination parameter names (`page`/`pageSize` vs `offset`/`limit` vs `cursor`/`count`).
+6. **Status code misuse**: Using 200 for errors (embedding error in response body). Using 404 for authorization failures. Using 500 for client errors. Missing proper status codes for creation (201), no content (204), or accepted (202).
+7. **Internal interface inconsistency**: Function signatures that don't follow project conventions. Service methods with inconsistent parameter ordering (some take `id` first, others take `options` first). Inconsistent return types (some return raw data, some return wrapped responses).
+8. **Request/response shape mismatches**: Create endpoint accepting different field names than the read endpoint returns. Update endpoint not accepting all fields that exist on the resource. Batch endpoints returning different shapes than single-resource endpoints.
+9. **Missing headers**: Missing `Content-Type` headers on responses. Missing `Cache-Control` headers on cacheable resources. Missing CORS headers on public APIs. Inconsistent content negotiation.
+10. **Breaking change risks**: Required fields added to request bodies (breaks existing clients). Enum values added to response fields (breaks strict client parsers). Nested object shapes changed (breaks destructuring patterns).
+## How to Review
+1. **Inventory all endpoints**: List every API endpoint, its HTTP method, URL pattern, request shape, and response shape. Look for inconsistencies across the inventory.
+2. **Check error handling**: Trigger each error path mentally (invalid input, missing resource, unauthorized, server error). Check that error responses follow a consistent pattern.
+3. **Compare similar endpoints**: Group endpoints by resource type. Verify they follow the same conventions (naming, pagination, error format, status codes).
+4. **Check internal contracts**: Look at service-to-service function calls. Verify that parameter types, return types, and error handling patterns are consistent across similar services.
+## Output Rules
+- Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
+- Sort findings by severity (P1 first)
+- Only report findings with confidence >= 80
+- For inconsistency findings, show specific examples of the inconsistency (endpoint A does X, endpoint B does Y)
+- Skip this entire audit if the project has no API layer — produce a DIMENSION SUMMARY with score 0 and note "N/A — no API layer detected"
+- Produce one DIMENSION SUMMARY for "API Contracts & Interface Consistency"

package/skills/deep-audit/agents/architecture-and-complexity.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Architecture & Complexity Auditor
+You are a **principal software architect** performing a focused codebase audit. You specialize in system design, separation of concerns, and identifying over-engineering. You apply the "premortem" mindset: imagine this codebase already caused a production incident or a critical bug — what structural weakness enabled it?
+## Dimensions
+You cover **Architecture** and **Simplification** from SKILL.md. These are two sides of the same coin — poor architecture creates unnecessary complexity, and over-engineering is itself an architecture problem.
+Read SKILL.md for exact dimension boundaries and output format requirements.
+## What to Check
+### Architecture
+1. **Separation of concerns**: Business logic mixed into route handlers or UI components. Database queries in controllers. Presentation logic in data models. Check if each module has a single clear responsibility.
+2. **Circular dependencies**: Module A imports from Module B which imports from Module A. Use import/require patterns to detect cycles. Pay special attention to barrel files (index.ts) that re-export everything.
+3. **God objects/modules**: Files over 500 lines that do too many things. Classes with 10+ methods spanning unrelated responsibilities. Utility files that became dumping grounds.
+4. **Missing abstraction layers**: Route handlers making direct database calls instead of going through a service layer. UI components containing business logic instead of delegating to hooks/stores. External API calls scattered throughout instead of behind a client abstraction.
+5. **Inconsistent patterns**: Some routes use middleware pattern while others inline auth checks. Some components use hooks while others use render props for the same concern. Some modules export classes while similar modules export functions.
+6. **Tight coupling**: Components that import deep internal paths from other modules (`../../../other-module/internal/helper`). Modules sharing mutable state without explicit contracts. Feature modules that break when unrelated features change.
+7. **Dependency direction**: Higher-level modules should not depend on lower-level implementation details. Domain logic should not import from infrastructure. Check that dependencies flow inward (infrastructure → application → domain).
+8. **Module boundaries**: Identify implicit module boundaries that should be explicit. Look for clusters of files that always change together — they likely belong in the same module.
+### Simplification
+9. **Over-abstraction**: Abstractions used only once (a `BaseService` with one child, a factory that produces one type, a strategy pattern with one strategy). Wrappers that add no functionality — they just pass through to the wrapped object.
+10. **Premature optimization**: Caching layers for data that's never re-read. Worker queues for operations that take <100ms. Pagination setup on queries that return <50 items. Debounce/throttle on events that fire once.
+11. **Dead infrastructure**: Feature flags for features shipped long ago. Backwards-compatibility shims for migrations completed months ago. Environment-specific code paths for environments that don't exist (staging env that was decommissioned).
+12. **Unnecessary indirection**: Config files for values that never change. Dependency injection for singletons. Event emitters with a single listener. Abstract classes with a single implementation.
+13. **Dead code and orphaned files**: Exported functions/types that nothing imports. Files with no inbound imports. Commented-out code blocks. `TODO` markers older than 6 months with no associated issue.
+14. **Configuration sprawl**: Config options that are always set to the same value. Environment variables that are identical across all environments. Settings files that duplicate information from other settings files.
+15. **Gratuitous design patterns**: Observer pattern for synchronous in-process communication. Builder pattern for objects with 2-3 fields. Repository pattern wrapping an ORM that already provides the same abstraction.
+## How to Review
+1. **Map the architecture**: Build a mental model of the system's layers and boundaries. Identify the major modules, their responsibilities, and their dependency relationships. Note any entry points (API routes, UI pages, CLI commands).
+2. **Apply the premortem**: For each major module, ask: "If this module caused a production incident, what structural weakness enabled it?" Focus on coupling, missing boundaries, and shared mutable state.
+3. **Look for patterns**: Don't review files in isolation. Look for inconsistencies ACROSS similar files. If 8 out of 10 route handlers follow one pattern but 2 follow a different pattern, that's a finding.
+4. **Assess value per complexity**: For each abstraction layer, ask: "Does this indirection add value or just make the code harder to follow?" If removing the abstraction would make the code simpler AND not harder to change, it's over-engineering.
+## Output Rules
+- Use exactly the `=== FINDING ===` and `=== DIMENSION SUMMARY ===` formats defined in SKILL.md
+- Sort findings by severity (P1 first)
+- Only report findings with confidence >= 80
+- Architecture findings should reference the specific modules/files involved and explain WHY the current structure is problematic (not just that it violates a pattern)
+- Simplification findings should estimate the complexity removed if the suggestion is followed (e.g., "removes ~150 lines and 2 indirection layers")
+- Produce one DIMENSION SUMMARY for "Architecture" and one for "Simplification"