npm - agentscamp - Versions diffs - 0.1.0 → 0.2.1 - Mend

agentscamp 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/README.md +3 -3
package/content/commands/create-skill.md +84 -0
package/content/commands/create-slash-command.md +80 -0
package/content/commands/create-subagent.md +101 -0
package/content/commands/estimate-effort.md +96 -0
package/content/commands/find-n-plus-one.md +67 -0
package/content/commands/flaky-test-hunt.md +112 -0
package/content/commands/git-bisect.md +126 -0
package/content/commands/rename-symbol.md +131 -0
package/content/commands/resolve-conflict.md +127 -0
package/content/commands/scaffold-rag-pipeline.md +76 -0
package/content/commands/trace-data-flow.md +102 -0
package/content/manifest.json +320 -3
package/content/skills/agent-memory-designer.md +42 -0
package/content/skills/auth-flow-reviewer.md +73 -0
package/content/skills/extract-module.md +50 -0
package/content/skills/migration-writer.md +41 -0
package/content/skills/prompt-regression-tester.md +89 -0
package/content/skills/rate-limiter-designer.md +56 -0
package/content/skills/react-render-profiler.md +37 -0
package/content/skills/semver-advisor.md +81 -0
package/content/skills/type-coverage-improver.md +81 -0
package/content/skills/version-bumper.md +91 -0
package/content/skills/webhook-handler-scaffolder.md +37 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # agentscamp
-> 116 ready-to-use Claude Code agents, skills, and slash commands — installable in one command.
+> 138 ready-to-use Claude Code agents, skills, and slash commands — installable in one command.
 [AgentsCamp](https://agentscamp.com) is a curated, format-validated directory of AI coding artifacts. This CLI bundles the full catalog and installs items straight into your `.claude/` directory.
@@ -43,8 +43,8 @@ These are Claude Code's standard locations — agents get delegated to automatic
 ## What's inside
 - **49 agents** — specialized subagents for development, data/AI, infra, security, and more → [browse agents](https://agentscamp.com/agents)
-- **38 skills** — on-demand capabilities for testing, databases, refactoring, releases → [browse skills](https://agentscamp.com/skills)
-- **29 commands** — reusable slash commands for planning, review, git, scaffolding → [browse commands](https://agentscamp.com/commands)
+- **49 skills** — on-demand capabilities for testing, databases, refactoring, releases → [browse skills](https://agentscamp.com/skills)
+- **40 commands** — reusable slash commands for planning, review, git, scaffolding → [browse commands](https://agentscamp.com/commands)
 Every item has a full page with docs, examples, and related picks at [agentscamp.com](https://agentscamp.com).

package/content/commands/create-skill.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+description: "Scaffold a new Claude Code skill into .claude/skills/<name>/SKILL.md — a model-invoked capability with a trigger-rich description, scoped tools, and a lean body that pushes detail into resource files."
+argument-hint: "<what the skill should do>"
+allowed-tools: "Read, Write, Glob, Grep"
+---
+## Scope
+Treat `$ARGUMENTS` as the skill's purpose — the capability you want Claude to reach for automatically. Restate it in one sentence, then derive a **kebab-case skill name** (verb-led, e.g. `migrate-sql-schema`, `summarize-pr`) that will be both the folder name and the `name` field.
+If `$ARGUMENTS` is empty, ask one focused question: *"What capability should this skill add, and roughly when should Claude reach for it?"* Do not invent a skill.
+Default to **project scope**: `.claude/skills/<name>/SKILL.md`. Mention `~/.claude/skills/<name>/` as the alternative for skills the user wants across every project, and let them pick.
+> [!WARNING]
+> A skill is **model-invoked**, not user-invoked. Claude decides to load it by matching the conversation against the `description`. If the description is vague, the skill never fires — so the description is the most important thing you write, not the body.
+## Step 1 — Understand the capability and check for collisions
+Pin down what triggers the skill, what it does, and what it produces. Then `Glob .claude/skills/*/SKILL.md` (and `~/.claude/skills/*/SKILL.md`) and `Grep` the existing `name:` / `description:` lines. If a near-duplicate exists, say so and offer to extend it instead of creating an overlapping skill — two skills with similar descriptions make activation ambiguous.
+## Step 2 — Skill vs. slash command sanity check
+Confirm a skill is the right artifact before writing one:
+- **Skill** — Claude should invoke it *on its own* whenever a matching situation arises (a recurring task it should recognize, e.g. "writing a migration", "reviewing Terraform"). Activation is the `description`.
+- **Slash command** — the user wants to *trigger it explicitly by name* (`/create-skill`). If that's what they described, point them at `create-slash-command` instead and stop.
+## Step 3 — Write the description (lead with what, then "Use when")
+This is the load-bearing field. Format: one clause stating **what the skill does**, then a sentence starting **"Use when ..."** listing concrete triggers — the phrasings, file types, or intents that should activate it.
+```
+description: "Convert REST endpoint handlers into typed OpenAPI 3.1 specs. Use when the user adds or edits an Express/Fastify route, asks for API docs, or mentions an openapi.yaml / swagger file."
+```
+Name the real cues (file extensions, library names, task verbs). Avoid first-person and avoid "helps with" — write triggers Claude can pattern-match against.
+## Step 4 — Write the SKILL.md (lean body, progressive disclosure)
+Create `.claude/skills/<name>/SKILL.md` with `Write`. Frontmatter, then a body that fits comfortably on one screen:
+```markdown
+---
+name: <kebab-name>            # MUST equal the folder name
+description: "<from Step 3>"
+allowed-tools: Read, Grep, Glob   # least privilege; omit to inherit the session's tools
+user-invocable: true           # also lets the user call it like /<name>
+---
+# <Title Case Name>
+## When to use
+Restate the triggers as a short checklist so Claude self-confirms before acting.
+## Instructions
+1. First concrete step, referencing the inputs the skill works on.
+2. ...
+3. ...
+## Output
+Exactly what the skill produces and where (file path, message, diff).
+```
+Rules that keep the skill reliable:
+- `name` must be identical to the folder name — a mismatch makes the skill fail to register.
+- Scope `allowed-tools` to the minimum the procedure needs. A skill that only inspects code gets `Read, Grep, Glob`; add `Write`/`Edit`/`Bash(...)` only if it must change things.
+- Keep `SKILL.md` focused on the *procedure*. The body is loaded into context every time the skill fires, so every extra line is a recurring token cost.
+## Step 5 — Decide whether it needs resource files
+Single-file is the default. Add sibling files only when warranted, and reference them by **relative path** so Claude loads them on demand (progressive disclosure), not eagerly:
+- Long reference material (lookup tables, style rules, schemas) → `reference.md`, linked from SKILL.md as *"see ./reference.md for the full mapping"*.
+- Runnable helpers → `scripts/<tool>.py` (or `.sh`), invoked from the Instructions; this requires a `Bash(...)` entry in `allowed-tools`.
+- Reusable boilerplate the skill emits → `templates/<name>.tmpl`.
+> [!NOTE]
+> Progressive disclosure is the whole point of the multi-file layout: a one-paragraph pointer in SKILL.md costs a few tokens, while the linked file is read only when the task actually needs it. Don't paste a 300-line table into SKILL.md "to be safe."
+## Report
+Confirm the absolute path of the created `SKILL.md` (and any resource files), echo the `name` and `description`, and state the chosen scope (project vs. user). Tell the user the skill activates automatically when the conversation matches its triggers — and, if `user-invocable: true`, that they can also call it as `/<name>`. End with the single most useful next step: open a fresh session and try a prompt that should trip the trigger, to confirm the skill loads.

package/content/commands/create-slash-command.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+description: "Scaffold a new Claude Code slash command into .claude/commands/ — a valid Markdown file with frontmatter, a least-privilege allowed-tools allowlist, and a $ARGUMENTS-driven body of numbered steps ending in a Report."
+argument-hint: "<what the command should do>"
+allowed-tools: "Read, Write, Glob, Grep"
+---
+## Scope
+Treat `$ARGUMENTS` as the **purpose of the command you are about to create** — for example, "review a PR for security issues" or "summarize today's git log". Restate it in one sentence, then commit to a slug and a tool allowlist before writing anything.
+If `$ARGUMENTS` is empty, ask one focused question: *"What should the new command do?"* Do not scaffold a placeholder command from a guessed purpose — an empty stub is worse than no file.
+Your deliverable is one valid file at `.claude/commands/<slug>.md` plus instructions for running it. You write exactly one new file; you do not modify existing commands.
+## Step 1 — Derive the slug and check for collisions
+Turn the purpose into a short, kebab-case verb-phrase slug (`review-pr`, `summarize-log`, `audit-deps`) — that filename *is* the command name. Then `Glob` `.claude/commands/**/*.md` and `~/.claude/commands/**/*.md`.
+- If `<slug>.md` already exists, stop and report it. Propose a distinct slug or ask whether to overwrite — never silently clobber a command the team relies on.
+- To group related commands, use a subfolder: `.claude/commands/git/review.md` is invoked as `/git:review` (the folder becomes a `:` namespace).
+> [!WARNING]
+> Default to the **project** scope `.claude/commands/`, which is committed and shared. Only write to `~/.claude/commands/` if the user explicitly asks for a personal, all-repos command.
+## Step 2 — Choose the minimum allowed-tools
+Pick the smallest tool set the command's job actually needs — this is the command's permission boundary, and Claude cannot exceed it at runtime.
+- Read-only analysis (review, summarize, explain): `Read, Grep, Glob`.
+- Generates a file: add `Write`. Edits in place: add `Edit`.
+- Needs to run commands (tests, git, build): add `Bash` — the heaviest grant; justify it.
+Omit `allowed-tools` only if the command should inherit the session's full tool access; for a shareable command, prefer an explicit allowlist.
+## Step 3 — Write the frontmatter
+Emit only the keys Claude Code recognizes. A command frontmatter block uses:
+```yaml
+---
+description: <one sentence — shown in the /help list>
+argument-hint: <e.g. "<pr number>" — omit if the command takes no args>
+allowed-tools: <comma list from Step 2 — omit to inherit>
+model: <haiku|sonnet|opus — omit to inherit; set opus only for heavy reasoning>
+---
+```
+> [!NOTE]
+> The command name comes from the **filename**, not a `name:` key. There is no `name` field in command frontmatter — adding one is just dead text.
+## Step 4 — Write the body
+Structure the body the same way this command is structured. It must:
+1. Open with a **Scope** section that restates `$ARGUMENTS` as the command's real input and defines the deliverable.
+2. **Handle empty `$ARGUMENTS` explicitly** — ask one specific question instead of guessing.
+3. Lay out the work as **numbered Steps** (`## Step 1 — …`), each concrete and referencing only the tools declared in `allowed-tools`.
+4. Reference `$ARGUMENTS` wherever the user's input drives behavior.
+5. End with a **## Report** section stating exactly what the command outputs (a written answer, a created path, a diff summary).
+Keep instructions decision-dense: prefer "Grep for `TODO(` and list each with file:line" over "look for issues".
+## Step 5 — Write the file and self-check
+`Write` the assembled frontmatter + body to the chosen path. Before reporting, verify:
+- YAML frontmatter is fenced by `---` lines and parses.
+- Every tool the body tells Claude to use appears in `allowed-tools`.
+- `$ARGUMENTS` appears in the body and the empty case is handled.
+- There is a final Report section.
+## Report
+Report:
+1. The **absolute path** of the file you created (or the collision you stopped on).
+2. The exact **invocation**, including namespace if any: `/<slug> <args>` (e.g. `/review-pr 482` or `/git:review HEAD~1`).
+3. The `allowed-tools` you granted and the one-line reason.
+End with a reminder to commit the file (project scope) so the team picks it up, and a suggested first test invocation.

package/content/commands/create-subagent.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+description: "Scaffold a new Claude Code subagent definition file into .claude/agents/ with a routing-ready description, scoped tools, and a system prompt."
+argument-hint: "<what the subagent should do>"
+allowed-tools: "Read, Write, Glob, Grep"
+---
+## Scope
+Treat `$ARGUMENTS` as the **purpose** of a new subagent — what specialized job it should own (`review database migrations for lock risk`, `write conventional-commit messages from a diff`, `triage incoming Sentry errors`). A subagent is a single-purpose persona the main agent can delegate to, with its own context window, tool allowlist, and system prompt. Restate the purpose in one sentence and narrow it to **one job** before writing anything — broad agents ("does backend stuff") never get routed to reliably.
+If `$ARGUMENTS` is empty or vague, ask **one** focused round of clarifying questions, then proceed:
+1. **Domain & job** — what is the one task this agent should be the expert at?
+2. **Tools** — does it need to write/edit files and run commands, or is it read-only (review, analysis, planning)?
+3. **Model tier** — `haiku` (fast, mechanical), `sonnet` (default, most work), or `opus` (deep reasoning / architecture)?
+Do not ask more than once. If the user is terse, pick sane defaults (read-only, `sonnet`), state them, and write the file.
+> [!WARNING]
+> You only create the agent definition. Do not invoke the new subagent, run its workflow, or modify other files. Your single output is `.claude/agents/<slug>.md`.
+## Step 1 — Derive the slug and check for collisions
+Convert the purpose into a kebab-case `name` that reads like a role, not a sentence: `migration-reviewer`, `commit-message-writer`, `sentry-triager`. The filename **is** the slug, so `name` must match `<slug>.md`.
+Use `Glob` on `.claude/agents/*.md` and `~/.claude/agents/*.md` to check the slug is not already taken. If it collides, `Read` the existing file: if it covers the same job, tell the user it already exists and stop; otherwise pick a more specific slug rather than overwriting.
+## Step 2 — Decide the tool allowlist (least privilege)
+The `tools` field restricts what the subagent can do; **omit it to inherit all tools**, or list the minimum it needs. Match tools to the job from Step 1:
+- **Read-only roles** (reviewers, analyzers, planners): `Read, Grep, Glob`. No `Write`, `Edit`, or `Bash`.
+- **Editing roles** (fixers, refactorers, scaffolders): add `Write, Edit` and only the `Bash` commands they need (e.g. the test runner).
+- **Investigators** that run commands but never edit: `Read, Grep, Glob, Bash`.
+> [!NOTE]
+> A subagent runs in its **own** context window — it does not see your conversation, only the prompt the orchestrator hands it plus what its tools surface. That is why the system prompt below must be self-contained: state the workflow and conventions explicitly; the agent cannot rely on chat history.
+## Step 3 — Write a description that triggers delegation
+This is the most important field. The main agent scans every subagent's `description` to decide what to route where, so it must read as a routing rule, not a label. Pack it with **concrete trigger phrases**:
+- Lead with the capability, then add explicit `Use this agent when...` / `Use proactively when...` cues tied to real situations and keywords a user would say.
+- Name the inputs it expects (a diff, a file path, an error payload) so the orchestrator delegates with the right context.
+A weak description (`"Reviews code."`) gets ignored. A strong one — `"Reviews database migration files for locking and downtime risk. Use this agent when the user adds or edits files under db/migrations, mentions ALTER TABLE, adding columns/indexes, or asks 'is this migration safe to run in prod?'"` — gets routed reliably.
+## Step 4 — Write the agent file
+`Write` `.claude/agents/<slug>.md` (create `.claude/agents/` if missing) in exactly this shape. Fill every placeholder with content specific to the purpose — no generic filler.
+```markdown
+---
+name: <slug>
+description: <capability + concrete "Use this agent when..." triggers from Step 3>
+tools: <minimum allowlist from Step 2, or omit the line to inherit all>
+model: <haiku | sonnet | opus>
+color: <a distinct color, e.g. cyan, green, orange>
+---
+You are <role> — a focused specialist in <domain>. <One sentence on the
+standard you hold and the value you add.>
+## When to use me
+- <concrete situation 1>
+- <concrete situation 2>
+## When NOT to use me
+- <adjacent job that belongs to a different agent / the main agent>
+- <scope you must refuse and hand back instead of guessing>
+## Workflow
+1. <first concrete step, naming the tools you'll use>
+2. <gather context: which files/patterns to Read/Grep before acting>
+3. <do the core work>
+4. <self-check against the standard in your role statement>
+## Output contract
+- <exact format you return: a review with severity-tagged findings, a
+  unified diff, a checklist, a single recommendation — be specific>
+- <what you must NOT do: e.g. never edit beyond the named files; flag,
+  don't fix, anything outside scope>
+```
+Field rationale to honor while filling it in:
+- **`name`** must equal the filename slug — that is how the agent is addressed.
+- **`description`** is the delegation trigger (Step 3); spend your effort here.
+- **`model`** controls cost vs. depth — don't default everything to `opus`.
+- **`color`** is just the UI label color; pick one not already used by a sibling agent.
+- **`tools`** is the security boundary — narrower is safer and keeps the agent on task.
+## Report
+Tell the user, in your message:
+1. The exact path written — `.claude/agents/<slug>.md` (and whether it's project- or user-scoped).
+2. How to invoke it — it triggers automatically when a request matches its `description`, or explicitly via *"use the `<slug>` subagent to ..."*.
+3. The one knob most likely to need tuning: if the agent isn't getting picked up, sharpen the `description` triggers (Step 3) — that is almost always the cause.
+Note that moving the file to `~/.claude/agents/` makes the subagent available across all projects.

package/content/commands/estimate-effort.md ADDED Viewed

@@ -0,0 +1,96 @@
+---
+description: "Produce a grounded effort and complexity estimate for a task by exploring the codebase read-only."
+argument-hint: "<task or feature to estimate>"
+allowed-tools: "Read, Grep, Glob"
+---
+Estimate the effort to do the task in `$ARGUMENTS` by grounding it in the real codebase, not in vibes. This is an analysis pass only — read to understand, then deliver an estimate. Do not edit, run, or create anything.
+## Scope
+Treat `$ARGUMENTS` as the task to size — a feature, a refactor, a migration, a bug ("add SSO via SAML", "split the monolith's billing module into a service", "fix the flaky checkout test"). If it names files, symbols, or routes, those are where your investigation starts.
+If `$ARGUMENTS` is empty, do not invent a task. Ask one question — *"What task should I estimate?"* — and stop until they answer.
+> [!WARNING]
+> Read-only mode. Your only output is the written estimate. An estimate is a range under stated assumptions, never a commitment or a deadline — say so explicitly in the report so nobody quotes the midpoint as a promise.
+## Step 1 — Pin down scope and non-scope
+Restate the task in one sentence. Then list what is explicitly **out of scope** — the adjacent work a reader might assume is included but isn't (migrations of old data, the admin UI, rollback tooling, docs). Unbounded scope is the single biggest source of estimate error; naming the boundary is half the job.
+If the task hides a fork that changes the size by an order of magnitude (rewrite vs. patch? backward-compatible vs. breaking? one provider vs. a pluggable abstraction?), surface it as an open question — do not silently pick the cheap reading.
+## Step 2 — Explore the code to ground the estimate
+Read enough of the repo to size against reality. Do not guess the structure — find it.
+```bash
+# Find the symbols, routes, or modules the task touches
+grep -rn "checkoutHandler" src/
+# Map the surrounding structure and blast radius
+find src -path '*billing*' -name '*.ts'
+# Gauge how much code already exists vs. needs writing
+grep -rln "PaymentProvider" src/
+```
+Open the entry points, trace one level of callers and callees, and note the seams the change crosses: shared types, config, public APIs, tests, and anything with many inbound references. A change behind a clean interface is small; one that ripples through dozens of call sites is not.
+> [!NOTE]
+> Existing tests are an effort multiplier in both directions. Good coverage on the touched area shrinks the estimate (you can refactor safely); zero coverage on a critical path inflates it (you must write characterization tests before you dare touch it). Check before you size.
+## Step 3 — Decompose into independently-shippable subtasks
+Cut the work into subtasks that each land on their own and deliver a checkable result — schema, then store, then the wiring, then tests, then docs. Estimating the whole as one lump is where guesses hide; estimating parts forces you to confront each one. Anything you can't decompose is a sign you don't understand it yet — flag it as a spike.
+## Step 4 — Size each subtask and sum
+Give each subtask a T-shirt size with a rough hands-on range (these are illustrative — calibrate to the team and stack):
+| Size | Rough range | Looks like |
+| ---- | ----------- | ---------- |
+| S | < ~0.5 day | localized edit, pattern already exists in the repo |
+| M | ~0.5–2 days | new code across a couple of files, clear approach |
+| L | ~2–5 days | new seam or interface, touches several modules |
+| XL | > ~1 week | unknown approach, cross-cutting, or needs a spike first — decompose further |
+An XL is a smell: split it until the pieces are L or smaller, or call it a spike whose only deliverable is a smaller estimate. Sum the subtask ranges into a **total range** (low to high), not a single number.
+> [!WARNING]
+> Do not just add the optimistic ends. Integration, review, and the inevitable "while I was in there" overhead are real — fold them in as their own line or a percentage, and let the high end of the range carry the unknowns rather than burying them.
+## Step 5 — Surface risks, dependencies, and assumptions
+The estimate is only as good as what could blow it up. List, specifically:
+- **Risks / unknowns** that would inflate the number — undocumented behavior, a flaky area, a third-party API you haven't read the docs for, a refactor that might cascade. For each, note roughly how much it could add.
+- **Dependencies / sequencing** — what must land first, what's blocked on another team, what can run in parallel.
+- **Assumptions** — every "I'm assuming X" you relied on to size it (env exists, no data migration, the happy path only). If an assumption is wrong, the estimate changes — that's the whole point of writing them down.
+## Report
+Deliver the estimate as your message — it is the whole deliverable.
+```markdown
+## Effort estimate — <task>
+**Scope:** <one sentence>
+**Out of scope:** <the boundary>
+| # | Subtask | Size | Range |
+| - | ------- | ---- | ----- |
+| 1 | Define provider config + types | S | < 0.5d |
+| 2 | Add SAML assertion parser | L | 2–4d |
+| 3 | Wire into the auth middleware | M | 1–2d |
+| 4 | Characterization tests for the login path | M | 1–2d |
+**Total range:** ~4.5–8.5 days (incl. review + integration)
+**Top risks:** <each with how much it could add — e.g. "no tests on auth path: +1–2d if parsing breaks something">
+**Dependencies / sequencing:** <what blocks what>
+**Assumptions:** <the list — if any is wrong, the estimate moves>
+```
+End with the **recommended first slice**: the smallest subtask that retires the biggest unknown (usually a spike or the riskiest interface). Shipping it first tightens the whole range — call out which assumptions it will confirm or kill. Remind the reader the total is a range under these assumptions, not a date.

package/content/commands/find-n-plus-one.md ADDED Viewed

@@ -0,0 +1,67 @@
+---
+description: "Scan code read-only for N+1 query patterns — loops that query per iteration and handlers that fan out per-row — and report each with a location, why it is N+1, and the concrete eager-load/batch/set-based fix."
+argument-hint: "<path or area to scan (optional)>"
+allowed-tools: "Read, Grep, Glob"
+---
+## Scope
+Treat `$ARGUMENTS` as the path or area to scan — a directory, a file, a feature ("the orders list endpoint"), or a layer ("the serializers"). Restate the scope in one sentence before scanning.
+If `$ARGUMENTS` is empty, scan the data-access surface of the repo: ORM models/repositories, serializers/resolvers, and request handlers. Say which paths you chose so the user can narrow it.
+> [!WARNING]
+> Read-only mode. Do not edit, run migrations, or execute queries. Your only output is the prioritized findings report. The `Bash`/`Edit` tools are deliberately not granted — if a fix needs verifying, tell the user the exact command to run, don't run it.
+## Step 1 — Identify the data-access vocabulary
+Before grepping for loops, learn how *this* codebase talks to the database, because the lazy-load that triggers the extra query is often invisible. Grep for the ORM/query primitives in use:
+- **ActiveRecord (Rails):** `.where`, `.find`, `.find_by`, association accessors inside `.each`/`.map`; missing `includes`/`preload`/`eager_load`.
+- **Django:** `.objects.`, `.filter`, related-field access in a loop without `select_related`/`prefetch_related`.
+- **SQLAlchemy:** `session.query`/`select`, relationship access with default `lazy="select"`; missing `selectinload`/`joinedload`.
+- **Prisma/TypeORM/Sequelize:** `findMany`/`findOne`/`findByPk` inside `map`/`for`; missing `include`/`relations`/`eager`.
+- **Raw SQL / micro-ORMs:** a `SELECT … WHERE id = ?` helper called inside a loop.
+Note which one is in play; the recommended fix differs per ORM.
+## Step 2 — Find queries issued per iteration
+Grep for loop constructs (`for`, `forEach`, `.map`, `.each`, list/dict comprehensions, `Promise.all([...].map(...))`) and inspect the body of each for a data-access call from Step 1. Flag any case where the query depends on the loop variable (`fetch(item.id)`, `item.author.name`) — that's the per-row query.
+> [!NOTE]
+> The most dangerous N+1s are the invisible ones: a property access like `order.customer.email` that *looks* free but silently fires a lazy SELECT each time. Don't only grep for `.query()` — flag relationship/foreign-key attribute access inside any loop.
+## Step 3 — Find handlers that fan out per row
+Trace request handlers / GraphQL resolvers / serializers that return a collection. A field resolver or a serializer method that loads a related record runs once **per item in the response** even when no explicit loop is visible in the handler. Flag list endpoints whose per-item shape includes a related lookup, and GraphQL resolvers without a batching layer.
+## Step 4 — Rank by blast radius
+Order findings worst-first. Severity is roughly *(how large N gets) x (how hot the path is)*:
+- **Critical:** unbounded/paginated collections on a hot path (list endpoints, dashboards, exports) — N grows with data.
+- **High:** loops over user-controlled or large fixed sets.
+- **Low:** loops over a small bounded set (a 3-item enum) — note it, but don't alarm.
+## Step 5 — For each finding, prescribe the concrete fix
+Per finding, give: the **file:line**, a one-line **why it's N+1**, and the **fix matched to the ORM** — not a generic "optimize this":
+- **Eager load / preload** when you need the related rows: `includes`/`preload` (Rails), `select_related` (1:1/FK) or `prefetch_related` (1:many) (Django), `selectinload`/`joinedload` (SQLAlchemy), `include`/`relations` (Prisma/TypeORM).
+- **Single set-based query** when you only need an aggregate or a filtered subset: replace the loop with one `WHERE id IN (...)` / `GROUP BY` / `JOIN` instead of looping.
+- **Batch / DataLoader** for GraphQL resolvers or service boundaries where you can't restructure the caller — collect the keys, resolve them in one batched call per tick.
+- **Map in memory** when the related set is small and reused: load once, index by key, look up in the loop.
+Include a compact **before/after sketch** (4-8 lines) so the fix is unambiguous.
+## Step 6 — Tell them how to confirm
+Close each finding with the verification step the user runs themselves (read-only command guidance only):
+- Enable query logging for the path (Rails: watch the dev log / `ActiveRecord::Base.logger`; Django: `django.db` logger or `django-debug-toolbar`; SQLAlchemy: `echo=True`; Prisma: `log: ['query']`) and confirm the count drops from ~N to 1-2.
+- Or `EXPLAIN`/`EXPLAIN ANALYZE` the new set-based query to confirm it's a single plan, not a loop.
+## Report
+Deliver a prioritized findings list (worst offenders first) as your message — it is the whole deliverable. For each: **severity · file:line · why it's N+1 · the fix · before/after sketch · how to confirm**. If you found nothing, say so plainly and name the paths you scanned. End with the single highest-leverage finding to fix first.

package/content/commands/flaky-test-hunt.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+description: "Reproduce a flaky test, find the real source of nondeterminism, and fix the cause."
+argument-hint: "<suspected test or area (optional)>"
+allowed-tools: "Bash, Read, Edit"
+---
+Find why a test passes sometimes and fails other times, then remove the **nondeterminism** — don't make it fail less often. The whole job is turning an intermittent failure into either a reliable pass or a reliable fail you can debug. Follow the steps in order; the judgment call is Step 3, classifying *which* source of flakiness you're looking at.
+## Scope
+`$ARGUMENTS` optionally names the suspect: a test file, a single test name/pattern, or just an area ("the checkout integration tests"). Use it to scope the reproduction loop so each run stays fast and one failure is readable.
+If `$ARGUMENTS` is empty, find the suspect first. Look for the usual tells: tests skipped/quarantined in config, `retry`/`flaky`/`@flaky` annotations, CI history if available, and tests that touch time, randomness, ordering, the network, or the filesystem. Pick the most-cited or most-recently-failed one and confirm with the user if several are equally likely.
+> [!WARNING]
+> Reproduce flakiness *before* you change anything. A single green run proves nothing — flaky tests pass most of the time by definition. If you can't make it fail, you can't prove you fixed it.
+## Step 1 — Reproduce the intermittency
+Detect the runner from the manifest, then **loop the suspect** until you observe both a pass and a failure. One run is worthless here; volume is the tool.
+```bash
+# Loop a single test N times, stop on first failure (shell, runner-agnostic)
+for i in $(seq 1 50); do
+  npx vitest run -t "applies the discount" || { echo "FAILED on run $i"; break; }
+done
+# Native repeat flags where they exist
+npx jest path/to/file.test.ts --runInBand            # jest: run serially to expose order coupling
+pytest tests/test_cart.py -k discount --count=50      # pytest-repeat
+go test ./cart -run TestDiscount -count=50            # go: -count disables the test cache
+cargo test discount -- --test-threads=1               # rust: serialize to surface shared state
+```
+Then attack the two biggest flake sources directly — **order and seed** — because a test that's stable in isolation but flaky in the suite is order-dependent:
+```bash
+npx vitest run --sequence.shuffle                      # randomize test order
+npx jest --shuffle
+pytest -p randomly                                      # pytest-randomly randomizes order + seed
+go test ./... -shuffle=on
+```
+Record the failure rate (e.g. "3/50 with shuffle on, 0/50 in isolation"). That ratio is your signal that the fix worked later.
+> [!NOTE]
+> If it fails only with randomized order but never alone, the bug is almost certainly **shared mutable state** leaking between tests — jump straight to that category in Step 3. If it fails in isolation too, the test owns its own nondeterminism (time, RNG, async).
+## Step 2 — Capture the failing run's details
+When you catch a red run, read the *whole* failure, not the summary line — and note what differs from the green runs.
+- The assertion and its **expected vs. actual**, plus how actual varies between failures (off by milliseconds? different element order? a `null` that's sometimes set?).
+- The first project frame in the stack — not framework internals.
+- Whether the failure correlates with order (which test ran before it), wall-clock time (midnight, month boundary, DST), or machine load (only fails under `--test-threads`/parallel CI).
+If the failure is rare, increase the loop count and add diagnostics inside the test temporarily (log the value, the timestamp, the seed) rather than guessing.
+## Step 3 — Classify the source of nondeterminism
+This is the decision that determines the fix. **State the category explicitly** with the evidence from Steps 1–2. Flakiness almost always falls into one of these:
+- **Test-order / shared mutable state** — a module-level variable, singleton, cache, DB row, env var, or temp file mutated by one test and read by another. *Tell:* fails only with shuffle on, or only after a specific sibling.
+- **Real time / `Date`** — assertions on `Date.now()`, `new Date()`, durations, `setTimeout`, or "expires in 1h" math that crosses a tick/second/day boundary. *Tell:* fails near boundaries, or expected/actual differ by a small time delta.
+- **Unseeded randomness** — `Math.random()`, `uuid()`, `faker` without a fixed seed, `Set`/`Map` insertion order, hash ordering. *Tell:* actual value changes every run with no other input change.
+- **Async races / missing awaits** — an un-awaited promise, a fire-and-forget effect, polling without synchronization, assertions running before state settles. *Tell:* fails under load/parallelism, passes when serialized or with an added (bad) sleep.
+- **Network / external deps** — real HTTP, a live DB, a clock-skewed API, DNS. *Tell:* fails on timeout, on CI but not locally, or when offline.
+- **Resource leaks** — unclosed servers/sockets/file handles, leaked timers, growing in-memory state across tests, port collisions. *Tell:* fails late in the run, or only after the Nth iteration.
+- **Locale / timezone** — date formatting, number/currency parsing, string collation that assumes the dev's `LANG`/`TZ`. *Tell:* fails in CI with a different `TZ`/`LC_ALL`. Reproduce with `TZ=UTC LC_ALL=C npx vitest run ...`.
+If two categories are plausible, fix the one your repro evidence most directly supports, then re-loop before assuming you're done.
+## Step 4 — Fix the cause, not the symptom
+Apply the smallest change that removes the nondeterminism. Match the fix to the category:
+- **Shared state:** isolate it — reset the singleton/cache in `beforeEach`/`afterEach`, use a fresh fixture/transaction per test, stop relying on cross-test ordering. Make each test set up everything it needs.
+- **Real time:** inject the clock. Use fake timers (`vi.useFakeTimers()` / `jest.useFakeTimers()`, `freezegun` in Python) or pass a `now()` function the test controls. Never assert against the live clock.
+- **Unseeded RNG:** seed it or inject the random source (`faker.seed(1)`, a stubbed `Math.random`, a fixed UUID factory). For collection ordering, sort before asserting or assert set-membership, not index.
+- **Async races:** `await` the actual promise; wait on the real condition (`waitFor`/`findBy`, a returned promise, an event) instead of a timer. Remove un-awaited side effects.
+- **Network / external:** mock at the boundary (`msw`, `nock`, `responses`, a fake) so the test is hermetic. If it's a genuine integration test, gate it behind a tag and keep it out of the unit loop.
+- **Leaks:** close what you open in teardown — servers, handles, timers; bind to an ephemeral port (`:0`) instead of a fixed one.
+- **Locale/TZ:** pin `TZ` and locale in the test setup, or pass an explicit locale to the formatter under test.
+> [!WARNING]
+> Do **not** "fix" flakiness by wrapping the test in a retry loop, adding `sleep`/`setTimeout` to "give it time", widening a time tolerance, or quarantining/`.skip`-ing it. Those hide nondeterminism — the test still fails for real and the underlying race ships to production. A `sleep` that makes it pass is proof you found an async race; replace the sleep with a real await on the condition.
+## Step 5 — Prove it's stable
+Re-run the **same loop and randomization** that exposed the flake. The bar is no failures across a high iteration count with order and seed randomized:
+```bash
+for i in $(seq 1 100); do
+  npx vitest run -t "applies the discount" --sequence.shuffle || { echo "STILL FLAKY on run $i"; break; }
+done
+```
+Then run the full suite (also shuffled) to confirm your isolation/teardown change didn't break a test that was silently relying on the old shared state. Reproduce the original failing condition specifically (e.g. `TZ=UTC`, parallel threads) if that's what triggered it.
+> [!NOTE]
+> If it now passes 100/100 in the loop but CI still flakes, the source is environmental (parallelism level, CI `TZ`, slower disk) — reproduce under those exact conditions locally before declaring victory.
+## Report
+Summarize for the user, concisely:
+- **Category:** which source of nondeterminism it was, and the one-line evidence (e.g. "shared module cache — failed 4/50 only with shuffle on").
+- **Offending code:** the file and the exact line/pattern that was nondeterministic.
+- **Fix:** what you changed and why it removes the cause (not masks it).
+- **Proof:** the before/after loop result (e.g. "was 4/50 failing → now 100/100 green with shuffle + `TZ=UTC`").
+Do not commit or push — leave the change staged for the user to review unless they explicitly ask you to commit.