agentscamp 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,78 @@
1
+ ---
2
+ description: "Explore the codebase and write a decision-oriented design doc / RFC for a feature or system change."
3
+ argument-hint: "<feature or system to design>"
4
+ allowed-tools: "Read, Grep, Glob"
5
+ ---
6
+
7
+ ## Scope
8
+
9
+ Treat `$ARGUMENTS` as the thing being designed — a feature, a subsystem, or a structural change (`move sessions from cookies to Redis`, `add multi-tenant billing`, `replace the polling sync with webhooks`). Restate it in one sentence to confirm scope before designing.
10
+
11
+ If `$ARGUMENTS` is empty, ask one focused question: *"What feature or system change should I design?"* Do not invent a problem to solve.
12
+
13
+ > [!WARNING]
14
+ > Read-only mode. Do not modify the repository, run migrations, install packages, or scaffold code. The written design doc is your only output. Designing without reading the current code produces a doc that won't survive contact with the repo — it proposes structure that already exists differently, or ignores constraints the code already enforces.
15
+
16
+ > [!NOTE]
17
+ > A design doc without honest alternatives and trade-offs is just a plan in disguise. If you cannot name an approach you rejected and *why*, you haven't done the design work yet — go back to Step 2.
18
+
19
+ ## Step 1 — Frame the problem
20
+
21
+ Before any solution, pin down what you're solving and why now.
22
+
23
+ - What is broken, missing, or about to break? Why is this worth doing *now* rather than later?
24
+ - Who is affected — end users, a specific team, on-call, future maintainers? What do they feel today?
25
+ - What does "done" look like as observable behavior, and what is explicitly **not** in scope for this change?
26
+
27
+ ## Step 2 — Ground the design in the real code
28
+
29
+ A design that ignores the existing structure invents a system that doesn't match the one you're changing. Use `Read`, `Grep`, and `Glob` (no shell) to map reality first:
30
+
31
+ - **Orient:** `Read` `README.md`, `package.json`, and `CLAUDE.md` for stack, conventions, and the patterns the team already commits to. `Glob` (e.g. `src/**/*.ts`, `**/migrations/**`, `**/*.config.*`) to see how the tree and its boundaries are laid out.
32
+ - **Find the blast radius:** `Grep` for terms from `$ARGUMENTS` to locate every module, route, model, and config the change touches. Trace the data flow and the layers it crosses — a design that only names the happy-path file underestimates the work.
33
+ - **Find the pattern to extend or break from:** `Read` the closest existing subsystem end to end. Decide deliberately whether your design *follows* that pattern (cite it) or *departs* from it (justify the departure in Trade-offs). Note real constraints you discover: a schema you must migrate, an interface other code depends on, a queue/cache/auth boundary you can't move freely.
34
+
35
+ ## Step 3 — Write the design doc
36
+
37
+ Output the doc in this structure. Keep it skimmable and decision-oriented — cite real file paths and symbols from Step 2, not placeholders. Cut anything that isn't a decision or a constraint.
38
+
39
+ ```markdown
40
+ ## Design — <one-line summary of the change>
41
+
42
+ ### Context & problem
43
+ <Why this, why now, who's affected. The state today, with real references
44
+ (`path/to/module.ts`, the current flow). 2-4 tight paragraphs, no preamble.>
45
+
46
+ ### Goals
47
+ - <observable outcome this change must achieve>
48
+
49
+ ### Non-goals
50
+ - <explicitly out of scope — the boundaries that keep this shippable>
51
+
52
+ ### Proposed design
53
+ <The approach. Data model / flow changes, key interfaces and signatures,
54
+ and exactly how it fits (or deliberately departs from) existing patterns
55
+ in `path/to/...`. Diagrams-in-prose are fine; be concrete about what code
56
+ lives where.>
57
+
58
+ ### Alternatives considered
59
+ - **<Alternative A>** — <how it would work> — **Rejected because** <reason>.
60
+ - **<Alternative B>** — <how it would work> — **Rejected because** <reason>.
61
+
62
+ ### Trade-offs & risks
63
+ - <what this design costs: complexity, perf, coupling, ops burden>
64
+ - <what could break, and the failure mode if it does>
65
+
66
+ ### Rollout & migration
67
+ - <how it ships: flag, phased rollout, backfill/migration order, rollback path>
68
+
69
+ ### Observability
70
+ - <metrics, logs, alerts that prove it works in prod and catch regressions>
71
+
72
+ ### Open questions
73
+ - <each unresolved decision that needs an owner / a call>
74
+ ```
75
+
76
+ ## Report
77
+
78
+ Deliver the design doc as your message — it is the whole deliverable. Verify it has real alternatives with reasons, honest trade-offs, and a rollout plan that names a rollback path; if any of those is hand-waved, it isn't done. End with the **Open questions** — the specific decisions that need a human call before implementation can start. No files were changed; this is a doc to align on, not the change itself.
@@ -1,10 +1,10 @@
1
1
  {
2
2
  "schemaVersion": 1,
3
- "generatedAt": "2026-06-18T01:57:52.358Z",
3
+ "generatedAt": "2026-06-18T02:21:30.142Z",
4
4
  "counts": {
5
5
  "agents": 58,
6
- "skills": 52,
7
- "commands": 43
6
+ "skills": 60,
7
+ "commands": 50
8
8
  },
9
9
  "items": [
10
10
  {
@@ -886,6 +886,20 @@
886
886
  "installAs": "agents/workflow-orchestrator.md",
887
887
  "url": "https://agentscamp.com/agents/meta-orchestration/workflow-orchestrator"
888
888
  },
889
+ {
890
+ "id": "commands/add-caching",
891
+ "type": "command",
892
+ "slug": "add-caching",
893
+ "category": "perf",
894
+ "title": "Add Caching",
895
+ "description": "Add a caching layer to one expensive function or endpoint correctly — confirm it's cacheable, design the cache key/TTL/layer/invalidation, handle stampedes, wrap the call in one place, and report the design.",
896
+ "topics": [
897
+ "devops-infra"
898
+ ],
899
+ "file": "commands/add-caching.md",
900
+ "installAs": "commands/add-caching.md",
901
+ "url": "https://agentscamp.com/commands/perf/add-caching"
902
+ },
889
903
  {
890
904
  "id": "commands/add-docstrings",
891
905
  "type": "command",
@@ -946,6 +960,20 @@
946
960
  "installAs": "commands/add-streaming-endpoint.md",
947
961
  "url": "https://agentscamp.com/commands/scaffold/add-streaming-endpoint"
948
962
  },
963
+ {
964
+ "id": "commands/audit-accessibility",
965
+ "type": "command",
966
+ "slug": "audit-accessibility",
967
+ "category": "analyze",
968
+ "title": "Audit Accessibility",
969
+ "description": "Audit a component or page for accessibility against WCAG — semantics, names, keyboard, ARIA, contrast, forms, motion.",
970
+ "topics": [
971
+ "review-qa"
972
+ ],
973
+ "file": "commands/audit-accessibility.md",
974
+ "installAs": "commands/audit-accessibility.md",
975
+ "url": "https://agentscamp.com/commands/analyze/audit-accessibility"
976
+ },
949
977
  {
950
978
  "id": "commands/benchmark-rerankers",
951
979
  "type": "command",
@@ -975,6 +1003,20 @@
975
1003
  "installAs": "commands/breakdown-task.md",
976
1004
  "url": "https://agentscamp.com/commands/plan/breakdown-task"
977
1005
  },
1006
+ {
1007
+ "id": "commands/clean-branches",
1008
+ "type": "command",
1009
+ "slug": "clean-branches",
1010
+ "category": "git",
1011
+ "title": "Clean Branches",
1012
+ "description": "Safely prune merged and stale Git branches: drop dead remote-tracking refs, list merged candidates for review, then delete with the safe -d variant.",
1013
+ "topics": [
1014
+ "review-qa"
1015
+ ],
1016
+ "file": "commands/clean-branches.md",
1017
+ "installAs": "commands/clean-branches.md",
1018
+ "url": "https://agentscamp.com/commands/git/clean-branches"
1019
+ },
978
1020
  {
979
1021
  "id": "commands/commit",
980
1022
  "type": "command",
@@ -1317,6 +1359,20 @@
1317
1359
  "installAs": "commands/review-pr.md",
1318
1360
  "url": "https://agentscamp.com/commands/review/review-pr"
1319
1361
  },
1362
+ {
1363
+ "id": "commands/review-tests",
1364
+ "type": "command",
1365
+ "slug": "review-tests",
1366
+ "category": "review",
1367
+ "title": "Review Tests",
1368
+ "description": "Review the quality of a test suite, not just whether it passes — find weak assertions, missing edge cases, and tests coupled to implementation.",
1369
+ "topics": [
1370
+ "review-qa"
1371
+ ],
1372
+ "file": "commands/review-tests.md",
1373
+ "installAs": "commands/review-tests.md",
1374
+ "url": "https://agentscamp.com/commands/review/review-tests"
1375
+ },
1320
1376
  {
1321
1377
  "id": "commands/run-evals",
1322
1378
  "type": "command",
@@ -1346,6 +1402,20 @@
1346
1402
  "installAs": "commands/scaffold-dockerfile.md",
1347
1403
  "url": "https://agentscamp.com/commands/scaffold/scaffold-dockerfile"
1348
1404
  },
1405
+ {
1406
+ "id": "commands/scaffold-github-action",
1407
+ "type": "command",
1408
+ "slug": "scaffold-github-action",
1409
+ "category": "scaffold",
1410
+ "title": "Scaffold GitHub Action",
1411
+ "description": "Scaffold a hardened GitHub Actions workflow for a stated goal, wired to the project's real test/lint/build commands.",
1412
+ "topics": [
1413
+ "devops-infra"
1414
+ ],
1415
+ "file": "commands/scaffold-github-action.md",
1416
+ "installAs": "commands/scaffold-github-action.md",
1417
+ "url": "https://agentscamp.com/commands/scaffold/scaffold-github-action"
1418
+ },
1349
1419
  {
1350
1420
  "id": "commands/scaffold-pgvector-schema",
1351
1421
  "type": "command",
@@ -1450,6 +1520,20 @@
1450
1520
  "installAs": "commands/setup-claude-ci.md",
1451
1521
  "url": "https://agentscamp.com/commands/workflow/setup-claude-ci"
1452
1522
  },
1523
+ {
1524
+ "id": "commands/setup-precommit-hooks",
1525
+ "type": "command",
1526
+ "slug": "setup-precommit-hooks",
1527
+ "category": "workflow",
1528
+ "title": "Setup Pre-commit Hooks",
1529
+ "description": "Set up fast pre-commit hooks that catch problems before they land — detect the repo's existing stack and hook mechanism, run lint/format/typecheck plus a secret scan on staged files only, keep the slow test suite in CI, and make the setup reproducible for the whole team.",
1530
+ "topics": [
1531
+ "devops-infra"
1532
+ ],
1533
+ "file": "commands/setup-precommit-hooks.md",
1534
+ "installAs": "commands/setup-precommit-hooks.md",
1535
+ "url": "https://agentscamp.com/commands/workflow/setup-precommit-hooks"
1536
+ },
1453
1537
  {
1454
1538
  "id": "commands/sync-branch",
1455
1539
  "type": "command",
@@ -1493,6 +1577,21 @@
1493
1577
  "installAs": "commands/update-readme.md",
1494
1578
  "url": "https://agentscamp.com/commands/docs/update-readme"
1495
1579
  },
1580
+ {
1581
+ "id": "commands/write-design-doc",
1582
+ "type": "command",
1583
+ "slug": "write-design-doc",
1584
+ "category": "plan",
1585
+ "title": "Write Design Doc",
1586
+ "description": "Explore the codebase and write a decision-oriented design doc / RFC for a feature or system change.",
1587
+ "topics": [
1588
+ "architecture",
1589
+ "workflow-prompting"
1590
+ ],
1591
+ "file": "commands/write-design-doc.md",
1592
+ "installAs": "commands/write-design-doc.md",
1593
+ "url": "https://agentscamp.com/commands/plan/write-design-doc"
1594
+ },
1496
1595
  {
1497
1596
  "id": "commands/write-tests",
1498
1597
  "type": "command",
@@ -1634,6 +1733,20 @@
1634
1733
  "installAs": "skills/claude-settings-auditor/SKILL.md",
1635
1734
  "url": "https://agentscamp.com/skills/workflow/claude-settings-auditor"
1636
1735
  },
1736
+ {
1737
+ "id": "skills/connection-pool-tuner",
1738
+ "type": "skill",
1739
+ "slug": "connection-pool-tuner",
1740
+ "category": "database",
1741
+ "title": "Connection Pool Tuner",
1742
+ "description": "Size and tune a database connection pool from the real constraint — the database's shared max_connections and its core count — so total connections (per-instance pool × instance count) stay safely under the cap and a too-large pool stops adding latency. Use when the app throws 'too many connections' or pool-acquire timeouts, when the DB is saturated by connection count, or when deploying to serverless.",
1743
+ "topics": [
1744
+ "devops-infra"
1745
+ ],
1746
+ "file": "skills/connection-pool-tuner.md",
1747
+ "installAs": "skills/connection-pool-tuner/SKILL.md",
1748
+ "url": "https://agentscamp.com/skills/database/connection-pool-tuner"
1749
+ },
1637
1750
  {
1638
1751
  "id": "skills/conventional-commits",
1639
1752
  "type": "skill",
@@ -1690,6 +1803,20 @@
1690
1803
  "installAs": "skills/dependency-audit/SKILL.md",
1691
1804
  "url": "https://agentscamp.com/skills/security/dependency-audit"
1692
1805
  },
1806
+ {
1807
+ "id": "skills/dependency-upgrade-planner",
1808
+ "type": "skill",
1809
+ "slug": "dependency-upgrade-planner",
1810
+ "category": "refactor",
1811
+ "title": "Dependency Upgrade Planner",
1812
+ "description": "Plan and de-risk a major dependency, framework, or runtime upgrade — map the full version path, read every intermediate migration guide, and pin the breaking changes to your actual call sites instead of bumping the number and hoping. Use when a key dependency is several majors behind, when a security advisory forces an upgrade, or before a framework migration.",
1813
+ "topics": [
1814
+ "devops-infra"
1815
+ ],
1816
+ "file": "skills/dependency-upgrade-planner.md",
1817
+ "installAs": "skills/dependency-upgrade-planner/SKILL.md",
1818
+ "url": "https://agentscamp.com/skills/refactor/dependency-upgrade-planner"
1819
+ },
1693
1820
  {
1694
1821
  "id": "skills/embedding-index-tuner",
1695
1822
  "type": "skill",
@@ -1889,6 +2016,20 @@
1889
2016
  "installAs": "skills/mcp-server-scaffolder/SKILL.md",
1890
2017
  "url": "https://agentscamp.com/skills/api/mcp-server-scaffolder"
1891
2018
  },
2019
+ {
2020
+ "id": "skills/memory-leak-hunter",
2021
+ "type": "skill",
2022
+ "slug": "memory-leak-hunter",
2023
+ "category": "performance",
2024
+ "title": "Memory Leak Hunter",
2025
+ "description": "Find and fix a memory leak in a running app: confirm it's a real leak under steady load, diff two heap snapshots to name the growing object and its retention path, cut the root reference that blocks collection, and re-run to confirm memory plateaus. Use when RSS climbs until OOM/restart, heap grows unbounded across a steady workload, or GC pauses worsen the longer the process runs.",
2026
+ "topics": [
2027
+ "review-qa"
2028
+ ],
2029
+ "file": "skills/memory-leak-hunter.md",
2030
+ "installAs": "skills/memory-leak-hunter/SKILL.md",
2031
+ "url": "https://agentscamp.com/skills/performance/memory-leak-hunter"
2032
+ },
1892
2033
  {
1893
2034
  "id": "skills/migration-writer",
1894
2035
  "type": "skill",
@@ -1946,6 +2087,20 @@
1946
2087
  "installAs": "skills/openapi-doc-writer/SKILL.md",
1947
2088
  "url": "https://agentscamp.com/skills/docs/openapi-doc-writer"
1948
2089
  },
2090
+ {
2091
+ "id": "skills/pagination-designer",
2092
+ "type": "skill",
2093
+ "slug": "pagination-designer",
2094
+ "category": "api",
2095
+ "title": "Pagination Designer",
2096
+ "description": "Design correct, scalable pagination (plus the filtering and sorting that ride with it) for a list endpoint — pick cursor (keyset) vs offset and justify it, define an opaque cursor with a unique tiebreaker so no row is skipped or repeated, return a consistent envelope, bound page size, and name the indexes the sort actually needs. Use when adding a list endpoint, when OFFSET pagination crawls on a large table, or when clients see duplicate or missing rows while paging.",
2097
+ "topics": [
2098
+ "architecture"
2099
+ ],
2100
+ "file": "skills/pagination-designer.md",
2101
+ "installAs": "skills/pagination-designer/SKILL.md",
2102
+ "url": "https://agentscamp.com/skills/api/pagination-designer"
2103
+ },
1949
2104
  {
1950
2105
  "id": "skills/plugin-scaffolder",
1951
2106
  "type": "skill",
@@ -2045,6 +2200,20 @@
2045
2200
  "installAs": "skills/prompt-regression-tester/SKILL.md",
2046
2201
  "url": "https://agentscamp.com/skills/data/prompt-regression-tester"
2047
2202
  },
2203
+ {
2204
+ "id": "skills/property-test-designer",
2205
+ "type": "skill",
2206
+ "slug": "property-test-designer",
2207
+ "category": "testing",
2208
+ "title": "Property Test Designer",
2209
+ "description": "Design property-based tests — generate hundreds of random inputs and assert invariants that must hold for ALL of them — to surface the edge cases hand-picked examples never reach. Use when code has a large input space (parsers, serializers, encoders, math, data transforms), when a bug keeps slipping through despite green example tests, or when you can't enumerate every case worth checking.",
2210
+ "topics": [
2211
+ "review-qa"
2212
+ ],
2213
+ "file": "skills/property-test-designer.md",
2214
+ "installAs": "skills/property-test-designer/SKILL.md",
2215
+ "url": "https://agentscamp.com/skills/testing/property-test-designer"
2216
+ },
2048
2217
  {
2049
2218
  "id": "skills/provider-fallback-wrapper",
2050
2219
  "type": "skill",
@@ -2131,6 +2300,20 @@
2131
2300
  "installAs": "skills/secret-scanner/SKILL.md",
2132
2301
  "url": "https://agentscamp.com/skills/security/secret-scanner"
2133
2302
  },
2303
+ {
2304
+ "id": "skills/security-headers-hardener",
2305
+ "type": "skill",
2306
+ "slug": "security-headers-hardener",
2307
+ "category": "security",
2308
+ "title": "Security Headers Hardener",
2309
+ "description": "Audit and harden a web app's or API's HTTP security headers — Content-Security-Policy, HSTS, X-Content-Type-Options, frame-ancestors, Referrer-Policy, Permissions-Policy, and CORS — and produce a staged rollout that won't break the site. Use before a launch, during a security pass, or when a scanner (Mozilla Observatory, securityheaders.com, a pentest) flags missing or weak headers. Audits and edits header config; rolls CSP out Report-Only first.",
2310
+ "topics": [
2311
+ "review-qa"
2312
+ ],
2313
+ "file": "skills/security-headers-hardener.md",
2314
+ "installAs": "skills/security-headers-hardener/SKILL.md",
2315
+ "url": "https://agentscamp.com/skills/security/security-headers-hardener"
2316
+ },
2134
2317
  {
2135
2318
  "id": "skills/semver-advisor",
2136
2319
  "type": "skill",
@@ -2145,6 +2328,20 @@
2145
2328
  "installAs": "skills/semver-advisor/SKILL.md",
2146
2329
  "url": "https://agentscamp.com/skills/release/semver-advisor"
2147
2330
  },
2331
+ {
2332
+ "id": "skills/slo-definer",
2333
+ "type": "skill",
2334
+ "slug": "slo-definer",
2335
+ "category": "observability",
2336
+ "title": "SLO Definer",
2337
+ "description": "Turn a vague reliability goal into concrete SLIs, SLOs, an error budget, and burn-rate alerts — service-level indicators measured at the user-facing boundary, targets over a rolling window, and a written policy for what happens when the budget runs out. Use when a service has no defined reliability target, when on-call is noisy and alert-fatigued, or before you commit to an SLA you can't measure.",
2338
+ "topics": [
2339
+ "devops-infra"
2340
+ ],
2341
+ "file": "skills/slo-definer.md",
2342
+ "installAs": "skills/slo-definer/SKILL.md",
2343
+ "url": "https://agentscamp.com/skills/observability/slo-definer"
2344
+ },
2148
2345
  {
2149
2346
  "id": "skills/sql-optimizer",
2150
2347
  "type": "skill",
@@ -2159,6 +2356,20 @@
2159
2356
  "installAs": "skills/sql-optimizer/SKILL.md",
2160
2357
  "url": "https://agentscamp.com/skills/data/sql-optimizer"
2161
2358
  },
2359
+ {
2360
+ "id": "skills/structured-logging-designer",
2361
+ "type": "skill",
2362
+ "slug": "structured-logging-designer",
2363
+ "category": "observability",
2364
+ "title": "Structured Logging Designer",
2365
+ "description": "Design a structured (JSON) logging strategy with a stable field schema, correlation-ID propagation, and a disciplined level policy — then migrate ad-hoc string logs toward it. Use when logs are unsearchable plain text, when debugging a request across services means grepping multiple log streams by hand, or when standing up logging for a new service.",
2366
+ "topics": [
2367
+ "devops-infra"
2368
+ ],
2369
+ "file": "skills/structured-logging-designer.md",
2370
+ "installAs": "skills/structured-logging-designer/SKILL.md",
2371
+ "url": "https://agentscamp.com/skills/observability/structured-logging-designer"
2372
+ },
2162
2373
  {
2163
2374
  "id": "skills/test-scaffolder",
2164
2375
  "type": "skill",
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: "connection-pool-tuner"
3
+ description: "Size and tune a database connection pool from the real constraint — the database's shared max_connections and its core count — so total connections (per-instance pool × instance count) stay safely under the cap and a too-large pool stops adding latency. Use when the app throws 'too many connections' or pool-acquire timeouts, when the DB is saturated by connection count, or when deploying to serverless."
4
+ allowed-tools: "Read, Grep, Glob"
5
+ version: 1.0.0
6
+ ---
7
+
8
+ Connection pools fail in two opposite ways, and a "nice round number" like 100 walks into both. Too large and every app instance's pool sums past the database's shared `max_connections`, so the next deploy or traffic spike exhausts the server and *every* instance starts throwing. Naively large and the pool is bigger than the DB has cores to serve it, so the extra connections don't add parallelism — they queue inside the database and add latency. This skill sizes the **per-instance** pool from concurrency need and core count, does the `instances × pool ≤ max_connections` arithmetic with real headroom, sets the timeouts that recycle dead connections, and sends serverless through a pooler instead of multiplying pools.
9
+
10
+ ## When to use this skill
11
+
12
+ - The app logs `FATAL: too many connections` / `remaining connection slots are reserved`, or pool-acquire timeouts ("timed out fetching a connection from the pool").
13
+ - The database is saturated by connection *count* (high `pg_stat_activity` rows, memory pressure from per-connection backends) rather than by slow queries.
14
+ - You scaled out app instances or autoscaling kicked in, and the DB started erroring even though per-instance load looks fine.
15
+ - You're deploying to serverless / many short-lived instances (Lambda, Vercel functions, Cloud Run) and need a connection strategy.
16
+ - Standing up a new service and picking a pool size before it hits production.
17
+
18
+ ## Instructions
19
+
20
+ 1. **Find the real ceiling first.** Read the database's `max_connections` (Postgres `SHOW max_connections`, MySQL `max_connections`) — this is shared across *everything*: every app instance, background workers, migrations, replicas, admin/`psql` sessions, and the monitoring agent. Postgres also reserves `superuser_reserved_connections`. Treat the usable budget as roughly `max_connections − reserved − headroom`, not the raw number.
21
+ 2. **Count every connection source, not just the web app.** Total connections = (per-instance pool × app instance count) + worker/cron pools + replicas + migration tooling + a margin for admin sessions and a deploy overlap (old and new instances live simultaneously during rolling deploys — pools effectively double for that window). Enumerate each source by grepping for pool config (`max`, `pool_size`, `maximumPoolSize`, `DATABASE_URL`, `?connection_limit=`).
22
+ 3. **Size the per-instance pool from concurrency, capped by cores — not by a big round number.** A connection only does work when the DB has a free core to run its query. The starting heuristic for a CPU-bound OLTP workload is near the DB's core count *for the whole fleet*, so per-instance pool ≈ `(useful_DB_concurrency) / instance_count`, often a small single-digit number. Going higher doesn't buy parallelism — it buys a queue. For I/O-bound queries (lots of waiting) you can go somewhat above core count, but measure rather than assume.
23
+ 4. **Do the exhaustion arithmetic explicitly and leave headroom.** Compute `instances × pool + other_sources` and confirm it stays under the usable budget *at max autoscale*, not at average instance count. Size against the ceiling the autoscaler can reach, then keep ~20–30% of `max_connections` free for migrations, admin, replication, and deploy overlap. If the math doesn't fit, shrink the pool before raising `max_connections` (each Postgres backend costs real memory).
24
+ 5. **Set the four timeouts deliberately — defaults leak or stall.**
25
+ - **Acquire / pool timeout** — how long a request waits for a free connection before failing fast (e.g. a few seconds). Without it, a saturated pool turns into unbounded queueing and looks like a hang.
26
+ - **Idle timeout** — return idle connections so the pool shrinks under low load and you're not holding slots the DB could give elsewhere.
27
+ - **Max lifetime** — recycle each connection after a bounded age (e.g. 30 min) so a load balancer / DNS failover / DB restart doesn't leave stale half-dead connections in the pool.
28
+ - **Min / idle floor** — keep a small warm minimum to avoid connect latency on the first request, but not so high that idle instances hoard the budget.
29
+ 6. **Handle serverless and many-instances specially — route through a pooler.** When instance count is large or unbounded (one pool per function invocation), per-instance pools multiply faster than any safe per-instance number can absorb. Don't fix it by shrinking the per-function pool to 1 alone — put a pooler between the app and the DB: PgBouncer in **transaction** mode, RDS Proxy, Supabase's pooler, or a provider serverless/HTTP driver. The pooler multiplexes hundreds of client connections onto a small set of real DB connections; keep the per-function pool at 1–2 behind it.
30
+
31
+ > [!WARNING]
32
+ > Scaling out app instances silently multiplies total connections. A pool of 20 that's fine on 3 instances (60) exhausts a 100-connection DB the moment the autoscaler reaches 5 instances — and it fails *everywhere at once*, not gracefully. Always size against **max autoscale × pool**, plus the deploy-overlap doubling, never average instance count.
33
+
34
+ > [!WARNING]
35
+ > A bigger pool is frequently *slower*, not faster. Past the DB's effective core count, added connections don't run in parallel — they queue inside the database and add context-switching overhead, raising p99 latency while throughput stays flat. If the pool is large and the DB is CPU-bound, the fix for latency is usually to *shrink* the pool.
36
+
37
+ > [!NOTE]
38
+ > Transaction-mode poolers (PgBouncer) break features that hold state across statements on one connection: session-level `SET`, advisory locks, `LISTEN/NOTIFY`, and some prepared-statement modes. Use session mode (or a dedicated direct connection) for those paths, and run migrations against the DB directly, not through the transaction pooler.
39
+
40
+ ## Output
41
+
42
+ A pool-sizing recommendation, concretely:
43
+ - **The math** — usable budget (`max_connections − reserved − headroom`), and `instances_at_max_autoscale × per_instance_pool + other_sources` shown to land under it with the headroom stated.
44
+ - **Recommended per-instance pool size** with the rationale (concurrency need vs. DB core count, and which workload type it is), plus separate sizes for worker/cron pools.
45
+ - **Timeout/lifetime settings** — acquire timeout, idle timeout, max lifetime, and min/idle floor, with the value and why each is set.
46
+ - **Serverless recommendation if applicable** — the specific pooler (PgBouncer transaction mode / RDS Proxy / serverless driver), the per-function pool size behind it, and any session-mode caveats for stateful paths.
@@ -0,0 +1,42 @@
1
+ ---
2
+ name: "dependency-upgrade-planner"
3
+ description: "Plan and de-risk a major dependency, framework, or runtime upgrade — map the full version path, read every intermediate migration guide, and pin the breaking changes to your actual call sites instead of bumping the number and hoping. Use when a key dependency is several majors behind, when a security advisory forces an upgrade, or before a framework migration."
4
+ allowed-tools: "Read, Grep, Glob, Bash"
5
+ version: 1.0.0
6
+ ---
7
+
8
+ Turn "bump the version and hope" into a sequenced, evidence-backed upgrade plan. The skill establishes the exact current → target version gap, reads the CHANGELOG and migration guide for **every** major in between, then greps the codebase for the dependency's imported symbols so the breaking-change list is narrowed to the call sites that actually exist here. It checks the target's peer-dependency and runtime requirements, orders the work (codemods first, one major at a time for big jumps, behind tests), and writes down a rollback before anything is touched.
9
+
10
+ ## When to use this skill
11
+
12
+ - A key dependency, framework, or runtime is several majors behind and you need a path forward, not a single `npm install pkg@latest`.
13
+ - A security advisory (CVE, `npm audit`, Dependabot) forces an upgrade and you need to know the blast radius before merging.
14
+ - You are scoping a framework or runtime migration (React, Next.js, Django, Rails, Node, Python) and want to know what breaks before committing the sprint.
15
+
16
+ > [!WARNING]
17
+ > Jumping several majors in one `install` hides which version broke what. Breaking changes compound: v3's removal of an API plus v4's renamed option plus v5's changed default land as one undebuggable wall of errors. For a gap of two or more majors, upgrade **one major at a time**, landing each behind a green build/test run, so every failure maps to exactly one version's changes.
18
+
19
+ ## Instructions
20
+
21
+ 1. **Pin the exact current and target versions.** Read the lockfile (`package-lock.json`/`pnpm-lock.yaml`/`yarn.lock`, `poetry.lock`, `go.sum`, `Cargo.lock`) for the version actually installed — not the loose range in the manifest, which lies about what resolved. Confirm the target: `npm view <pkg> versions --json`, `pip index versions <pkg>`, `go list -m -versions <mod>`, or the registry page. Record the full hop list, e.g. `4.2.1 → 5.x → 6.x → 7.0.3`.
22
+ 2. **Read the migration guide for every major in between — don't skip the intermediate notes.** A jump from v4 to v7 means reading the v5, v6, **and** v7 breaking-change sections, not just v7's. Pull the CHANGELOG / UPGRADING / migration doc (`gh release view`, the repo's `CHANGELOG.md`, the docs site) and extract every entry under "Breaking", "Removed", "Renamed", "Default changed", and "Deprecated → removed".
23
+ 3. **Inventory your actual usage so you only care about breaks that hit you.** Grep the codebase for the dependency's imported symbols and entry points — `grep -rIn "from 'pkg'" `, `grep -rIn "require('pkg')"`, `import pkg`, the specific class/function/option names called out in the breaking-change list. A breaking change to an API you never call is noise; a one-line default change to a function on 40 call sites is the real work. Map each relevant breaking change to its call sites.
24
+ 4. **Check transitive/peer-dep and runtime requirements of the target.** The target may demand a newer peer (`react@>=19`, a `@types/*` bump) or a higher minimum runtime (Node, Python, Go, the language edition). Run `npm info <pkg>@<target> peerDependencies engines` (or read `requires-python` / `go.mod` `go` directive / `rust-version`). Cross-check against your other dependencies' peer ranges and your CI/Dockerfile/`.nvmrc`/`engines` runtime — a conflict here blocks the install before any code change.
25
+ 5. **Sequence the work: codemods → one major at a time → behind tests.** Run the official codemod first if one exists (`npx <pkg>-codemod`, `npx @next/codemod`, framework migration CLIs) — they do the mechanical renames so you review semantics, not churn. For multi-major gaps, do one major per commit/PR; for each step, apply the codemod, hand-fix the mapped call sites, then run the **real** build and test commands as a checkpoint before the next hop.
26
+ 6. **Write the rollback before touching anything.** Commit the current lockfile, branch the work, and record the revert: restore the pinned versions in the manifest **and** the lockfile (a manifest-only revert re-resolves to something new), then reinstall from the lockfile (`npm ci`, `pnpm install --frozen-lockfile`, `poetry install`). For a forced security upgrade with no safe target yet, note the interim mitigation (override/resolution pin, patch backport) as the fallback.
27
+
28
+ > [!WARNING]
29
+ > Peer-dependency conflicts and a bumped minimum runtime are the upgrades that silently break the build — not the API renames you can see in a diff. `npm install` may resolve a peer with a warning (or fail under strict/`pnpm`), and a target that requires Node 22 will install fine locally then explode in CI on Node 20. Verify both **before** writing code, in step 4.
30
+
31
+ > [!NOTE]
32
+ > Land the upgrade on its own branch with one commit per major hop and the codemod output as a separate commit from your hand-fixes. If a regression only shows up in CI or staging, granular history makes `git revert` of a single version trivial instead of unpicking a tangled bump.
33
+
34
+ ## Output
35
+
36
+ A concrete upgrade plan, reproducible from the evidence gathered:
37
+
38
+ - **Version path** — the exact hop list from the lockfile to the target (`4.2.1 → 5.18.0 → 6.4.2 → 7.0.3`), one line per major.
39
+ - **Breaking changes that affect THIS codebase** — a table of `change → version → call sites`, with the file:line locations grep found; changes that touch no call site are explicitly listed as not-applicable so the reader trusts the filter.
40
+ - **Peer-dep & runtime gate** — required peer ranges and minimum runtime of the target vs. what the repo and CI currently pin, with conflicts flagged as blockers.
41
+ - **Steps in order** — codemod commands first, then per-major manual fixes, each with its test/build checkpoint command.
42
+ - **Rollback plan** — the exact manifest + lockfile revert and reinstall command, plus any interim mitigation for a forced upgrade.
@@ -0,0 +1,35 @@
1
+ ---
2
+ name: "memory-leak-hunter"
3
+ description: "Find and fix a memory leak in a running app: confirm it's a real leak under steady load, diff two heap snapshots to name the growing object and its retention path, cut the root reference that blocks collection, and re-run to confirm memory plateaus. Use when RSS climbs until OOM/restart, heap grows unbounded across a steady workload, or GC pauses worsen the longer the process runs."
4
+ allowed-tools: "Read, Grep, Glob, Bash"
5
+ version: 1.0.0
6
+ ---
7
+
8
+ A process whose memory only goes up will eventually OOM, get killed, or grind to a halt in GC — but "memory went up" is not the same as "there is a leak." A warming cache, a JIT, a connection pool filling, and a steadily growing legitimate working set all climb too. This skill refuses to guess: it first *confirms* the leak against a steady workload, then *locates* it with a heap diff rather than a single snapshot, traces the *retention path* to the one reference that blocks collection, fixes that root, and re-runs to prove the curve flattens.
9
+
10
+ ## When to use this skill
11
+
12
+ - RSS climbs monotonically until the process OOMs, gets OOM-killed, or hits a scheduled restart that "fixes" it for a while.
13
+ - Heap usage trends up across a steady, repeating workload and never returns to baseline after a GC.
14
+ - GC pauses (or full-GC frequency) get worse the longer the process stays up — a classic sign the live set is growing.
15
+ - A load test or soak test shows memory that doesn't plateau even after the request rate is constant.
16
+ - After a deploy, memory behavior changed and you need to know whether it's a real leak or a bigger-but-bounded cache.
17
+
18
+ ## Instructions
19
+
20
+ 1. **Confirm it's a leak before hunting one.** Drive a *steady, repeating* workload (constant request rate or a fixed loop) and record memory over time — RSS and heap-used at, say, 30s intervals. Force a GC between samples where you can (`global.gc()` with `--expose-gc` in Node, `System.gc()`/`jcmd <pid> GC.run` on the JVM, `gc.collect()` in Python). A leak is memory that trends **up** under constant load and **does not recover** after GC. Memory that rises during warmup and then *plateaus*, or that drops back after GC, is not a leak — stop here and look at cache sizing or normal working set instead.
21
+ 2. **Capture two heap snapshots under load, spaced apart.** Take snapshot A once warmup has settled, keep the same workload running, then take snapshot B after memory has visibly grown (Node: `--inspect` + DevTools/`heapdump`/`v8.writeHeapSnapshot()`; JVM: `jmap -dump:live,format=b,file=… <pid>` or a JFR `OldObjectSample`; Python: `tracemalloc.take_snapshot()` ×2, or `objgraph`/`guppy`). One snapshot tells you what's big *now*, which is useless — you need both ends of the growth.
22
+ 3. **Diff the two snapshots — read what GREW, not what's biggest.** Use the comparison view (DevTools "Comparison" between A and B, `tracemalloc.compare_to`, MAT's dominator/histogram delta). Sort by *delta in retained size and object count*. The leak is the object type whose instance count and retained size climb monotonically across the diff and never get freed — not necessarily the single largest object, which is often a legitimately big-but-stable buffer.
23
+ 4. **Trace the retention path to the root that blocks collection.** For the growing object, follow the *retainers / paths-to-GC-root* (DevTools "Retainers", MAT "Path to GC Roots: exclude weak/soft"). The fix lives at the *root* end of that chain — the live reference that keeps the whole subtree alive. Match it to the usual suspects: an unbounded cache/`Map`/dict keyed by something ever-growing (request id, user id); an event listener / observable / pub-sub subscription added but never removed; a closure captured by a long-lived callback that drags a large scope with it; a `setInterval`/timer/scheduled task never cleared; a module-level array/list that's only ever appended to; or — in native or manual-memory code — an allocation with no matching free (check with `valgrind --leak-check=full` / ASan / a heap profiler).
24
+ 5. **Fix by bounding the lifetime at the root.** Don't trim symptoms — cut the retaining reference: put a size cap and eviction (LRU) or TTL on the cache; `removeEventListener` / `unsubscribe` / `dispose` in the matching teardown; `clearInterval`/`clearTimeout` and cancel scheduled work on shutdown/unmount; replace a cache keyed by short-lived objects with a `WeakMap`/`WeakRef` so entries are collectible; bound or drain the module-level collection; add the missing `free`/`delete`/`close`. Prefer the change that makes the lifetime *correct* over one that just makes the leak slower.
25
+ 6. **Re-run the same workload and confirm a plateau.** Repeat step 1's steady workload with the fix in place and capture the same memory-over-time trace. The fix is verified only when memory rises during warmup and then *flattens* (and recovers after GC) across a window long enough to have leaked before. If it still trends up, the diff pointed at one of several retainers — go back to step 3 and trace the next-largest grower.
26
+
27
+ > [!WARNING]
28
+ > A single heap snapshot proves nothing about a leak — every running process holds a lot of live memory legitimately. Only the **diff of two snapshots under sustained load** distinguishes "growing and never freed" from "big but stable." Never conclude a leak (or a fix) from one snapshot or one memory number.
29
+
30
+ > [!NOTE]
31
+ > "Memory went up" during warmup, JIT, or cache fill is expected, not a leak — a leak is unbounded growth that never plateaus under *constant* load. Before touching code, confirm the curve never flattens and never recovers after a forced GC; otherwise you'll "fix" a cache that was working as designed and make the app slower.
32
+
33
+ ## Output
34
+
35
+ A short report with four parts: (1) the **confirmation evidence** — the memory-over-time trace under steady load showing growth that doesn't recover after GC; (2) the **leaking object and retention path** from the heap diff (type, delta count/retained size, and the path-to-GC-root naming the retaining root); (3) the **root-cause fix** as a concrete diff at that root (eviction/TTL, unsubscribe, cleared timer, weak reference, or missing free); and (4) the **post-fix plateau** — the same workload's memory trace now flattening — or a note that another retainer remains and which one to chase next.
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: "pagination-designer"
3
+ description: "Design correct, scalable pagination (plus the filtering and sorting that ride with it) for a list endpoint — pick cursor (keyset) vs offset and justify it, define an opaque cursor with a unique tiebreaker so no row is skipped or repeated, return a consistent envelope, bound page size, and name the indexes the sort actually needs. Use when adding a list endpoint, when OFFSET pagination crawls on a large table, or when clients see duplicate or missing rows while paging."
4
+ allowed-tools: "Read, Grep, Glob"
5
+ version: 1.0.0
6
+ ---
7
+
8
+ Pagination looks trivial until the table grows or the data moves under the reader. `OFFSET 100000` doesn't skip to row 100,000 — the database scans and throws away the first 100,000 matching rows on every request, so latency climbs linearly with depth. And sorting by a non-unique column (`created_at`, `name`, `score`) without a tiebreaker gives a *partial* order: rows that tie can reorder between requests, so paging skips some and shows others twice. This skill makes the pagination scheme an explicit decision — keyset vs offset, the cursor encoding, the tiebreaker, the page-size bounds, and the indexes — and defines how filtering and sorting compose with it.
9
+
10
+ ## When to use this skill
11
+
12
+ - You're adding a list/collection endpoint and need to decide how clients page through it.
13
+ - An existing `OFFSET`/`LIMIT` endpoint is fast on page 1 and slow on page 500, or it times out on deep pages.
14
+ - Clients report seeing the same row twice or missing rows entirely while scrolling — the classic symptom of an unstable sort under concurrent inserts/deletes.
15
+ - The list is large, append-heavy, or actively changing (feeds, logs, events, search results) and you need stable paging that doesn't drift as rows are added.
16
+
17
+ ## Instructions
18
+
19
+ 1. **Choose cursor (keyset) vs offset from the dataset, and justify it.**
20
+ - **Cursor / keyset** — the default for large or actively-changing data. Instead of `OFFSET`, the next page *seeks* on the sort key: `WHERE (created_at, id) < (:last_created_at, :last_id) ORDER BY created_at DESC, id DESC LIMIT :n`. It's stable under inserts/deletes (each page is anchored to a real row, not a positional count) and stays fast at any depth because it uses an index range scan instead of scanning prior rows. Cost: no random page jumps, no total page count.
21
+ - **Offset / limit** — acceptable only for **small, stable, human-paginated** lists where users click numbered pages (an admin table of a few thousand rows). It allows arbitrary jumps and easy "page 7 of 20" UIs. Never use it for infinite scroll, large tables, or feeds.
22
+ State which you chose and the property (depth performance + stability vs random access) that drove it.
23
+
24
+ 2. **Always include a unique tiebreaker so the sort order is total.** A cursor seeking on a non-unique column alone (`created_at`) can't disambiguate ties: two rows with the same timestamp have no defined relative order, so one can land on both sides of a page boundary. Encode the user-facing sort key **plus a unique, monotonic tiebreaker** (the primary key) — the cursor compares on the tuple `(created_at, id)`. This makes the order total: every row has exactly one position, so no row is skipped or repeated. Even when the apparent sort is "by id" alone, that already happens to be unique — but any user-chosen sort needs the explicit `, id` tiebreaker appended.
25
+
26
+ 3. **Make the cursor opaque.** Encode the tuple `(sort_key_value, tiebreaker_value)` (and, if filters/sort are part of the page identity, a version or the sort direction) into a single base64url token — `next_cursor: "eyJjcmVhdGVkX2F0IjoiMjAyNi0wNi0xN1QwOTozMDowMFoiLCJpZCI6IjQ4ODEyIn0"`. Opaque means clients treat it as a blob and pass it back verbatim; you keep freedom to change the internal encoding without breaking them. Do **not** expose raw `(timestamp, id)` as query params — clients will hand-craft them, couple to your schema, and break on the next change.
27
+
28
+ 4. **Return one consistent envelope.** Every list endpoint returns the same shape:
29
+ ```json
30
+ { "data": [ ... ], "next_cursor": "…", "has_more": true }
31
+ ```
32
+ `next_cursor` is `null` when there are no more rows. Derive `has_more` reliably by fetching `LIMIT n + 1`: if you get `n + 1` rows, there's another page — drop the extra row and set `next_cursor` from the last *kept* row. This avoids a separate `COUNT` and is correct even when the last page is exactly full. Do not return a total count for keyset pagination; computing it scans the whole filtered set and defeats the point.
33
+
34
+ 5. **Bound page size with a sane default and a hard max.** Read the page size from `limit` (or `page_size`), clamp it: default 20–50, hard max 100–200 — never unbounded. An unbounded `limit` lets one client request a million rows and OOM the server or exhaust the DB. Clamp silently (return `min(requested, max)`) and document the cap.
35
+
36
+ 6. **Name the indexes the sort actually needs — this is non-negotiable for keyset.** The `ORDER BY (sort_key, tiebreaker)` and the `WHERE (sort_key, tiebreaker) < (...)` seek are only fast if a **composite index on those exact columns in that exact order and direction** exists. Sorting `created_at DESC, id DESC` needs an index supporting that; a plain index on `created_at` alone forces a sort and undoes the win. If filters narrow the set, the index should lead with the equality-filter columns, then the sort columns: `(tenant_id, created_at, id)` for a query filtered by tenant and sorted by time. Verify the index exists or flag it as required.
37
+
38
+ 7. **Define how filtering and sorting compose with the cursor.** The cursor is only valid *for the filter and sort it was issued under* — a cursor minted for `?status=active&sort=created_at` is meaningless if the next request changes `status` or `sort`. Specify the contract: which fields are filterable, which are sortable (whitelist them — never interpolate a client-supplied column name into `ORDER BY`), and that **changing any filter or sort param invalidates the cursor and resets to the first page**. For multi-column sorts, the tiebreaker is appended after *all* user sort columns, and the seek predicate must compare the full tuple (row-value comparison `(a, b, c) < (:a, :b, :c)`, not `a < :a OR (a = :a AND b < :b) OR …` unless your engine lacks tuple comparison).
39
+
40
+ > [!WARNING]
41
+ > Deep `OFFSET` is O(n), not O(1). `OFFSET 100000 LIMIT 20` makes the database read and discard 100,000 matching rows before returning 20 — every request, getting worse as users page deeper, holding locks and burning IO the whole time. Page 1 being fast tells you nothing about page 5,000. If the table can grow large or users can reach deep pages, use keyset.
42
+
43
+ > [!WARNING]
44
+ > A non-unique sort key without a tiebreaker silently corrupts paging. With `ORDER BY created_at` and several rows sharing a timestamp, the engine may return those tied rows in a different order on the next request — so a row sitting on the page boundary gets skipped on one page and the previous boundary row reappears on the next. There is no error, just missing and duplicated data. Always append a unique tiebreaker (`, id`) to every sort.
45
+
46
+ > [!NOTE]
47
+ > Offset and keyset can coexist behind one envelope: serve numbered offset pages for a small admin UI and keyset for the public feed, both returning `{ data, next_cursor, has_more }` (offset endpoints simply also accept `page`/leave `next_cursor` null). Pick per endpoint from its access pattern, not one rule for the whole API.
48
+
49
+ ## Output
50
+
51
+ A pagination spec stating: the chosen **scheme** (cursor vs offset) + rationale; the **response envelope** (`data` / `next_cursor` / `has_more`, with the `null`-when-done and `LIMIT n+1` rules); the **cursor encoding** — the exact tuple `(sort key, unique tiebreaker)` and that it's base64url-opaque; the **page-size** default and hard max; the **required indexes** (exact columns, order, and direction, leading with equality-filter columns); and the **filter/sort contract** — the filterable/sortable field whitelist, the tuple seek predicate, and that changing any filter or sort param invalidates the cursor.