npm - agentscamp - Versions diffs - 0.1.0 → 0.2.0 - Mend

agentscamp 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/content/commands/create-skill.md +84 -0
package/content/commands/create-slash-command.md +80 -0
package/content/commands/create-subagent.md +101 -0
package/content/commands/estimate-effort.md +96 -0
package/content/commands/find-n-plus-one.md +67 -0
package/content/commands/flaky-test-hunt.md +112 -0
package/content/commands/git-bisect.md +126 -0
package/content/commands/rename-symbol.md +131 -0
package/content/commands/resolve-conflict.md +127 -0
package/content/commands/scaffold-rag-pipeline.md +76 -0
package/content/commands/trace-data-flow.md +102 -0
package/content/manifest.json +320 -3
package/content/skills/agent-memory-designer.md +42 -0
package/content/skills/auth-flow-reviewer.md +73 -0
package/content/skills/extract-module.md +50 -0
package/content/skills/migration-writer.md +41 -0
package/content/skills/prompt-regression-tester.md +89 -0
package/content/skills/rate-limiter-designer.md +56 -0
package/content/skills/react-render-profiler.md +37 -0
package/content/skills/semver-advisor.md +81 -0
package/content/skills/type-coverage-improver.md +81 -0
package/content/skills/version-bumper.md +91 -0
package/content/skills/webhook-handler-scaffolder.md +37 -0
package/package.json +1 -1

package/content/skills/migration-writer.md ADDED Viewed

@@ -0,0 +1,41 @@
+---
+name: "migration-writer"
+description: "Write a safe, reversible, zero-downtime database migration using expand-contract — add the new shape, backfill in batches, switch reads/writes, then drop the old — so every deploy stays compatible with the running app version. Use when adding or changing schema on a live system, renaming/dropping a column, adding NOT NULL or a foreign key on a large table, or when a migration risks locks, table rewrites, or an unrevertable step."
+allowed-tools: "Read, Grep, Glob, Edit"
+version: 1.0.0
+---
+Most schema migrations break production not because the SQL is wrong but because they assume the database and the app flip over in one atomic instant. During a rolling deploy, old and new code run **at the same time** against **one schema** — so a migration that the new code needs will crash the old code, and a rollback that the old code needs is gone the moment you `DROP`. This skill writes migrations the **expand-contract** way: each step is independently deployable against the version before and after it, every change has a real `down`, and no step takes a lock that blocks writes on a hot table.
+## When to use this skill
+- Adding, renaming, dropping, or retyping a column on a table that a live app reads/writes.
+- Adding `NOT NULL`, a `CHECK`, a foreign key, or a unique constraint to a table with existing rows.
+- Creating an index on a large/busy table, or backfilling a new column across millions of rows.
+- Splitting/merging tables, moving a column, or any change where old and new app code must coexist during the deploy.
+## Instructions
+1. **Decide the expand-contract phases first, before writing SQL.** A column rename `a → b` is not one migration; it is: (1) add `b` nullable, (2) dual-write `a` and `b` in app code, (3) backfill `b` from `a`, (4) switch reads to `b`, (5) stop writing `a`, (6) drop `a`. Each phase ships and is safe to roll back to the phase before it. Name the phases explicitly in the output, mapped to app deploys.
+2. **Make additive changes nullable / without a default rewrite.** `ADD COLUMN ... NULL` is instant. Adding a column with a non-constant default (or, on old engines, any default) rewrites the table under a lock — split it into add-nullable, then backfill, then set default for future rows.
+3. **Add `NOT NULL` and `CHECK` without a blocking scan.** On Postgres: `ADD CONSTRAINT ... CHECK (...) NOT VALID`, then `VALIDATE CONSTRAINT` (takes only a `SHARE UPDATE EXCLUSIVE` lock, doesn't block writes). For `NOT NULL`, add the validated `CHECK (col IS NOT NULL)` first, then promote — never `SET NOT NULL` cold on a big table, which full-scans under an `ACCESS EXCLUSIVE` lock.
+4. **Build indexes and FKs concurrently / unvalidated.** `CREATE INDEX CONCURRENTLY` (and `DROP INDEX CONCURRENTLY`) so writes keep flowing; add foreign keys as `NOT VALID` then `VALIDATE CONSTRAINT` in a second step. Concurrent index builds run outside a transaction — keep them in their own migration with no other statements.
+5. **Backfill in bounded batches, never one transaction.** Update in chunks (e.g. `WHERE id BETWEEN ...` or `LIMIT n` loops) committing each batch, with a short sleep between batches to spare replication and locks. Keep the backfill in a **separate migration/job** from the schema DDL so a slow backfill can't hold a DDL lock and a failed batch doesn't roll back the whole table.
+6. **Write a real `down` for every `up`.** The down must actually reverse the change (drop the added column/index/constraint), or, where reversal loses data (a dropped column, a narrowed type), say so loudly and add an export/backup step to the up rather than pretending it's reversible.
+7. **State the deploy ordering contract.** For each migration, note which app version it requires and which it must remain compatible with: backward-compatible (expand) migrations run **before** the code that needs them; destructive (contract) migrations run **after** all code that used the old shape is fully rolled out and confirmed.
+> [!WARNING]
+> A single-transaction backfill (`UPDATE big_table SET ...` with no batching) holds row locks on every touched row until commit, bloats WAL, can deadlock with live traffic, and on failure rolls back hours of work. Always batch and commit; treat any unbounded `UPDATE`/`DELETE` on a large table as a production incident waiting to happen.
+> [!WARNING]
+> Type changes that rewrite the table (`ALTER COLUMN ... TYPE` between incompatible types, e.g. `int → bigint` on older Postgres) take an `ACCESS EXCLUSIVE` lock and block all reads and writes for the duration. Prefer expand-contract: add a new column of the target type, backfill, switch over, drop the old — never an in-place rewrite on a hot table.
+> [!NOTE]
+> Don't take `ACCESS EXCLUSIVE` DDL with `lock_timeout = 0`. Set a short `lock_timeout` (e.g. `5s`) so a migration that can't grab its lock fails fast and retries, instead of queueing behind a long query and stalling every write that piles up behind it.
+## Output
+For the requested change, produce:
+- **The `up` and `down` migration** — split into separate files per expand-contract phase, with `CONCURRENTLY` / `NOT VALID` / `VALIDATE` used where they avoid blocking locks.
+- **The backfill + rollout sequence** — the ordered phases (add → dual-write → backfill → switch reads → stop old writes → drop), each tagged with the app deploy it pairs with, and the batched backfill loop as a separate step.
+- **Locking & risk notes** — for each statement: the lock it takes, whether it blocks reads/writes, whether it rewrites the table, and whether the `down` is lossless — with destructive/irreversible steps called out explicitly.

package/content/skills/prompt-regression-tester.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: "prompt-regression-tester"
+description: "Build a regression test harness for an LLM prompt so a prompt edit or model upgrade can't silently degrade quality — a fixed eval set, checkable assertions, and a diff against a committed baseline. Use when changing a production prompt, migrating model versions, or any time 'I tweaked the prompt' needs to be backed by evidence instead of eyeballing two outputs."
+allowed-tools: "Read, Grep, Glob, Bash, Write"
+version: 1.0.0
+---
+A prompt edit that "looks fine on a couple of examples" is the single most common way teams ship a quality regression. The fix is not heroic — it's a fixed eval set, assertions a machine can check, and a baseline you diff against. This skill builds that harness so any prompt change or model migration produces a pass/fail + regression report instead of a vibe.
+## When to use this skill
+- You're about to edit a production prompt and want proof the edit doesn't regress existing behavior.
+- You're migrating models (e.g. `claude-opus-4-7` → `claude-opus-4-8`, or onto a new provider) and need to confirm output quality holds.
+- A prompt regressed in the past and you want a committed test that would have caught it.
+- You're standing up a new prompt and want defensible coverage before it goes live.
+- Someone "improved" a prompt and you need to confirm it actually improved rather than traded one failure for another.
+## Instructions
+1. **Find the prompt and its call site first.** `Grep` for the system/user prompt text, the model ID (`claude-`, `gpt-`, `model=`, `model:`), and the SDK call (`messages.create`, `chat.completions`, etc.). Pin down exactly what's under test: the prompt string, the model ID, and every generation parameter (`effort`/`temperature`/`max_tokens`/`tools`). The harness must reproduce that call exactly — a regression test that uses different params than production tests the wrong thing.
+2. **Assemble a FIXED eval set — and freeze it.** Collect 15–40 representative inputs into a version-controlled file (`evals/cases.jsonl`, one JSON object per line: `{id, input, ...}`). Include, deliberately: the happy-path cases, the boundary cases, and — most importantly — every input that has *previously failed* (past bug reports, support tickets, the example that motivated the last prompt edit). The set is an asset; it only grows by deliberate commit, never shrinks or mutates per run.
+   > [!WARNING]
+   > An eval set that changes between runs is not a regression test — it's a fresh experiment each time, and the diff against baseline is meaningless. Do not generate inputs on the fly, sample randomly, or pull "the latest N production logs" at runtime. Commit the inputs.
+3. **Write checkable assertions per case — prefer deterministic over judged.** For each input, specify what "correct" means as machine-checkable expectations, picking the *tightest* check that fits:
+   - `exact` — output equals a string (classification labels, enum values).
+   - `contains` / `not_contains` — a required substring is present / a forbidden phrase (a leaked secret, a banned disclaimer, "As an AI") is absent.
+   - `regex` — a format pattern (an order ID shape, a date, a currency string).
+   - `json_schema` — output parses as JSON and validates against a schema (the highest-value check for structured-output prompts; catches missing fields, wrong types, extra keys).
+   - `structural` — list length, required keys present, sorted order, no duplicates.
+   Store assertions alongside each case. A single case may carry several. **Reserve an LLM-as-judge only for qualities no code can decide** — tone, helpfulness, faithfulness to a source, "did it refuse appropriately" — and even then express the judge as a rubric with a discrete verdict (`pass`/`fail` or a 1–5 score with a threshold), not a freeform "rate this."
+4. **Default to the latest, most capable model for both the system-under-test and the judge.** Use `claude-opus-4-8` (current most-capable Opus; 1M context, adaptive thinking) unless the prompt's production config pins a different model — in which case test *that* model and add `claude-opus-4-8` as a comparison column. For the judge, also default to `claude-opus-4-8`: a weak judge is a noisy judge. Use adaptive thinking (`thinking: {type: "adaptive"}`) for the judge; do not send `temperature`/`top_p`/`budget_tokens` (removed on Opus 4.7/4.8 — they 400). Keep the judge model **separate and named** in the report so a judge swap is never confused with a quality change.
+5. **Run the set across each variant under test and score every case.** A "variant" is a (prompt, model, params) tuple — typically `baseline` (current production) vs `candidate` (your edit). Iterate the frozen cases, call the real SDK with the exact production config, run each assertion, and record a per-case result: `pass`/`fail`, which assertion failed, and the raw output. Run candidate and baseline in the same invocation so they see identical inputs. Cache or store raw outputs so a re-score (e.g. after fixing an assertion) doesn't require re-generating.
+6. **Diff against the committed baseline and flag regressions.** Snapshot the baseline results to `evals/baseline.json` (committed). On each run, compare candidate-vs-baseline per case and classify:
+   - **Regression** — passed in baseline, fails now. (The line that blocks the PR.)
+   - **Fix** — failed in baseline, passes now. (The intended win — confirm it's real, not luck.)
+   - **Unchanged pass / unchanged fail** — note but don't alarm.
+   > [!WARNING]
+   > Don't treat "candidate pass-rate ≥ baseline pass-rate" as green. Aggregate rate hides swaps — a candidate can fix two cases and break two others for the same headline number. The per-case regression list is the signal; the aggregate is a summary, not the gate.
+7. **Make the baseline a deliberate, reviewed artifact.** Update `evals/baseline.json` only via an explicit "accept" step (a `--update-baseline` flag or a separate command), and commit it in the same PR as the prompt change so a reviewer sees both the new prompt and exactly which case results moved. Never let the harness silently rewrite the baseline on every run — that's how a regression gets absorbed into "the new normal" unnoticed.
+   > [!NOTE]
+   > LLM outputs vary run-to-run even at fixed settings, so a single judged case flipping isn't necessarily a real regression. For judge-scored cases, run N=3 and require a majority verdict before flagging; for deterministic assertions, one failure is one failure — no repetition needed.
+## Output
+The skill produces a committed, runnable harness:
+- **Layout** —
+  ```
+  evals/
+    cases.jsonl        # frozen inputs + per-case assertions (version-controlled)
+    baseline.json      # accepted baseline results (version-controlled)
+    run.(py|ts)        # loads cases, runs each variant, scores, diffs
+    schema/*.json      # json_schema assertions, if any
+  ```
+- **The assertion set** — each case lists its checks, e.g.
+  ```json
+  {"id": "refund-001", "input": "...", "assert": [
+    {"type": "json_schema", "schema": "schema/refund.json"},
+    {"type": "not_contains", "value": "As an AI"},
+    {"type": "judge", "rubric": "Output politely declines without admitting fault.", "threshold": "pass"}
+  ]}
+  ```
+- **A pass/fail + regression report** — printed and written to `evals/report.md`:
+  ```
+  Variant: candidate (prompt@HEAD, claude-opus-4-8)  vs  baseline (prompt@main, claude-opus-4-8)
+  Judge: claude-opus-4-8
+  42 cases | 39 pass / 3 fail   (baseline: 40 pass / 2 fail)
+  REGRESSIONS (2)  ← blocks merge
+    refund-001   json_schema: missing field "reason"
+    tone-014     judge(2/3): newly apologetic where baseline was neutral
+  FIXES (1)
+    parse-007    regex: now matches order-id format
+  Exit code: 1 (regressions present)
+  ```
+A non-zero exit on any regression makes the harness drop straight into CI. Report file paths back as absolute paths so the user can wire it up.

package/content/skills/rate-limiter-designer.md ADDED Viewed

@@ -0,0 +1,56 @@
+---
+name: "rate-limiter-designer"
+description: "Design and implement API rate limiting that actually holds under load — pick the algorithm (token bucket vs sliding-window-counter vs fixed window) and justify it, choose the limiting key and per-tier limits, use cross-instance atomic storage, and return standard 429 signals. Use when protecting an API from abuse or scrapers, enforcing per-tier quotas, or replacing an in-memory limiter that breaks behind multiple replicas."
+allowed-tools: "Read, Grep, Glob, Edit"
+version: 1.0.0
+---
+A rate limiter is only as correct as its storage and its atomicity. An in-memory counter behind three replicas enforces 3x the limit; a `GET` then `SET` without an atomic increment lets a burst of concurrent requests all read the same pre-increment value and pass. This skill makes the decisions explicit — algorithm, key, limits, storage, failure mode — and produces an implementation sketch that survives horizontal scaling and concurrency.
+## When to use this skill
+- You're protecting an API (public, partner, or internal) from abuse, scrapers, credential stuffing, or runaway clients.
+- You need per-tier quotas (free vs pro vs enterprise) or per-endpoint limits (cheap reads vs expensive writes/exports).
+- You have an existing in-memory limiter and the service now runs more than one instance, so the effective limit drifts with replica count.
+- A downstream dependency (a paid API, a database, an LLM provider) needs protecting from your own traffic spikes.
+## Instructions
+1. **Pick the algorithm from the traffic shape, and justify it.** Three viable choices:
+   - **Token bucket** — refills at a steady rate, allows configurable bursts up to bucket capacity. Use for interactive/bursty clients (a user clicking fast, batch jobs) where occasional bursts are legitimate. Default choice for most APIs.
+   - **Sliding-window counter** — approximates a true sliding window by weighting the previous and current fixed windows. Use when you need *smooth* enforcement without burst spikes (protecting a fragile downstream). Cheap: two counters per key.
+   - **Fixed window** — one counter per key per interval. Use *only* when simplicity outweighs correctness; it permits up to **2x the limit** across a window boundary (full quota at the end of window N plus full quota at the start of N+1). Never use it to protect something that genuinely caps at N.
+   State which you chose and the burst/smoothness tradeoff that drove it.
+2. **Choose the limiting key — and prefer a composite.** Options and their failure modes:
+   - **IP** — defeated by NAT (one office shares an IP → collateral throttling) and by rotating proxies. Use only for unauthenticated traffic.
+   - **API key / authenticated user** — the right granularity for quotas; ties the limit to identity, not network. Requires the limiter to run *after* auth.
+   - **Composite** (e.g. `user + endpoint`, or `apiKey + route-class`) — lets expensive endpoints have tighter limits than cheap ones under the same identity.
+   Pick the key per route class. Unauthenticated routes fall back to IP; authenticated routes key on identity.
+3. **Set limits per tier, written down as a table.** Define explicit numbers: e.g. free = 60 req/min, pro = 600 req/min, enterprise = custom; expensive endpoints (export, search, LLM-backed) get their own lower limit. Don't invent one global number — the whole point is differentiation.
+4. **Use storage that is shared and atomic.** The counter must live in a store all instances reach — **Redis** (or equivalent) — and the increment-and-check must be **atomic**. With Redis, use `INCR` + `EXPIRE` on the same key (or a single Lua script for token bucket, so read-refill-decrement is one atomic operation). A `GET` then `SET` from application code is a race: concurrent requests read the same value and all pass. In-memory (`Map`, an LRU) is correct only for a single-process service and is otherwise a silent bug — each replica keeps its own private quota.
+5. **Return standard signals.** On limit exceeded, respond **`429 Too Many Requests`** with:
+   - `Retry-After: <seconds>` — when the client may retry.
+   - `RateLimit-Limit`, `RateLimit-Remaining`, `RateLimit-Reset` — emit these on *every* response (not just 429s) so well-behaved clients self-throttle before hitting the wall. Reset is seconds-until-reset (or a Unix timestamp — be consistent and document which).
+6. **Decide fail-open vs fail-closed when the store is down.** This is a deliberate choice, not a default:
+   - **Fail-open** (allow when Redis is unreachable) — preserves availability; correct for limiters that protect against *abuse* where a brief gap is acceptable.
+   - **Fail-closed** (reject) — correct when the limit guards a hard resource cap (a paid downstream, a quota you're contractually bound to). Wrap the store call in a short timeout so a slow store doesn't hang every request; on timeout, apply the chosen policy.
+7. **Handle clock skew and bursts.** Compute windows from the **store's clock** (e.g. Redis `TIME`) or a single source, not each instance's wall clock — skewed instances otherwise disagree on window boundaries. For token bucket, set capacity = the largest legitimate burst and refill rate = the sustained limit; document both.
+> [!WARNING]
+> Per-instance in-memory limiting in a horizontally-scaled deploy is the most common rate-limiter bug: with N replicas and a round-robin load balancer, the effective limit is roughly N x the configured value, and it changes silently when you autoscale. If the service has more than one replica, the limiter state MUST be in shared storage.
+> [!WARNING]
+> Read-then-write without atomicity defeats the limiter under exactly the load it exists to stop. Concurrent requests all read the pre-increment count and all pass. Use an atomic `INCR` (fixed/sliding window) or a single Lua script (token bucket) — never `GET` then conditional `SET` from app code.
+> [!NOTE]
+> Don't rate-limit at the app when an upstream layer does it better. A CDN/WAF or API gateway (Vercel Firewall, Cloudflare, Kong) can enforce coarse IP limits at the edge before traffic reaches your origin; reserve app-level limiting for identity- and tier-aware quotas that need request context.
+## Output
+A short design block stating: the chosen **algorithm** + rationale, the **key** per route class, a **per-tier limits table**, the **storage** mechanism (and why it's atomic + cross-instance), and the **fail-open/closed** policy with timeout. Followed by a concrete middleware/handler sketch that performs the atomic increment-and-check against the store, sets `RateLimit-*` headers on every response, returns `429` + `Retry-After` on breach, and applies the chosen failure policy when the store is unreachable.

package/content/skills/react-render-profiler.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: "react-render-profiler"
+description: "Find and fix wasteful React re-renders by classifying the cause — unstable prop/callback/object identities, context value churn, state lifted too high, expensive work in render, or unvirtualized lists — confirming it with a measurement, then applying the one targeted fix and re-measuring. Use when a React UI is janky, slow to type in, or re-renders far more than the data actually changed."
+allowed-tools: "Read, Grep, Glob, Edit, Bash"
+version: 1.0.0
+---
+A janky React UI is almost always re-rendering more than the data changed — and the reflex fix, wrapping everything in `useMemo`/`memo`, usually adds cost and complexity without helping, because it doesn't address *why* the component re-rendered. This skill makes the work diagnostic: name the cause class, prove it with a measurement, apply exactly one matching fix, and re-measure. No blind memoization.
+## When to use this skill
+- Typing in an input is laggy, or interacting with one widget visibly re-renders unrelated parts of the page.
+- The React DevTools Profiler shows a component (or a whole subtree) committing on interactions that shouldn't touch it.
+- A list or table with hundreds of rows stutters on scroll, filter, or keystroke.
+- A `useEffect`/`useMemo` runs every render even though its inputs "look" the same.
+- You're tempted to sprinkle `memo`/`useCallback` and want to confirm where they actually pay off first.
+## Instructions
+1. **Measure before you touch code.** Open React DevTools → Profiler, record the slow interaction, and read the flamegraph: which components committed, how many times, and why (enable "Record why each component rendered"). For a sharper signal on a specific component, wire up `@welldone-software/why-did-you-render` in dev and check the console for which prop/state changed identity. Do not edit anything until you have a named culprit and a render count.
+2. **Classify the cause — pick exactly one per culprit.** (a) *Unstable identity*: an object/array/function literal created in the parent's render and passed as a prop, so a `memo`'d child or an effect dep changes every render. (b) *Context churn*: a context Provider whose `value={{...}}` is a fresh object each render, re-rendering every consumer. (c) *State too high*: state lives in an ancestor, so a localized change re-renders a large subtree. (d) *Expensive render work*: heavy compute (sorting/formatting/parsing) runs inline in render. (e) *Unvirtualized long list*: hundreds/thousands of DOM rows all committing.
+3. **Fix (c) by moving state, not memoizing.** If a keystroke or toggle re-renders a big subtree, *colocate* the state into the smallest component that uses it, or *lift it down* into a child. Moving state is the cheapest, most durable fix and often deletes the need for any `memo` at all — try this before reaching for memoization.
+4. **Fix (a) by stabilizing identity at the source.** Wrap callbacks passed to memoized children in `useCallback`, and derived objects/arrays in `useMemo`, with honest dependency arrays. This only helps if the *child is memoized* (`React.memo`) or the value is an *effect/memo dependency* — stabilizing a prop to an unmemoized child does nothing.
+5. **Fix (b) by splitting or memoizing context.** Memoize the Provider `value` with `useMemo`, and split a single fat context into separate contexts (e.g. state vs. dispatch, or per-concern) so a consumer only re-renders when the slice it reads changes.
+6. **Fix (d) by memoizing the computation or moving it out.** Wrap the expensive calculation in `useMemo` keyed on its real inputs, or hoist it out of render (precompute, server-side, or `useDeferredValue` for low-priority work). Memoize the *work*, not the component.
+7. **Fix (e) by virtualizing.** Render only visible rows with `@tanstack/react-virtual` (or `react-window`); `memo` on the row component matters here because virtualization recycles rows.
+8. **Re-measure and report the delta.** Re-record the same interaction in the Profiler and capture the new render count per culprit. If the count didn't drop, you classified the cause wrong — revert the change (don't leave a `memo` that bought nothing) and go back to step 2.
+> [!WARNING]
+> Blanket memoization is a regression, not a fix. `memo`/`useMemo`/`useCallback` each cost a comparison and retained memory every render, add dependency-array bugs, and break the moment one prop's identity still churns. Never add them without a Profiler reading showing they remove a real render — and when the true cause is class (c), *moving state deletes the problem* while memoization only masks it.
+> [!NOTE]
+> `React.memo` compares props shallowly, so it is *defeated* by a single unstable prop (an inline `style={{...}}`, `onClick={() => ...}`, or `data={[...]}`). A `memo`'d child that still re-renders on every parent commit is the signature of an unstable-identity prop (cause a) — not a reason to remove the `memo`.
+## Output
+Per culprit: the component name, the **measured** cause class with the evidence (Profiler "why it rendered" reason or why-did-you-render line), the single targeted fix as an `Edit` diff, and **before/after render counts** for the same recorded interaction. End with a one-line verdict per fix (kept / reverted-no-effect) so no no-op memoization is left behind.

package/content/skills/semver-advisor.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: "semver-advisor"
+description: "Decide the correct semantic-version bump — major, minor, or patch — by diffing a release range, mapping the changes onto the public API surface, and classifying each as breaking, additive, or a fix. Use before cutting a release when you are unsure whether changes are breaking, when a teammate proposes a bump you want to sanity-check, or when a behavior change has no signature change and you need to know if it is still breaking."
+allowed-tools: "Read, Grep, Glob, Bash"
+version: 1.0.0
+---
+The wrong call here is silent until a consumer's build breaks. "It compiled, ship a minor" is how breaking changes escape — a tightened validation rule, a changed default, or a removed export looks small in the diff but breaks every downstream caller. This skill makes the bump a defensible decision: it pins down what your *public API surface* actually is, diffs the release range against it, classifies each change, and applies the SemVer rules — including the pre-1.0 exception people forget.
+## When to use this skill
+- You are about to tag a release and are unsure whether the changes are breaking.
+- Someone proposed `minor` or `patch` and you want to verify it against the real diff.
+- You changed behavior without changing a signature and need to know if that is still breaking (it often is).
+- You maintain a `0.x` library and keep forgetting that SemVer treats pre-1.0 differently.
+- A CI/release gate failed on version mismatch and you need the correct bump with a rationale.
+## Instructions
+1. **Define the public API surface first — it is narrower or wider than you think.** Enumerate every contract a consumer can depend on, not just the language exports:
+   - **Code exports**: the package entry points (`exports`/`main` in `package.json`, `__all__`, `pub`/`public` symbols). Anything reachable from the documented entry point is public; deep imports into internal paths usually are not (unless your docs/`exports` map expose them).
+   - **CLI**: command names, flags, positional args, env vars they read, and exit codes.
+   - **Config**: accepted keys, their types, defaults, and required-ness in config files / schema.
+   - **Wire contracts**: HTTP routes, request/response shapes, status codes, GraphQL schema, event/message payloads.
+   - **File formats**: on-disk formats you read or write, serialization versions, migration outputs.
+   ```bash
+   # entry points and public surface clues
+   git show HEAD:package.json | grep -E '"(main|module|types|exports|bin)"' -A3
+   grep -rEn '__all__|^export (default |const |function |class |\{)' src | head -50
+   ```
+2. **Diff the release range, scoped to the surface.** Use the last released tag as the lower bound; review only files that touch the surface from step 1.
+   ```bash
+   LAST_TAG=$(git describe --tags --abbrev=0)
+   git diff "$LAST_TAG"..HEAD --stat
+   git diff "$LAST_TAG"..HEAD -- <surface paths: src/index.*, cli/, openapi.*, *.schema.json>
+   ```
+3. **Classify each surface change into exactly one bucket.**
+   - **Breaking** (forces major): removed or renamed export/flag/route/config key; changed function signature, required arg added, or narrowed/changed return type; changed *default behavior* a consumer relied on; stricter validation that rejects previously-valid input; changed error type/exit code/status code; removed config default; changed file-format output that old readers can't parse.
+   - **Additive** (minor): new export, flag, optional config key with a safe default, new route, new optional response field — all 100% backward compatible.
+   - **Fix** (patch): bug fix that restores documented behavior with no API change, internal refactor, perf, docs, deps that don't change the public contract.
+4. **Apply the rule, then handle the pre-1.0 caveat.** Take the highest-severity bucket present: any breaking → **major**; else any additive → **minor**; else **patch**. Then check the current version:
+   - **`>= 1.0.0`**: apply the rule directly.
+   - **`0.y.z` (pre-1.0)**: SemVer special-cases this. A breaking change bumps the **minor** (`0.y` → `0.(y+1)`), and additive/fix changes bump the **patch**. State explicitly that you are using pre-1.0 semantics.
+5. **Re-check every "no signature change" item before finalizing.** A change with an identical signature can still be breaking — search the diff for default-value changes, validation tightening, altered side effects, and changed return *values* (not just types). These are the ones that get mislabeled as patches.
+6. **Output the recommendation with receipts.** Give the bump, the resulting version number, the one-line rule that decided it, and the itemized changes per bucket — with each breaking change named explicitly so a reviewer can challenge it.
+> [!WARNING]
+> A behavior change with an unchanged signature is still breaking. Tightening input validation, flipping a default (e.g. `cache: false` → `true`), changing rounding/sort order, or returning a different value for the same input all break consumers even though the API "didn't change." Grep the diff for changed literals and default arguments, not just modified declarations.
+> [!CAUTION]
+> Pre-1.0 SemVer is not "anything goes" but it is not the 1.0 rule either: breaking changes go in the **minor** slot (`0.4.x` → `0.5.0`), not the major. If you mechanically bump major for a `0.x` package you will jump to `1.0.0` and signal stability you didn't intend. Confirm the current version before recommending.
+## Output
+A bump recommendation, reproducible from the diff:
+```markdown
+## SemVer recommendation: MAJOR  (1.4.2 → 2.0.0)
+Rule applied: contains ≥1 breaking change → major (current version ≥ 1.0.0).
+### Breaking (forces major)
+- Removed export `parseLegacy()` from package entry — consumers importing it will fail to resolve.
+- `loadConfig()` now throws on unknown keys (was: ignored) — stricter validation rejects previously-valid config.
+- Default of `--timeout` changed 0 (infinite) → 30000ms — changes runtime behavior for callers relying on the old default.
+### Additive (would be minor on its own)
+- New optional flag `--format json`.
+### Fix (would be patch on its own)
+- Fixed off-by-one in `splitRange()` matching documented behavior.
+Note: if this were a 0.x package, the same set would be a MINOR bump (0.y → 0.(y+1)).
+```

package/content/skills/type-coverage-improver.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: "type-coverage-improver"
+description: "Raise TypeScript type strictness incrementally — measure the any/implicit-any baseline, enable one strict sub-flag at a time, and fix the fallout per flag instead of all at once, keeping the typecheck green at every step. Use when a codebase is loosely typed, when you want strict mode on without a big-bang break, or when `any` keeps hiding bugs that surface in production."
+allowed-tools: "Read, Grep, Glob, Edit, Bash"
+version: 1.0.0
+---
+Turn on TypeScript strictness without a big-bang break. The skill measures where you stand (explicit `any`, implicit `any`, which strict flags are already on), then enables the strict family **one sub-flag at a time** — `noImplicitAny`, `strictNullChecks`, and the rest — fixing the fallout from each flag before touching the next. Every step ends with `tsc --noEmit` passing, so you ratchet coverage up monotonically instead of staring at 600 errors and rage-casting them away.
+## When to use this skill
+- The codebase runs with `strict: false` (or a partial strict config) and is littered with `any`, implicit `any` parameters, and unchecked nullables.
+- You want to reach `strict: true` but a single flip produces an unfixable wall of errors and a stalled PR.
+- `any` is masking real defects — `undefined is not a function`, missing-property crashes — that strict typing would have caught at compile time.
+> [!WARNING]
+> Do not "fix" strict errors with `any`-casts, `as` assertions, `@ts-ignore`, or `@ts-expect-error`. Those silence the exact diagnostic strict mode exists to surface — you ship the bug *and* the suppression. The only acceptable fixes are: a precise type, a real null/undefined narrow, or (rarely) a documented `// @ts-expect-error` with a linked issue when the fix is genuinely a separate change. A PR whose net effect is "more suppressions" is negative progress.
+## Instructions
+1. **Measure the baseline before changing anything.** Read `tsconfig.json` and record which strict sub-flags are already set (`strict` implies `noImplicitAny`, `strictNullChecks`, `strictFunctionTypes`, `strictBindCallApply`, `strictPropertyInitialization`, `noImplicitThis`, `useUnknownInCatchVariables`, `alwaysStrict`). Then quantify the `any` surface:
+   ```bash
+   # explicit `any` annotations
+   grep -rIn -E ':\s*any\b|\bas any\b|<any>|Array<any>|any\[\]' --include='*.ts' --include='*.tsx' src | wc -l
+   # existing suppressions (these are debt you must not add to)
+   grep -rIn -E '@ts-ignore|@ts-expect-error' --include='*.ts' --include='*.tsx' src | wc -l
+   # implicit any + the full error count under the strictest config (dry run, no edits)
+   npx tsc --noEmit --strict --noErrorTruncation 2>&1 | grep -c 'error TS'
+   ```
+   If `type-coverage` is available, `npx type-coverage --detail` gives a single percentage and a per-identifier list — capture the starting number; it is your headline metric.
+2. **Order the work by risk and traffic, not alphabetically.** Use `git log --format= --name-only --since='6 months ago' | sort | uniq -c | sort -rn` to find churned files, and grep for the modules with the most `any` and the most importers (entry points, shared utils, API/DB boundaries). Fix these first: a precise type on a widely-imported util propagates correctness everywhere; an `any` at a data boundary (HTTP response, DB row, JSON parse) is where wrong-shape bugs originate.
+3. **Enable exactly one sub-flag at a time.** Add a single flag to `tsconfig.json` (`"noImplicitAny": true`), run `npx tsc --noEmit`, and fix only the errors that flag produces. Recommended order, easiest-to-hardest:
+   - `noImplicitAny` — annotate parameters/returns the compiler couldn't infer.
+   - `strictNullChecks` — the big one; surfaces every place `null`/`undefined` was silently allowed.
+   - `strictFunctionTypes`, `strictBindCallApply`, `noImplicitThis` — usually small fallout.
+   - `strictPropertyInitialization` — class fields; often the last and noisiest.
+   Once each flag is green individually, the final flip to `"strict": true` is a no-op verification.
+4. **Replace `any` with the real type, narrow at the boundary.** For explicit `any`: infer the actual shape from how the value is used and from the producer, and write the `interface`/`type`. For external/untyped data (`JSON.parse`, `fetch().json()`, env vars, dynamic imports), type the boundary as `unknown` and narrow with a type guard or a schema parse (e.g. `zod`'s `.parse()`) — `unknown` forces a check; `any` skips it. Add explicit return types to exported functions so inference errors surface at the definition, not three call sites away.
+5. **Keep the typecheck green at every commit.** After each flag's fallout is fixed, run the project's real check (`npm run typecheck` / `tsc --noEmit`) and the test suite, then commit that flag as its own commit. Never enable the next flag on a red tree — you lose the ability to attribute a new error to a specific flag, and the diff becomes unreviewable.
+6. **Re-measure and report the delta.** Re-run the baseline commands from step 1. Report the before/after `any` count, the `type-coverage` percentage delta, which flags are now on, and any honest residue: spots that genuinely need `unknown` + a follow-up, third-party `@types` gaps, or generated code excluded via `tsconfig` `exclude` rather than suppressed inline.
+> [!NOTE]
+> Don't refactor logic while fixing types. A type-only PR should change annotations, guards, and config — not behavior. If a strict error reveals a real bug (a nullable that was actually being dereferenced), fix it in a **separate** commit with a test, so reviewers can tell "added a type" apart from "changed runtime behavior."
+## Output
+1. **Baseline metrics** — current `tsconfig` strict flags, explicit-`any` count, suppression count, total error count under `--strict`, and `type-coverage` percentage if available.
+2. **An ordered flag-by-flag plan** — the sub-flags to enable in sequence, each with its estimated fallout count and the highest-priority files to fix first, e.g.:
+   | Step | Flag | Errors introduced | Fix-first files |
+   |------|------|-------------------|-----------------|
+   | 1 | `noImplicitAny` | 38 | `src/lib/api/client.ts`, `src/utils/parse.ts` |
+   | 2 | `strictNullChecks` | 142 | `src/db/repository.ts`, `src/lib/session.ts` |
+   | 3 | `strictPropertyInitialization` | 21 | `src/services/*.ts` |
+3. **Concrete type changes for the first file** — the actual diff: `any` → named types, added return annotations, and `unknown`-at-the-boundary guards, with `tsc --noEmit` shown passing afterward. For example:
+   ```diff
+   - export function parseUser(raw: any) {
+   -   return { id: raw.id, name: raw.name };
+   - }
+   + interface User { id: string; name: string }
+   + export function parseUser(raw: unknown): User {
+   +   if (typeof raw !== "object" || raw === null) throw new Error("invalid user");
+   +   const r = raw as Record<string, unknown>;
+   +   if (typeof r.id !== "string" || typeof r.name !== "string") throw new Error("invalid user");
+   +   return { id: r.id, name: r.name };
+   + }
+   ```
+   ```bash
+   $ npx tsc --noEmit
+   $   # exit 0 — clean
+   ```

package/content/skills/version-bumper.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: "version-bumper"
+description: "Bump the project version everywhere it lives in one consistent pass — package.json, lockfile, nested/CLI package manifests, version constants, README badges, docs — then roll the changelog's Unreleased section under the new version and stage an annotated git tag. Use when you've already decided the new version (X.Y.Z or a pre-release like -rc.1) and need every artifact updated to the same value without drift, or before cutting a release."
+allowed-tools: "Read, Edit, Bash"
+version: 1.0.0
+---
+Bumping a version is rarely one line. The number hides in `package.json`, a lockfile, a nested CLI or submodule manifest, a `__version__` constant, a README badge, and a docs install snippet — and any one you miss ships as drift. This skill finds every occurrence, sets them all to a single agreed value, rolls the changelog, and stages the tag. It never picks the version for you and never publishes without your say-so.
+## When to use this skill
+- You've decided the new version (e.g. `2.4.0`, or a pre-release `2.4.0-rc.1`) and need every artifact updated to match in one pass.
+- You're cutting a release and want the bump commit to be clean, atomic, and correctly tagged.
+- A previous bump left drift — `package.json` says one version, the lockfile or a badge says another — and you want them reconciled.
+- You maintain a monorepo or a repo with a bundled CLI sub-package whose versions and dependency ranges must move together.
+> [!NOTE]
+> This skill applies a version you've already chosen. If you haven't decided whether the change is major/minor/patch, run a semver analysis first (see `semver-advisor`) — getting the number right is out of scope here.
+## Instructions
+1. **Confirm the exact target version before touching anything.** Read the current version from the root `package.json` (or `pyproject.toml`, `Cargo.toml`, etc.). State the old → new transition explicitly and stop if the new value isn't strictly greater, or if it's malformed. A pre-release identifier (`-rc.1`, `-beta.2`, `-next.0`) is valid and must be carried verbatim into every artifact — do not silently drop it.
+2. **Find every place the version lives.** Don't assume — grep. The number leaks into more files than you expect:
+   ```bash
+   OLD="1.2.3"   # current version, escaped if it contains dots
+   grep -rnF "$OLD" \
+     --include='*.json' --include='*.toml' --include='*.md' \
+     --include='*.ts' --include='*.js' --include='*.py' --include='*.yml' --include='*.yaml' \
+     --exclude-dir=node_modules --exclude-dir=.git --exclude-dir=dist .
+   ```
+   Then triage each hit. Update version *declarations*; never blanket-replace — a `1.2.3` in changelog history or a test fixture must stay put.
+3. **Update the canonical manifest(s).** Edit the `version` field in the root `package.json`. For nested packages (a `cli/package.json`, workspace packages, a submodule manifest), update each one to the same value unless they version independently — confirm which model the repo uses before assuming lockstep.
+4. **Update the lockfile so it doesn't drift.** Editing `package.json` alone leaves `package-lock.json` (and the `packages[""].version` entry inside it) stale. Regenerate it deterministically rather than hand-editing:
+   ```bash
+   npm install --package-lock-only --ignore-scripts
+   ```
+   For `pnpm` use `pnpm install --lockfile-only`; for `yarn` run `yarn install --mode update-lockfile`.
+5. **Update version constants and human-facing references.** Catch the non-manifest spots the grep surfaced: a `VERSION` / `__version__` constant in source, a README shields.io badge (`version-1.2.3-` → `version-2.4.0-`), install snippets pinning `pkg@1.2.3`, and any `docs/` page that names the current version. Skip historical mentions (changelog entries, migration notes about old releases).
+6. **Roll the changelog.** Move everything under the `## [Unreleased]` heading into a new `## [X.Y.Z] — YYYY-MM-DD` section dated today, then leave `## [Unreleased]` empty above it. Update the link-reference footer if the changelog uses compare-URL refs (`[X.Y.Z]: …/compare/vOLD...vX.Y.Z` and a fresh `[Unreleased]: …/compare/vX.Y.Z...HEAD`).
+7. **For monorepos, keep interdependent versions and ranges consistent.** When package A depends on package B and both bump, update A's dependency range on B (e.g. `"@scope/b": "^2.4.0"`) so a consumer doesn't resolve a mismatched pair. Verify no `workspace:*` range was accidentally pinned to a literal.
+8. **Stage — do not run — the release commit and annotated tag.** Print the exact commands and wait for the user. The bump commit must land *before* the tag points at it; tag from the wrong commit and you've published a tag that doesn't match its tree.
+   ```bash
+   git add -A
+   git commit -m "chore(release): vX.Y.Z"
+   git tag -a vX.Y.Z -m "vX.Y.Z"
+   # push only when asked:  git push origin HEAD vX.Y.Z
+   ```
+> [!WARNING]
+> Never run `git tag` before the bump commit is committed — an annotated tag captures the commit it points to, so a premature tag will reference the *previous* state, and moving a published tag breaks anyone who already fetched it. Commit first, verify `git show HEAD --stat` contains the version edits, then tag.
+> [!WARNING]
+> A lockfile left at the old version is the single most common bump bug: CI installs, sees `package-lock.json` disagrees with `package.json`, and either fails or silently resolves the old version. Always regenerate the lockfile in the same commit as the manifest bump.
+## Output
+Two artifacts, both reviewable before anything is committed:
+1. **A change table** of every file touched, old → new:
+   | File | Old | New |
+   | --- | --- | --- |
+   | `package.json` | `1.2.3` | `2.4.0` |
+   | `package-lock.json` | `1.2.3` | `2.4.0` |
+   | `cli/package.json` | `1.2.3` | `2.4.0` |
+   | `src/version.ts` | `1.2.3` | `2.4.0` |
+   | `README.md` (badge) | `1.2.3` | `2.4.0` |
+   | `CHANGELOG.md` | Unreleased | `## [2.4.0] — 2026-06-17` |
+2. **The exact release commands**, ready to paste and run only on request:
+   ```bash
+   git add -A
+   git commit -m "chore(release): v2.4.0"
+   git tag -a v2.4.0 -m "v2.4.0"
+   # git push origin HEAD v2.4.0
+   ```
+   Plus a one-line note of anything skipped on purpose (historical version mentions left untouched) or anything that needs a human decision (a sub-package that may version independently).

package/content/skills/webhook-handler-scaffolder.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: "webhook-handler-scaffolder"
+description: "Scaffold a robust inbound webhook handler that verifies the signature on the raw body first, dedupes on the provider's event id, acknowledges fast, and processes asynchronously — the four things naive handlers get wrong. Use when wiring up events from a third party (Stripe, GitHub, Shopify, Slack, Twilio), when a provider keeps retrying because your endpoint times out or 500s, or when duplicate events are double-charging or double-creating records."
+allowed-tools: "Read, Grep, Glob, Edit, Write"
+version: 1.0.0
+---
+A webhook endpoint is untrusted input that arrives over the public internet, claims to be from your payment provider, and may be a duplicate, a replay, or out of order. Handlers written for the happy path — parse the JSON, do the work, return 200 — fail in exactly the ways that cost money: a forged request gets processed, a retried event double-charges a customer, or a slow database write makes the provider time out and retry, compounding the duplicate. This skill scaffolds the handler in the one shape that survives real delivery semantics: **verify → dedupe → persist → ack → process async**.
+## When to use this skill
+- Wiring up an inbound webhook from a third party (Stripe, GitHub, Shopify, Slack, Twilio, Square, GitLab) and you want it correct on the first commit.
+- A provider's dashboard shows repeated retries or your endpoint is flagged as failing because it times out or returns 5xx.
+- Duplicate events are causing double-charges, duplicate emails, or duplicate records, and you need idempotency retrofitted.
+- A security review flagged that the endpoint trusts the payload without verifying its origin.
+## Instructions
+1. **Identify the provider's contract before writing code.** Find, for the specific provider: the signature scheme (HMAC-SHA256 is typical), which header carries the signature (`Stripe-Signature`, `X-Hub-Signature-256`, `X-Shopify-Hmac-Sha256`), what string is actually signed (often `timestamp + "." + raw_body`, not the body alone), the stable event id field (`id`, `delivery` GUID), and the event-type field. Grep the codebase for an existing handler to match its framework and error conventions rather than introducing a new pattern.
+2. **Capture the raw body, then verify before anything else.** Read the request body as raw bytes/string and compute the HMAC over those exact bytes with the signing secret from config (never hardcoded). Compare using a **constant-time** comparison (`crypto.timingSafeEqual`, `hmac.compare_digest`) — never `==`, which leaks the secret via timing. If verification fails, return `401` and stop. Only after a valid signature do you `JSON.parse` the body.
+3. **Reject stale requests to block replay.** If the provider signs a timestamp, parse it and reject (`400`) when it is outside a tolerance window (Stripe uses 5 minutes). This stops a captured-and-replayed valid request from being reprocessed indefinitely.
+4. **Dedupe on the provider's event id.** Before doing any work, insert the event id into a store with a **unique constraint** (a `webhook_events` table, or Redis `SET key NX EX`). If the insert conflicts, the event was already received — return `200` immediately and do nothing else. Treat the provider's id as the idempotency key; never derive your own from payload contents (two distinct events can have identical bodies).
+5. **Persist the raw event, then acknowledge fast.** Store the raw body, headers, event id, and type with a `received_at` timestamp and `processed = false`. Return `2xx` as soon as the event is durably recorded — do not run business logic inline. Providers enforce short timeouts (Stripe ~10s, GitHub ~10s); slow synchronous work guarantees a timeout and a redundant retry.
+6. **Process asynchronously and idempotently.** Enqueue the stored event to a worker/queue (or a background task) that performs the real side effects, then marks `processed = true`. The worker must itself be safe to re-run, because the queue is also at-least-once: scope writes by the event id, use upserts, and never assume the event arrives in causal order (a `subscription.updated` can land before `subscription.created`) — reconcile from the payload's own state rather than from event sequence.
+7. **Define the failure path explicitly.** Decide what a `5xx` means (provider will retry) versus a `2xx` on a malformed-but-authentic event (you accept and dead-letter it instead of looping forever). Add structured logging keyed by event id and a dead-letter destination for events the worker can't process after N attempts.
+> [!WARNING]
+> Parsing the body before verifying the signature is the single most common webhook vulnerability — and the most common cause of "signature mismatch" bugs. Framework middleware that auto-parses JSON (Express `express.json()`, Next.js default body parsing) consumes the raw stream, so the bytes you sign over no longer match the bytes the provider sent. Reserve the raw body for the webhook route (e.g. `express.raw({ type: 'application/json' })`, `bodyParser: false`) before doing anything else.
+> [!NOTE]
+> Never treat delivery as exactly-once or ordered. Every major provider documents at-least-once delivery with retries, which means duplicates and reordering are normal operation, not edge cases. The unique-constraint dedupe in step 4 and the order-independent reconciliation in step 6 are what make that safe — without them the handler is correct only by luck.
+## Output
+- A complete webhook handler scaffold in the project's framework, structured as **verify (constant-time HMAC on raw body) → replay check → dedupe on event id → persist raw event → return 2xx → enqueue for async processing**, with the signature check, secret loading, and raw-body wiring filled in for the detected provider and the business logic left as a clearly marked TODO in the worker.
+- The idempotency and storage design: the `webhook_events` schema (event id with a unique constraint, raw payload, type, `received_at`, `processed`, attempt count), the dedupe-on-insert flow, and the dead-letter/retry policy.
+- A short note listing the provider-specific details that must be confirmed against its docs: signing secret location, signature header name, the exact signed string, event-id field, and the timeout/retry behavior the handler is built to satisfy.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentscamp",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "Install AgentsCamp agents, skills, and slash commands into Claude Code from your terminal.",
   "license": "MIT",
   "type": "module",