npm - agentscamp - Versions diffs - 0.3.0 → 0.5.0 - Mend

agentscamp 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/README.md +3 -3
package/content/commands/add-caching.md +79 -0
package/content/commands/audit-accessibility.md +101 -0
package/content/commands/clean-branches.md +113 -0
package/content/commands/review-tests.md +98 -0
package/content/commands/scaffold-github-action.md +94 -0
package/content/commands/setup-precommit-hooks.md +72 -0
package/content/commands/write-design-doc.md +78 -0
package/content/manifest.json +425 -3
package/content/skills/agent-trajectory-evaluator.md +59 -0
package/content/skills/alerting-rules-tuner.md +49 -0
package/content/skills/canary-release-planner.md +35 -0
package/content/skills/cold-start-optimizer.md +83 -0
package/content/skills/connection-pool-tuner.md +46 -0
package/content/skills/contract-test-designer.md +70 -0
package/content/skills/dependency-upgrade-planner.md +42 -0
package/content/skills/devcontainer-designer.md +40 -0
package/content/skills/distributed-tracing-instrumenter.md +42 -0
package/content/skills/idempotency-designer.md +47 -0
package/content/skills/memory-leak-hunter.md +35 -0
package/content/skills/mutation-test-runner.md +64 -0
package/content/skills/pagination-designer.md +51 -0
package/content/skills/property-test-designer.md +63 -0
package/content/skills/query-plan-analyzer.md +49 -0
package/content/skills/runbook-writer.md +83 -0
package/content/skills/security-headers-hardener.md +79 -0
package/content/skills/semantic-cache-designer.md +40 -0
package/content/skills/slo-definer.md +38 -0
package/content/skills/strangler-fig-migrator.md +47 -0
package/content/skills/structured-logging-designer.md +42 -0
package/content/skills/threat-model-builder.md +46 -0
package/content/skills/token-usage-profiler.md +39 -0
package/package.json +1 -1

package/content/manifest.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
   "schemaVersion": 1,
-  "generatedAt": "2026-06-18T01:57:52.358Z",
+  "generatedAt": "2026-06-18T02:36:19.351Z",
   "counts": {
     "agents": 58,
-    "skills": 52,
-    "commands": 43
+    "skills": 75,
+    "commands": 50
   },
   "items": [
     {
@@ -886,6 +886,20 @@
       "installAs": "agents/workflow-orchestrator.md",
       "url": "https://agentscamp.com/agents/meta-orchestration/workflow-orchestrator"
     },
+    {
+      "id": "commands/add-caching",
+      "type": "command",
+      "slug": "add-caching",
+      "category": "perf",
+      "title": "Add Caching",
+      "description": "Add a caching layer to one expensive function or endpoint correctly — confirm it's cacheable, design the cache key/TTL/layer/invalidation, handle stampedes, wrap the call in one place, and report the design.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "commands/add-caching.md",
+      "installAs": "commands/add-caching.md",
+      "url": "https://agentscamp.com/commands/perf/add-caching"
+    },
     {
       "id": "commands/add-docstrings",
       "type": "command",
@@ -946,6 +960,20 @@
       "installAs": "commands/add-streaming-endpoint.md",
       "url": "https://agentscamp.com/commands/scaffold/add-streaming-endpoint"
     },
+    {
+      "id": "commands/audit-accessibility",
+      "type": "command",
+      "slug": "audit-accessibility",
+      "category": "analyze",
+      "title": "Audit Accessibility",
+      "description": "Audit a component or page for accessibility against WCAG — semantics, names, keyboard, ARIA, contrast, forms, motion.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "commands/audit-accessibility.md",
+      "installAs": "commands/audit-accessibility.md",
+      "url": "https://agentscamp.com/commands/analyze/audit-accessibility"
+    },
     {
       "id": "commands/benchmark-rerankers",
       "type": "command",
@@ -975,6 +1003,20 @@
       "installAs": "commands/breakdown-task.md",
       "url": "https://agentscamp.com/commands/plan/breakdown-task"
     },
+    {
+      "id": "commands/clean-branches",
+      "type": "command",
+      "slug": "clean-branches",
+      "category": "git",
+      "title": "Clean Branches",
+      "description": "Safely prune merged and stale Git branches: drop dead remote-tracking refs, list merged candidates for review, then delete with the safe -d variant.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "commands/clean-branches.md",
+      "installAs": "commands/clean-branches.md",
+      "url": "https://agentscamp.com/commands/git/clean-branches"
+    },
     {
       "id": "commands/commit",
       "type": "command",
@@ -1317,6 +1359,20 @@
       "installAs": "commands/review-pr.md",
       "url": "https://agentscamp.com/commands/review/review-pr"
     },
+    {
+      "id": "commands/review-tests",
+      "type": "command",
+      "slug": "review-tests",
+      "category": "review",
+      "title": "Review Tests",
+      "description": "Review the quality of a test suite, not just whether it passes — find weak assertions, missing edge cases, and tests coupled to implementation.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "commands/review-tests.md",
+      "installAs": "commands/review-tests.md",
+      "url": "https://agentscamp.com/commands/review/review-tests"
+    },
     {
       "id": "commands/run-evals",
       "type": "command",
@@ -1346,6 +1402,20 @@
       "installAs": "commands/scaffold-dockerfile.md",
       "url": "https://agentscamp.com/commands/scaffold/scaffold-dockerfile"
     },
+    {
+      "id": "commands/scaffold-github-action",
+      "type": "command",
+      "slug": "scaffold-github-action",
+      "category": "scaffold",
+      "title": "Scaffold GitHub Action",
+      "description": "Scaffold a hardened GitHub Actions workflow for a stated goal, wired to the project's real test/lint/build commands.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "commands/scaffold-github-action.md",
+      "installAs": "commands/scaffold-github-action.md",
+      "url": "https://agentscamp.com/commands/scaffold/scaffold-github-action"
+    },
     {
       "id": "commands/scaffold-pgvector-schema",
       "type": "command",
@@ -1450,6 +1520,20 @@
       "installAs": "commands/setup-claude-ci.md",
       "url": "https://agentscamp.com/commands/workflow/setup-claude-ci"
     },
+    {
+      "id": "commands/setup-precommit-hooks",
+      "type": "command",
+      "slug": "setup-precommit-hooks",
+      "category": "workflow",
+      "title": "Setup Pre-commit Hooks",
+      "description": "Set up fast pre-commit hooks that catch problems before they land — detect the repo's existing stack and hook mechanism, run lint/format/typecheck plus a secret scan on staged files only, keep the slow test suite in CI, and make the setup reproducible for the whole team.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "commands/setup-precommit-hooks.md",
+      "installAs": "commands/setup-precommit-hooks.md",
+      "url": "https://agentscamp.com/commands/workflow/setup-precommit-hooks"
+    },
     {
       "id": "commands/sync-branch",
       "type": "command",
@@ -1493,6 +1577,21 @@
       "installAs": "commands/update-readme.md",
       "url": "https://agentscamp.com/commands/docs/update-readme"
     },
+    {
+      "id": "commands/write-design-doc",
+      "type": "command",
+      "slug": "write-design-doc",
+      "category": "plan",
+      "title": "Write Design Doc",
+      "description": "Explore the codebase and write a decision-oriented design doc / RFC for a feature or system change.",
+      "topics": [
+        "architecture",
+        "workflow-prompting"
+      ],
+      "file": "commands/write-design-doc.md",
+      "installAs": "commands/write-design-doc.md",
+      "url": "https://agentscamp.com/commands/plan/write-design-doc"
+    },
     {
       "id": "commands/write-tests",
       "type": "command",
@@ -1536,6 +1635,35 @@
       "installAs": "skills/agent-memory-designer/SKILL.md",
       "url": "https://agentscamp.com/skills/workflow/agent-memory-designer"
     },
+    {
+      "id": "skills/agent-trajectory-evaluator",
+      "type": "skill",
+      "slug": "agent-trajectory-evaluator",
+      "category": "data",
+      "title": "Agent Trajectory Evaluator",
+      "description": "Evaluate a multi-step AI agent's whole run — tool calls, intermediate steps, and final result — not just final-answer correctness, so you can pinpoint WHERE it went wrong. Use when building or debugging a tool-using or multi-step agent, when final-answer-only evals can't explain failures, or when a prompt/model change quietly makes the agent less efficient or more error-prone even though the answer still looks right.",
+      "topics": [
+        "llm-evals",
+        "ai-agents-systems"
+      ],
+      "file": "skills/agent-trajectory-evaluator.md",
+      "installAs": "skills/agent-trajectory-evaluator/SKILL.md",
+      "url": "https://agentscamp.com/skills/data/agent-trajectory-evaluator"
+    },
+    {
+      "id": "skills/alerting-rules-tuner",
+      "type": "skill",
+      "slug": "alerting-rules-tuner",
+      "category": "observability",
+      "title": "Alerting Rules Tuner",
+      "description": "Cut alert noise and make every page mean something — rewrite alerting rules to fire on user-felt symptoms (error rate, latency SLO burn, failed requests) instead of causes (high CPU, full disk), with duration windows and severity routing so only urgent, actionable conditions reach a human. Use when on-call is fatigued by low-value pages, when real incidents get missed in the noise, or when alerts fire on causes rather than impact.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/alerting-rules-tuner.md",
+      "installAs": "skills/alerting-rules-tuner/SKILL.md",
+      "url": "https://agentscamp.com/skills/observability/alerting-rules-tuner"
+    },
     {
       "id": "skills/architecture-diagram-generator",
       "type": "skill",
@@ -1592,6 +1720,20 @@
       "installAs": "skills/bundle-analyzer/SKILL.md",
       "url": "https://agentscamp.com/skills/performance/bundle-analyzer"
     },
+    {
+      "id": "skills/canary-release-planner",
+      "type": "skill",
+      "slug": "canary-release-planner",
+      "category": "release",
+      "title": "Canary Release Planner",
+      "description": "Design a canary / progressive rollout so a bad release reaches 1% of users instead of 100% — staged traffic with bake times, gating metrics compared against the concurrently-running stable baseline, and automated promote-or-rollback. Use when shipping a risky change, when you want automatic rollback on regression, or when moving off all-at-once deploys.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/canary-release-planner.md",
+      "installAs": "skills/canary-release-planner/SKILL.md",
+      "url": "https://agentscamp.com/skills/release/canary-release-planner"
+    },
     {
       "id": "skills/changelog-from-prs",
       "type": "skill",
@@ -1634,6 +1776,48 @@
       "installAs": "skills/claude-settings-auditor/SKILL.md",
       "url": "https://agentscamp.com/skills/workflow/claude-settings-auditor"
     },
+    {
+      "id": "skills/cold-start-optimizer",
+      "type": "skill",
+      "slug": "cold-start-optimizer",
+      "category": "performance",
+      "title": "Cold Start Optimizer",
+      "description": "Cut cold-start latency for serverless functions and slow-booting apps by measuring the init breakdown, then attacking the dominant phase — artifact size, eager imports, eager connections, or under-provisioned memory — instead of reflexively buying provisioned concurrency. Use when serverless p99 spikes on the first request, when a function times out during init, or when scale-to-zero is hurting user-facing latency.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/cold-start-optimizer.md",
+      "installAs": "skills/cold-start-optimizer/SKILL.md",
+      "url": "https://agentscamp.com/skills/performance/cold-start-optimizer"
+    },
+    {
+      "id": "skills/connection-pool-tuner",
+      "type": "skill",
+      "slug": "connection-pool-tuner",
+      "category": "database",
+      "title": "Connection Pool Tuner",
+      "description": "Size and tune a database connection pool from the real constraint — the database's shared max_connections and its core count — so total connections (per-instance pool × instance count) stay safely under the cap and a too-large pool stops adding latency. Use when the app throws 'too many connections' or pool-acquire timeouts, when the DB is saturated by connection count, or when deploying to serverless.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/connection-pool-tuner.md",
+      "installAs": "skills/connection-pool-tuner/SKILL.md",
+      "url": "https://agentscamp.com/skills/database/connection-pool-tuner"
+    },
+    {
+      "id": "skills/contract-test-designer",
+      "type": "skill",
+      "slug": "contract-test-designer",
+      "category": "testing",
+      "title": "Contract Test Designer",
+      "description": "Design consumer-driven contract tests between services so an API provider can't break its consumers unnoticed — without slow, flaky full end-to-end environments. Use when independent services or teams integrate over an API, when integration bugs only surface in staging or prod, or when E2E suites are too slow and brittle to catch breaking API changes.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/contract-test-designer.md",
+      "installAs": "skills/contract-test-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/testing/contract-test-designer"
+    },
     {
       "id": "skills/conventional-commits",
       "type": "skill",
@@ -1690,6 +1874,48 @@
       "installAs": "skills/dependency-audit/SKILL.md",
       "url": "https://agentscamp.com/skills/security/dependency-audit"
     },
+    {
+      "id": "skills/dependency-upgrade-planner",
+      "type": "skill",
+      "slug": "dependency-upgrade-planner",
+      "category": "refactor",
+      "title": "Dependency Upgrade Planner",
+      "description": "Plan and de-risk a major dependency, framework, or runtime upgrade — map the full version path, read every intermediate migration guide, and pin the breaking changes to your actual call sites instead of bumping the number and hoping. Use when a key dependency is several majors behind, when a security advisory forces an upgrade, or before a framework migration.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/dependency-upgrade-planner.md",
+      "installAs": "skills/dependency-upgrade-planner/SKILL.md",
+      "url": "https://agentscamp.com/skills/refactor/dependency-upgrade-planner"
+    },
+    {
+      "id": "skills/devcontainer-designer",
+      "type": "skill",
+      "slug": "devcontainer-designer",
+      "category": "workflow",
+      "title": "Dev Container Designer",
+      "description": "Design a reproducible dev environment (Dev Container / Docker) so onboarding is one command and 'works on my machine' dies — by detecting the project's real stack and versions, authoring a devcontainer.json (+ Dockerfile/compose) that pins the runtime to what the repo targets, wires dependent services, caches dependencies, and injects secrets instead of baking them. Use when new contributors struggle to set up the project, when environment drift causes inconsistent behavior, or when standardizing tooling across a team.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/devcontainer-designer.md",
+      "installAs": "skills/devcontainer-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/workflow/devcontainer-designer"
+    },
+    {
+      "id": "skills/distributed-tracing-instrumenter",
+      "type": "skill",
+      "slug": "distributed-tracing-instrumenter",
+      "category": "observability",
+      "title": "Distributed Tracing Instrumenter",
+      "description": "Instrument a service (or a chain of services) with OpenTelemetry so a single request can be followed end-to-end — context propagated across every hop including async/queue boundaries, spans at the boundaries that matter, deliberate trace-wide sampling, and trace_id stamped on log lines. Use when latency or failures span multiple services, when you have logs but can't reconstruct a request's full path, or when adopting OpenTelemetry.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/distributed-tracing-instrumenter.md",
+      "installAs": "skills/distributed-tracing-instrumenter/SKILL.md",
+      "url": "https://agentscamp.com/skills/observability/distributed-tracing-instrumenter"
+    },
     {
       "id": "skills/embedding-index-tuner",
       "type": "skill",
@@ -1804,6 +2030,20 @@
       "installAs": "skills/human-in-the-loop-gate/SKILL.md",
       "url": "https://agentscamp.com/skills/workflow/human-in-the-loop-gate"
     },
+    {
+      "id": "skills/idempotency-designer",
+      "type": "skill",
+      "slug": "idempotency-designer",
+      "category": "api",
+      "title": "Idempotency Designer",
+      "description": "Make unsafe, retryable API operations idempotent so a client retry or a network hiccup can't double-charge, double-create, or double-send — design a client-supplied idempotency key, an atomic store-and-check (unique constraint or conditional write), in-flight conflict handling, and a retention policy. Use when a POST/mutation can be retried (payments, order creation, sends, webhooks), or when duplicate side effects have already shown up in production.",
+      "topics": [
+        "architecture"
+      ],
+      "file": "skills/idempotency-designer.md",
+      "installAs": "skills/idempotency-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/api/idempotency-designer"
+    },
     {
       "id": "skills/llm-as-judge-scorer",
       "type": "skill",
@@ -1889,6 +2129,20 @@
       "installAs": "skills/mcp-server-scaffolder/SKILL.md",
       "url": "https://agentscamp.com/skills/api/mcp-server-scaffolder"
     },
+    {
+      "id": "skills/memory-leak-hunter",
+      "type": "skill",
+      "slug": "memory-leak-hunter",
+      "category": "performance",
+      "title": "Memory Leak Hunter",
+      "description": "Find and fix a memory leak in a running app: confirm it's a real leak under steady load, diff two heap snapshots to name the growing object and its retention path, cut the root reference that blocks collection, and re-run to confirm memory plateaus. Use when RSS climbs until OOM/restart, heap grows unbounded across a steady workload, or GC pauses worsen the longer the process runs.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/memory-leak-hunter.md",
+      "installAs": "skills/memory-leak-hunter/SKILL.md",
+      "url": "https://agentscamp.com/skills/performance/memory-leak-hunter"
+    },
     {
       "id": "skills/migration-writer",
       "type": "skill",
@@ -1932,6 +2186,20 @@
       "installAs": "skills/multimodal-document-extractor/SKILL.md",
       "url": "https://agentscamp.com/skills/data/multimodal-document-extractor"
     },
+    {
+      "id": "skills/mutation-test-runner",
+      "type": "skill",
+      "slug": "mutation-test-runner",
+      "category": "testing",
+      "title": "Mutation Test Runner",
+      "description": "Measure whether a test suite actually catches bugs by running mutation testing — introduce small faults into the code and check which ones a test kills versus which slip through silently. Use when line coverage is high but bugs still ship, when you suspect tests assert weakly, or to find the exact assertions a suite is missing.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/mutation-test-runner.md",
+      "installAs": "skills/mutation-test-runner/SKILL.md",
+      "url": "https://agentscamp.com/skills/testing/mutation-test-runner"
+    },
     {
       "id": "skills/openapi-doc-writer",
       "type": "skill",
@@ -1946,6 +2214,20 @@
       "installAs": "skills/openapi-doc-writer/SKILL.md",
       "url": "https://agentscamp.com/skills/docs/openapi-doc-writer"
     },
+    {
+      "id": "skills/pagination-designer",
+      "type": "skill",
+      "slug": "pagination-designer",
+      "category": "api",
+      "title": "Pagination Designer",
+      "description": "Design correct, scalable pagination (plus the filtering and sorting that ride with it) for a list endpoint — pick cursor (keyset) vs offset and justify it, define an opaque cursor with a unique tiebreaker so no row is skipped or repeated, return a consistent envelope, bound page size, and name the indexes the sort actually needs. Use when adding a list endpoint, when OFFSET pagination crawls on a large table, or when clients see duplicate or missing rows while paging.",
+      "topics": [
+        "architecture"
+      ],
+      "file": "skills/pagination-designer.md",
+      "installAs": "skills/pagination-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/api/pagination-designer"
+    },
     {
       "id": "skills/plugin-scaffolder",
       "type": "skill",
@@ -2045,6 +2327,20 @@
       "installAs": "skills/prompt-regression-tester/SKILL.md",
       "url": "https://agentscamp.com/skills/data/prompt-regression-tester"
     },
+    {
+      "id": "skills/property-test-designer",
+      "type": "skill",
+      "slug": "property-test-designer",
+      "category": "testing",
+      "title": "Property Test Designer",
+      "description": "Design property-based tests — generate hundreds of random inputs and assert invariants that must hold for ALL of them — to surface the edge cases hand-picked examples never reach. Use when code has a large input space (parsers, serializers, encoders, math, data transforms), when a bug keeps slipping through despite green example tests, or when you can't enumerate every case worth checking.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/property-test-designer.md",
+      "installAs": "skills/property-test-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/testing/property-test-designer"
+    },
     {
       "id": "skills/provider-fallback-wrapper",
       "type": "skill",
@@ -2073,6 +2369,20 @@
       "installAs": "skills/qlora-finetune-runner/SKILL.md",
       "url": "https://agentscamp.com/skills/data/qlora-finetune-runner"
     },
+    {
+      "id": "skills/query-plan-analyzer",
+      "type": "skill",
+      "slug": "query-plan-analyzer",
+      "category": "database",
+      "title": "Query Plan Analyzer",
+      "description": "Read a slow query's execution plan and turn it into a concrete fix — the exact index to add, the rewrite, or the ANALYZE to run — by getting the REAL plan with EXPLAIN ANALYZE (actual rows + timing, not estimates), finding the offending node, and confirming the fix removes it. Use when one specific query is slow and you need to know WHY, not just that it is.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/query-plan-analyzer.md",
+      "installAs": "skills/query-plan-analyzer/SKILL.md",
+      "url": "https://agentscamp.com/skills/database/query-plan-analyzer"
+    },
     {
       "id": "skills/rate-limiter-designer",
       "type": "skill",
@@ -2117,6 +2427,20 @@
       "installAs": "skills/readme-generator/SKILL.md",
       "url": "https://agentscamp.com/skills/docs/readme-generator"
     },
+    {
+      "id": "skills/runbook-writer",
+      "type": "skill",
+      "slug": "runbook-writer",
+      "category": "docs",
+      "title": "Runbook Writer",
+      "description": "Write an operational runbook a half-asleep on-call engineer can execute at 3am — scoped to ONE alert, leading with how to confirm the problem, the copy-pasteable mitigation that stops user pain, then diagnosis, escalation, and verification. Use when an alert has no documented response, after an incident exposed a missing procedure, or when standing up on-call for a service.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/runbook-writer.md",
+      "installAs": "skills/runbook-writer/SKILL.md",
+      "url": "https://agentscamp.com/skills/docs/runbook-writer"
+    },
     {
       "id": "skills/secret-scanner",
       "type": "skill",
@@ -2131,6 +2455,34 @@
       "installAs": "skills/secret-scanner/SKILL.md",
       "url": "https://agentscamp.com/skills/security/secret-scanner"
     },
+    {
+      "id": "skills/security-headers-hardener",
+      "type": "skill",
+      "slug": "security-headers-hardener",
+      "category": "security",
+      "title": "Security Headers Hardener",
+      "description": "Audit and harden a web app's or API's HTTP security headers — Content-Security-Policy, HSTS, X-Content-Type-Options, frame-ancestors, Referrer-Policy, Permissions-Policy, and CORS — and produce a staged rollout that won't break the site. Use before a launch, during a security pass, or when a scanner (Mozilla Observatory, securityheaders.com, a pentest) flags missing or weak headers. Audits and edits header config; rolls CSP out Report-Only first.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/security-headers-hardener.md",
+      "installAs": "skills/security-headers-hardener/SKILL.md",
+      "url": "https://agentscamp.com/skills/security/security-headers-hardener"
+    },
+    {
+      "id": "skills/semantic-cache-designer",
+      "type": "skill",
+      "slug": "semantic-cache-designer",
+      "category": "data",
+      "title": "Semantic Cache Designer",
+      "description": "Design a semantic cache for LLM responses — serve a cached answer when a new query is similar enough to a past one — to cut cost and latency on repetitive traffic, with the similarity threshold calibrated on real query pairs and a cache key that prevents cross-user/model leaks. Use when an LLM app sees many near-duplicate prompts (FAQs, support, search), when token spend on repetitive queries is high, or when latency on common questions matters.",
+      "topics": [
+        "llm-app-dev"
+      ],
+      "file": "skills/semantic-cache-designer.md",
+      "installAs": "skills/semantic-cache-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/data/semantic-cache-designer"
+    },
     {
       "id": "skills/semver-advisor",
       "type": "skill",
@@ -2145,6 +2497,20 @@
       "installAs": "skills/semver-advisor/SKILL.md",
       "url": "https://agentscamp.com/skills/release/semver-advisor"
     },
+    {
+      "id": "skills/slo-definer",
+      "type": "skill",
+      "slug": "slo-definer",
+      "category": "observability",
+      "title": "SLO Definer",
+      "description": "Turn a vague reliability goal into concrete SLIs, SLOs, an error budget, and burn-rate alerts — service-level indicators measured at the user-facing boundary, targets over a rolling window, and a written policy for what happens when the budget runs out. Use when a service has no defined reliability target, when on-call is noisy and alert-fatigued, or before you commit to an SLA you can't measure.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/slo-definer.md",
+      "installAs": "skills/slo-definer/SKILL.md",
+      "url": "https://agentscamp.com/skills/observability/slo-definer"
+    },
     {
       "id": "skills/sql-optimizer",
       "type": "skill",
@@ -2159,6 +2525,34 @@
       "installAs": "skills/sql-optimizer/SKILL.md",
       "url": "https://agentscamp.com/skills/data/sql-optimizer"
     },
+    {
+      "id": "skills/strangler-fig-migrator",
+      "type": "skill",
+      "slug": "strangler-fig-migrator",
+      "category": "refactor",
+      "title": "Strangler Fig Migrator",
+      "description": "Plan the incremental replacement of a legacy module or service using the strangler-fig pattern — grow new code around the old behind an interception seam until the old is dead, instead of a big-bang rewrite. Use when a legacy system is too risky to rewrite at once, or when migrating off a deprecated framework/dependency gradually while staying shippable and rollback-able at every step.",
+      "topics": [
+        "architecture"
+      ],
+      "file": "skills/strangler-fig-migrator.md",
+      "installAs": "skills/strangler-fig-migrator/SKILL.md",
+      "url": "https://agentscamp.com/skills/refactor/strangler-fig-migrator"
+    },
+    {
+      "id": "skills/structured-logging-designer",
+      "type": "skill",
+      "slug": "structured-logging-designer",
+      "category": "observability",
+      "title": "Structured Logging Designer",
+      "description": "Design a structured (JSON) logging strategy with a stable field schema, correlation-ID propagation, and a disciplined level policy — then migrate ad-hoc string logs toward it. Use when logs are unsearchable plain text, when debugging a request across services means grepping multiple log streams by hand, or when standing up logging for a new service.",
+      "topics": [
+        "devops-infra"
+      ],
+      "file": "skills/structured-logging-designer.md",
+      "installAs": "skills/structured-logging-designer/SKILL.md",
+      "url": "https://agentscamp.com/skills/observability/structured-logging-designer"
+    },
     {
       "id": "skills/test-scaffolder",
       "type": "skill",
@@ -2173,6 +2567,34 @@
       "installAs": "skills/test-scaffolder/SKILL.md",
       "url": "https://agentscamp.com/skills/testing/test-scaffolder"
     },
+    {
+      "id": "skills/threat-model-builder",
+      "type": "skill",
+      "slug": "threat-model-builder",
+      "category": "security",
+      "title": "Threat Model Builder",
+      "description": "Build a practical threat model for a feature or system using STRIDE — diagram the data flow, mark trust boundaries, enumerate concrete threats where data crosses them, and prioritize by likelihood × impact so security is reasoned about before shipping instead of bolted on after. Use when designing a feature that touches auth, money, or sensitive data, running a security design review, or hardening before a launch.",
+      "topics": [
+        "review-qa"
+      ],
+      "file": "skills/threat-model-builder.md",
+      "installAs": "skills/threat-model-builder/SKILL.md",
+      "url": "https://agentscamp.com/skills/security/threat-model-builder"
+    },
+    {
+      "id": "skills/token-usage-profiler",
+      "type": "skill",
+      "slug": "token-usage-profiler",
+      "category": "data",
+      "title": "Token Usage Profiler",
+      "description": "Measure and attribute LLM token usage and cost across an app — input vs output tokens by feature, route, model, and tenant — then rank the waste and the specific lever to cut it. Use when LLM spend is high or climbing with no clear cause, before scaling a feature that calls a model, or when you need per-feature or per-tenant cost attribution for billing or budgets.",
+      "topics": [
+        "llm-app-dev"
+      ],
+      "file": "skills/token-usage-profiler.md",
+      "installAs": "skills/token-usage-profiler/SKILL.md",
+      "url": "https://agentscamp.com/skills/data/token-usage-profiler"
+    },
     {
       "id": "skills/tool-definition-generator",
       "type": "skill",

package/content/skills/agent-trajectory-evaluator.md ADDED Viewed

@@ -0,0 +1,59 @@
+---
+name: "agent-trajectory-evaluator"
+description: "Evaluate a multi-step AI agent's whole run — tool calls, intermediate steps, and final result — not just final-answer correctness, so you can pinpoint WHERE it went wrong. Use when building or debugging a tool-using or multi-step agent, when final-answer-only evals can't explain failures, or when a prompt/model change quietly makes the agent less efficient or more error-prone even though the answer still looks right."
+allowed-tools: "Read, Grep, Glob, Bash"
+version: 1.0.0
+---
+Final-answer evals tell you the agent failed; they don't tell you *where*. An agent that returns the right number might have called the wrong tool first, looped on a flaky API, or stumbled into the answer through a path that collapses on the next input. This skill makes the agent's **process** inspectable: capture the full trajectory — every decision, tool call, argument, and result — then score it on the axes that actually predict failure, asserting what's checkable and judging only what isn't.
+## When to use this skill
+- You're building or debugging a tool-using / multi-step agent and a final-answer eval says "wrong" without saying why.
+- A prompt or model change kept the answers correct but you suspect the agent got slower, looped more, or recovers worse — and you need to prove it.
+- You're adding a new tool and want to confirm the agent selects it correctly instead of brute-forcing with the old one.
+- Failures are intermittent and you can't tell whether the agent is fragile (lucky path) or robust (sound path).
+## Instructions
+1. **Capture the full trajectory as a structured, replayable log — one record per step.** Final-answer-only logging is the root cause of un-diagnosable failures. Each step records: the model's decision (the assistant turn, including thinking-block summaries if present), the tool called and its exact arguments, the raw tool result (success/error), and any externalized state (files written, working dir, retry count). Use a stable schema so two runs diff cleanly:
+   ```json
+   {"run_id": "...", "task_id": "...", "step": 3,
+    "decision": "call search_orders to find the open order",
+    "tool": "search_orders", "args": {"customer_id": "C-118", "status": "open"},
+    "result": {"ok": true, "rows": 2}, "is_error": false,
+    "latency_ms": 410, "state": {"retries": 0}}
+   ```
+   Pull this from your agent loop's tool-call records (or the Managed Agents event stream: `agent.tool_use` / `agent.tool_result` / `agent.custom_tool_use` events carry tool name, input, and result). Persist trajectories to disk so a baseline run is a diffable artifact, not a console scroll-by.
+2. **Build a fixed, version-controlled eval set of representative tasks — and deliberately include trap tasks.** A good set has three buckets: (a) routine tasks the agent should handle cleanly, (b) tasks that *require* tool use (the answer isn't in the prompt, so the agent must select and call the right tool), and (c) tasks engineered to trip a known failure mode — a tool that returns an error on the first call (does it recover?), an ambiguous request (does it loop?), a distractor tool that looks relevant but is wrong (does it mis-select?). Pin the set; an eval set that drifts can't catch regressions. Each task carries its expected trajectory assertions (next step).
+3. **Score every trajectory on five axes, not one.** Final-answer correctness is necessary but insufficient. For each task, evaluate:
+   - **Tool selection** — did it call the right tool for each sub-goal? (mis-selection often produces a right answer via a wrong, slow path)
+   - **Argument correctness** — were the tool arguments right? (a `status: "open"` typo'd to `status: "all"` can still return the target row by luck)
+   - **Step efficiency** — did it stay within a step budget, or did it repeat calls, loop, or take a needless detour? Measure against a per-task budget, not a global one.
+   - **Error recovery** — when a tool returned an error, did the agent recover sensibly (retry once, switch approach) or thrash / give up?
+   - **Goal completion** — did it actually finish the task, distinct from "the final text looks plausible"?
+4. **Split scoring into programmatic assertions and a narrow LLM-judge — assert everything you can.** An LLM-judge over a whole trajectory is noisy and expensive, and it will rationalize a broken path. So check the deterministic axes with code: exact tool-name assertions, argument equality (or schema match), and step-count budgets are all plain comparisons against the trajectory you captured.
+   ```python
+   tools = [s["tool"] for s in trajectory]
+   assert tools[0] == "search_orders", f"wrong first tool: {tools[0]}"
+   assert trajectory[0]["args"]["status"] == "open"
+   assert len(trajectory) <= task["step_budget"], f"{len(trajectory)} steps > budget"
+   assert not any(s["is_error"] for s in trajectory[-2:]), "ended on an error"
+   ```
+   Reserve the LLM-judge for the genuinely subjective steps only — "was this reasoning step sound given the prior result?", "was this summary faithful to the tool output?" — and judge **one step at a time** with the step's inputs in context, not the entire run. Default both the agent-under-test and the judge to the latest, most capable Claude model (`claude-opus-4-8`); use a *different* sample or framing for the judge so it isn't grading its own twin, and keep the judge's rubric to one criterion per call.
+5. **Diff every candidate trajectory against a stored baseline and report the regressions.** This is what catches the silent ones. After a prompt or model change, re-run the fixed eval set and compare trajectory-for-trajectory against the baseline: tools added/removed/reordered, argument changes, step-count delta, new error-recovery loops, latency delta. A change that keeps the final answer correct but adds two steps, introduces a retry loop, or swaps a precise tool for a brute-force one is a **regression** — surface it even though the answer still passes. Promote a candidate to the new baseline only when the diff is empty or every change is reviewed and intended.
+> [!WARNING]
+> Grading only the final answer hides process failures. An agent can reach the right answer through a path that is broken, expensive, or lucky — wrong tool, redundant loop, a crash it recovered from by chance — and that path will break on the very next input. The final answer being correct is *not* evidence the agent worked correctly.
+> [!WARNING]
+> An LLM-judge over a whole trajectory is noisy and tends to rationalize whatever path it sees. Assert the checkable steps — tool names, argument values, step counts — with code, and give the judge exactly one subjective step and one criterion at a time. A judge asked "was this whole run good?" will hand-wave; a judge asked "was *this* summary faithful to *this* tool output?" gives a usable signal.
+## Output
+- **Trajectory schema** — the per-step record (decision, tool, args, result, is_error, latency, state) and where each field comes from in your agent loop or event stream.
+- **Per-axis rubric** — the five axes (tool selection, argument correctness, step efficiency, error recovery, goal completion) with the concrete check for each task.
+- **Assertion-vs-judge split** — the deterministic assertions written as code, and the short list of subjective steps routed to a single-criterion LLM-judge (agent and judge both on `claude-opus-4-8`).
+- **Baseline-diff regression report** — a per-task diff of the candidate run against the stored baseline (tools reordered/added/removed, arg changes, step-count and latency deltas, new recovery loops), flagging every regression even where the final answer still passes, plus a verdict on whether to promote the candidate to baseline.