agentscamp 0.2.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -4
- package/content/agents/ci-cd-engineer.md +95 -0
- package/content/agents/cli-tooling-engineer.md +47 -0
- package/content/agents/context-engineer.md +68 -0
- package/content/agents/csharp-pro.md +73 -0
- package/content/agents/database-architect.md +90 -0
- package/content/agents/eval-driven-developer.md +47 -0
- package/content/agents/incident-responder.md +77 -0
- package/content/agents/java-pro.md +73 -0
- package/content/agents/qa-automation-engineer.md +92 -0
- package/content/commands/add-caching.md +79 -0
- package/content/commands/audit-accessibility.md +101 -0
- package/content/commands/clean-branches.md +113 -0
- package/content/commands/generate-e2e-test.md +98 -0
- package/content/commands/review-tests.md +98 -0
- package/content/commands/scaffold-dockerfile.md +111 -0
- package/content/commands/scaffold-github-action.md +94 -0
- package/content/commands/seed-data.md +63 -0
- package/content/commands/setup-precommit-hooks.md +72 -0
- package/content/commands/write-design-doc.md +78 -0
- package/content/manifest.json +436 -4
- package/content/skills/architecture-diagram-generator.md +78 -0
- package/content/skills/connection-pool-tuner.md +46 -0
- package/content/skills/dependency-upgrade-planner.md +42 -0
- package/content/skills/github-actions-optimizer.md +45 -0
- package/content/skills/load-test-designer.md +87 -0
- package/content/skills/memory-leak-hunter.md +35 -0
- package/content/skills/pagination-designer.md +51 -0
- package/content/skills/property-test-designer.md +63 -0
- package/content/skills/security-headers-hardener.md +79 -0
- package/content/skills/slo-definer.md +38 -0
- package/content/skills/structured-logging-designer.md +42 -0
- package/package.json +1 -1
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "slo-definer"
|
|
3
|
+
description: "Turn a vague reliability goal into concrete SLIs, SLOs, an error budget, and burn-rate alerts — service-level indicators measured at the user-facing boundary, targets over a rolling window, and a written policy for what happens when the budget runs out. Use when a service has no defined reliability target, when on-call is noisy and alert-fatigued, or before you commit to an SLA you can't measure."
|
|
4
|
+
allowed-tools: "Read, Grep, Glob"
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
"Make it reliable" can't be measured, can't be alerted on, and can't tell you when to stop shipping. This skill converts a reliability intention into four artifacts that can: **SLIs** that measure what users actually experience, **SLOs** that set a target over a window, an **error budget** with a written policy for spending and exhausting it, and **burn-rate alerts** that page when the budget is genuinely at risk. The output is a spec, not a dashboard — a contract the team and on-call can both point at.
|
|
9
|
+
|
|
10
|
+
## When to use this skill
|
|
11
|
+
|
|
12
|
+
- A service is "important" but has no defined reliability target, so nobody can say whether last week was good or bad.
|
|
13
|
+
- On-call is drowning in pages that don't correspond to user pain — alert fatigue from threshold blips on CPU, memory, or a single 5xx.
|
|
14
|
+
- You're about to sign an SLA and need an internal SLO (tighter, measurable) to back it before you promise anything externally.
|
|
15
|
+
- You have dashboards full of metrics but can't answer "are users having a good time right now, and how much room do we have left to break things?"
|
|
16
|
+
|
|
17
|
+
## Instructions
|
|
18
|
+
|
|
19
|
+
1. **Identify the user and the boundary first.** An SLI measures the experience of a consumer (end user, calling service) at a specific boundary — the load balancer, the API gateway, the client SDK. Measure as close to the user as you can: a 200 at the app server while the CDN returns 502s is a lie. Name the boundary explicitly before picking metrics.
|
|
20
|
+
2. **Pick the few SLIs that reflect that experience.** Choose from the request/response SLI families: **availability** (good-event ratio: non-5xx, non-timeout responses ÷ total valid requests), **latency** (fraction of requests served under a threshold at a percentile), and for data systems **freshness** (fraction of reads no older than N seconds) or **correctness/coverage**. Two or three SLIs per service is plenty — more dilutes the signal.
|
|
21
|
+
3. **Write each SLI as an explicit good-event criterion.** Spell out what counts as a good event, what's in the denominator, and what's excluded. Example: `latency SLI = (requests with TTFB < 300ms) / (all non-400 requests at the gateway)`. Exclude client errors (4xx) and load-test traffic from the denominator — they aren't the service failing — but say so in writing.
|
|
22
|
+
4. **Set the SLO as a target over a rolling window grounded in user need.** Format: "X% of [good events] over [rolling window]" — e.g. `99.9% of requests succeed over 28 days`. Use a **rolling** window (28 days is common) rather than calendar months so the number can't be gamed by a quiet week. Pick the lowest target users genuinely won't notice; if you can't justify the extra nine from user impact, don't pay for it.
|
|
23
|
+
5. **Derive the error budget and write its spend policy.** The budget is `1 − SLO` over the window: a 99.9% SLO allows 0.1% bad events — for 28 days that's ~40 minutes of total unavailability, or 0.1% of requests. State who may spend it (experiments, risky migrations, planned maintenance all draw down the same budget) and the **exhaustion rule in writing**: when the budget is gone, risky changes freeze and reliability work takes priority until the window recovers. A budget with no consequence is just a number.
|
|
24
|
+
6. **Tie alerts to burn rate, not to thresholds.** Alert on how fast the budget is being consumed relative to the window. Run two: a **fast-burn** alert (e.g. 14.4× burn over 1 hour = ~2% of a 28-day budget gone in an hour → page now) and a **slow-burn** alert (e.g. ~3× burn over 6 hours → ticket, not a page). This makes a page mean "the budget is at risk," with high precision and low noise, instead of "5xx crossed 5 for 30 seconds."
|
|
25
|
+
7. **Sanity-check against history before committing.** Read recent latency/error data (logs, metrics exports) and confirm the proposed SLO is currently *achievable* and *meaningful* — not already breached every week (unattainable, so it'll be ignored) and not trivially met with 100× headroom (no signal). Adjust the target to the real distribution.
|
|
26
|
+
|
|
27
|
+
> [!WARNING]
|
|
28
|
+
> A 100% SLO is a trap: it leaves zero error budget, so every deploy is a potential breach and the only "safe" move is to never change the system. The gap below 100% is precisely the room you have to ship, experiment, and do maintenance — design it in deliberately.
|
|
29
|
+
|
|
30
|
+
> [!WARNING]
|
|
31
|
+
> Averages hide the tail. A 200ms *average* latency is consistent with 5% of users waiting 4 seconds — and the tail is where users churn. Always state latency SLIs as a percentile (p95/p99 served under a threshold), never as a mean.
|
|
32
|
+
|
|
33
|
+
> [!NOTE]
|
|
34
|
+
> System metrics are not SLIs. CPU, memory, disk, and queue depth are *causes*, useful for debugging, but a user never files a ticket about your CPU. SLIs live at the user-facing boundary; keep host metrics on the diagnosis dashboard, out of the SLO spec.
|
|
35
|
+
|
|
36
|
+
## Output
|
|
37
|
+
|
|
38
|
+
A reliability spec containing: (1) **SLI definitions** — for each, what's measured, the boundary it's measured at, and the exact good-event criterion (numerator/denominator + exclusions); (2) **SLO targets** — the percentage and rolling window per SLI, with the user-impact rationale; (3) the **error budget** — `1 − SLO` translated into concrete allowance (minutes and/or request count over the window) plus the written spend-and-exhaustion policy; and (4) the **burn-rate alert thresholds** — fast-burn (page) and slow-burn (ticket) multipliers and look-back windows. Reproducible: the same spec can be re-derived and re-checked against fresh data each quarter.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "structured-logging-designer"
|
|
3
|
+
description: "Design a structured (JSON) logging strategy with a stable field schema, correlation-ID propagation, and a disciplined level policy — then migrate ad-hoc string logs toward it. Use when logs are unsearchable plain text, when debugging a request across services means grepping multiple log streams by hand, or when standing up logging for a new service."
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Edit"
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
A log line like `"user 42 failed to checkout"` answers nothing you can query: you can't filter by user, can't join it to the request that produced it, can't alert on it. Structured logging makes every line a queryable record — fields, not prose — so "show me every ERROR for tenant X in the last hour, with the request ID" is a query instead of a grep across five files. This skill designs that schema, threads a correlation ID through a request so a single flow is reconstructable across services, sets a level policy you can actually act on, and redacts secrets at the boundary — then rewrites representative statements so the team has a concrete pattern to copy.
|
|
9
|
+
|
|
10
|
+
## When to use this skill
|
|
11
|
+
|
|
12
|
+
- Logs are plain text and unsearchable — you grep for substrings instead of filtering on fields, and you can't build a dashboard or alert from them.
|
|
13
|
+
- Debugging one request means manually correlating timestamps across multiple services or log streams because nothing ties the lines together.
|
|
14
|
+
- Standing up logging for a new service and you want a defensible schema and level policy instead of scattered `print`/`console.log` calls.
|
|
15
|
+
- Log levels are meaningless (everything is INFO, or ERROR is used for expected conditions) so on-call alerts are noise and real failures hide.
|
|
16
|
+
|
|
17
|
+
## Instructions
|
|
18
|
+
|
|
19
|
+
1. **Emit one structured record per line with a stable schema.** Every log line is a JSON object with the same required fields: `timestamp` (ISO-8601 / RFC-3339, UTC), `level`, `message` (a short, *constant* string — the variable parts go in fields, not interpolated into the message), `service`, and `correlation_id`. A constant message is what lets you group and count: `{"message": "checkout failed", "user_id": 42, "reason": "card_declined"}` is countable; `"user 42 failed: card declined"` is not.
|
|
20
|
+
2. **Thread a correlation ID through every line of a request.** At the request entry point (HTTP middleware, queue consumer, RPC handler), read an incoming `X-Request-Id` / trace header or generate one, store it in a context-local (Go `context`, Node `AsyncLocalStorage`, Python `contextvars`, MDC in JVM), and have the logger attach it automatically to *every* line in that request — never pass it by hand. Propagate the same ID on outbound calls (set the header) so downstream services log it too. Reconstructing a flow then becomes `correlation_id = "abc123"` across all services.
|
|
21
|
+
3. **Define a level policy and enforce what each level means.** ERROR = something failed and a human needs to act or be alerted (unhandled exception, failed write, breached invariant) — never use it for expected conditions like a 404 or a validation rejection. WARN = suspicious but handled (retry succeeded, fell back, approaching a limit). INFO = key business events worth keeping in production (request completed, order placed, job finished). DEBUG = developer detail (intermediate values, branch taken), off in production. Write the policy down with one concrete example per level so reviewers can reject a misused level.
|
|
22
|
+
4. **Make the level runtime-configurable.** Read the threshold from an env var or config (`LOG_LEVEL=debug`) so you can raise verbosity for an incident without a redeploy, and run production at INFO. Where the logger supports it, allow per-module overrides (e.g. DEBUG for one noisy package) so you can zoom in without drowning in unrelated DEBUG output.
|
|
23
|
+
5. **Attach context as fields, never by string concatenation.** User, tenant, resource, and operation IDs are structured fields (`user_id`, `tenant_id`, `order_id`, `operation`), not substrings of `message`. Bind request-scoped context once (a child/bound logger carrying `tenant_id` and `correlation_id`) so every line in that scope inherits it without repeating it. This is what makes `tenant_id = "acme" AND level = "ERROR"` a one-line query.
|
|
24
|
+
6. **Redact secrets and PII at the logging boundary.** Maintain a deny-list of field names (`password`, `token`, `authorization`, `secret`, `api_key`, `ssn`, `card`, `cookie`, `set-cookie`) and a redaction hook in the logger that masks them *before serialization*, regardless of which call site logs them — do not rely on every developer remembering. Never log full request/response bodies or raw headers; log a content length, a hash, or an explicit allow-list of safe fields instead.
|
|
25
|
+
7. **Rewrite representative statements as before/after.** Pick the highest-traffic and highest-value sites — a request handler, an error path, an external-call wrapper — and rewrite each from string log to structured log so the team copies a real pattern, not a doc.
|
|
26
|
+
|
|
27
|
+
> [!WARNING]
|
|
28
|
+
> Logging a secret, token, or PII field is a breach the moment it lands in your log store — logs are widely replicated, retained, and read by people who'd never get database access. Redact at the boundary (step 6); do not trust call sites to remember.
|
|
29
|
+
|
|
30
|
+
> [!WARNING]
|
|
31
|
+
> Unbounded high-cardinality fields (raw URLs with query strings, full user-agent strings, per-request UUIDs as *indexed* fields) explode log-store cost and index size. Keep correlation IDs as plain fields, bucket or template high-cardinality values (`route_template = "/users/:id"`, not the literal path), and never put unbounded free text in a field your backend indexes.
|
|
32
|
+
|
|
33
|
+
> [!WARNING]
|
|
34
|
+
> A log call in a hot loop or per-row path can dominate latency — serialization, redaction, and I/O are not free. Guard DEBUG with the level check so it's skipped (not just discarded) in production, log aggregates instead of per-iteration lines, and sample very-high-frequency events rather than logging every one.
|
|
35
|
+
|
|
36
|
+
## Output
|
|
37
|
+
|
|
38
|
+
- **Log schema** — the required fields (`timestamp`, `level`, `message`, `service`, `correlation_id`) and the standard contextual fields (`user_id`, `tenant_id`, request/resource IDs) with types and an example record.
|
|
39
|
+
- **Correlation-ID propagation** — where the ID is created/read, how it's stored (context-local), how it's auto-attached to every line, and how it's propagated on outbound calls.
|
|
40
|
+
- **Level policy** — the meaning of ERROR/WARN/INFO/DEBUG with one concrete example each, plus the runtime config knob (`LOG_LEVEL`) and any per-module override.
|
|
41
|
+
- **Redaction rules** — the field deny-list, the boundary hook that applies it, and the body/header policy.
|
|
42
|
+
- **Before/after diffs** — representative log statements rewritten from string to structured, ready to copy across the codebase.
|