npm - agentscamp - Versions diffs - 0.3.0 → 0.4.0 - Mend

agentscamp 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +3 -3
package/content/commands/add-caching.md +79 -0
package/content/commands/audit-accessibility.md +101 -0
package/content/commands/clean-branches.md +113 -0
package/content/commands/review-tests.md +98 -0
package/content/commands/scaffold-github-action.md +94 -0
package/content/commands/setup-precommit-hooks.md +72 -0
package/content/commands/write-design-doc.md +78 -0
package/content/manifest.json +214 -3
package/content/skills/connection-pool-tuner.md +46 -0
package/content/skills/dependency-upgrade-planner.md +42 -0
package/content/skills/memory-leak-hunter.md +35 -0
package/content/skills/pagination-designer.md +51 -0
package/content/skills/property-test-designer.md +63 -0
package/content/skills/security-headers-hardener.md +79 -0
package/content/skills/slo-definer.md +38 -0
package/content/skills/structured-logging-designer.md +42 -0
package/package.json +1 -1

package/content/skills/property-test-designer.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+name: "property-test-designer"
+description: "Design property-based tests — generate hundreds of random inputs and assert invariants that must hold for ALL of them — to surface the edge cases hand-picked examples never reach. Use when code has a large input space (parsers, serializers, encoders, math, data transforms), when a bug keeps slipping through despite green example tests, or when you can't enumerate every case worth checking."
+allowed-tools: "Read, Grep, Glob, Edit"
+version: 1.0.0
+---
+Example-based tests only check the inputs you thought to write down. This skill designs property-based tests instead: it identifies the invariants that must hold for *every* valid input, defines generators that produce hundreds of them — including the corners you'd never type by hand — and lets the framework shrink any failure to its minimal reproducing input. The deliverable is the chosen properties, the generators, a runnable test in your language's framework, and a plan to pin every counterexample as a fixed regression case.
+## When to use this skill
+- The input space is large or recursive — parsers, serializers, encoders/decoders, numeric code, date/time logic, data transforms, state machines — and enumerating cases by hand is hopeless.
+- A bug keeps escaping a green example suite because it lives in a corner nobody wrote a test for (empty input, unicode, overflow, a specific interleaving).
+- You have a clear correctness relation — a round-trip, an inverse, a slower reference implementation — but no single "expected output" to assert against.
+- You're hardening a critical pure function and want adversarial coverage, not three happy-path examples.
+## Instructions
+1. **Pick properties that hold for ALL valid inputs — not examples.** Stop choosing inputs; choose relations. The classics, in rough order of power:
+   - **Round-trip / inverse:** `decode(encode(x)) == x`, `parse(render(x)) == x`, `decompress(compress(x)) == x`. The highest-value property for any serializer or codec.
+   - **Invariant:** a property of the output regardless of input — `sort(xs)` is ordered *and* a permutation of `xs`; a balanced-tree insert keeps the balance condition; a parser never returns a node spanning past EOF.
+   - **Idempotence:** `f(f(x)) == f(x)` — for normalizers, dedupers, sanitizers, `canonicalize`.
+   - **Oracle / model:** the function must agree with a simpler, slower, or trusted reference (a brute-force version, the previous release, the stdlib) on every input.
+   - **Metamorphic:** when there's no oracle, relate two runs — `sort(xs) == sort(shuffle(xs))`; `search(q)` ⊆ `search(broaden(q))`; `len(filter(p, xs)) <= len(xs)`.
+2. **Define generators that cover the real domain.** A property is only as good as its inputs. For each property, build a generator that reaches the nasty regions on purpose: empty/single-element collections, `0`/`-0`/negatives/`MAX_INT`/`MIN_INT`, NaN and infinities, empty strings, unicode and surrogate pairs, embedded delimiters and escape chars, huge inputs, deeply nested structures, and duplicates. Compose existing generators (`lists(integers())`, `dictionaries(...)`) rather than rolling raw randomness.
+3. **Constrain generators to valid inputs.** If the property only holds for, say, sorted lists or well-formed dates, *generate them in that shape* — `map`/`build` from raw primitives — instead of generating garbage and filtering it. Filtering (`assume`/`.filter`) discards rejected inputs and silently shrinks your effective sample size.
+4. **Pick the framework for the language.** Python → **Hypothesis** (`@given`, `st.*` strategies). JS/TS → **fast-check** (`fc.assert(fc.property(...))`). Haskell → **QuickCheck**; Scala → **ScalaCheck**; JVM/Java → **jqwik**; Rust → **proptest**/`quickcheck`; Go → built-in `testing/quick` or `rapid`. Match what's already in the project before adding a dep.
+5. **Lean on shrinking and pin the counterexample.** When a property fails, the framework shrinks the random input to a *minimal* failing case (e.g. `[0, 0]`, not a 400-element list). Read that minimal input — it usually names the bug. Then add it as an explicit example so it's checked every run regardless of the random seed: Hypothesis `@example(...)`, fast-check `fc.assert(prop, { examples: [[...]] })`, or just a plain unit test asserting the fixed input.
+6. **Budget run counts for CI.** Defaults (Hypothesis 100, fast-check 100) are fine locally; for cheap pure functions raise to 1000+ in a nightly job, but keep PR runs bounded so the suite stays fast. Set an explicit seed in CI config notes so a flake is reproducible, and disable Hypothesis's `deadline` for inputs whose runtime legitimately scales with size.
+> [!WARNING]
+> A property that reimplements the function under test proves nothing. If your "oracle" shares the buggy logic (or you assert `encode(x) == encode(x)`), the test is green and worthless. The relation must be *independent* of the implementation — an inverse, a brute-force model, or a structural invariant the code never computes directly.
+> [!NOTE]
+> An unconstrained generator wastes the run budget rejecting invalid inputs and can starve the interesting region. If a heavy `assume()`/`.filter()` throws away most candidates, the framework will warn (Hypothesis raises `FailedHealthCheck`) — rebuild the generator to *construct* valid inputs instead of filtering for them.
+## Output
+For each property, the skill produces:
+- **The property and the relation it encodes** (round-trip / invariant / idempotence / oracle / metamorphic), stated as a one-line claim about all valid inputs.
+- **The generator(s)**, written in the project's framework, with the edge regions they deliberately reach.
+- **A runnable test** in that framework.
+- **The regression plan** — where each shrunk counterexample gets pinned as a fixed example so it's checked deterministically forever.
+Example — a round-trip property for a CSV codec, in Hypothesis:
+```python
+from hypothesis import given, strategies as st, example
+# Generate well-formed rows directly (no filtering): each cell is arbitrary
+# text incl. commas, quotes, newlines, unicode — exactly the chars that break parsers.
+rows = st.lists(st.lists(st.text(), min_size=1), min_size=1)
+@given(rows)
+@example([["a,b", '"q"', "line\nbreak"]])  # pinned: a past failure, checked every run
+def test_csv_roundtrip(data):
+    # Property: parsing what we wrote back yields the original (inverse).
+    # parse_csv is INDEPENDENT of write_csv — not a reimplementation of it.
+    assert parse_csv(write_csv(data)) == data
+```
+A failure here shrinks to the minimal breaking cell — typically `[["\n"]]` or `[['"']]` — which you read, fix, and then pin via a second `@example(...)`. Hand the proposed properties to `test-scaffolder` to flesh out, and use `coverage-gap-finder` to confirm the generated inputs now reach the previously-cold branches.

package/content/skills/security-headers-hardener.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: "security-headers-hardener"
+description: "Audit and harden a web app's or API's HTTP security headers — Content-Security-Policy, HSTS, X-Content-Type-Options, frame-ancestors, Referrer-Policy, Permissions-Policy, and CORS — and produce a staged rollout that won't break the site. Use before a launch, during a security pass, or when a scanner (Mozilla Observatory, securityheaders.com, a pentest) flags missing or weak headers. Audits and edits header config; rolls CSP out Report-Only first."
+allowed-tools: "Read, Grep, Glob, Edit"
+version: 1.0.0
+---
+Audit the HTTP security headers a web app or API actually sends, then harden them without taking the site down. The single highest-value header is a real **Content-Security-Policy** — it is the strongest in-band mitigation for XSS — but it is also the one most likely to break your site if shipped carelessly, so this skill always stages CSP through **Report-Only** first. Around it: enforce HTTPS with HSTS (carefully, because `preload` is effectively one-way), stop MIME sniffing, block framing, tighten `Referrer-Policy` and `Permissions-Policy`, scope CORS so it can't be turned into a credential-leaking open door, and strip headers that advertise your stack and version. Output is a per-header `current → recommended` audit, the exact values to paste, and a rollout plan that goes Report-Only before enforce.
+## When to use this skill
+- Before a public launch or a major release that changes the frontend, third-party scripts, or the CDN/proxy in front of the app.
+- When a scanner (securityheaders.com, Mozilla Observatory, Lighthouse, a pentest report) flags missing or weak headers.
+- When standing up a new service, edge config, or reverse proxy and you want headers right from day one.
+- After adding a third-party embed, analytics, payment iframe, or auth widget — anything that changes what origins the page must trust.
+> [!WARNING]
+> Never ship an enforcing `Content-Security-Policy` you have not first run as `Content-Security-Policy-Report-Only` against real traffic. A directive like `script-src 'self'` will silently kill every inline `<script>`, injected analytics snippet, and third-party widget the moment it enforces — that's a white-screened production site, not a hardened one.
+## Instructions
+1. **Find where headers are actually set, then observe what ships.** Glob and grep the layers that can emit headers — app middleware (`helmet`, `setHeader`, `res.headers`, `add_header`), framework config (`next.config`, `vercel.json`, `netlify.toml`, `**/middleware*`), and edge config (`nginx.conf`, `*.htaccess`, Cloudflare/CDN rules, `**/*.conf`). Multiple layers may set the same header; the proxy can override the app, or duplicate it. Establish the *effective* response (e.g. `curl -sI https://host` against a deployed instance, or read the proxy config) before changing anything — you can't harden what you can't see, and a header set twice with different values is its own bug.
+2. **Set a real Content-Security-Policy — the core control.** Start from a default-deny base: `default-src 'self'`. Then open *only* what the app needs: `script-src` and `style-src` for trusted origins, `img-src`, `connect-src` for your APIs/websockets, `font-src`, `frame-src` for embeds. Avoid `'unsafe-inline'` and `'unsafe-eval'` in `script-src` — they neuter the whole policy against XSS. For unavoidable inline scripts, use a per-response **nonce** (`script-src 'nonce-<random>'`, regenerated each request) or a **SHA-256 hash** of the script body, not a blanket allow. Always add `object-src 'none'` (kills Flash/plugin vectors) and `base-uri 'self'` (stops `<base>`-tag injection that reroutes relative script URLs). Add a `report-uri`/`report-to` endpoint so violations are collected.
+3. **Roll CSP out Report-Only before enforcing.** Deploy the policy as `Content-Security-Policy-Report-Only` first — same directives, but violations are reported to your collector instead of blocked. Watch the violation stream across representative traffic (all major pages, logged-in and out, the third-party flows) until it goes quiet or shows only known-benign noise (browser extensions inject inline styles — scope by `document-uri`/`blocked-uri`, don't widen the policy for them). Only then flip the header name to `Content-Security-Policy`. Keep `report-to` on after enforcing to catch regressions.
+4. **Enforce HTTPS with HSTS — and be deliberate about preload.** Set `Strict-Transport-Security: max-age=31536000; includeSubDomains`. Add `; preload` **only** once every subdomain serves valid HTTPS, because preload submission bakes HTTPS-only into shipped browsers and is slow and painful to undo. When first introducing HSTS, consider starting with a shorter `max-age` (e.g. a day) to confirm nothing breaks, then raise it. HSTS only takes effect on a response served over HTTPS, so also ensure a plain-HTTP→HTTPS redirect exists.
+5. **Stop MIME sniffing and clickjacking.** Set `X-Content-Type-Options: nosniff` (stops the browser from re-interpreting a response's type, a classic way to execute an uploaded "image" as script). Block framing with a frame-busting policy: prefer `Content-Security-Policy: frame-ancestors 'self'` (or an explicit allowlist of origins permitted to frame you), which supersedes the legacy `X-Frame-Options: DENY/SAMEORIGIN` — set both for older-browser coverage, but make them agree.
+6. **Tighten Referrer-Policy and Permissions-Policy.** Set `Referrer-Policy: strict-origin-when-cross-origin` (sends the full URL same-origin, only the origin cross-origin over HTTPS, nothing on downgrade) — this stops tokens or PII in query strings from leaking via the `Referer` header to third parties. Set `Permissions-Policy` to disable powerful features the app doesn't use, e.g. `camera=(), microphone=(), geolocation=(), payment=()` — an empty allowlist `()` means "no origin, not even self." Only grant features the app actually calls.
+7. **Scope CORS tightly — never the wildcard-plus-credentials trap.** If the API serves cross-origin requests, reflect or allowlist **specific** trusted origins for `Access-Control-Allow-Origin`; never reflect an arbitrary `Origin` header back unchecked (that's "allow everyone" with a disguise). The exploitable misconfiguration to hunt for: `Access-Control-Allow-Origin: *` together with `Access-Control-Allow-Credentials: true` — browsers forbid the literal combination, so a server that *needs* credentials will instead reflect the caller's Origin, and if that reflection is unchecked, any site can make authenticated cross-origin requests and read the response. Pin `Allow-Methods`/`Allow-Headers` to what's used, and set `Vary: Origin` when reflecting so caches don't serve one origin's CORS response to another.
+8. **Remove headers that leak the stack.** Strip or blank `Server` version detail, `X-Powered-By`, `X-AspNet-Version`, `X-Generator`, and framework banners — they hand attackers a version to match against known CVEs and cost nothing to remove. (`X-XSS-Protection` is deprecated and best set to `0` or omitted; do not rely on it — CSP replaces it.)
+9. **Apply the changes, keeping each layer's edit minimal and consistent.** Use Edit to set the recommended values in the right layer (prefer the single source of truth — usually the proxy/edge or one central middleware — over scattering headers across the app). Don't introduce a header in two places with conflicting values. Leave CSP as Report-Only in the committed config if the violation-watch window hasn't completed; note clearly in the rollout plan when to flip it.
+> [!NOTE]
+> Test against a real response, not the config file. A header in `helmet()` or `next.config` can be silently overridden, dropped, or duplicated by a CDN, load balancer, or framework default. Confirm the effective `curl -sI` output before and after — the wire is the source of truth.
+## Output
+A per-header audit table (`current → recommended` for every header in scope), the exact header/config values to apply in the identified layer, and a staged rollout plan that puts CSP through Report-Only before enforce. Edits are applied to the header config; CSP stays Report-Only until the violation window is clear.
+```text
+Security headers — scope: next.config.ts, middleware.ts, effective response for https://app.example.com
+Header                       Current                          Recommended
+---------------------------------------------------------------------------------------------------
+Content-Security-Policy      (none)                           default-src 'self'; script-src 'self'
+                                                              'nonce-{n}'; style-src 'self'; img-src
+                                                              'self' data:; connect-src 'self'
+                                                              https://api.example.com; object-src
+                                                              'none'; base-uri 'self'; frame-ancestors
+                                                              'self'; report-to csp
+                                                              → ship as -Report-Only first
+Strict-Transport-Security    (none)                           max-age=31536000; includeSubDomains
+                                                              (add ;preload only after subdomain audit)
+X-Content-Type-Options       (none)                           nosniff
+X-Frame-Options              (none)                           DENY        (CSP frame-ancestors is primary)
+Referrer-Policy              unsafe-url                       strict-origin-when-cross-origin
+Permissions-Policy           (none)                           camera=(), microphone=(), geolocation=(),
+                                                              payment=()
+Access-Control-Allow-Origin  * (reflected, with credentials)  https://app.example.com (allowlist) + Vary: Origin
+X-Powered-By                 Next.js                          (removed)
+Server                       nginx/1.25.3                     nginx (version suppressed)
+Rollout plan
+1. Deploy all headers above; CSP as Content-Security-Policy-Report-Only with report-to=csp.
+2. Watch violation reports across all pages + third-party flows for one full traffic cycle.
+3. Resolve real violations (add the specific origin/nonce); ignore extension noise.
+4. When the stream is quiet, rename the header to Content-Security-Policy (enforce). Keep report-to on.
+5. After every subdomain is verified HTTPS-only, add ;preload to HSTS and submit (one-way).
+Fixed now: CORS wildcard+credentials misconfiguration removed; X-Powered-By/Server stripped;
+nosniff, frame-ancestors, Referrer-Policy, Permissions-Policy, HSTS applied. CSP pending enforce.
+```

package/content/skills/slo-definer.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: "slo-definer"
+description: "Turn a vague reliability goal into concrete SLIs, SLOs, an error budget, and burn-rate alerts — service-level indicators measured at the user-facing boundary, targets over a rolling window, and a written policy for what happens when the budget runs out. Use when a service has no defined reliability target, when on-call is noisy and alert-fatigued, or before you commit to an SLA you can't measure."
+allowed-tools: "Read, Grep, Glob"
+version: 1.0.0
+---
+"Make it reliable" can't be measured, can't be alerted on, and can't tell you when to stop shipping. This skill converts a reliability intention into four artifacts that can: **SLIs** that measure what users actually experience, **SLOs** that set a target over a window, an **error budget** with a written policy for spending and exhausting it, and **burn-rate alerts** that page when the budget is genuinely at risk. The output is a spec, not a dashboard — a contract the team and on-call can both point at.
+## When to use this skill
+- A service is "important" but has no defined reliability target, so nobody can say whether last week was good or bad.
+- On-call is drowning in pages that don't correspond to user pain — alert fatigue from threshold blips on CPU, memory, or a single 5xx.
+- You're about to sign an SLA and need an internal SLO (tighter, measurable) to back it before you promise anything externally.
+- You have dashboards full of metrics but can't answer "are users having a good time right now, and how much room do we have left to break things?"
+## Instructions
+1. **Identify the user and the boundary first.** An SLI measures the experience of a consumer (end user, calling service) at a specific boundary — the load balancer, the API gateway, the client SDK. Measure as close to the user as you can: a 200 at the app server while the CDN returns 502s is a lie. Name the boundary explicitly before picking metrics.
+2. **Pick the few SLIs that reflect that experience.** Choose from the request/response SLI families: **availability** (good-event ratio: non-5xx, non-timeout responses ÷ total valid requests), **latency** (fraction of requests served under a threshold at a percentile), and for data systems **freshness** (fraction of reads no older than N seconds) or **correctness/coverage**. Two or three SLIs per service is plenty — more dilutes the signal.
+3. **Write each SLI as an explicit good-event criterion.** Spell out what counts as a good event, what's in the denominator, and what's excluded. Example: `latency SLI = (requests with TTFB < 300ms) / (all non-400 requests at the gateway)`. Exclude client errors (4xx) and load-test traffic from the denominator — they aren't the service failing — but say so in writing.
+4. **Set the SLO as a target over a rolling window grounded in user need.** Format: "X% of [good events] over [rolling window]" — e.g. `99.9% of requests succeed over 28 days`. Use a **rolling** window (28 days is common) rather than calendar months so the number can't be gamed by a quiet week. Pick the lowest target users genuinely won't notice; if you can't justify the extra nine from user impact, don't pay for it.
+5. **Derive the error budget and write its spend policy.** The budget is `1 − SLO` over the window: a 99.9% SLO allows 0.1% bad events — for 28 days that's ~40 minutes of total unavailability, or 0.1% of requests. State who may spend it (experiments, risky migrations, planned maintenance all draw down the same budget) and the **exhaustion rule in writing**: when the budget is gone, risky changes freeze and reliability work takes priority until the window recovers. A budget with no consequence is just a number.
+6. **Tie alerts to burn rate, not to thresholds.** Alert on how fast the budget is being consumed relative to the window. Run two: a **fast-burn** alert (e.g. 14.4× burn over 1 hour = ~2% of a 28-day budget gone in an hour → page now) and a **slow-burn** alert (e.g. ~3× burn over 6 hours → ticket, not a page). This makes a page mean "the budget is at risk," with high precision and low noise, instead of "5xx crossed 5 for 30 seconds."
+7. **Sanity-check against history before committing.** Read recent latency/error data (logs, metrics exports) and confirm the proposed SLO is currently *achievable* and *meaningful* — not already breached every week (unattainable, so it'll be ignored) and not trivially met with 100× headroom (no signal). Adjust the target to the real distribution.
+> [!WARNING]
+> A 100% SLO is a trap: it leaves zero error budget, so every deploy is a potential breach and the only "safe" move is to never change the system. The gap below 100% is precisely the room you have to ship, experiment, and do maintenance — design it in deliberately.
+> [!WARNING]
+> Averages hide the tail. A 200ms *average* latency is consistent with 5% of users waiting 4 seconds — and the tail is where users churn. Always state latency SLIs as a percentile (p95/p99 served under a threshold), never as a mean.
+> [!NOTE]
+> System metrics are not SLIs. CPU, memory, disk, and queue depth are *causes*, useful for debugging, but a user never files a ticket about your CPU. SLIs live at the user-facing boundary; keep host metrics on the diagnosis dashboard, out of the SLO spec.
+## Output
+A reliability spec containing: (1) **SLI definitions** — for each, what's measured, the boundary it's measured at, and the exact good-event criterion (numerator/denominator + exclusions); (2) **SLO targets** — the percentage and rolling window per SLI, with the user-impact rationale; (3) the **error budget** — `1 − SLO` translated into concrete allowance (minutes and/or request count over the window) plus the written spend-and-exhaustion policy; and (4) the **burn-rate alert thresholds** — fast-burn (page) and slow-burn (ticket) multipliers and look-back windows. Reproducible: the same spec can be re-derived and re-checked against fresh data each quarter.

package/content/skills/structured-logging-designer.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+name: "structured-logging-designer"
+description: "Design a structured (JSON) logging strategy with a stable field schema, correlation-ID propagation, and a disciplined level policy — then migrate ad-hoc string logs toward it. Use when logs are unsearchable plain text, when debugging a request across services means grepping multiple log streams by hand, or when standing up logging for a new service."
+allowed-tools: "Read, Grep, Glob, Edit"
+version: 1.0.0
+---
+A log line like `"user 42 failed to checkout"` answers nothing you can query: you can't filter by user, can't join it to the request that produced it, can't alert on it. Structured logging makes every line a queryable record — fields, not prose — so "show me every ERROR for tenant X in the last hour, with the request ID" is a query instead of a grep across five files. This skill designs that schema, threads a correlation ID through a request so a single flow is reconstructable across services, sets a level policy you can actually act on, and redacts secrets at the boundary — then rewrites representative statements so the team has a concrete pattern to copy.
+## When to use this skill
+- Logs are plain text and unsearchable — you grep for substrings instead of filtering on fields, and you can't build a dashboard or alert from them.
+- Debugging one request means manually correlating timestamps across multiple services or log streams because nothing ties the lines together.
+- Standing up logging for a new service and you want a defensible schema and level policy instead of scattered `print`/`console.log` calls.
+- Log levels are meaningless (everything is INFO, or ERROR is used for expected conditions) so on-call alerts are noise and real failures hide.
+## Instructions
+1. **Emit one structured record per line with a stable schema.** Every log line is a JSON object with the same required fields: `timestamp` (ISO-8601 / RFC-3339, UTC), `level`, `message` (a short, *constant* string — the variable parts go in fields, not interpolated into the message), `service`, and `correlation_id`. A constant message is what lets you group and count: `{"message": "checkout failed", "user_id": 42, "reason": "card_declined"}` is countable; `"user 42 failed: card declined"` is not.
+2. **Thread a correlation ID through every line of a request.** At the request entry point (HTTP middleware, queue consumer, RPC handler), read an incoming `X-Request-Id` / trace header or generate one, store it in a context-local (Go `context`, Node `AsyncLocalStorage`, Python `contextvars`, MDC in JVM), and have the logger attach it automatically to *every* line in that request — never pass it by hand. Propagate the same ID on outbound calls (set the header) so downstream services log it too. Reconstructing a flow then becomes `correlation_id = "abc123"` across all services.
+3. **Define a level policy and enforce what each level means.** ERROR = something failed and a human needs to act or be alerted (unhandled exception, failed write, breached invariant) — never use it for expected conditions like a 404 or a validation rejection. WARN = suspicious but handled (retry succeeded, fell back, approaching a limit). INFO = key business events worth keeping in production (request completed, order placed, job finished). DEBUG = developer detail (intermediate values, branch taken), off in production. Write the policy down with one concrete example per level so reviewers can reject a misused level.
+4. **Make the level runtime-configurable.** Read the threshold from an env var or config (`LOG_LEVEL=debug`) so you can raise verbosity for an incident without a redeploy, and run production at INFO. Where the logger supports it, allow per-module overrides (e.g. DEBUG for one noisy package) so you can zoom in without drowning in unrelated DEBUG output.
+5. **Attach context as fields, never by string concatenation.** User, tenant, resource, and operation IDs are structured fields (`user_id`, `tenant_id`, `order_id`, `operation`), not substrings of `message`. Bind request-scoped context once (a child/bound logger carrying `tenant_id` and `correlation_id`) so every line in that scope inherits it without repeating it. This is what makes `tenant_id = "acme" AND level = "ERROR"` a one-line query.
+6. **Redact secrets and PII at the logging boundary.** Maintain a deny-list of field names (`password`, `token`, `authorization`, `secret`, `api_key`, `ssn`, `card`, `cookie`, `set-cookie`) and a redaction hook in the logger that masks them *before serialization*, regardless of which call site logs them — do not rely on every developer remembering. Never log full request/response bodies or raw headers; log a content length, a hash, or an explicit allow-list of safe fields instead.
+7. **Rewrite representative statements as before/after.** Pick the highest-traffic and highest-value sites — a request handler, an error path, an external-call wrapper — and rewrite each from string log to structured log so the team copies a real pattern, not a doc.
+> [!WARNING]
+> Logging a secret, token, or PII field is a breach the moment it lands in your log store — logs are widely replicated, retained, and read by people who'd never get database access. Redact at the boundary (step 6); do not trust call sites to remember.
+> [!WARNING]
+> Unbounded high-cardinality fields (raw URLs with query strings, full user-agent strings, per-request UUIDs as *indexed* fields) explode log-store cost and index size. Keep correlation IDs as plain fields, bucket or template high-cardinality values (`route_template = "/users/:id"`, not the literal path), and never put unbounded free text in a field your backend indexes.
+> [!WARNING]
+> A log call in a hot loop or per-row path can dominate latency — serialization, redaction, and I/O are not free. Guard DEBUG with the level check so it's skipped (not just discarded) in production, log aggregates instead of per-iteration lines, and sample very-high-frequency events rather than logging every one.
+## Output
+- **Log schema** — the required fields (`timestamp`, `level`, `message`, `service`, `correlation_id`) and the standard contextual fields (`user_id`, `tenant_id`, request/resource IDs) with types and an example record.
+- **Correlation-ID propagation** — where the ID is created/read, how it's stored (context-local), how it's auto-attached to every line, and how it's propagated on outbound calls.
+- **Level policy** — the meaning of ERROR/WARN/INFO/DEBUG with one concrete example each, plus the runtime config knob (`LOG_LEVEL`) and any per-module override.
+- **Redaction rules** — the field deny-list, the boundary hook that applies it, and the body/header policy.
+- **Before/after diffs** — representative log statements rewritten from string to structured, ready to copy across the codebase.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentscamp",
-  "version": "0.3.0",
+  "version": "0.4.0",
   "description": "Install AgentsCamp agents, skills, and slash commands into Claude Code from your terminal.",
   "license": "MIT",
   "type": "module",