sanook-cli 0.4.0 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +19 -0
- package/CHANGELOG.md +173 -0
- package/README.md +153 -20
- package/README.th.md +136 -0
- package/dist/agentContext.js +4 -0
- package/dist/approval.js +6 -0
- package/dist/bin.js +405 -57
- package/dist/brain.js +92 -59
- package/dist/brand.js +47 -0
- package/dist/checkpoint.js +37 -0
- package/dist/commands.js +86 -6
- package/dist/compaction.js +76 -5
- package/dist/config.js +100 -12
- package/dist/cost.js +60 -3
- package/dist/doctor.js +92 -0
- package/dist/gateway/auth.js +2 -2
- package/dist/gateway/ledger.js +2 -2
- package/dist/gateway/scheduler.js +1 -0
- package/dist/gateway/serve.js +6 -4
- package/dist/gateway/server.js +10 -2
- package/dist/git.js +11 -2
- package/dist/hooks.js +43 -17
- package/dist/knowledge.js +48 -49
- package/dist/loop.js +182 -66
- package/dist/lsp/client.js +173 -0
- package/dist/lsp/framing.js +56 -0
- package/dist/lsp/index.js +138 -0
- package/dist/lsp/servers.js +82 -0
- package/dist/mcp-server.js +244 -0
- package/dist/mcp.js +184 -29
- package/dist/memory-store.js +559 -0
- package/dist/memory.js +143 -29
- package/dist/orchestrate.js +150 -0
- package/dist/providers/codex.js +21 -7
- package/dist/providers/keys.js +3 -2
- package/dist/providers/models.js +22 -6
- package/dist/providers/registry.js +155 -1
- package/dist/repomap.js +93 -0
- package/dist/search/chunk.js +158 -0
- package/dist/search/embed-store.js +187 -0
- package/dist/search/engine.js +203 -0
- package/dist/search/fuse.js +35 -0
- package/dist/search/index-core.js +187 -0
- package/dist/search/indexer.js +241 -0
- package/dist/search/store.js +77 -0
- package/dist/session.js +42 -8
- package/dist/skill-install.js +10 -10
- package/dist/skills.js +12 -9
- package/dist/summarize.js +31 -0
- package/dist/tools/bash.js +21 -2
- package/dist/tools/diagnostics.js +41 -0
- package/dist/tools/edit.js +29 -7
- package/dist/tools/index.js +8 -1
- package/dist/tools/list.js +7 -2
- package/dist/tools/permission.js +90 -9
- package/dist/tools/read.js +23 -4
- package/dist/tools/remember.js +1 -1
- package/dist/tools/sandbox.js +61 -0
- package/dist/tools/search.js +105 -4
- package/dist/tools/task.js +195 -29
- package/dist/tools/timeout.js +35 -0
- package/dist/tools/util.js +10 -0
- package/dist/tools/write.js +6 -4
- package/dist/trust.js +89 -0
- package/dist/ui/app.js +228 -31
- package/dist/ui/banner.js +4 -9
- package/dist/ui/brain-wizard.js +2 -2
- package/dist/ui/history.js +30 -0
- package/dist/ui/mentions.js +44 -0
- package/dist/ui/render.js +55 -15
- package/dist/ui/setup.js +97 -12
- package/dist/ui/useEditor.js +83 -0
- package/dist/update.js +114 -0
- package/dist/worktree.js +173 -0
- package/package.json +11 -5
- package/scripts/postinstall.mjs +33 -0
- package/second-brain/.agents/_Index.md +30 -0
- package/second-brain/.agents/skills/_Index.md +30 -0
- package/second-brain/.agents/workflows/_Index.md +30 -0
- package/second-brain/AGENTS.md +4 -4
- package/second-brain/Acceptance/_Index.md +30 -0
- package/second-brain/Acceptance/golden-case-template.md +39 -0
- package/second-brain/Areas/_Index.md +30 -0
- package/second-brain/Bugs/System-OS/_Index.md +30 -0
- package/second-brain/Bugs/_Index.md +30 -0
- package/second-brain/CLAUDE.md +4 -1
- package/second-brain/Checklists/_Index.md +30 -0
- package/second-brain/Checklists/preflight-postflight-template.md +29 -0
- package/second-brain/Distillations/_Index.md +30 -0
- package/second-brain/Entities/_Index.md +30 -0
- package/second-brain/Entities/entity-template.md +33 -0
- package/second-brain/Evals/_Index.md +30 -0
- package/second-brain/Evals/correction-pairs.md +24 -0
- package/second-brain/Evals/failure-taxonomy.md +24 -0
- package/second-brain/Evals/golden-set.md +25 -0
- package/second-brain/Evals/quality-ledger.md +23 -0
- package/second-brain/Evals/self-eval-rubric.md +23 -0
- package/second-brain/GEMINI.md +4 -4
- package/second-brain/Goals/_Index.md +30 -0
- package/second-brain/Handoffs/_Index.md +30 -0
- package/second-brain/Home.md +7 -0
- package/second-brain/Intake/Raw Sources/_Index.md +30 -0
- package/second-brain/Intake/_Index.md +30 -0
- package/second-brain/Intake/_Quarantine/_Index.md +30 -0
- package/second-brain/Learning/_Index.md +30 -0
- package/second-brain/Playbooks/_Index.md +30 -0
- package/second-brain/Playbooks/playbook-template.md +23 -0
- package/second-brain/Projects/_Index.md +30 -0
- package/second-brain/Prompts/_Index.md +30 -0
- package/second-brain/README.md +2 -1
- package/second-brain/Research/_Index.md +30 -0
- package/second-brain/Retrospectives/_Index.md +30 -0
- package/second-brain/Reviews/_Index.md +30 -0
- package/second-brain/Runbooks/_Index.md +30 -0
- package/second-brain/Runbooks/eval-loop.md +24 -0
- package/second-brain/Sessions/_Index.md +30 -0
- package/second-brain/Shared/AI-Context-Index.md +20 -0
- package/second-brain/Shared/AI-Threads/_Index.md +30 -0
- package/second-brain/Shared/Archive/_Index.md +30 -0
- package/second-brain/Shared/Assets/_Index.md +30 -0
- package/second-brain/Shared/Context-Packs/_Index.md +30 -0
- package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
- package/second-brain/Shared/Coordination/NOW.md +28 -0
- package/second-brain/Shared/Coordination/_Index.md +30 -0
- package/second-brain/Shared/Coordination/agent-registry.md +24 -0
- package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
- package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
- package/second-brain/Shared/Coordination/task-board.md +32 -0
- package/second-brain/Shared/Core-Facts/_Index.md +30 -0
- package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
- package/second-brain/Shared/Glossary/_Index.md +30 -0
- package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
- package/second-brain/Shared/Operating-State/_Index.md +30 -0
- package/second-brain/Shared/Prompting/_Index.md +30 -0
- package/second-brain/Shared/Provenance/_Index.md +30 -0
- package/second-brain/Shared/Rules/_Index.md +30 -0
- package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
- package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
- package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
- package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
- package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
- package/second-brain/Shared/Rules/rules-formatting.md +34 -0
- package/second-brain/Shared/Scripts/_Index.md +30 -0
- package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
- package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
- package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
- package/second-brain/Shared/User-Memory/_Index.md +30 -0
- package/second-brain/Shared/User-Persona/_Index.md +30 -0
- package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
- package/second-brain/Shared/Working-Memory/_Index.md +30 -0
- package/second-brain/Shared/_Index.md +30 -0
- package/second-brain/Shared/mcp-servers/_Index.md +30 -0
- package/second-brain/Skills/_Index.md +30 -0
- package/second-brain/Templates/_Index.md +30 -0
- package/second-brain/Templates/bug.md +2 -0
- package/second-brain/Templates/handoff.md +2 -0
- package/second-brain/Templates/session.md +2 -0
- package/second-brain/Tools/_Index.md +30 -0
- package/second-brain/Traces/_Index.md +30 -0
- package/second-brain/Vault Structure Map.md +33 -1
- package/second-brain/copilot/_Index.md +30 -0
- package/skills/audit-license-compliance/SKILL.md +117 -0
- package/skills/author-codemod/SKILL.md +110 -0
- package/skills/build-audit-logging/SKILL.md +112 -0
- package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
- package/skills/build-cli-tool/SKILL.md +108 -0
- package/skills/build-data-table/SKILL.md +141 -0
- package/skills/build-native-mobile-ui/SKILL.md +154 -0
- package/skills/build-offline-first-sync/SKILL.md +118 -0
- package/skills/build-realtime-channel/SKILL.md +122 -0
- package/skills/build-vector-search/SKILL.md +131 -0
- package/skills/compose-local-dev-stack/SKILL.md +149 -0
- package/skills/configure-bundler-build/SKILL.md +166 -0
- package/skills/configure-dns-tls/SKILL.md +142 -0
- package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
- package/skills/configure-security-headers-csp/SKILL.md +122 -0
- package/skills/contract-testing/SKILL.md +140 -0
- package/skills/datetime-timezone-correctness/SKILL.md +125 -0
- package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
- package/skills/debug-flaky-tests/SKILL.md +128 -0
- package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
- package/skills/deliver-webhooks/SKILL.md +116 -0
- package/skills/design-api-pagination/SKILL.md +144 -0
- package/skills/design-authorization-model/SKILL.md +119 -0
- package/skills/design-backup-dr-recovery/SKILL.md +113 -0
- package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
- package/skills/design-multi-tenancy/SKILL.md +100 -0
- package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
- package/skills/design-relational-schema/SKILL.md +129 -0
- package/skills/design-search-index-infra/SKILL.md +151 -0
- package/skills/design-state-machine/SKILL.md +108 -0
- package/skills/design-token-system/SKILL.md +109 -0
- package/skills/distributed-locks-leases/SKILL.md +120 -0
- package/skills/encrypt-sensitive-data/SKILL.md +148 -0
- package/skills/feature-flags-rollout/SKILL.md +130 -0
- package/skills/file-upload-object-storage/SKILL.md +107 -0
- package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
- package/skills/harden-llm-app-reliability/SKILL.md +126 -0
- package/skills/i18n-localization-setup/SKILL.md +113 -0
- package/skills/idempotency-keys/SKILL.md +107 -0
- package/skills/implement-push-notifications/SKILL.md +142 -0
- package/skills/ingest-webhook-secure/SKILL.md +120 -0
- package/skills/integrate-oauth-oidc/SKILL.md +126 -0
- package/skills/load-stress-test/SKILL.md +129 -0
- package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
- package/skills/model-nosql-data/SKILL.md +118 -0
- package/skills/money-decimal-arithmetic/SKILL.md +123 -0
- package/skills/monitor-ml-drift/SKILL.md +109 -0
- package/skills/numeric-precision-units/SKILL.md +144 -0
- package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
- package/skills/optimize-react-rerenders/SKILL.md +124 -0
- package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
- package/skills/payments-billing-integration/SKILL.md +114 -0
- package/skills/pin-toolchain-versions/SKILL.md +116 -0
- package/skills/plan-strangler-migration/SKILL.md +95 -0
- package/skills/property-based-testing/SKILL.md +108 -0
- package/skills/publish-package-registry/SKILL.md +130 -0
- package/skills/recover-git-state/SKILL.md +119 -0
- package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
- package/skills/resilience-timeouts-retries/SKILL.md +104 -0
- package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
- package/skills/rewrite-git-history/SKILL.md +109 -0
- package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
- package/skills/schema-evolution-compatibility/SKILL.md +121 -0
- package/skills/send-transactional-email/SKILL.md +126 -0
- package/skills/serve-deploy-ml-model/SKILL.md +107 -0
- package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
- package/skills/setup-devcontainer-env/SKILL.md +131 -0
- package/skills/setup-lint-format-precommit/SKILL.md +140 -0
- package/skills/setup-monorepo-tooling/SKILL.md +125 -0
- package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
- package/skills/structured-output-llm/SKILL.md +86 -0
- package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
- package/skills/test-data-factories/SKILL.md +158 -0
- package/skills/threat-model-stride/SKILL.md +123 -0
- package/skills/train-evaluate-ml-model/SKILL.md +109 -0
- package/skills/unicode-text-correctness/SKILL.md +109 -0
- package/skills/visual-regression-testing/SKILL.md +120 -0
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ingest-webhook-secure
|
|
3
|
+
description: Builds secure inbound webhook receivers that verify HMAC/asymmetric signatures over the raw body, reject replays via signed-timestamp windows and seen-id stores, dedup idempotently on provider event id, and fast-ack within timeout before processing async. Use when receiving callbacks from an external service that must be authentic, non-replayed, and handled exactly once.
|
|
4
|
+
when_to_use: When standing up or debugging an inbound webhook/callback endpoint that must reject spoofed, replayed, or duplicate events and survive retry storms. Distinct from auth-jwt-session (verifies your own users' identity, not a provider's request signature), message-queue-jobs (the async worker you hand off to), and rate-limiting (caps request rate, not authenticity).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when an **external service POSTs to you** and you must trust, deduplicate, and reliably process those events:
|
|
10
|
+
|
|
11
|
+
- "Stripe/GitHub/Slack/Twilio/Shopify webhook — verify the signature before acting"
|
|
12
|
+
- "We're getting duplicate webhook deliveries / charged twice / sent the email twice"
|
|
13
|
+
- "Provider says our endpoint timed out and they're hammering us with retries"
|
|
14
|
+
- "Someone is POSTing fake events to our `/webhooks` URL"
|
|
15
|
+
- "Signature verification fails intermittently" (almost always raw-body mangling)
|
|
16
|
+
- Designing one intake endpoint for several providers with different header/encoding quirks
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- Verifying *your own* logged-in user (session/JWT/cookie) → auth-jwt-session
|
|
20
|
+
- The background worker/queue that does the slow processing → message-queue-jobs
|
|
21
|
+
- Capping how many requests a caller may send → rate-limiting
|
|
22
|
+
- Where the signing secret is stored/rotated at rest → secrets-management
|
|
23
|
+
- Metrics/traces/dashboards for the endpoint → observability-instrument
|
|
24
|
+
|
|
25
|
+
## Steps
|
|
26
|
+
|
|
27
|
+
1. **Verify BEFORE parsing — over the RAW bytes, not re-serialized JSON.** Capture the exact body as received (`bytes`/`Buffer`) and sign *that*. Any JSON round-trip (`json.loads`→`json.dumps`, framework body-parser, pretty-printer, key reorder, trailing-newline strip) changes the bytes and breaks HMAC. Disable the framework's auto JSON parse for this route and read the raw stream first.
|
|
28
|
+
|
|
29
|
+
| Provider style | Signature scheme | What is signed |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| Stripe | HMAC-SHA256, header `Stripe-Signature: t=…,v1=…` | `"{t}.{rawbody}"` |
|
|
32
|
+
| GitHub | HMAC-SHA256, header `X-Hub-Signature-256: sha256=…` | raw body |
|
|
33
|
+
| Slack | HMAC-SHA256, header `X-Slack-Signature: v0=…` | `"v0:{ts}:{rawbody}"` |
|
|
34
|
+
| Shopify | HMAC-SHA256, **base64**, header `X-Shopify-Hmac-Sha256` | raw body |
|
|
35
|
+
| Svix/Standard Webhooks | HMAC-SHA256 base64, `webhook-signature` | `"{id}.{ts}.{rawbody}"` |
|
|
36
|
+
| GitHub App / Apple / some payment rails | **asymmetric** Ed25519 or RSA-SHA256, public key | raw body (you hold only the public key) |
|
|
37
|
+
|
|
38
|
+
2. **Constant-time compare, support multiple/rotating secrets.** Never `==` on signatures — that leaks timing. Compute the digest and use a constant-time check. Iterate over *all* currently-valid secrets so rotation has zero-downtime overlap (old + new accepted during the window).
|
|
39
|
+
|
|
40
|
+
```python
|
|
41
|
+
import hmac, hashlib, time
|
|
42
|
+
# header_sig MUST already be the parsed hex digest, NOT the raw header:
|
|
43
|
+
# GitHub "sha256=<hex>" -> strip "sha256="; Stripe "t=..,v1=<hex>" -> the v1 value.
|
|
44
|
+
def verify(raw: bytes, header_sig: str, ts: str, secrets: list[bytes], tol=300) -> bool:
|
|
45
|
+
try: # malformed/missing ts -> reject, never 500
|
|
46
|
+
skew = abs(time.time() - int(ts))
|
|
47
|
+
except (TypeError, ValueError):
|
|
48
|
+
return False
|
|
49
|
+
if skew > tol: # replay window FIRST (cheap reject)
|
|
50
|
+
return False
|
|
51
|
+
signed = f"{ts}.".encode() + raw # STRIPE-SHAPED ("{ts}.{rawbody}"); swap per Step 1:
|
|
52
|
+
# GitHub -> signed = raw
|
|
53
|
+
# Slack -> signed = b"v0:" + ts.encode() + b":" + raw
|
|
54
|
+
# Svix -> signed = id.encode() + b"." + ts.encode() + b"." + raw
|
|
55
|
+
for secret in secrets: # accept any active secret (rotation overlap)
|
|
56
|
+
expected = hmac.new(secret, signed, hashlib.sha256).hexdigest()
|
|
57
|
+
if hmac.compare_digest(expected, header_sig):
|
|
58
|
+
return True
|
|
59
|
+
return False
|
|
60
|
+
```
|
|
61
|
+
For asymmetric schemes, swap the body for `nacl.signing.VerifyKey(pub).verify(...)` (Ed25519) or `cryptography` `public_key.verify(...)` (RSA-PSS/PKCS1v15) — you never hold a shared secret. For base64 providers (Shopify, Svix) compare base64 digests, not hex.
|
|
62
|
+
|
|
63
|
+
3. **Reject replays — two layers.** (a) Tolerance window on the **signed** timestamp (default **±300 s**); a captured-but-stale request fails the window even with a valid signature. (b) Store the provider event id with a TTL ≥ the window and reject a second sighting. The timestamp must be the one *inside the signature*, not a client header you didn't authenticate.
|
|
64
|
+
|
|
65
|
+
4. **Idempotency — dedup on the provider's event id, atomically.** Use `SETNX webhook:{provider}:{event_id} 1 EX 86400` (or a UNIQUE column + `INSERT … ON CONFLICT DO NOTHING`). First writer proceeds; a `0`/conflict means already-seen → return `200` immediately (acknowledge, do nothing). TTL/retention ≥ the provider's max retry horizon (Stripe ~3 days, others up to weeks — check the table in step 7).
|
|
66
|
+
|
|
67
|
+
5. **Respond 2xx fast, then process async — never do slow work inline.** The handler's only inline job: verify → persist the verified raw event → enqueue → return `200`. Hand the actual processing (DB writes, emails, downstream calls) to a worker/queue (→ message-queue-jobs). Most providers retry on >~5–10 s; slow inline work causes a retry storm that multiplies load. Return `200`/`202` within ~2 s.
|
|
68
|
+
|
|
69
|
+
| Outcome | Status | Why |
|
|
70
|
+
|---|---|---|
|
|
71
|
+
| Verified + enqueued (or duplicate) | `200`/`202` | Ack; stops retries |
|
|
72
|
+
| Bad/missing signature, failed asymmetric verify | `401` | Not authentic — do **not** 200 |
|
|
73
|
+
| Replay outside window / malformed timestamp | `400` | Authentic-looking but stale/garbage |
|
|
74
|
+
| Body too large / not the expected content-type | `413` / `415` | Reject before reading fully |
|
|
75
|
+
| Your DB/queue down (verified but can't persist) | `500`/`503` | Let the provider retry — do NOT 200 and drop |
|
|
76
|
+
|
|
77
|
+
6. **Handle out-of-order delivery by resource version, not arrival order.** Retries and parallel deliveries mean `updated` can land before `created`. Reconcile on a monotonic field the provider gives (`sequence`, resource `version`, `updated_at`, Stripe object `created`): apply an event only if its version > the version you've stored; otherwise drop it as stale. When in doubt, treat the webhook as a *signal to re-fetch* the resource from the provider's API and use that as truth.
|
|
78
|
+
|
|
79
|
+
7. **Lock down the surface + ship a safe replay tool.** Cap body size (`413` past e.g. 1 MB) before reading the whole stream. Reject unsigned/missing-header requests with `401` — never fall through to processing. Optionally pin source IPs to the provider's published CIDR allowlist (defense in depth, not a substitute for the signature). Some providers require a one-time **handshake/challenge** (Slack `url_verification` echo, Stripe/Meta GET with a `hub.challenge`, EventSub `webhook_callback_verification`) — answer it verbatim or you'll never receive events. Store the verified raw payload so you can re-drive processing later; the replay tool must re-run the *worker*, never re-accept an unverified HTTP request.
|
|
80
|
+
|
|
81
|
+
| Provider | Signature header | Encoding | Handshake | Notes |
|
|
82
|
+
|---|---|---|---|---|
|
|
83
|
+
| Stripe | `Stripe-Signature` | hex, `t=`/`v1=` | none | tolerance 300 s; secret per-endpoint (`whsec_…`) |
|
|
84
|
+
| GitHub | `X-Hub-Signature-256` | hex | ping event | also legacy SHA-1 header — ignore it, use 256 |
|
|
85
|
+
| Slack | `X-Slack-Signature` + `X-Slack-Request-Timestamp` | hex, `v0=` | `url_verification` echo | reject ts older than 5 min |
|
|
86
|
+
| Shopify | `X-Shopify-Hmac-Sha256` | **base64** | none | sign raw body, compare base64 not hex |
|
|
87
|
+
| Twilio | `X-Twilio-Signature` | base64 over **URL + sorted POST params** | none | not raw-body — concat full URL + params |
|
|
88
|
+
| Svix/Standard Webhooks | `webhook-id`/`webhook-timestamp`/`webhook-signature` | base64, `v1,` | none | id+ts+body signed; multiple space-sep sigs |
|
|
89
|
+
|
|
90
|
+
## Common Errors
|
|
91
|
+
|
|
92
|
+
- **Signing re-serialized JSON instead of raw bytes.** The #1 "works in Postman, fails in prod" bug. Read and sign the exact received bytes; never let a body-parser touch the route before verification.
|
|
93
|
+
- **Plain `==` / string compare on signatures.** Timing oracle. Use `hmac.compare_digest` / `crypto.timingSafeEqual` (and length-check first since it throws on mismatched length).
|
|
94
|
+
- **Comparing against the raw header instead of the parsed digest.** `X-Hub-Signature-256` is `sha256=<hex>`; `Stripe-Signature` is `t=…,v1=<hex>`. Extract the digest field first, then constant-time compare — comparing the whole header always fails.
|
|
95
|
+
- **Reconstructing the signed string wrong (right secret, still rejects).** Each provider signs a different preimage (raw body vs `"{ts}.{body}"` vs `"v0:{ts}:{body}"`). Build it byte-for-byte from the Step 1 table; a generic `"{ts}.{body}"` silently works only for Stripe-shaped schemes.
|
|
96
|
+
- **Crashing on a malformed/missing timestamp.** `int(ts)` on a non-numeric or absent header throws → `500` (provider retries forever). Catch and treat a bad timestamp as a hard reject (`400`/`401`), not an exception.
|
|
97
|
+
- **Parsing the JSON before verifying.** Hands attacker-controlled bytes to your parser and downstream logic pre-trust. Verify first, parse second.
|
|
98
|
+
- **Trusting an unsigned timestamp/IP header for replay defense.** Use the timestamp *inside the signed payload*; anyone can set a raw header. IP allowlists are spoofable behind misconfigured proxies — keep them as defense in depth only.
|
|
99
|
+
- **No idempotency, or dedup that isn't atomic.** "Check then insert" in two steps lets two concurrent retries both pass → double processing. Use `SETNX`/`INSERT … ON CONFLICT` as one atomic op on the event id.
|
|
100
|
+
- **Doing the work inline, returning 200 after.** Causes timeouts → provider retries → storm. Persist + enqueue + 200 fast; process in a worker.
|
|
101
|
+
- **Returning 200 when persistence/enqueue failed.** Swallows the event forever — the provider thinks it's delivered and stops retrying. On internal failure return `5xx` so the retry redelivers.
|
|
102
|
+
- **Applying events in arrival order.** Out-of-order retries overwrite newer state with older. Gate on resource version/sequence, or re-fetch the resource.
|
|
103
|
+
- **One global secret, no rotation path.** Rotating means downtime or dropped events. Accept a *list* of active secrets; remove the old one after the overlap window.
|
|
104
|
+
- **Ignoring the handshake/challenge.** Endpoint silently never activates; you debug "missing events" that were never sent. Implement the provider's verification echo.
|
|
105
|
+
- **No body-size cap.** A multi-GB POST OOMs the process before you ever check the signature. Enforce a max length and `413` early.
|
|
106
|
+
|
|
107
|
+
## Verify
|
|
108
|
+
|
|
109
|
+
1. **Happy path:** Replay a captured real delivery with its original headers and raw body → `200`, event persisted once, worker ran exactly once.
|
|
110
|
+
2. **Tampered body:** Flip one byte of the body, keep the signature → `401`, nothing persisted, worker never invoked.
|
|
111
|
+
3. **Tampered/forged signature:** Random or empty signature header → `401`. Missing header entirely → `401` (not a 500, not a 200).
|
|
112
|
+
4. **Raw-body integrity:** Send a payload whose `json.dumps` re-serialization differs from the bytes (extra whitespace, reordered keys) → still `200`. Proves you verify the raw bytes, not a re-encode.
|
|
113
|
+
5. **Replay window:** Valid signature with a timestamp older than tolerance (e.g. ts−600 s) → `400`/`401`; same request within tolerance → `200`.
|
|
114
|
+
6. **Duplicate delivery:** POST the identical valid event twice (and concurrently, in parallel) → both return `200` but the worker side-effect happens **exactly once**. This catches non-atomic dedup.
|
|
115
|
+
7. **Out-of-order:** Deliver `version=2` then `version=1` for the same resource → final stored state reflects v2; the v1 arrival is dropped/ignored.
|
|
116
|
+
8. **Fast-ack:** Make downstream processing sleep; the HTTP response still returns 2xx within the provider timeout (assert response latency, not just status).
|
|
117
|
+
9. **Persistence failure:** Force the store/queue to fail on a verified event → endpoint returns `5xx` (so the provider retries), not `200`.
|
|
118
|
+
10. **Oversized / wrong type:** POST > size cap → `413`; wrong `Content-Type` → `415`; both reject before full read.
|
|
119
|
+
|
|
120
|
+
Done = a tampered or unsigned request gets `401`, a stale one `400`, a duplicate (including concurrent) is accepted but processed exactly once, a valid one is acked 2xx within timeout and processed via the worker, and raw-body verification survives a JSON re-serialization that would have broken a naïve implementation.
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: integrate-oauth-oidc
|
|
3
|
+
description: Integrates a THIRD-PARTY identity provider via OpenID Connect — "Log in with Google/GitHub/Microsoft/Apple" or acting as an OAuth client to a third-party API. Uses the Authorization Code flow with PKCE (S256) everywhere (SPA, native, server); mandatory state (CSRF) + nonce (replay); exact-match redirect_uri; server-side code→token exchange (no client_secret in public clients); strict ID-token validation against JWKS; safe email_verified account linking; refresh rotation with reuse detection; system-browser-only native flows.
|
|
4
|
+
when_to_use: Adding "Sign in with <provider>", consuming a third-party OAuth API, validating an ID token, linking accounts across providers, or fixing a broken OAuth callback/redirect. Distinct from auth-jwt-session (that ISSUES and validates YOUR app's own session/JWT after this handshake completes — this skill is the third-party handshake itself) and design-authorization-model (what a user may DO — permissions — not who they ARE).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when you are talking to an identity provider you do not own:
|
|
10
|
+
|
|
11
|
+
- "Add Log in with Google / GitHub / Microsoft / Apple"
|
|
12
|
+
- "Call the Stripe/Slack/Notion API on a user's behalf" (you are the OAuth client)
|
|
13
|
+
- "Validate this ID token / id_token / JWT from Google" — check signature + claims
|
|
14
|
+
- "A user signed up with Google but already has a password account — merge them"
|
|
15
|
+
- "My OAuth callback redirects but the token exchange / state check fails"
|
|
16
|
+
- "Refresh the access token / our Google refresh stopped working"
|
|
17
|
+
- "The native app login opens a webview and Google blocks it with disallowed_useragent"
|
|
18
|
+
|
|
19
|
+
NOT this skill:
|
|
20
|
+
- Issuing, signing, or verifying YOUR app's OWN session cookie / JWT AFTER login succeeds, refresh rotation of YOUR token, RP-initiated logout clearing YOUR session → **auth-jwt-session** (this skill ends when you have a validated set of claims; minting your session from them is that skill)
|
|
21
|
+
- "Which users can edit vs view", roles, multi-tenant isolation, per-resource rules → **design-authorization-model** (authZ — what they may do — not authN — who they are)
|
|
22
|
+
- Where to STORE the `client_secret` (Vault/Secrets Manager, OIDC-to-cloud, rotation, leak remediation) → **secrets-management**
|
|
23
|
+
- Auditing an existing diff for vulns by severity → **security-review**
|
|
24
|
+
|
|
25
|
+
## Steps
|
|
26
|
+
|
|
27
|
+
**1. Pick the flow — Authorization Code + PKCE, for every client type.**
|
|
28
|
+
- The implicit flow (`response_type=token`) is dead (deprecated by OAuth 2.1 / Security BCP) — never use it. So is ROPC (password grant). Use `response_type=code` always.
|
|
29
|
+
- PKCE (`code_challenge` + `code_verifier`) is mandatory for ALL clients now, including confidential server apps — not just SPA/native.
|
|
30
|
+
|
|
31
|
+
| Client | client_secret? | PKCE | Token exchange runs |
|
|
32
|
+
|---|---|---|---|
|
|
33
|
+
| Server / web app (BFF) | yes (server-only) | yes | server |
|
|
34
|
+
| SPA (React/Vue) | **no** | yes | **server (BFF)** — never the browser |
|
|
35
|
+
| Native / mobile | **no** | yes | server, or native via AppAuth |
|
|
36
|
+
| CLI | no | yes | local loopback or device code |
|
|
37
|
+
|
|
38
|
+
**2. Build the authorize request with state + nonce + PKCE.**
|
|
39
|
+
- `code_verifier` = 43–128 random chars; `code_challenge = BASE64URL(SHA256(verifier))`, send `code_challenge_method=S256` (never `plain`).
|
|
40
|
+
- `state` = random, server-stored, tied to the user's session → verify on callback. This is your **CSRF** defense; a missing/unchecked state lets an attacker inject their own auth code.
|
|
41
|
+
- `nonce` (OIDC) = random, stored, sent on authorize → **must equal** the `nonce` claim in the returned ID token. This is your **ID-token replay** defense.
|
|
42
|
+
- `redirect_uri` must **exactly** match a value pre-registered with the provider (scheme, host, port, path, trailing slash — byte-for-byte). No wildcards; "almost matches" = error or open redirect.
|
|
43
|
+
|
|
44
|
+
**3. Do the code→token exchange SERVER-SIDE. Never ship a secret to a public client.**
|
|
45
|
+
- POST `code` + `code_verifier` (+ `client_secret` only if confidential) to the token endpoint from your backend.
|
|
46
|
+
- A `client_secret` in SPA JS, mobile binary, or a public repo IS published — anyone can extract it. SPA/mobile use **PKCE without a secret** (it replaces the secret) behind a Backend-for-Frontend (BFF) that holds any secret and sets an httpOnly session cookie.
|
|
47
|
+
- Store the `client_secret` per **secrets-management** (env/Vault), never in source.
|
|
48
|
+
|
|
49
|
+
**4. VALIDATE the ID token — this is where most integrations are silently broken.**
|
|
50
|
+
- Fetch the provider's **JWKS** (`jwks_uri` from `/.well-known/openid-configuration`), select the key by the token's `kid`, **verify the signature**. Cache JWKS; refresh on unknown `kid`.
|
|
51
|
+
- **alg allowlist:** accept only what you expect (`["RS256"]` / `["ES256"]`). **Reject `alg:none`** and reject `HS256` when expecting RS — the RS→HS confusion attack signs with the public key as an HMAC secret. Never let the library read `alg` from the token.
|
|
52
|
+
- Check claims: `iss` == provider's exact issuer; `aud` == **your** `client_id` (reject tokens minted for another app); `exp` not past, `iat` not absurdly future (small clock skew ok); `nonce` == the one you sent.
|
|
53
|
+
- Only AFTER the token validates may you trust its claims or call `userinfo`. The `userinfo` response itself is not signed — trust comes from the validated ID token / the access token used to fetch it.
|
|
54
|
+
|
|
55
|
+
**5. Read verified claims, then hand off to YOUR app.**
|
|
56
|
+
- Standard OIDC claims: `sub` (the provider's STABLE user id — your join key, not email), `email`, `email_verified`, `name`, `picture`.
|
|
57
|
+
- Match users on `sub`, never on email alone (email is reassignable and provider-controlled). Now mint your own session/JWT — that is **auth-jwt-session**'s job; this skill is done at "validated claims".
|
|
58
|
+
|
|
59
|
+
**6. ACCOUNT LINKING — get this wrong and you enable account takeover.**
|
|
60
|
+
- Link an OAuth identity to an existing local account by email **only if `email_verified == true`** AND the provider is one you trust to verify email. If you auto-link on an unverified email, an attacker registers `victim@example.com` at a sloppy IdP and takes over the victim's account.
|
|
61
|
+
- Safer default: if an account with that email exists, require the user to **log in with the existing method first**, THEN link (first-party confirmation), instead of silently merging.
|
|
62
|
+
- Model identities as a separate table: one user → many `(provider, sub)` rows. A user with Google + GitHub + password is normal. Unique-constrain `(provider, sub)`.
|
|
63
|
+
|
|
64
|
+
**7. Refresh tokens — rotation, reuse detection, secure storage.**
|
|
65
|
+
- Request `offline_access` / `access_type=offline` only if you actually need long-lived access. Google returns a refresh token **only on the first consent** (or with `prompt=consent`) — capture and store it then.
|
|
66
|
+
- Rotate: each refresh use issues a new refresh token and invalidates the old. If an already-used (rotated) refresh token reappears → it was stolen → revoke the whole token family. (Mechanics overlap **auth-jwt-session**.)
|
|
67
|
+
- Storage: server-side or httpOnly `Secure` cookie; native → **Keychain (iOS) / Keystore (Android)**. **Never `localStorage`** (XSS reads it).
|
|
68
|
+
|
|
69
|
+
**8. Logout.**
|
|
70
|
+
- RP-initiated logout: redirect to the provider's `end_session_endpoint` with `id_token_hint` + `post_logout_redirect_uri` to end the provider session, and **revoke** the refresh token at the provider's revocation endpoint.
|
|
71
|
+
- Clearing YOUR app's own session/cookie is **auth-jwt-session**. Logging out of your app does NOT log the user out of Google unless you hit `end_session`.
|
|
72
|
+
|
|
73
|
+
**9. Scopes & incremental consent.**
|
|
74
|
+
- Request the **minimum** scopes at login (`openid profile email`). Ask for sensitive/extra scopes later, at the moment you need them (incremental consent) — broad upfront scopes scare users and over-privilege your token.
|
|
75
|
+
|
|
76
|
+
**10. NATIVE / mobile — system browser only, never a webview.**
|
|
77
|
+
- Use **ASWebAuthenticationSession** (iOS) / **Custom Tabs** (Android) via **AppAuth**. These share the system cookie jar (SSO) and isolate credentials from your app.
|
|
78
|
+
- **Never an embedded `WKWebView`/`WebView`**: Google (and others) block it (`disallowed_useragent`), it defeats SSO, and an embedded webview CAN read the user's IdP credentials — that is the whole point of avoiding it.
|
|
79
|
+
- PKCE is mandatory; redirect via a custom scheme or App Link/Universal Link that exact-matches registration.
|
|
80
|
+
|
|
81
|
+
**11. Apple Sign In quirks (and other provider gotchas).**
|
|
82
|
+
- Apple returns the user's **name only on the FIRST authorization** — persist it then or it's gone forever. Email may be a **private relay** (`@privaterelay.appleid.com`) the user can disable later — handle bounces.
|
|
83
|
+
- Provider table:
|
|
84
|
+
|
|
85
|
+
| Provider | Watch out for |
|
|
86
|
+
|---|---|
|
|
87
|
+
| Apple | name first-auth only; relay email; `client_secret` is a short-lived **JWT you sign** (ES256), not a static string — must regenerate |
|
|
88
|
+
| GitHub | OAuth, **not full OIDC** — no id_token; call `/user` + `/user/emails` with the access token; pick the `primary`+`verified` email |
|
|
89
|
+
| Microsoft (Entra) | `iss` varies per tenant; validate against `https://login.microsoftonline.com/{tid}/v2.0`; `v1.0` vs `v2.0` endpoints differ |
|
|
90
|
+
| Google | refresh token only on first consent / `prompt=consent`; `email_verified` reliable |
|
|
91
|
+
|
|
92
|
+
**12. Use a vetted library — do not hand-roll JWT validation or the flow.**
|
|
93
|
+
|
|
94
|
+
| Stack | Library |
|
|
95
|
+
|---|---|
|
|
96
|
+
| Node | `openid-client` |
|
|
97
|
+
| Python | `Authlib` |
|
|
98
|
+
| Java/Spring | Spring Security OAuth2 Client |
|
|
99
|
+
| Next.js / full-stack JS | NextAuth / Auth.js |
|
|
100
|
+
| iOS / Android | AppAuth |
|
|
101
|
+
|
|
102
|
+
These handle discovery, JWKS caching, PKCE, state/nonce, and clock skew correctly. Rolling your own ID-token verifier is the single most common source of `alg:none`/audience-confusion bugs.
|
|
103
|
+
|
|
104
|
+
## Common Errors
|
|
105
|
+
|
|
106
|
+
- **No PKCE / `code_challenge_method=plain`** — auth code interceptable. Always S256.
|
|
107
|
+
- **Skipping or not comparing `state`** — CSRF / code injection. Store server-side, compare on callback.
|
|
108
|
+
- **Trusting the ID token without checking `aud`** — a token minted for a DIFFERENT app of the same provider passes signature but is not for you. Require `aud == your client_id`.
|
|
109
|
+
- **`alg:none` / RS→HS confusion accepted** — verifier reads `alg` from the token. Hardcode an allowlist; reject `none` and unexpected algs.
|
|
110
|
+
- **`client_secret` shipped in SPA/mobile/repo** — it's public. PKCE replaces it; secret lives only server-side.
|
|
111
|
+
- **Auto-linking on unverified email** — account takeover. Link only when `email_verified` AND trusted IdP, or require existing-login confirmation.
|
|
112
|
+
- **Refresh token in `localStorage`** — XSS-readable. httpOnly cookie / Keychain.
|
|
113
|
+
- **Embedded webview for native login** — provider blocks it and it can steal IdP creds. System browser (ASWebAuthenticationSession / Custom Tabs).
|
|
114
|
+
- **redirect_uri "close enough"** — provider rejects, or a loose registration becomes an open redirect. Exact match, pre-registered.
|
|
115
|
+
- **Lost Apple name / dropped Google refresh token** — both arrive once. Persist on first response.
|
|
116
|
+
|
|
117
|
+
## Verify
|
|
118
|
+
|
|
119
|
+
1. Tamper one byte of the ID-token signature → validation rejects. Craft `alg:none` → rejected. Swap to `HS256` signed with the public key → rejected.
|
|
120
|
+
2. Token with wrong `aud` (another client_id) → rejected; expired `exp` → rejected; mismatched `nonce` → rejected.
|
|
121
|
+
3. Callback with a wrong/missing `state` → rejected. Token exchange with a wrong `code_verifier` → fails.
|
|
122
|
+
4. `grep` the SPA/mobile bundle for the `client_secret` → not present.
|
|
123
|
+
5. Account-link test: register `victim@x.com` at a provider that does NOT verify email → your app refuses to auto-link to the existing local account.
|
|
124
|
+
6. Refresh rotation: use a refresh token, replay the old one → family revoked, refresh fails.
|
|
125
|
+
7. Native: confirm login opens the system browser (ASWebAuthenticationSession / Custom Tabs), not an in-app webview.
|
|
126
|
+
8. Logout: after RP-initiated logout, the refresh token no longer mints access tokens at the provider.
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: load-stress-test
|
|
3
|
+
description: Designs and runs load, stress, soak, and spike tests against an HTTP/gRPC service using an open arrival-rate model — driving a realistic endpoint mix with think-time past the saturation knee and reporting latency percentiles, throughput ceiling, and breaking point against machine-checkable SLO thresholds.
|
|
4
|
+
when_to_use: Before a launch/scale event, for capacity planning, or to validate an SLO — when the question is sustained req/s, where p99 degrades, or whether the service survives a soak/spike. Distinct from performance-profiling (explains why one already-measured request is slow) and optimize-sql-query (tunes one query's plan); this skill finds the limit, those explain it.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when the question is **"how much can it take, and where does it break"** — a capacity/SLO question, not a code question:
|
|
10
|
+
|
|
11
|
+
- "How many req/s can this hold before p99 blows past 500ms?"
|
|
12
|
+
- "Will checkout survive Black Friday / the launch spike?"
|
|
13
|
+
- "Find the breaking point — ramp until error rate crosses 1%."
|
|
14
|
+
- "Does it leak / degrade over an 8-hour soak at steady load?"
|
|
15
|
+
- "Validate the SLO: p95 < 300ms, p99 < 800ms, errors < 0.5% at 2k RPS."
|
|
16
|
+
- "Gate CI so a PR can't regress p95 by >10%."
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- *Why* one endpoint is slow when you already know it is (flame graph, allocations) → performance-profiling
|
|
20
|
+
- A specific slow SQL query's plan/indexes → optimize-sql-query
|
|
21
|
+
- A prod incident already happening (this is a planned test, not a live outage) → incident-response-sre, or debug-root-cause for a reproducible failure
|
|
22
|
+
- Adding the metrics/traces you watch during the run → observability-instrument (a prerequisite, not this)
|
|
23
|
+
- Wiring the gate into the pipeline mechanics → cicd-pipeline-author (this defines the threshold; that plumbs it in)
|
|
24
|
+
|
|
25
|
+
## Steps
|
|
26
|
+
|
|
27
|
+
1. **Write the goal as numbers before touching a tool.** A test with no pass/fail line is just a graph. Fix four things:
|
|
28
|
+
- **Objective + scenario** (drives the load shape):
|
|
29
|
+
|
|
30
|
+
| Scenario | Question it answers | Shape | Duration |
|
|
31
|
+
|---|---|---|---|
|
|
32
|
+
| **Smoke** | Does the script even work? | 1–5 VUs | 1 min |
|
|
33
|
+
| **Load** | Holds at *expected peak*? | ramp to target RPS, hold | 10–30 min |
|
|
34
|
+
| **Stress** | Where's the knee / breaking point? | ramp **past** target until SLO breaks | until failure |
|
|
35
|
+
| **Soak** | Leak/degradation over time? | steady moderate load | 2–8 hr |
|
|
36
|
+
| **Spike** | Survives a sudden surge + recovers? | flat → instant 5–20×, then drop | 1–5 min spike |
|
|
37
|
+
|
|
38
|
+
- **SLO thresholds** as concrete inequalities: e.g. `p95 < 300ms`, `p99 < 800ms`, `error_rate < 0.5%`, `throughput ≥ 2000 req/s`. These become the exit code.
|
|
39
|
+
- **Target intensity** in **arrival rate (RPS)**, not just VUs — VU count without think-time is meaningless (see Common Errors). Derive VUs from Little's Law: `VUs ≈ target_RPS × (avg_response_time + think_time)`.
|
|
40
|
+
- **Environment**: a prod-like staging box (same instance class, DB size, cache warm, autoscaling either off or explicitly in-scope). Never load-test prod blind.
|
|
41
|
+
|
|
42
|
+
2. **Model a realistic workload, not a hammer on one URL.** A single hot endpoint at 100% gives a fantasy number.
|
|
43
|
+
- **Endpoint mix** weighted to real traffic (read from access logs / APM): e.g. 70% `GET /feed`, 20% `GET /item/:id`, 8% `POST /cart`, 2% `POST /checkout`.
|
|
44
|
+
- **Think-time** between steps (`sleep(rand 1..3)`) so each VU models a user, not a tight loop.
|
|
45
|
+
- **Parameterized + correlated data**: unique users/items per iteration from a CSV/SharedArray (no caching by accident); capture a token/ID from response N and feed request N+1 (login → use `access_token`; create order → reuse `order_id`).
|
|
46
|
+
- **Auth**: log in once per VU and reuse the token; don't re-auth every iteration unless that's the scenario under test.
|
|
47
|
+
|
|
48
|
+
3. **Pick the tool by team + need, encode thresholds as exit-code gates.** Default to **k6** for code-first, CI-friendly tests — it has RPS-precise arrival-rate executors and native threshold gates, so it covers most cases. Reach for the others only for the listed reason:
|
|
49
|
+
|
|
50
|
+
| Tool | Script lang | Reach for it when | Native threshold gate |
|
|
51
|
+
|---|---|---|---|
|
|
52
|
+
| **k6** (default) | JS | CI, scripted, RPS-precise (`constant-arrival-rate`) | `thresholds` → exit 99 on breach |
|
|
53
|
+
| Locust | Python | dynamic per-user logic, Python shop | `--exit-code-on-error` + custom |
|
|
54
|
+
| Gatling | Scala/Java DSL | JVM teams, rich HTML report | `assertions` → non-zero exit |
|
|
55
|
+
| Artillery | YAML/JS | quick YAML scenarios, serverless | `ensure` plugin |
|
|
56
|
+
| JMeter | XML/GUI | legacy/enterprise, protocol breadth | clunky; prefer above for CI |
|
|
57
|
+
|
|
58
|
+
k6 with an **open model** (arrival rate — the correct way to fix RPS and dodge coordinated omission) and SLOs as code:
|
|
59
|
+
|
|
60
|
+
```js
|
|
61
|
+
import http from 'k6/http';
|
|
62
|
+
import { check, sleep } from 'k6';
|
|
63
|
+
import { SharedArray } from 'k6/data';
|
|
64
|
+
const users = new SharedArray('u', () => JSON.parse(open('./users.json')));
|
|
65
|
+
|
|
66
|
+
export const options = {
|
|
67
|
+
scenarios: {
|
|
68
|
+
ramp_to_knee: {
|
|
69
|
+
executor: 'ramping-arrival-rate', // open model: fixed RPS, k6 adds VUs as needed
|
|
70
|
+
startRate: 100, timeUnit: '1s',
|
|
71
|
+
preAllocatedVUs: 200, maxVUs: 2000,
|
|
72
|
+
stages: [
|
|
73
|
+
{ target: 500, duration: '2m' }, // warm-up — exclude from SLO judgment
|
|
74
|
+
{ target: 2000, duration: '5m' }, // hold at target peak
|
|
75
|
+
{ target: 4000, duration: '5m' }, // push PAST to find the knee
|
|
76
|
+
],
|
|
77
|
+
},
|
|
78
|
+
},
|
|
79
|
+
thresholds: { // breach → process exits non-zero → CI fails
|
|
80
|
+
http_req_duration: ['p(95)<300', 'p(99)<800'],
|
|
81
|
+
http_req_failed: ['rate<0.005'],
|
|
82
|
+
http_reqs: ['rate>1800'], // throughput floor
|
|
83
|
+
},
|
|
84
|
+
};
|
|
85
|
+
export default function () {
|
|
86
|
+
const u = users[Math.floor(Math.random() * users.length)];
|
|
87
|
+
const r = http.get(`https://staging.internal/feed?u=${u.id}`);
|
|
88
|
+
check(r, { 'status 200': (res) => res.status === 200 });
|
|
89
|
+
sleep(Math.random() * 2 + 1); // think-time 1–3s
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
Run: `k6 run --summary-trend-stats="avg,p(95),p(99),max" test.js`.
|
|
93
|
+
|
|
94
|
+
4. **Run staged, and watch the server while the client pushes.** Escalate; don't jump to max:
|
|
95
|
+
1. **Smoke** (1–5 VUs) — fix the script/correlation before scaling.
|
|
96
|
+
2. **Baseline** at low steady load — record reference percentiles.
|
|
97
|
+
3. **Ramp to target** — confirm SLO holds at expected peak.
|
|
98
|
+
4. **Push past** — keep ramping until a threshold breaks; the load just below that is the **breaking point / knee**.
|
|
99
|
+
|
|
100
|
+
The client number alone is half the picture. Capture **server-side** metrics over the same window (Grafana/Prometheus/APM): CPU%, memory (RSS trend for soak), **DB/connection-pool saturation**, thread/worker queue depth, GC pauses, downstream latency. The first resource to hit ~100% (CPU, pool exhaustion, disk I/O, a downstream rate limit) **is the bottleneck** — that's the finding. Always confirm the **client isn't the bottleneck** (load-gen box CPU/network not saturated, file descriptors raised) before trusting a ceiling.
|
|
101
|
+
|
|
102
|
+
5. **Report the four numbers + the saturated resource, then gate.** A useful report states: **(a)** latency percentiles (p50/p95/p99/max) at target load, **(b)** sustained throughput ceiling (max RPS where SLO still holds), **(c)** breaking point (load where it broke + how — errors, timeouts, or latency cliff), **(d)** the saturated resource at that point. For soak, add the RSS/latency trend over time (flat = healthy; rising = leak). For CI: store the baseline summary, fail the build when p95/p99/error-rate regress beyond an allowed delta.
|
|
103
|
+
|
|
104
|
+
## Common Errors
|
|
105
|
+
|
|
106
|
+
- **Coordinated omission.** A closed-model loop that waits on each slow response stops *issuing* new requests during a stall, so the slowest requests are undercounted and p99 looks great. Fix: use an **open/arrival-rate model** (k6 `*-arrival-rate`, Gatling `constantUsersPerSec`, wrk2) that schedules requests on a fixed clock regardless of in-flight latency.
|
|
107
|
+
- **No warm-up.** First requests hit cold JIT, empty caches, unconnected pools, and cold autoscalers — folding them in poisons percentiles. Run a warm-up stage and **exclude it** from the SLO judgment window.
|
|
108
|
+
- **VUs as the target, no think-time.** "500 VUs" in a tight loop is an unrealistic, immeasurable arrival rate. Specify **RPS**; add think-time so a VU models a user. Convert via Little's Law.
|
|
109
|
+
- **Single-VU extrapolation.** "1 user got 50ms, so 1000 users = 50ms each" — ignores contention, queueing, and pool limits, the entire point of the test. Latency is non-linear past the knee; you must actually ramp.
|
|
110
|
+
- **Client is the bottleneck.** A maxed-out load-gen box (CPU, NIC, ephemeral ports, `ulimit -n`) caps *your* throughput, not the server's. Raise FD limits, distribute across machines (k6 cloud / multiple agents), and verify the generator is under ~70% before believing any ceiling.
|
|
111
|
+
- **Testing a non-prod-like env.** Tiny DB, no cache, debug logging, a shared box — numbers don't transfer. Match instance class, data volume, and config; disable verbose logging.
|
|
112
|
+
- **One endpoint at 100%.** Over-caches and misses cross-endpoint contention (shared pool, locks). Use a weighted mix from real traffic.
|
|
113
|
+
- **Reusing the same record every iteration.** One user ID hits a hot cache row and reports impossibly low latency. Parameterize from a dataset of unique keys.
|
|
114
|
+
- **Reporting only the average.** A 40ms mean can hide a 4s p99. Averages lie under load — always report **p95/p99/max**.
|
|
115
|
+
- **Load-testing production unannounced.** Real users, real bills, real pages. Use staging; if prod is mandatory, schedule it, cap blast radius, and tell the on-call.
|
|
116
|
+
- **Ignoring server metrics.** Client-only results tell you *that* it broke, never *why*. Without CPU/mem/pool/DB you can't name the bottleneck or fix it.
|
|
117
|
+
|
|
118
|
+
## Verify
|
|
119
|
+
|
|
120
|
+
1. **Threshold gate is real:** intentionally set an impossible threshold (`p(95)<1`) → the run **exits non-zero**. Proves the SLO is machine-checked, not eyeballed.
|
|
121
|
+
2. **Open model confirmed:** the actual issued RPS tracks the configured arrival rate even as latency rises (not throttled by in-flight count) — no coordinated omission.
|
|
122
|
+
3. **Warm-up excluded:** reported percentiles come from the steady window, and the first-stage cold numbers are visibly separated, not blended in.
|
|
123
|
+
4. **Breaking point is named with a cause:** report states "broke at ~N RPS — `http_req_failed` crossed 1% / p99 hit the cliff" **and** the saturated resource (e.g. "DB pool at 100%, CPU 95%"), not just a latency graph.
|
|
124
|
+
5. **Client wasn't the limiter:** load-gen CPU/network stayed below ~70% and FD limits weren't hit at the reported ceiling — otherwise the number is the generator's, not the service's.
|
|
125
|
+
6. **Realism holds:** endpoint mix ≈ production weights, data was parameterized (cache-hit ratio sane, not artificially 100%), think-time present.
|
|
126
|
+
7. **Soak (if run):** memory RSS and p95 are **flat** across the full duration — a rising slope is a leak/degradation finding, not a pass.
|
|
127
|
+
8. **Reproducible:** the script, dataset, env spec, and exact command are committed so the run can be replayed and CI-gated.
|
|
128
|
+
|
|
129
|
+
Done = the scenario ran on a prod-like env with an open arrival-rate model and excluded warm-up, every SLO threshold is enforced by a non-zero exit code, and the report states latency percentiles, the sustained throughput ceiling, the breaking point with its cause, and the saturated server-side resource — with the load generator proven not to be the bottleneck.
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: map-privacy-data-gdpr
|
|
3
|
+
description: Implements privacy/data-protection engineering — personal-data inventory/mapping (RoPA), lawful-basis and versioned consent capture, DSAR machine-readable export and right-to-erasure cascades across derived data/logs/backups, TTL/scheduled retention purge, and PII minimization/pseudonymization — for GDPR/CCPA-style compliance.
|
|
4
|
+
when_to_use: A product stores personal data and needs consent capture, data export, deletion/erasure, or retention controls. Distinct from auth-jwt-session and design-authorization-model (who may access), build-audit-logging (tamper-evident action trail), and security-review (vulnerability audit).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when the requirement is **what happens to a person's data**, not who may touch it:
|
|
10
|
+
|
|
11
|
+
- "A user requested their data / file a DSAR export endpoint"
|
|
12
|
+
- "Implement account deletion / right to be forgotten / erasure that actually removes them everywhere"
|
|
13
|
+
- "Record consent for marketing/analytics — granular, withdrawable, with proof"
|
|
14
|
+
- "We keep PII forever — add retention limits / a purge job"
|
|
15
|
+
- "Map where personal data lives for our DPIA / Article 30 record of processing"
|
|
16
|
+
- "Stop collecting/storing PII we don't need; pseudonymize the rest"
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- Logging in users, sessions, OAuth, refresh rotation → **auth-jwt-session**
|
|
20
|
+
- Deciding which role/tenant may read a record (RBAC/ABAC/row scoping) → **design-authorization-model**
|
|
21
|
+
- The tamper-evident trail of *who did what* (including who ran an erasure) → **build-audit-logging**
|
|
22
|
+
- Hunting injection/SSRF/access-control bugs in changed code → **security-review**
|
|
23
|
+
- Design-level threat enumeration over a system handling PII → **threat-model-stride**
|
|
24
|
+
- Backup/PITR/retention *of the datastore itself* (RPO/RTO, WAL archiving) → **design-backup-dr-recovery**
|
|
25
|
+
|
|
26
|
+
## Steps
|
|
27
|
+
|
|
28
|
+
1. **Build the data inventory first — you cannot delete or export what you haven't mapped.** Produce a machine-readable record of processing (RoPA, GDPR Art. 30). One row per (data element × store). Drive everything downstream — export, erasure, retention — off this file, not off tribal knowledge.
|
|
29
|
+
|
|
30
|
+
```yaml
|
|
31
|
+
# privacy/inventory.yaml — source of truth for DSAR + purge + DPIA
|
|
32
|
+
- element: email
|
|
33
|
+
category: contact # contact | identifier | special-category | behavioral | financial
|
|
34
|
+
store: postgres.users.email
|
|
35
|
+
purpose: account login, transactional mail
|
|
36
|
+
lawful_basis: contract # see step 2 table
|
|
37
|
+
retention: account_lifetime_plus_30d
|
|
38
|
+
subject_key: users.id
|
|
39
|
+
export: true
|
|
40
|
+
erase: anonymize # delete | anonymize | retain-with-basis
|
|
41
|
+
- element: ip_address
|
|
42
|
+
category: identifier
|
|
43
|
+
store: postgres.request_logs.ip
|
|
44
|
+
purpose: fraud/abuse detection
|
|
45
|
+
lawful_basis: legitimate_interest
|
|
46
|
+
retention: 90d
|
|
47
|
+
subject_key: request_logs.user_id
|
|
48
|
+
export: true
|
|
49
|
+
erase: delete
|
|
50
|
+
- element: clickstream
|
|
51
|
+
category: behavioral
|
|
52
|
+
store: bigquery.analytics.events
|
|
53
|
+
purpose: product analytics
|
|
54
|
+
lawful_basis: consent
|
|
55
|
+
retention: 14m
|
|
56
|
+
subject_key: events.user_pseudo_id # NOT the raw user id — see step 6
|
|
57
|
+
export: true
|
|
58
|
+
erase: delete
|
|
59
|
+
```
|
|
60
|
+
Enumerate **every** store: primary DB, replicas, search index (Elasticsearch), caches (Redis), object storage (S3 uploads), data warehouse, application logs, third-party processors (Stripe, Segment, Intercom, Sentry), and **backups**. A store missing from this file is a store your erasure silently skips — that is the #1 compliance gap.
|
|
61
|
+
|
|
62
|
+
2. **Pin a lawful basis to every element before you collect it.** No basis = you may not process it. Pick the *narrowest* basis that fits and record it in the inventory; consent is the weakest because it's revocable and must be audited.
|
|
63
|
+
|
|
64
|
+
| Lawful basis (GDPR Art. 6) | Use for | On erasure request |
|
|
65
|
+
|---|---|---|
|
|
66
|
+
| **Consent** | marketing, non-essential analytics, optional cookies | must delete; consent withdrawable any time |
|
|
67
|
+
| **Contract** | login email, order/shipping data, billing | retain while contract active, then purge |
|
|
68
|
+
| **Legal obligation** | tax invoices, AML/KYC records | **keep** for statutory period; refuse erasure with reason |
|
|
69
|
+
| **Legitimate interest** | fraud/abuse, basic security logs | keep if LIA balancing holds; honor objection |
|
|
70
|
+
| **Vital / public interest** | rare; safety-of-life | document case-by-case |
|
|
71
|
+
|
|
72
|
+
Default new fields to **no collection** until a basis is assigned. CCPA framing differs (opt-out of "sale/share", not opt-in consent) — model both as flags on the same consent record.
|
|
73
|
+
|
|
74
|
+
3. **Capture consent as granular, versioned, withdrawable, time-stamped records — never a single boolean.** A `marketing_opt_in = true` column proves nothing and can't show *which* policy version they agreed to. Append-only consent ledger:
|
|
75
|
+
|
|
76
|
+
```sql
|
|
77
|
+
CREATE TABLE consent_records (
|
|
78
|
+
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
|
79
|
+
subject_id uuid NOT NULL,
|
|
80
|
+
purpose text NOT NULL, -- 'marketing_email','analytics','third_party_share'
|
|
81
|
+
granted boolean NOT NULL, -- false row = explicit withdrawal
|
|
82
|
+
policy_version text NOT NULL, -- 'privacy-policy@2026-03-01'
|
|
83
|
+
source text NOT NULL, -- 'signup_form','preference_center','cookie_banner'
|
|
84
|
+
evidence jsonb NOT NULL, -- {ip, user_agent, banner_choice_ids}
|
|
85
|
+
created_at timestamptz NOT NULL DEFAULT now()
|
|
86
|
+
);
|
|
87
|
+
CREATE INDEX ON consent_records (subject_id, purpose, created_at DESC);
|
|
88
|
+
-- current state = latest row per (subject_id, purpose); NEVER UPDATE/DELETE rows
|
|
89
|
+
```
|
|
90
|
+
Withdrawal is a new `granted=false` row, not a mutation — you must be able to prove the full history. Check consent at point of use (`SELECT … WHERE subject_id=$1 AND purpose=$2 ORDER BY created_at DESC LIMIT 1`), and gate consent-based processing on it. Cookie banner: no non-essential cookies/tags fire before a `granted=true` row exists.
|
|
91
|
+
|
|
92
|
+
4. **DSAR export: assemble a complete, machine-readable package keyed off the inventory.** Iterate every row with `export: true`, pull the subject's data by `subject_key`, emit structured JSON (GDPR Art. 20 portability requires machine-readable). Include data from processors via their APIs. Authenticate the requester *hard* (re-auth + verification) — handing one user's data to another is itself a breach. Respond within the statutory window (GDPR 30 days; CCPA 45). Don't include other people's PII caught in the subject's rows (e.g. recipient emails) — redact third parties.
|
|
93
|
+
|
|
94
|
+
5. **Right-to-erasure must cascade to every store, including derived data, caches, logs, and backups — or document the backup-expiry path.** A `DELETE FROM users WHERE id=…` that leaves the subject in the search index, Redis cache, analytics warehouse, and last night's snapshot is **not** erasure. Drive a cascade from the inventory's `erase:` column:
|
|
95
|
+
|
|
96
|
+
```python
|
|
97
|
+
def erase_subject(subject_id):
|
|
98
|
+
for row in inventory: # one source of truth
|
|
99
|
+
store = connect(row.store)
|
|
100
|
+
match row.erase:
|
|
101
|
+
case "delete": store.delete(row, subject_id)
|
|
102
|
+
case "anonymize": store.anonymize(row, subject_id) # see step 6
|
|
103
|
+
case "retain-with-basis":
|
|
104
|
+
log_retained(subject_id, row, reason=row.lawful_basis) # legal hold
|
|
105
|
+
invalidate_cache(subject_id) # Redis keys, CDN signed URLs
|
|
106
|
+
delete_from_search_index(subject_id) # Elasticsearch/OpenSearch
|
|
107
|
+
enqueue_processor_deletions(subject_id) # Stripe, Segment, Intercom, Sentry APIs
|
|
108
|
+
add_to_suppression_list(subject_id) # tombstone — see below
|
|
109
|
+
record_erasure(subject_id) # write to audit log (build-audit-logging)
|
|
110
|
+
```
|
|
111
|
+
**Backups can't be selectively edited** — the defensible approach is: (a) restrict restored-backup access, (b) re-apply the erasure to any data restored from backup, and (c) let backups age out under a bounded retention (e.g. 35 days), documented in your privacy policy. Maintain a **suppression/tombstone list** so a restore or a late-arriving event for an erased subject is re-deleted, not resurrected. Erasure under a legal-hold basis (tax/AML) is refused with a recorded reason, not silently ignored.
|
|
112
|
+
|
|
113
|
+
6. **Minimize and pseudonymize — the cheapest data to protect is data you don't hold.** Per element ask: do we *need* it? Don't collect optional PII "just in case". Replace direct identifiers with a per-subject pseudonym key in analytics/derived stores so erasing the key map breaks linkage (`erase: anonymize` then becomes deleting the mapping, not rewriting a warehouse). True anonymization (irreversible, no re-identification via combination) takes data **out of GDPR scope** — prefer it for analytics/ML training sets. Tokenize or keyed-hash (HMAC with a secret salt) and delete the mapping; **never** use a plain unsalted hash you call "anonymized". Drop high-cardinality quasi-identifiers (full IP → /24, exact DOB → birth year) where the purpose allows.
|
|
114
|
+
|
|
115
|
+
7. **Enforce retention with TTL or scheduled purge jobs — retention written in a policy but not enforced in code is a fiction.** Translate each `retention:` value into a real mechanism:
|
|
116
|
+
- Native TTL where the store has one: MongoDB TTL index, DynamoDB TTL attribute, Redis `EXPIRE`, BigQuery partition expiration, S3 lifecycle rules, Elasticsearch ILM.
|
|
117
|
+
- A scheduled job (cron / Airflow / pg_cron) that `DELETE`s rows past `created_at + retention` for stores without TTL — run daily, log counts purged, alert on zero-purged-when-expected.
|
|
118
|
+
Logs and analytics are the usual offenders (PII-laden, retained forever). Cap them explicitly.
|
|
119
|
+
|
|
120
|
+
8. **Document cross-border transfers and processor agreements.** Any element flowing to a processor outside the data's region needs a transfer mechanism (SCCs / adequacy decision) and a signed DPA. List sub-processors. Record this alongside the inventory — auditors ask for it, and a new third-party integration that isn't in the list is an unmapped data egress.
|
|
121
|
+
|
|
122
|
+
## Common Errors
|
|
123
|
+
|
|
124
|
+
- **Erasure that hits the primary DB only.** The subject survives in the search index, cache, warehouse, logs, and backups. Drive deletion from the inventory across *every* store; assert absence afterward.
|
|
125
|
+
- **Consent as one boolean column.** Can't prove which policy version, when, or how; an `UPDATE` erases the prior state. Use an append-only versioned ledger; withdrawal is a new row.
|
|
126
|
+
- **No suppression list.** A backup restore or a delayed event re-creates an erased subject ("data resurrection"). Keep a tombstone list and re-apply erasure on restore/ingest.
|
|
127
|
+
- **Reversible "anonymization".** A plain SHA-256 of an email is re-identifiable by dictionary attack — still personal data, still in scope. Keyed-hash/tokenize and delete the mapping, or aggregate so individuals can't be singled out.
|
|
128
|
+
- **Treating backups as out of scope entirely.** Ignoring them fails erasure; trying to surgically edit them corrupts them. Use bounded retention + restore-time re-erasure, and document it.
|
|
129
|
+
- **Weak DSAR identity check.** Emailing an export to whoever asks lets an attacker harvest a victim's data. Re-authenticate and verify before export or erasure.
|
|
130
|
+
- **Retention policy with no enforcement job.** "We keep logs 90 days" while the table grows unbounded. Wire a TTL or a scheduled purge and verify it actually runs.
|
|
131
|
+
- **Logging full PII.** Request/error logs capturing emails, tokens, full bodies become an uncontrolled PII store with infinite retention. Redact at the logger; set log retention.
|
|
132
|
+
- **Forgetting processors.** Deleting locally but leaving the subject in Stripe/Segment/Intercom/Sentry. Call each processor's deletion API as part of the cascade.
|
|
133
|
+
- **Hardcoding the store list in deletion code instead of the inventory.** A new table added without touching the deletion code is silently skipped forever. Single source of truth, generated cascade.
|
|
134
|
+
|
|
135
|
+
## Verify
|
|
136
|
+
|
|
137
|
+
- **Erasure completeness:** run `erase_subject(id)`, then query **every** store in the inventory for that subject's `subject_key` → zero rows (or only anonymized/legal-hold-retained rows with a recorded reason). This is the test that catches the forgotten store; automate it per store.
|
|
138
|
+
- **Resurrection resistance:** restore a backup taken before an erasure (or replay a late event) → the subject is re-suppressed, not present. Suppression list is consulted on restore/ingest.
|
|
139
|
+
- **Export completeness:** for a seeded subject with data in N stores, the DSAR package contains all N, is valid JSON/machine-readable, and contains **no other** subject's PII.
|
|
140
|
+
- **Consent gate:** withdrawing consent (new `granted=false` row) stops the gated processing on the next check; granted/withdrawn history is fully reconstructable; no non-essential tag fires before consent.
|
|
141
|
+
- **Retention enforcement:** advance a record past its retention window (or wait/seed) → the TTL/purge job removes it on the next scheduled run; purge job logs counts and alerts on anomalies.
|
|
142
|
+
- **Lawful basis coverage:** every element in the inventory has a basis; legal-obligation elements correctly *survive* an erasure request with a recorded reason.
|
|
143
|
+
- **Minimization:** no element is collected/stored without a row in the inventory; derived/analytics stores key on a pseudonym, not the raw identifier.
|
|
144
|
+
- **Transfers:** every cross-border element maps to a transfer mechanism + DPA; processor list matches actual integrations.
|
|
145
|
+
|
|
146
|
+
Done = an erasure request provably removes or anonymizes the subject across every inventoried store (with backups handled by bounded retention + restore-time re-erasure and a suppression list), the DSAR export is complete and machine-readable with no third-party PII leakage, consent is versioned/withdrawable with reconstructable history, and every retention window is enforced by a TTL or scheduled purge that demonstrably runs.
|