npm - sanook-cli - Versions diffs - 0.4.0 → 0.5.1 - Mend

sanook-cli 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (238) hide show

package/.env.example +19 -0
package/CHANGELOG.md +173 -0
package/README.md +153 -20
package/README.th.md +136 -0
package/dist/agentContext.js +4 -0
package/dist/approval.js +6 -0
package/dist/bin.js +405 -57
package/dist/brain.js +92 -59
package/dist/brand.js +47 -0
package/dist/checkpoint.js +37 -0
package/dist/commands.js +86 -6
package/dist/compaction.js +76 -5
package/dist/config.js +100 -12
package/dist/cost.js +60 -3
package/dist/doctor.js +92 -0
package/dist/gateway/auth.js +2 -2
package/dist/gateway/ledger.js +2 -2
package/dist/gateway/scheduler.js +1 -0
package/dist/gateway/serve.js +6 -4
package/dist/gateway/server.js +10 -2
package/dist/git.js +11 -2
package/dist/hooks.js +43 -17
package/dist/knowledge.js +48 -49
package/dist/loop.js +182 -66
package/dist/lsp/client.js +173 -0
package/dist/lsp/framing.js +56 -0
package/dist/lsp/index.js +138 -0
package/dist/lsp/servers.js +82 -0
package/dist/mcp-server.js +244 -0
package/dist/mcp.js +184 -29
package/dist/memory-store.js +559 -0
package/dist/memory.js +143 -29
package/dist/orchestrate.js +150 -0
package/dist/providers/codex.js +21 -7
package/dist/providers/keys.js +3 -2
package/dist/providers/models.js +22 -6
package/dist/providers/registry.js +155 -1
package/dist/repomap.js +93 -0
package/dist/search/chunk.js +158 -0
package/dist/search/embed-store.js +187 -0
package/dist/search/engine.js +203 -0
package/dist/search/fuse.js +35 -0
package/dist/search/index-core.js +187 -0
package/dist/search/indexer.js +241 -0
package/dist/search/store.js +77 -0
package/dist/session.js +42 -8
package/dist/skill-install.js +10 -10
package/dist/skills.js +12 -9
package/dist/summarize.js +31 -0
package/dist/tools/bash.js +21 -2
package/dist/tools/diagnostics.js +41 -0
package/dist/tools/edit.js +29 -7
package/dist/tools/index.js +8 -1
package/dist/tools/list.js +7 -2
package/dist/tools/permission.js +90 -9
package/dist/tools/read.js +23 -4
package/dist/tools/remember.js +1 -1
package/dist/tools/sandbox.js +61 -0
package/dist/tools/search.js +105 -4
package/dist/tools/task.js +195 -29
package/dist/tools/timeout.js +35 -0
package/dist/tools/util.js +10 -0
package/dist/tools/write.js +6 -4
package/dist/trust.js +89 -0
package/dist/ui/app.js +228 -31
package/dist/ui/banner.js +4 -9
package/dist/ui/brain-wizard.js +2 -2
package/dist/ui/history.js +30 -0
package/dist/ui/mentions.js +44 -0
package/dist/ui/render.js +55 -15
package/dist/ui/setup.js +97 -12
package/dist/ui/useEditor.js +83 -0
package/dist/update.js +114 -0
package/dist/worktree.js +173 -0
package/package.json +11 -5
package/scripts/postinstall.mjs +33 -0
package/second-brain/.agents/_Index.md +30 -0
package/second-brain/.agents/skills/_Index.md +30 -0
package/second-brain/.agents/workflows/_Index.md +30 -0
package/second-brain/AGENTS.md +4 -4
package/second-brain/Acceptance/_Index.md +30 -0
package/second-brain/Acceptance/golden-case-template.md +39 -0
package/second-brain/Areas/_Index.md +30 -0
package/second-brain/Bugs/System-OS/_Index.md +30 -0
package/second-brain/Bugs/_Index.md +30 -0
package/second-brain/CLAUDE.md +4 -1
package/second-brain/Checklists/_Index.md +30 -0
package/second-brain/Checklists/preflight-postflight-template.md +29 -0
package/second-brain/Distillations/_Index.md +30 -0
package/second-brain/Entities/_Index.md +30 -0
package/second-brain/Entities/entity-template.md +33 -0
package/second-brain/Evals/_Index.md +30 -0
package/second-brain/Evals/correction-pairs.md +24 -0
package/second-brain/Evals/failure-taxonomy.md +24 -0
package/second-brain/Evals/golden-set.md +25 -0
package/second-brain/Evals/quality-ledger.md +23 -0
package/second-brain/Evals/self-eval-rubric.md +23 -0
package/second-brain/GEMINI.md +4 -4
package/second-brain/Goals/_Index.md +30 -0
package/second-brain/Handoffs/_Index.md +30 -0
package/second-brain/Home.md +7 -0
package/second-brain/Intake/Raw Sources/_Index.md +30 -0
package/second-brain/Intake/_Index.md +30 -0
package/second-brain/Intake/_Quarantine/_Index.md +30 -0
package/second-brain/Learning/_Index.md +30 -0
package/second-brain/Playbooks/_Index.md +30 -0
package/second-brain/Playbooks/playbook-template.md +23 -0
package/second-brain/Projects/_Index.md +30 -0
package/second-brain/Prompts/_Index.md +30 -0
package/second-brain/README.md +2 -1
package/second-brain/Research/_Index.md +30 -0
package/second-brain/Retrospectives/_Index.md +30 -0
package/second-brain/Reviews/_Index.md +30 -0
package/second-brain/Runbooks/_Index.md +30 -0
package/second-brain/Runbooks/eval-loop.md +24 -0
package/second-brain/Sessions/_Index.md +30 -0
package/second-brain/Shared/AI-Context-Index.md +20 -0
package/second-brain/Shared/AI-Threads/_Index.md +30 -0
package/second-brain/Shared/Archive/_Index.md +30 -0
package/second-brain/Shared/Assets/_Index.md +30 -0
package/second-brain/Shared/Context-Packs/_Index.md +30 -0
package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
package/second-brain/Shared/Coordination/NOW.md +28 -0
package/second-brain/Shared/Coordination/_Index.md +30 -0
package/second-brain/Shared/Coordination/agent-registry.md +24 -0
package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
package/second-brain/Shared/Coordination/task-board.md +32 -0
package/second-brain/Shared/Core-Facts/_Index.md +30 -0
package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
package/second-brain/Shared/Glossary/_Index.md +30 -0
package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
package/second-brain/Shared/Operating-State/_Index.md +30 -0
package/second-brain/Shared/Prompting/_Index.md +30 -0
package/second-brain/Shared/Provenance/_Index.md +30 -0
package/second-brain/Shared/Rules/_Index.md +30 -0
package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
package/second-brain/Shared/Rules/rules-formatting.md +34 -0
package/second-brain/Shared/Scripts/_Index.md +30 -0
package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
package/second-brain/Shared/User-Memory/_Index.md +30 -0
package/second-brain/Shared/User-Persona/_Index.md +30 -0
package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
package/second-brain/Shared/Working-Memory/_Index.md +30 -0
package/second-brain/Shared/_Index.md +30 -0
package/second-brain/Shared/mcp-servers/_Index.md +30 -0
package/second-brain/Skills/_Index.md +30 -0
package/second-brain/Templates/_Index.md +30 -0
package/second-brain/Templates/bug.md +2 -0
package/second-brain/Templates/handoff.md +2 -0
package/second-brain/Templates/session.md +2 -0
package/second-brain/Tools/_Index.md +30 -0
package/second-brain/Traces/_Index.md +30 -0
package/second-brain/Vault Structure Map.md +33 -1
package/second-brain/copilot/_Index.md +30 -0
package/skills/audit-license-compliance/SKILL.md +117 -0
package/skills/author-codemod/SKILL.md +110 -0
package/skills/build-audit-logging/SKILL.md +112 -0
package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
package/skills/build-cli-tool/SKILL.md +108 -0
package/skills/build-data-table/SKILL.md +141 -0
package/skills/build-native-mobile-ui/SKILL.md +154 -0
package/skills/build-offline-first-sync/SKILL.md +118 -0
package/skills/build-realtime-channel/SKILL.md +122 -0
package/skills/build-vector-search/SKILL.md +131 -0
package/skills/compose-local-dev-stack/SKILL.md +149 -0
package/skills/configure-bundler-build/SKILL.md +166 -0
package/skills/configure-dns-tls/SKILL.md +142 -0
package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
package/skills/configure-security-headers-csp/SKILL.md +122 -0
package/skills/contract-testing/SKILL.md +140 -0
package/skills/datetime-timezone-correctness/SKILL.md +125 -0
package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
package/skills/debug-flaky-tests/SKILL.md +128 -0
package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
package/skills/deliver-webhooks/SKILL.md +116 -0
package/skills/design-api-pagination/SKILL.md +144 -0
package/skills/design-authorization-model/SKILL.md +119 -0
package/skills/design-backup-dr-recovery/SKILL.md +113 -0
package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
package/skills/design-multi-tenancy/SKILL.md +100 -0
package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
package/skills/design-relational-schema/SKILL.md +129 -0
package/skills/design-search-index-infra/SKILL.md +151 -0
package/skills/design-state-machine/SKILL.md +108 -0
package/skills/design-token-system/SKILL.md +109 -0
package/skills/distributed-locks-leases/SKILL.md +120 -0
package/skills/encrypt-sensitive-data/SKILL.md +148 -0
package/skills/feature-flags-rollout/SKILL.md +130 -0
package/skills/file-upload-object-storage/SKILL.md +107 -0
package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
package/skills/harden-llm-app-reliability/SKILL.md +126 -0
package/skills/i18n-localization-setup/SKILL.md +113 -0
package/skills/idempotency-keys/SKILL.md +107 -0
package/skills/implement-push-notifications/SKILL.md +142 -0
package/skills/ingest-webhook-secure/SKILL.md +120 -0
package/skills/integrate-oauth-oidc/SKILL.md +126 -0
package/skills/load-stress-test/SKILL.md +129 -0
package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
package/skills/model-nosql-data/SKILL.md +118 -0
package/skills/money-decimal-arithmetic/SKILL.md +123 -0
package/skills/monitor-ml-drift/SKILL.md +109 -0
package/skills/numeric-precision-units/SKILL.md +144 -0
package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
package/skills/optimize-react-rerenders/SKILL.md +124 -0
package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
package/skills/payments-billing-integration/SKILL.md +114 -0
package/skills/pin-toolchain-versions/SKILL.md +116 -0
package/skills/plan-strangler-migration/SKILL.md +95 -0
package/skills/property-based-testing/SKILL.md +108 -0
package/skills/publish-package-registry/SKILL.md +130 -0
package/skills/recover-git-state/SKILL.md +119 -0
package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
package/skills/resilience-timeouts-retries/SKILL.md +104 -0
package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
package/skills/rewrite-git-history/SKILL.md +109 -0
package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
package/skills/schema-evolution-compatibility/SKILL.md +121 -0
package/skills/send-transactional-email/SKILL.md +126 -0
package/skills/serve-deploy-ml-model/SKILL.md +107 -0
package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
package/skills/setup-devcontainer-env/SKILL.md +131 -0
package/skills/setup-lint-format-precommit/SKILL.md +140 -0
package/skills/setup-monorepo-tooling/SKILL.md +125 -0
package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
package/skills/structured-output-llm/SKILL.md +86 -0
package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
package/skills/test-data-factories/SKILL.md +158 -0
package/skills/threat-model-stride/SKILL.md +123 -0
package/skills/train-evaluate-ml-model/SKILL.md +109 -0
package/skills/unicode-text-correctness/SKILL.md +109 -0
package/skills/visual-regression-testing/SKILL.md +120 -0

package/skills/configure-dns-tls/SKILL.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+name: configure-dns-tls
+description: Configures DNS records and TLS for a service — A/AAAA/CNAME/ALIAS/MX/TXT/CAA, zero-downtime cutovers via pre-lowered TTL, automated ACME/Let's Encrypt/cert-manager issuance and auto-renewal, and TLS 1.2+/1.3-only settings with HSTS, OCSP stapling, and 80→443 redirect — eliminating expired-cert and bad-cutover outages.
+when_to_use: Pointing a domain at a service, enabling HTTPS, automating/rotating certificates (ACME/cert-manager), or migrating DNS. Distinct from configure-reverse-proxy-lb (the proxy/LB that terminates the TLS this issues) and setup-cdn-edge-waf (the CDN/WAF edge in front).
+---
+## When to Use
+Reach for this skill when the task is **names and certificates** — getting a domain to resolve to your service and serving valid HTTPS that renews itself:
+- "Point `app.example.com` at this load balancer / IP without downtime"
+- "Enable HTTPS / fix the expired-cert outage / stop the cert from ever expiring again"
+- "Automate certs with Let's Encrypt / cert-manager; issue a wildcard"
+- "Migrate DNS to a new provider / cut over to a new origin"
+- "Lock down SPF/DKIM/DMARC, or CAA so only my CA can issue"
+- "Why does SSL Labs give us a B? Harden the TLS config"
+NOT this skill:
+- Configuring the proxy/LB/Ingress that **terminates** TLS, virtual hosts, upstream pools, timeouts → configure-reverse-proxy-lb
+- The CDN/edge, WAF rules, edge caching, or DDoS layer in front of origin → setup-cdn-edge-waf
+- Application-layer auth/authz, token scopes, RBAC → design-authorization-model
+- Tamper-evident security event logs (incl. cert-rotation events) → build-audit-logging
+This skill owns the **record values, the cutover choreography, certificate lifecycle, and the TLS handshake policy**. It hands the terminated connection to the proxy.
+## Steps
+1. **Pick the record type by what you're pointing at — do not CNAME the apex.**
+   | Need | Record | Notes |
+   |---|---|---|
+   | Name → IPv4 | `A` | Bare IP only |
+   | Name → IPv6 | `AAAA` | Add alongside A; serve dual-stack |
+   | Subdomain → another hostname | `CNAME` | e.g. `www → app.example.com`; cannot coexist with other records on that name |
+   | **Apex** (`example.com`) → hostname | `ALIAS`/`ANAME`/flattened-CNAME | Apex can't be a real CNAME (breaks SOA/NS/MX). Use the provider's ALIAS (Route 53 alias, Cloudflare CNAME-flattening, etc.) |
+   | Mail | `MX` | Priority + target; target must be an A/AAAA, never a CNAME |
+   | SPF/DKIM/DMARC/verification | `TXT` | One SPF per domain; DMARC at `_dmarc`; DKIM at `<sel>._domainkey` |
+   | Who may issue certs | `CAA` | `0 issue "letsencrypt.org"` + `0 issuewild "letsencrypt.org"` |
+   Set CAA **before** first ACME issuance, or issuance fails with `CAA record prevents issuance`. Example:
+   ```
+   example.com.  CAA  0 issue "letsencrypt.org"
+   example.com.  CAA  0 issuewild "letsencrypt.org"
+   example.com.  CAA  0 iodef "mailto:security@example.com"
+   ```
+2. **Zero-downtime cutover: lower the TTL BEFORE the change — this is the whole trick.** Resolvers cache the old answer for up to its TTL; if you cut over while TTL is 3600, clients hit the dead origin for an hour.
+   1. Drop the record's TTL to `60` (or `30`). **Wait out the *old* TTL** (e.g. wait the full prior 3600s) so every cache holds the short TTL.
+   2. Run both origins in parallel (old + new healthy) during the switch — never tear down old first.
+   3. Change the record value to the new target.
+   4. Verify the new answer is served (step in Verify) and the new origin takes real traffic.
+   5. Only after traffic has fully drained from the old origin (watch its access logs go quiet for > one TTL), decommission it and **raise TTL back** to 3600+ to cut query volume/cost.
+3. **Automate certificates — manual renewal is a guaranteed future outage.** Use ACME (Let's Encrypt / ZeroSSL). Never click-issue a 1-year cert you have to remember to renew; LE is 90-day by design to *force* automation.
+   - **VM / bare proxy:** `certbot` with a renewal timer, or the proxy's built-in ACME (Caddy auto-HTTPS, Traefik resolver, nginx + `acme.sh`).
+   - **Kubernetes:** **cert-manager** — a `ClusterIssuer` + `Certificate` (or Ingress annotation) reconciles renewal automatically; renews at ~⅔ of lifetime.
+   ```yaml
+   # cert-manager: DNS-01 wildcard via Cloudflare
+   apiVersion: cert-manager.io/v1
+   kind: ClusterIssuer
+   metadata: { name: letsencrypt-prod }
+   spec:
+     acme:
+       server: https://acme-v02.api.letsencrypt.org/directory
+       email: ops@example.com
+       privateKeySecretRef: { name: letsencrypt-prod-key }
+       solvers:
+       - dns01:
+           cloudflare:
+             apiTokenSecretRef: { name: cloudflare-token, key: api-token }
+   ---
+   apiVersion: cert-manager.io/v1
+   kind: Certificate
+   metadata: { name: example-tls, namespace: web }
+   spec:
+     secretName: example-tls          # Ingress references this
+     issuerRef: { name: letsencrypt-prod, kind: ClusterIssuer }
+     dnsNames: ["example.com", "*.example.com"]
+   ```
+   Iterate against the **staging** ACME server first — set the `ClusterIssuer` `spec.acme.server` to `https://acme-staging-v02.api.letsencrypt.org/directory` (or `certbot --test-cert`) to dodge LE prod rate limits (50 certs / registered-domain / week) while debugging, then flip the server back to prod and re-issue.
+4. **Choose the ACME challenge and cert shape deliberately.**
+   | Axis | Pick | Why |
+   |---|---|---|
+   | **HTTP-01** | single host, port 80 reachable from internet | simplest; needs `/.well-known/acme-challenge/` served; **cannot** do wildcards |
+   | **DNS-01** | wildcards, internal hosts, no inbound 80, or many SANs | proves control via a `_acme-challenge` TXT; needs DNS-provider API creds; works behind a firewall |
+   | **Wildcard** `*.example.com` | many dynamic subdomains | DNS-01 only; one cert, but a single shared private key (bigger blast radius) |
+   | **SAN / multi-domain** | a known fixed set of names | explicit per-name; rotate one without touching others; preferred when the list is stable |
+   Default: **SAN cert via DNS-01** for anything non-trivial; wildcard only when subdomains are unbounded/dynamic.
+5. **Set a modern TLS policy at the terminator — TLS 1.2+ only, redirect, HSTS, stapling.** Configure on whatever terminates (see configure-reverse-proxy-lb), but the *policy* is owned here:
+   - Protocols: **TLS 1.3 + TLS 1.2 only**. Disable TLS 1.0/1.1 and SSLv3 entirely.
+   - Ciphers: TLS 1.3 defaults; for 1.2 use forward-secret AEAD suites (ECDHE + AES-GCM/CHACHA20), no CBC/RC4/3DES.
+   - **Redirect 80→443** with `301`, then serve everything over HTTPS.
+   - **HSTS** on HTTPS responses: `Strict-Transport-Security: max-age=63072000; includeSubDomains; preload` — but only add `preload`/`includeSubDomains` once *every* subdomain is HTTPS (it's hard to undo). Roll out short → long → preload.
+   - **OCSP stapling** on (`ssl_stapling on;` in nginx) so clients don't round-trip the CA.
+   - Serve the **full chain** (leaf + intermediates), not just the leaf — the #2 cause of "works in my browser, fails in `curl`/old Android".
+   ```nginx
+   server {
+     listen 443 ssl http2;
+     ssl_protocols TLSv1.2 TLSv1.3;
+     ssl_prefer_server_ciphers off;
+     ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;  # full chain
+     ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
+     ssl_stapling on; ssl_stapling_verify on;
+     add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
+   }
+   server { listen 80; server_name example.com; return 301 https://$host$request_uri; }
+   ```
+6. **Prove auto-renew works before you trust it.** A cert that issues fine but never renews is a 90-day time bomb. Force a dry-run/staging renewal now (step in Verify) so you discover broken DNS creds or a missing port today, not at 2am on day 89.
+## Common Errors
+- **CNAME on the apex.** Breaks NS/SOA/MX co-existence; many resolvers reject it. Use ALIAS/ANAME/CNAME-flattening for `example.com`.
+- **Cutover without pre-lowering TTL.** You switch the record but caches serve the dead origin for the full old TTL (often an hour). Lower TTL and wait out the *old* TTL first.
+- **Raising TTL or killing the old origin too early.** Do it only after old-origin logs go quiet for > one TTL; otherwise stragglers 502.
+- **Missing/forbidding CAA.** No CAA = any CA may issue (security gap); a CAA that omits your CA = ACME fails with `CAA record prevents issuance`. Add the issuing CA explicitly, including `issuewild` for wildcards.
+- **HTTP-01 for a wildcard.** Impossible — wildcards require DNS-01. Switch the solver.
+- **Manual cert renewal "we'll remember."** You won't. The outage is scheduled for expiry day. Automate or it will lapse.
+- **Serving only the leaf cert.** Browsers cache intermediates and "work"; `curl`, Java, old Android, and API clients fail chain validation. Always deploy `fullchain.pem`.
+- **Burning LE rate limits while debugging.** Iterate against `acme-staging-v02` (or `certbot --test-cert`); only hit prod once issuance succeeds in staging.
+- **`includeSubDomains`/`preload` HSTS before all subdomains are HTTPS.** Any plain-HTTP subdomain becomes unreachable, and `preload` is baked into browsers for months. Roll HSTS out short → long → preload.
+- **DNS-01 with under-scoped API creds.** The token can't write `_acme-challenge` TXT, so renewal silently fails. Scope the token to DNS-edit on that zone and test it.
+- **Mixed content after enabling HTTPS.** Page loads over HTTPS but pulls `http://` assets → browser blocks them. Rewrite asset URLs to `https://` or protocol-relative; verify console is clean.
+- **Clock skew on the TLS host.** A wrong system clock makes a valid cert read as not-yet-valid/expired. Run NTP.
+## Verify
+1. **Records resolve correctly:** `dig +short A app.example.com` (and `AAAA`) returns the new target; `dig CAA example.com` shows your CA; `dig TXT _dmarc.example.com` shows the DMARC policy. Query an external resolver (`dig @1.1.1.1 …`) too, not just the local cache.
+2. **TTL was actually lowered before cutover:** `dig app.example.com | grep -E '^app'` shows the short TTL *before* you change the value; confirm the answer flips after, and that it propagated (`dig @8.8.8.8` and `@1.1.1.1` agree).
+3. **Full chain + protocol scan:** `echo | openssl s_client -connect example.com:443 -servername example.com -showcerts` shows leaf **and** intermediate(s), `Verify return code: 0 (ok)`. `testssl.sh example.com` (or SSL Labs) reports TLS 1.2/1.3 only, no TLS 1.0/1.1, HSTS present, OCSP stapled — target grade **A/A+**.
+4. **Redirect + HSTS:** `curl -sI http://example.com` → `301` to `https://`; `curl -sI https://example.com | grep -i strict-transport` shows the HSTS header.
+5. **No mixed content:** load the page, browser console shows zero "Mixed Content" / blocked-asset warnings; all subresources are `https://`.
+6. **Expiry & auto-renew proven:** `echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -enddate` shows a future date; then force a **staging** renewal — `certbot renew --dry-run` (VM) or, for k8s, point the issuer at `acme-staging-v02`, run `cmctl renew example-tls`, and watch `cmctl status certificate example-tls` go Ready — and confirm a fresh cert issues without manual steps.
+7. **Mail auth (if MX set):** SPF/DKIM/DMARC TXT records validate (e.g. an external mail-tester) — no `softfail`/missing-DKIM.
+Done = every name resolves to the new target on external resolvers, HTTPS serves the **full chain** over **TLS 1.2/1.3 only** with HSTS + stapling + 80→443 redirect and no mixed content (SSL Labs/testssl ≥ A), CAA locks issuance to your CA, and a staging force-renew has **proven** auto-renewal works before any cert nears expiry.

package/skills/configure-reverse-proxy-lb/SKILL.md ADDED Viewed

@@ -0,0 +1,129 @@
+---
+name: configure-reverse-proxy-lb
+description: Configures a reverse proxy / load balancer (nginx, Envoy, Caddy, HAProxy) in front of services — upstream pools, active/passive health checks, per-hop connect/read/send timeouts, TLS termination vs passthrough, idempotent-only retries with circuit breaking, sticky sessions, and zero-drop graceful reloads.
+when_to_use: Putting a proxy/LB in front of services, fixing 502/504s, balancing across instances, or routing by host/path. Distinct from configure-dns-tls (DNS records + cert issuance), setup-cdn-edge-waf (the CDN/WAF edge), rate-limiting (app-level request caps), and k8s-manifest-review (in-cluster Service/Ingress objects).
+---
+## When to Use
+Reach for this skill when the request is about **the proxy/LB layer between clients and your services**:
+- "Put nginx/Envoy/Caddy/HAProxy in front of these app instances"
+- "We're getting random 502/504s — fix the timeouts"
+- "Balance traffic across N backends and drop a dead one automatically"
+- "Route by `Host:` / path prefix to different upstreams"
+- "Terminate TLS at the proxy" / "pass TLS straight through to the backend"
+- "Config reload kills in-flight requests — make it zero-drop"
+NOT this skill:
+- Creating DNS records or issuing/renewing the cert itself → configure-dns-tls
+- The CDN/edge tier, bot rules, or WAF rulesets → setup-cdn-edge-waf
+- Per-user/per-key request caps and 429s at the app → rate-limiting
+- Kubernetes `Service`/`Ingress`/`Gateway` objects in-cluster → k8s-manifest-review
+## Steps
+1. **Pick the proxy by requirement — default to nginx.**
+   | Proxy | Pick when | Watch out |
+   |---|---|---|
+   | **nginx** | General L7 in front of HTTP/HTTPS apps — the default | Active health checks need nginx **Plus**; OSS only does passive `max_fails` |
+   | **Envoy** | Dynamic config via xDS, gRPC/HTTP2, fine-grained circuit breaking, outlier detection | Steep config; run with a control plane (Istio/Contour/Gloo) for anything large |
+   | **Caddy** | You want automatic TLS (ACME) with near-zero config | Less knob-level control over upstreams/retries |
+   | **HAProxy** | Heavy L4 (TCP) LB, max throughput, advanced balancing/observability | L7 ergonomics weaker than nginx for content routing |
+   For a typical web service: **nginx terminating TLS, round-robin or least-conn upstream, passive health checks**. Reach for Envoy only when you genuinely need dynamic upstreams or per-endpoint outlier ejection.
+2. **Define the upstream pool + algorithm — least-conn is the safer default for mixed latency.**
+   ```nginx
+   upstream app {
+       least_conn;                         # round-robin is fine for uniform requests; least_conn for variable latency
+       server 10.0.1.11:8080 max_fails=3 fail_timeout=10s;
+       server 10.0.1.12:8080 max_fails=3 fail_timeout=10s;
+       server 10.0.1.13:8080 max_fails=3 fail_timeout=10s backup;  # only when primaries are down
+       keepalive 64;                       # REUSE upstream conns — without this every request does a fresh TCP+TLS handshake
+   }
+   ```
+   - **round-robin** (default): uniform, cheap requests.
+   - **least-conn**: requests with variable duration — avoids piling onto a slow node.
+   - **consistent-hash** (`hash $arg_key consistent;` / Envoy ring-hash): only when a key must stick to a backend (cache affinity, sharding). Plain `ip_hash` rebalances badly when a node leaves; use `consistent` so a single ejection doesn't reshuffle every key.
+3. **Set timeouts at EVERY hop — a proxy timeout shorter than the app is the #1 cause of 502/504.** A 502 = backend refused/reset the connection; a 504 = backend accepted but didn't answer before `proxy_read_timeout`. The proxy's read timeout must be **longer** than the slowest legitimate backend response, and the backend's own keepalive must be **longer** than the proxy's so the proxy never reuses a socket the backend just closed (classic race → sporadic 502).
+   ```nginx
+   location / {
+       proxy_pass http://app;
+       proxy_http_version 1.1;
+       proxy_set_header Connection "";      # required so keepalive to upstream actually works
+       proxy_connect_timeout 2s;            # TCP connect to backend — short; a backend that won't accept is dead
+       proxy_send_timeout   30s;            # writing the request body to backend
+       proxy_read_timeout   60s;            # waiting for the backend's response — MUST exceed slowest real response
+   }
+   # And: backend keepalive_timeout (e.g. 75s) > nginx upstream idle reuse window, to avoid the reuse-after-close 502.
+   ```
+   Envoy: set `connect_timeout` on the cluster and `route.timeout` per route; default route timeout is 15s and silently truncates long requests — set it deliberately.
+4. **Add health checks — passive at minimum, active if your proxy supports it.** Passive ejection (`max_fails`/`fail_timeout`, Envoy outlier detection) reacts only to *real* request failures, so a freshly-booted-but-broken node still gets traffic until it fails N live requests. Active checks (nginx Plus `health_check`, HAProxy `option httpchk`, Envoy `health_checks`) probe a `/healthz` endpoint and eject before user traffic hits it.
+   - Health endpoint must check **dependencies** (DB, cache reachable), not just "process is up" — otherwise you keep a node that 500s on every real request.
+   - Set an explicit `unhealthy`→`healthy` hysteresis (e.g. eject after 3 fails, re-add after 2 passes) so a flapping node doesn't oscillate in and out of rotation.
+5. **TLS: terminate at the proxy unless the backend legally must see the cert.** Terminate (decrypt at proxy, plaintext or re-encrypt to backend) for HTTP routing, header inspection, and central cert management — the common case. **Passthrough** (L4 `stream`/SNI routing, proxy never decrypts) only for end-to-end encryption mandates or non-HTTP TLS. When terminating, forward the original scheme/IP so the app builds correct URLs and logs the real client:
+   ```nginx
+   proxy_set_header Host              $host;
+   proxy_set_header X-Real-IP         $remote_addr;
+   proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+   proxy_set_header X-Forwarded-Proto $scheme;   # app uses this to know the request was HTTPS
+   ```
+   Pin `ssl_protocols TLSv1.2 TLSv1.3;` and a modern cipher suite; redirect `:80` → `:443`.
+6. **Retry idempotent requests ONLY, with circuit breaking.** Auto-retrying a `POST`/`PATCH` that timed out can double-charge a card or double-write. Restrict retries to safe methods + connect/early failures, cap attempts, and stop retrying once the backend is clearly down.
+   ```nginx
+   proxy_next_upstream error timeout http_502 http_503;   # NOT non_idempotent — never blindly retry POST
+   proxy_next_upstream_tries 2;
+   proxy_next_upstream_timeout 10s;
+   ```
+   Envoy: `retry_policy` with `retry_on: connect-failure,refused-stream,unavailable`, `num_retries: 2`, plus `retry_back_off`. Add **circuit breaking** (Envoy `circuit_breakers` max connections/pending/retries, or outlier detection ejecting a 5xx-storming host) so retries don't amplify load against a struggling backend into a full meltdown.
+7. **Sticky sessions only when state truly demands it.** Cookie/affinity routing (`sticky cookie`, Envoy hash policy) pins a client to one backend — necessary for in-memory session state, fatal for even load balancing and graceful drain (a drained node's clients all break). **First fix the state**: move sessions to Redis/JWT so any backend serves any user, then drop stickiness. Only keep it for unavoidable backend-local state, and pair it with consistent hashing so losing one node reshuffles minimally.
+8. **Make reloads zero-drop (graceful drain).** A naive restart cuts in-flight connections → user-visible 5xx during every deploy.
+   - **nginx:** `nginx -t && nginx -s reload` — the master spins up new workers on the new config and lets old workers finish in-flight requests before exiting. Never `kill -9` / hard restart for a config change.
+   - **HAProxy:** run with `-sf $(cat pid)` (seamless finish) or the master-worker socket reload.
+   - **Envoy:** hot restart / xDS push drains the old listener.
+   - For removing a **backend**: first mark it `down`/drain in the pool and reload so the proxy stops sending *new* requests, wait for in-flight to finish, then stop the backend. Tie the backend's shutdown to its readiness probe (fail `/healthz` → proxy ejects → then SIGTERM) so the LB drains it before it dies.
+## Common Errors
+- **`proxy_read_timeout` shorter than the slowest real response.** Long uploads/reports hit a **504** even though the backend is healthy. Set the read timeout above the legitimate p99, and only then chase a slow endpoint separately.
+- **Backend keepalive shorter than the proxy's upstream idle window.** Backend closes an idle socket the proxy then reuses → sporadic **502** under no real load. Make backend `keepalive_timeout` longer than the proxy's, and set `proxy_http_version 1.1` + `Connection ""`.
+- **No `keepalive` in the upstream block.** Every request does a fresh TCP (and TLS) handshake to the backend — latency and CPU explode under load. Add `keepalive N` and clear the `Connection` header.
+- **Retrying non-idempotent requests.** `proxy_next_upstream` including `non_idempotent` (or an Envoy `retry_on` that catches POSTs) silently double-executes writes on a timeout → duplicate charges/orders. Retry safe methods + connect failures only.
+- **Health check that only pings the port / returns 200 unconditionally.** A node with a dead DB stays in rotation and 500s every request. Probe real dependencies in `/healthz`.
+- **`ip_hash` / non-consistent hashing for affinity.** Removing or adding one node reshuffles *every* client to a new backend, blowing caches and sessions. Use `consistent` hashing.
+- **Trusting client-supplied `X-Forwarded-For`/`X-Forwarded-Proto`.** The app sees spoofed client IPs or thinks plaintext is HTTPS. Reset these headers at the trust boundary (`proxy_set_header ... $remote_addr`/`$scheme`); never pass the raw inbound value through.
+- **Hard restart on config change.** `systemctl restart nginx` / `kill -9` drops in-flight connections every deploy. Use `reload` / `-sf` graceful paths.
+- **Stopping a backend before draining it.** Killing an instance while the LB still routes to it = a burst of 5xx for its in-flight requests. Drain (fail readiness → eject) first, then SIGTERM.
+- **Default Envoy 15s route timeout left implicit.** Long-running requests get cut at 15s with no obvious cause. Set `route.timeout` explicitly per route.
+- **Single proxy = single point of failure.** One LB box and the whole service is down when it dies or reloads badly. Run ≥2 behind a VIP/anycast/keepalived or a managed LB.
+## Verify
+1. **Config is valid before reload:** `nginx -t` (or `haproxy -c -f`, `envoy --mode validate`, `caddy validate`) returns OK. Never reload an unvalidated config.
+2. **Balancing works:** fire `N` requests (`hey`, `vegeta`, `for i in $(seq 100); do curl -s .../whoami; done`) and confirm responses spread across all backends per the chosen algorithm (e.g. roughly even for round-robin).
+3. **Dead-backend reroute, zero 5xx:** kill one backend mid-load. Traffic must reroute to healthy nodes and the client must see **no 5xx** (passive: a brief blip until `max_fails`; active: none). The killed node returns to rotation after it's healthy again.
+4. **Timeouts behave:** point at a backend that sleeps longer than `proxy_read_timeout` → you get **504** at the configured time, not earlier/later. A backend refusing connections → **502** (not a retry storm).
+5. **Retries are idempotent-only:** a timed-out `GET` retries to a second backend (one served response); a timed-out `POST` does **not** double-execute (assert the write happened exactly once at the backend).
+6. **Zero-drop reload:** run sustained load (`vegeta attack -rate=200 -duration=60s`), trigger a config `reload` mid-run, and confirm **0 connection errors / 0 non-2xx** attributable to the reload in the report.
+7. **TLS + forwarded headers:** `curl -v https://host` negotiates TLS1.2/1.3; the backend logs the real client IP (`X-Real-IP`) and sees `X-Forwarded-Proto: https`; `:80` 301-redirects to `:443`.
+8. **Drain before stop:** mark a backend down, confirm new requests stop hitting it while in-flight ones complete, *then* stop it — no 5xx in the transition.
+Done = killing a backend reroutes with **zero 5xx**, timeouts produce the right code at the right time, idempotent-only retries never double-write, and a config reload under sustained load drops **zero** in-flight connections — all with a validated config and ≥2 proxies (no single point of failure).

package/skills/configure-security-headers-csp/SKILL.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+name: configure-security-headers-csp
+description: Configures HTTP response security headers and a strict, nonce/hash-based Content-Security-Policy — script-src with a per-request nonce or sha256 hash plus 'strict-dynamic' (so you can drop host allowlists and 'unsafe-inline'), object-src 'none', base-uri 'none', frame-ancestors to control framing, a Report-Only rollout via report-to/report-uri before enforcing, plus HSTS with includeSubDomains+preload, X-Content-Type-Options: nosniff, Referrer-Policy, a deny-by-default Permissions-Policy, correct CORS (echo a single allowed origin, never wildcard '*' together with Access-Control-Allow-Credentials), and cookie flags Secure+HttpOnly+SameSite. Eliminates inline-script XSS sinks, clickjacking, MIME-sniffing, mixed content, and credentialed-CORS leaks by policy rather than per-bug patching.
+when_to_use: Hardening a web app's HTTP responses — adding or tightening CSP, fixing a console "Refused to execute inline script" after enabling CSP, rolling out HSTS/preload, setting frame-ancestors/Referrer-Policy/Permissions-Policy, or getting CORS and cookie flags right. Distinct from remediate-web-vulnerabilities (finds and fixes a specific bug like a reflected XSS or open redirect; this skill sets the defense-in-depth headers that contain whole bug classes) and setup-cdn-edge-waf (the CDN/WAF edge layer that can inject or override these headers; this skill defines the header values that layer should serve).
+---
+## When to Use
+Reach for this skill when the task is **setting HTTP response headers and CSP as defense-in-depth policy**, not chasing one specific vulnerability:
+- "Add a Content-Security-Policy" / "our CSP uses 'unsafe-inline' — make it strict"
+- "After turning on CSP the page broke: Refused to execute inline script / Refused to load the stylesheet"
+- "Enable HSTS / submit the domain to the preload list"
+- "Stop the site from being framed / set frame-ancestors / X-Frame-Options"
+- "Set Referrer-Policy and lock down Permissions-Policy (camera, geolocation, FLoC)"
+- "Our CORS sends `Access-Control-Allow-Origin: *` with credentials — is that safe?" (no)
+- "Cookies missing Secure/HttpOnly/SameSite" / harden the Set-Cookie flags
+NOT this skill:
+- Finding/fixing a concrete bug — reflected/stored XSS sink, open redirect, SSRF, SQLi — and sanitizing the offending code path → remediate-web-vulnerabilities (this skill is the header *containment* layer that limits the blast radius of such bugs)
+- Configuring the CDN/WAF/edge that injects, caches, or overrides these headers, or rate-limits at the edge → setup-cdn-edge-waf (this skill defines the header *values* it should emit)
+- TLS certs, cipher suites, OCSP, ACME issuance, the TLS handshake behind HSTS → configure-dns-tls (HSTS only asserts TLS is mandatory; it doesn't provision it)
+- Reverse-proxy/load-balancer routing where you might *also* add these headers (nginx/Envoy/Traefik) → configure-reverse-proxy-lb (this skill says *which* headers; that one places them in the proxy)
+- The OAuth/OIDC redirect, token, and session-cookie *protocol* → integrate-oauth-oidc / auth-jwt-session (this skill only hardens the cookie *flags* and CORS around them)
+- Structured threat enumeration (STRIDE) or a full audit pass → threat-model-stride / security-review
+- Active fuzzing/DAST to prove a bypass → fuzz-dynamic-security-test
+## Steps
+1. **Default to a strict, nonce- or hash-based CSP — host allowlists are obsolete and bypassable.** Allowlist CSPs (`script-src 'self' cdn.example.com`) are trivially defeated via JSONP endpoints, open redirects, or AngularJS on a whitelisted host (Google's own research found ~94% of allowlist CSPs bypassable). The strict pattern:
+   ```
+   Content-Security-Policy:
+     script-src 'nonce-{RANDOM}' 'strict-dynamic' https: 'unsafe-inline';
+     object-src 'none';
+     base-uri 'none';
+     require-trusted-types-for 'script';
+     report-uri /csp-report; report-to csp
+   ```
+   - **`'strict-dynamic'`** lets a nonced/hashed script load further scripts it creates, so you don't enumerate every CDN. When present, browsers that understand it **ignore** `https:` and `'unsafe-inline'` — those are *fallbacks for old browsers only*, not a real relaxation.
+   - **`object-src 'none'`** kills Flash/`<object>` injection; **`base-uri 'none'`** stops `<base href>` from rewriting relative script URLs.
+   - You usually don't need `default-src` micromanaged once `script-src` is strict; the dangerous directive is script execution.
+2. **Generate a fresh 128-bit nonce per response and stamp it on every inline `<script>`.** The nonce must be cryptographically random and **unique per HTTP response** (never reuse, never hardcode) — a static nonce is equivalent to `'unsafe-inline'`.
+   | Stack | Generate | Apply |
+   |---|---|---|
+   | Express | `res.locals.nonce = crypto.randomBytes(16).toString('base64')` | helmet `contentSecurityPolicy` with `(req,res)=>`nonce`; `<script nonce="<%= nonce %>">` |
+   | Next.js | nonce in `middleware.ts`, pass via header | Next injects nonce into its own scripts when CSP header has a nonce |
+   | Django | `django-csp` `@csp_update` / `{{ request.csp_nonce }}` | `<script nonce="{{ request.csp_nonce }}">` |
+   | Rails | `config.content_security_policy_nonce_generator` | `javascript_tag nonce: true` / `nonce: true` in tags |
+   | Go/Caddy/nginx | per-request var (sub_filter or middleware) | template the nonce into markup |
+   For **static/cached HTML** where you can't inject a per-response nonce, use **`'sha256-...'` hashes** of each inline script's exact bytes instead (compute at build time). Nonces require dynamic rendering; hashes work on a CDN.
+3. **Roll out in Report-Only first — never flip enforcing CSP straight to prod.** Ship `Content-Security-Policy-Report-Only` (same policy) alongside any existing enforced policy, collect violations for days/weeks, fix legitimate breakage, then promote to the enforcing header. Wire reporting with the modern `report-to` (a `Reporting-Endpoints` header naming a collector) **and** keep deprecated `report-uri` for older browsers:
+   ```
+   Reporting-Endpoints: csp="https://example.com/csp-report"
+   Content-Security-Policy-Report-Only: script-src 'nonce-...' 'strict-dynamic'; report-to csp; report-uri /csp-report
+   ```
+   Expect noise from browser extensions injecting inline scripts — triage by `blocked-uri`/`source-file`; don't widen the policy to silence extension reports.
+4. **Set `frame-ancestors` to control framing — it supersedes X-Frame-Options.** `frame-ancestors 'none'` (no framing) or `frame-ancestors 'self' https://trusted.example.com` (allow specific embedders). Browsers honor `frame-ancestors` over the legacy `X-Frame-Options: DENY|SAMEORIGIN` when both exist; keep `X-Frame-Options: DENY` only as a fallback for ancient clients. XFO has no allowlist-multiple-origins capability — `frame-ancestors` is the real control.
+5. **Pin the rest of the header set — each closes a specific class.**
+   | Header | Value (strong default) | Closes |
+   |---|---|---|
+   | `Strict-Transport-Security` | `max-age=63072000; includeSubDomains; preload` | SSL-strip / downgrade; mandates HTTPS for 2y |
+   | `X-Content-Type-Options` | `nosniff` | MIME-sniffing a JSON/text response into executable HTML/JS |
+   | `Referrer-Policy` | `strict-origin-when-cross-origin` (or `no-referrer`) | leaking full URL + query (tokens) in `Referer` |
+   | `Permissions-Policy` | `camera=(), microphone=(), geolocation=(), interest-cohort=()` | abuse of powerful features; deny-by-default `()` = nobody |
+   | `Cross-Origin-Opener-Policy` | `same-origin` | cross-window attacks; required (with COEP) to re-enable `SharedArrayBuffer` |
+   | `Cross-Origin-Resource-Policy` | `same-origin` (or `same-site`) | side-channel/Spectre cross-origin reads (data leak) |
+   | `X-Frame-Options` | `DENY` (legacy fallback only) | clickjacking on old browsers (else use `frame-ancestors`) |
+   HSTS rules: only send over HTTPS; `includeSubDomains` covers every subdomain (verify they're all HTTPS first); **`preload` is a near-irreversible commitment** — once on the browser preload list, removal takes months, so don't add it until you're certain all subdomains are HTTPS-only. Submit at hstspreload.org. `Permissions-Policy` replaces the old `Feature-Policy`; `interest-cohort=()` opts out of FLoC/Topics.
+6. **CORS: echo exactly one validated origin — NEVER wildcard with credentials.** The single most common CORS vuln is `Access-Control-Allow-Origin: *` (or reflecting `Origin` blindly) **together with** `Access-Control-Allow-Credentials: true`, which lets any site read authenticated responses.
+   - The spec **forbids** `*` + credentials — but reflecting the `Origin` header unchecked is the same hole. **Validate the incoming `Origin` against an allowlist**, and only then echo it back: `Access-Control-Allow-Origin: <that exact origin>` + `Vary: Origin`.
+   - Never trust substring/regex matches like `endsWith('.example.com')` (matches `evilexample.com`) or `startsWith('https://example.com')` (matches `https://example.com.evil.com`). Match the full origin against an explicit set.
+   - If you don't need credentials, prefer `Access-Control-Allow-Origin: *` **without** credentials — that's safe and simpler. Don't reflect `null` (sandboxed iframes/`file://` send `Origin: null` — allowlisting `null` is exploitable).
+   - Set `Vary: Origin` whenever the ACAO value depends on the request, or a cache will serve one origin's allowed-response to another.
+7. **Harden cookies: `Secure; HttpOnly; SameSite` — and `__Host-` for session cookies.** Every session/auth cookie:
+   ```
+   Set-Cookie: __Host-session=...; Secure; HttpOnly; SameSite=Lax; Path=/
+   ```
+   - **`Secure`** — only sent over HTTPS. **`HttpOnly`** — invisible to `document.cookie`, so an XSS can't exfiltrate it. **`SameSite=Lax`** (default-safe; blocks cross-site POST CSRF) or **`Strict`** for the most sensitive; use `SameSite=None; Secure` only for genuine cross-site cookies (and then you need CSRF defense).
+   - The **`__Host-` prefix** forces `Secure`, `Path=/`, and no `Domain` — the browser rejects the cookie if those aren't met, preventing subdomain cookie-fixation. Use it for session cookies. `__Secure-` is the weaker variant (just requires `Secure`).
+8. **Set headers once, at the right layer, and don't let it get clobbered.** Prefer a single source of truth: app middleware (helmet / `secure_headers` / `django-csp`) **or** the edge/proxy — not both fighting. If a CDN/WAF (setup-cdn-edge-waf) or reverse proxy (configure-reverse-proxy-lb) also injects headers, confirm which wins (proxies often *append*, producing duplicate/conflicting CSP — the browser then enforces the **intersection**, which can silently break the page). Apply headers to **all** responses including errors, redirects, and API/JSON. Use **helmet** (Express), **`secure_headers`** gem (Rails), **`django-csp` + `SecurityMiddleware`** (Django), or **`securityheaders`** middleware (Go) rather than hand-rolling.
+## Common Errors
+- **Allowlist CSP with `'unsafe-inline'`.** `script-src 'self' 'unsafe-inline'` provides essentially zero XSS protection — inline injected scripts run. Fix: nonce/hash + `'strict-dynamic'`, drop `'unsafe-inline'` (keep it only as the old-browser fallback that strict-dynamic neutralizes).
+- **Reusing or hardcoding the nonce.** A static/cached nonce = `'unsafe-inline'`; the attacker just reads it from the page and reuses it. Fix: fresh CSPRNG nonce per response; for cacheable HTML use hashes instead.
+- **Flipping enforcing CSP straight to prod.** You blank-screen real users on day one. Fix: `-Report-Only` first, collect via `report-to`/`report-uri`, fix breakage, then enforce.
+- **`'unsafe-eval'` left in to satisfy a library.** Re-opens `eval`/`Function` injection. Fix: move to a CSP-compatible build (no runtime eval); add Trusted Types (`require-trusted-types-for 'script'`) instead of loosening.
+- **CSP only on HTML, missing `object-src`/`base-uri`.** `<base>` hijack or `<object>` injection bypasses a script-only policy. Fix: always add `object-src 'none'; base-uri 'none'`.
+- **`Access-Control-Allow-Origin: *` (or reflected Origin) with `Allow-Credentials: true`.** Any website reads the victim's authenticated data. Fix: allowlist + echo the single matched origin + `Vary: Origin`; or drop credentials and use `*`.
+- **Substring origin matching.** `origin.endsWith('example.com')` allows `notexample.com`/`example.com.evil.com`. Fix: exact full-origin set membership.
+- **HSTS `preload` added prematurely / without `includeSubDomains`.** A non-HTTPS subdomain becomes unreachable, and preload removal takes months. Fix: confirm every subdomain is HTTPS-only before `includeSubDomains; preload`; ramp `max-age` up gradually.
+- **Setting HSTS over plain HTTP.** Ignored by browsers and a sign of misconfig. Fix: emit HSTS only on HTTPS responses; redirect HTTP→HTTPS first.
+- **Cookies without `HttpOnly`/`Secure`/`SameSite`.** XSS steals the session; CSRF rides it; it leaks over HTTP. Fix: `__Host-name=...; Secure; HttpOnly; SameSite=Lax`.
+- **Duplicate CSP headers from app + proxy.** Browser enforces the *intersection* of all CSP headers, silently breaking the stricter-than-intended result. Fix: one owner of the header; verify the response has a single CSP.
+- **Missing `nosniff`, so an API returns user content as `text/html`.** Browser sniffs and executes it. Fix: `X-Content-Type-Options: nosniff` on every response and correct `Content-Type`.
+## Verify
+1. **Scan the live headers:** run the response through `securityheaders.com` / Mozilla Observatory, or `curl -sI https://site` — confirm a single `Content-Security-Policy`, `Strict-Transport-Security`, `X-Content-Type-Options: nosniff`, `Referrer-Policy`, `Permissions-Policy`, and `frame-ancestors` present, with no duplicates.
+2. **CSP is strict:** the policy contains a `'nonce-...'` or `'sha256-...'` in `script-src` with `'strict-dynamic'` and **no** standalone `'unsafe-inline'`/`'unsafe-eval'` that a modern browser honors; `object-src 'none'` and `base-uri 'none'` present. Validate with Google's CSP Evaluator.
+3. **Nonce is per-response:** fetch the page twice — the nonce value differs each time and matches the inline `<script nonce=...>` tags.
+4. **Report-Only worked:** the violation collector received reports and they were triaged before enforcing; the enforced policy doesn't blank the app (load the real pages, check the console for `Refused to...`).
+5. **CORS is safe:** `curl -H 'Origin: https://evil.com' -I` to a credentialed endpoint returns **no** `Access-Control-Allow-Origin` for `evil.com` (or omits credentials); an allowlisted origin gets that exact origin echoed plus `Vary: Origin`. No `*`+credentials anywhere.
+6. **Cookies hardened:** `Set-Cookie` on the session cookie shows `Secure; HttpOnly; SameSite=...` (and `__Host-` prefix for session); inspect in DevTools → Application → Cookies.
+7. **HSTS sane:** `Strict-Transport-Security` only on HTTPS, `max-age` ≥ 1 year, `includeSubDomains` only if every subdomain is HTTPS; `preload` only when committed (verify at hstspreload.org).
+8. **Clickjacking blocked:** attempt to frame the site from another origin → blocked by `frame-ancestors`; `X-Content-Type-Options: nosniff` confirmed so a `text/plain` API body isn't sniffed to HTML.
+Done = a strict nonce/hash CSP with `'strict-dynamic'` and no honored `'unsafe-inline'`, rolled out via Report-Only then enforced; HSTS (preload only when safe), nosniff, frame-ancestors, Referrer-Policy and a deny-by-default Permissions-Policy all present exactly once; CORS validates origin against an allowlist and never pairs `*`/reflected-origin with credentials; and session cookies carry Secure+HttpOnly+SameSite (`__Host-` prefixed) — all proven by the header scan, CSP evaluator, and CORS/cookie checks above.

package/skills/contract-testing/SKILL.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+name: contract-testing
+description: Implements consumer-driven contract testing so services deploy independently without a full integration environment — the consumer's unit tests record concrete request/response expectations against a stub (Pact `pact-jvm`/`pact-js`/`pact-python`, or Spring Cloud Contract DSL), the resulting contract (pact file / Spring stub jar) is published to a broker (Pact Broker / PactFlow) tagged by consumer version + branch + environment, the provider replays every expectation against its real app in CI with provider states (`@State` / `Given`) seeding data, and `pact-broker can-i-deploy --pacticipant X --version <git-sha> --to-environment production` gates the pipeline — plus webhook-triggered provider verification on contract change, bi-directional contracts (verify a provider's OpenAPI against consumer pacts without running the provider), pending/WIP pacts so a new consumer expectation never breaks the provider build, and version pinning via the consumer's git SHA with `record-deployment`/`record-release`.
+when_to_use: You have ≥2 services that talk over HTTP/messages and want to catch integration breakage in fast unit-speed CI instead of a brittle shared E2E env — adding Pact or Spring Cloud Contract, wiring a Pact broker, gating deploys with can-i-deploy, or deciding consumer-driven vs bi-directional contracts. Distinct from rest-graphql-contract (defines the API spec/schema itself — OpenAPI/GraphQL SDL/JSON Schema; this skill tests that two specific deployed versions actually agree) and schema-evolution-compatibility (the back/forward-compat rules a change must obey; this skill is the CI mechanism that proves a given consumer↔provider pair still satisfies them).
+---
+## When to Use
+Reach for this skill when two or more independently deployed services integrate and you want integration confidence at unit-test speed, not via a fragile end-to-end stack:
+- "Provider changed a field and a consumer broke in prod — catch it in CI before merge"
+- "Our shared staging/E2E env is flaky and slow; we want to test integration without it"
+- "Add Pact / Spring Cloud Contract between our frontend/BFF and the API"
+- "Gate the deploy: don't ship the provider until every consumer's contract still passes"
+- "We already have an OpenAPI spec — verify the provider matches it AND the consumers (bi-directional)"
+- "A new consumer's expectation shouldn't be able to red the provider's build (pending pacts)"
+- "Mobile app v3 is still live; how do we know the provider didn't drop a field v3 needs?"
+NOT this skill:
+- Authoring the API spec/schema (OpenAPI, GraphQL SDL, JSON Schema, field types, pagination shape) → rest-graphql-contract (defines *what* the API is; this skill proves two running versions *agree* on it)
+- The back/forward-compatibility *rules* (additive-only, never-remove-required, default-on-new-optional) → schema-evolution-compatibility (the policy; this skill is the per-pair CI enforcement of it)
+- gRPC/protobuf service definition and codegen → design-protobuf-grpc-service (you can still Pact-test gRPC via message pacts, but the `.proto` itself lives there)
+- General API design / breaking-change review of a diff → api-design-review
+- Browser/UI end-to-end flows across the whole app → write-playwright-e2e (this skill *replaces* most cross-service E2E with isolated pair contracts)
+- Structuring the unit-test suite itself / assertions / fixtures → write-tests, test-data-factories (this skill specifies the contract interactions; those build the surrounding suite/data)
+- Wiring the CI stages / runners / caching → cicd-pipeline-author; the deploy gate's release flow → deploy-release (this skill supplies the can-i-deploy check those stages run)
+## Steps
+1. **Pick consumer-driven (Pact) when consumers know what they need; bi-directional/spec-driven when the provider already owns an OpenAPI/GraphQL spec.** They are not interchangeable:
+   | Approach | How it works | Use when | Limitation |
+   |---|---|---|---|
+   | **Consumer-driven (Pact)** | consumer's tests *generate* expectations; provider *replays* them against the real app | consumers drive the API; you want to know exactly which fields are used | provider must run verification against real code; needs provider states |
+   | **Bi-directional (PactFlow)** | provider's OpenAPI is verified as a "provider contract"; consumer pacts compared statically against it — provider need not run | provider already has a trustworthy spec; can't run full provider verification | only as good as the spec; a spec that lies passes |
+   | **Spring Cloud Contract** | contracts in Groovy/YAML DSL live with the *provider*; generate provider tests + a stub jar consumers run against | JVM-heavy estate, provider-owned contracts, message + HTTP | JVM-centric; less natural for polyglot consumers |
+   Default to **consumer-driven Pact** for polyglot HTTP/message estates; **Spring Cloud Contract** for an all-JVM shop; add **bi-directional** when a provider can't feasibly run verification but has a real OpenAPI.
+2. **Write the consumer test against a Pact mock — assert on the request you send and matchers (not literals) for the response.** The consumer test spins up Pact's local mock server, you exercise your real client code against it, and Pact records the interaction. Use **matchers** so the contract pins *structure/type*, not brittle example values:
+   ```js
+   // pact-js v3+ (V3/V4 spec)
+   const { PactV3, MatchersV3: M } = require('@pact-foundation/pact');
+   const provider = new PactV3({ consumer: 'web-bff', provider: 'orders-api' });
+   provider
+     .given('order 42 exists')                       // provider state — seeds data later
+     .uponReceiving('a request for order 42')
+     .withRequest({ method: 'GET', path: '/orders/42',
+                    headers: { Accept: 'application/json' } })
+     .willRespondWith({ status: 200,
+       headers: { 'Content-Type': M.regex('application/json.*', 'application/json') },
+       body: { id: M.integer(42), total: M.decimal(19.99),
+               status: M.regex('PAID|PENDING', 'PAID'),
+               items: M.eachLike({ sku: M.string('ABC'), qty: M.integer(1) }) } });
+   await provider.executeTest(mock => new OrdersClient(mock.url).getOrder(42));
+   ```
+   Rules: assert only on **fields the consumer actually reads** (Pact verifies the provider returns *at least* these — extra provider fields are fine; that's how providers stay free to add). Use `integer/decimal/string/regex/eachLike/like`, never hardcoded values, or any data change reds the provider. One `given(...)` per distinct precondition; the string must match a provider state handler exactly.
+3. **Run the consumer test in normal unit CI; it emits a pact JSON file as a side effect — there is no provider involved here.** `npm test` / `mvn test` / `pytest` produces `pacts/web-bff-orders-api.json`. This runs at unit speed, no network, no provider deployed. The pact file is the deliverable.
+4. **Publish the pact to a broker, tagged with the consumer's git SHA + branch + (later) environments.** The broker is the exchange point; never email pact files around.
+   ```bash
+   pact-broker publish ./pacts \
+     --consumer-app-version $(git rev-parse --short HEAD) \
+     --branch $GIT_BRANCH \
+     --broker-base-url $PACT_BROKER_URL --broker-token $PACT_BROKER_TOKEN
+   ```
+   **Version MUST be the git SHA (or `<semver>+<sha>`), not a timestamp or "latest"** — can-i-deploy reasons about specific versions, and a non-unique version corrupts the matrix. `--branch` enables WIP/pending-pact selection. Self-host the OSS **Pact Broker** (Docker, Postgres-backed) or use hosted **PactFlow** (adds bi-directional + WIP UI).
+5. **Provider verification: replay every consumer's pact against the real running provider, seeding data via provider-state handlers.** The provider pulls pacts from the broker by **consumer version selectors** (not "all pacts ever") and runs them against a real instance:
+   ```java
+   // pact-jvm JUnit5
+   @Provider("orders-api")
+   @PactBroker(url="${PACT_BROKER_URL}", selectors = {
+       @VersionSelector(deployedOrReleased = true),   // pacts live in any env
+       @VersionSelector(mainBranch = true) })          // + main branch
+   class OrdersApiPactTest {
+     @State("order 42 exists")                          // matches given(...) string EXACTLY
+     void seedOrder42() { db.insertOrder(42, "PAID"); } // arrange real data
+     @TestTemplate @ExtendWith(PactVerificationInvocationContextProvider.class)
+     void verify(PactVerificationContext ctx) { ctx.verifyInteraction(); }
+   }
+   ```
+   Verify against the **real app + a test DB**, not mocks — the point is to prove the actual provider satisfies the expectation. **`@State` handlers are mandatory and must be idempotent**; they set up exactly the data the interaction needs and clean up after. A missing/misnamed state handler fails verification with "state not found".
+6. **Publish verification results back to the broker so the matrix is complete on both sides.** Set `pact.verifier.publishResults=true` (pact-jvm) / `publishVerificationResult: true` (pact-js) **only in CI, keyed to the provider's git SHA**. This is what lets can-i-deploy answer "has provider@sha verified consumer@sha?" — without it the matrix has holes and the gate fails open or stuck.
+7. **Gate every deploy with `can-i-deploy` against the target environment — this is the whole payoff.** Before shipping either side, ask the broker whether this version is compatible with everything currently in the target env:
+   ```bash
+   pact-broker can-i-deploy \
+     --pacticipant orders-api --version $(git rev-parse --short HEAD) \
+     --to-environment production --retry-while-unknown 30 --retry-interval 10
+   # exit 0 = safe to deploy; non-zero = a consumer in prod would break → fail the stage
+   ```
+   `--retry-while-unknown` waits for in-flight verifications instead of failing on a race. After a successful deploy, record it so the matrix tracks what's live:
+   ```bash
+   pact-broker record-deployment --pacticipant orders-api \
+     --version $(git rev-parse --short HEAD) --environment production
+   ```
+   Use `record-deployment` for environments you replace-in-place (one version live), `record-release`/`record-support-ended` for things like mobile apps where **multiple versions are live at once** — that's how you stop the provider dropping a field old app builds still need.
+8. **Trigger provider re-verification automatically on contract change via broker webhooks.** Configure a broker **webhook** on `contract_content_changed` / `contract_requiring_verification_published` to POST to the provider's CI (GitHub Actions `repository_dispatch`, GitLab pipeline trigger). New consumer expectation published → provider pipeline runs verification → result published → consumer's can-i-deploy unblocks. Without this the loop is manual and contracts rot.
+9. **Use pending pacts + WIP pacts so a new/changed consumer expectation can't red the provider's main build.** Enable `enablePending: true` and `includeWipPactsSince: <date>` in the provider's selectors. A brand-new consumer expectation is verified but reported as **pending** — failures are visible but **non-blocking** for the provider — until it verifies green once, at which point it becomes blocking. This decouples teams: a consumer can publish a forward-looking contract without breaking the provider's release, and the provider opts in when ready. Pair with branch-based selectors so you verify against `main` + `deployedOrReleased`, not every stale feature-branch pact.
+10. **For async/messaging, use message pacts; for the provider's own spec, optionally add a bi-directional contract.** **Message pacts**: the consumer asserts on a *message body* it can handle (no HTTP mock); the provider verifies its producer function emits a matching message — same broker, same can-i-deploy. **Bi-directional**: publish the provider's OpenAPI as a provider contract (`pactflow-cli publish-provider-contract openapi.yaml`); PactFlow statically cross-validates consumer pacts against it, so the provider needn't run verification — accept the tradeoff that a wrong spec passes (mitigate by also asserting the spec in the provider's own tests).
+## Common Errors
+- **Asserting on literal example values instead of matchers.** Hardcoding `total: 19.99` means any data change reds provider verification. Fix: `M.decimal()/integer()/regex()/eachLike()` — pin type/structure, not the example.
+- **Consumer over-specifies fields it doesn't use.** Asserting on every response field couples you to the provider's full shape and blocks its additive changes. Fix: assert only the fields the consumer reads; extra provider fields must pass.
+- **Provider state string ≠ `@State`/`Given` handler.** `given('order exists')` vs `@State("order 42 exists")` → "no state handler" verification failure. Fix: keep the strings byte-identical; treat them as a shared contract.
+- **Verifying the provider against mocks/in-memory stubs.** Defeats the purpose — you prove the mock matches, not the real app. Fix: run verification against the real provider + test DB seeded by state handlers.
+- **Versioning pacts with `latest`/timestamps instead of the git SHA.** can-i-deploy's matrix needs unique, reproducible versions; "latest" makes the gate meaningless. Fix: `--consumer-app-version <git-sha>`, branch via `--branch`.
+- **Not publishing verification results (or publishing from local dev).** Holes in the matrix → can-i-deploy can't answer → gate fails open or hangs. Fix: publish results only from CI, keyed to the provider SHA.
+- **Skipping can-i-deploy and just deploying.** Contracts that aren't gated provide false safety. Fix: make can-i-deploy a required pipeline stage that fails the deploy on non-zero exit; add `record-deployment` after.
+- **No pending pacts → new consumer expectation reds the provider main build.** Teams get blocked on each other and disable Pact in frustration. Fix: `enablePending` + WIP pacts; new expectations are non-blocking until first green.
+- **Treating Pact as full-coverage E2E.** Pact verifies the request/response *shape* per interaction, not business correctness or multi-hop flows. Fix: keep a thin layer of true E2E for critical journeys; Pact replaces the broad, flaky middle.
+- **Forgetting multi-version providers (mobile).** `record-deployment` assumes one live version; old app builds still in the wild get dropped. Fix: `record-release`/`record-support-ended` so can-i-deploy keeps every supported app version in the matrix.
+- **Webhook not configured → manual verification loop.** Contracts published but provider never re-verifies, so the broker shows stale green. Fix: `contract_requiring_verification_published` webhook → provider CI dispatch.
+## Verify
+1. **Consumer test produces a pact at unit speed:** running the consumer suite emits `pacts/<consumer>-<provider>.json` with matchers (not literals), no provider or network involved.
+2. **Provider verification replays real interactions:** the provider's verification task pulls pacts from the broker, runs against the real app + seeded DB via every `@State`/`Given` handler, and all interactions pass (or are explicitly pending).
+3. **Matrix is complete both ways:** the broker shows the consumer pact *and* a published verification result for the provider's version — no "unverified" holes.
+4. **Gate actually blocks:** introduce a breaking provider change (drop/rename a consumed field), run `can-i-deploy --to-environment production` → it exits non-zero and the deploy stage fails; revert → exit 0.
+5. **Additive change is safe:** add a new optional field on the provider → consumer pact still verifies green and can-i-deploy passes (proves extra fields don't break consumers).
+6. **Pending pacts don't red main:** publish a new consumer expectation the provider doesn't yet satisfy → provider build reports it pending/non-blocking, not failed; once provider implements it and verifies, it becomes blocking.
+7. **Versions are git SHAs:** every publish/verify/record uses `git rev-parse` versions; grep CI for `latest`/timestamp versions and remove them.
+8. **Webhook closes the loop:** publishing a changed contract auto-triggers the provider's verification pipeline; the broker reflects the fresh result without manual intervention.
+9. **Multi-version handled (if applicable):** `record-release` keeps every supported mobile/app version in the matrix; can-i-deploy refuses a provider change that breaks any still-supported version.
+Done = consumers generate matcher-based pacts at unit speed, the provider replays them against the real app with idempotent state handlers, verification results and deployments are recorded to the broker keyed by git SHA, every deploy is gated by can-i-deploy against the target environment, new expectations land as non-blocking pending pacts, and contract changes auto-trigger provider re-verification via webhook — proven by the breaking-change-blocks / additive-change-passes / pending-doesn't-red tests in checks 4–6.