npm - @rafter-security/cli - Versions diffs - 0.7.0 → 0.7.2 - Mend

@rafter-security/cli 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/README.md +20 -1
package/dist/commands/agent/audit-skill.js +2 -1
package/dist/commands/agent/audit.js +27 -0
package/dist/commands/agent/components.js +800 -0
package/dist/commands/agent/disable.js +47 -0
package/dist/commands/agent/enable.js +50 -0
package/dist/commands/agent/index.js +6 -0
package/dist/commands/agent/init.js +162 -164
package/dist/commands/agent/list.js +72 -0
package/dist/commands/brief.js +20 -0
package/dist/commands/docs/index.js +18 -0
package/dist/commands/docs/list.js +37 -0
package/dist/commands/docs/show.js +64 -0
package/dist/commands/mcp/server.js +84 -0
package/dist/commands/skill/index.js +14 -0
package/dist/commands/skill/install.js +89 -0
package/dist/commands/skill/list.js +79 -0
package/dist/commands/skill/registry.js +273 -0
package/dist/commands/skill/remote.js +333 -0
package/dist/commands/skill/review.js +975 -0
package/dist/commands/skill/uninstall.js +65 -0
package/dist/core/audit-logger.js +262 -21
package/dist/core/config-manager.js +3 -0
package/dist/core/docs-loader.js +148 -0
package/dist/core/policy-loader.js +72 -1
package/dist/index.js +6 -0
package/package.json +1 -1
package/resources/skills/rafter/SKILL.md +76 -96
package/resources/skills/rafter/docs/backend.md +106 -0
package/resources/skills/rafter/docs/cli-reference.md +199 -0
package/resources/skills/rafter/docs/finding-triage.md +79 -0
package/resources/skills/rafter/docs/guardrails.md +91 -0
package/resources/skills/rafter/docs/shift-left.md +64 -0
package/resources/skills/rafter-code-review/SKILL.md +91 -0
package/resources/skills/rafter-code-review/docs/api.md +90 -0
package/resources/skills/rafter-code-review/docs/asvs.md +120 -0
package/resources/skills/rafter-code-review/docs/cwe-top25.md +78 -0
package/resources/skills/rafter-code-review/docs/investigation-playbook.md +101 -0
package/resources/skills/rafter-code-review/docs/llm.md +87 -0
package/resources/skills/rafter-code-review/docs/web-app.md +84 -0
package/resources/skills/rafter-secure-design/SKILL.md +103 -0
package/resources/skills/rafter-secure-design/docs/api-design.md +97 -0
package/resources/skills/rafter-secure-design/docs/auth.md +67 -0
package/resources/skills/rafter-secure-design/docs/data-storage.md +90 -0
package/resources/skills/rafter-secure-design/docs/dependencies.md +101 -0
package/resources/skills/rafter-secure-design/docs/deployment.md +104 -0
package/resources/skills/rafter-secure-design/docs/ingestion.md +98 -0
package/resources/skills/rafter-secure-design/docs/standards-pointers.md +102 -0
package/resources/skills/rafter-secure-design/docs/threat-modeling.md +128 -0
package/resources/skills/rafter-skill-review/SKILL.md +106 -0
package/resources/skills/rafter-skill-review/docs/authorship-provenance.md +82 -0
package/resources/skills/rafter-skill-review/docs/changelog-review.md +99 -0
package/resources/skills/rafter-skill-review/docs/data-practices.md +88 -0
package/resources/skills/rafter-skill-review/docs/malware-indicators.md +79 -0
package/resources/skills/rafter-skill-review/docs/prompt-injection.md +85 -0
package/resources/skills/rafter-skill-review/docs/telemetry.md +78 -0

package/resources/skills/rafter-secure-design/docs/dependencies.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Dependencies & Supply Chain — Design Questions
+Every dependency is a trust transfer: their bugs become yours, their maintainers become your dependency on goodwill. The question at design time is "is this worth the transfer?"
+## Pick vs. write — which one
+- Cryptography, authN / authZ primitives, parsers for complex formats, protocol implementations: **pick, don't write.** The library has years of eyes and fuzz time.
+- Glue code, config loaders, small utility functions: **write, don't pick.** A 5-line helper beats a transitively-huge dependency.
+- The middle (rate limiters, retry logic, caches): depends on how mature your language's standard library is. Go stdlib + a small helper often beats pulling in a 300-line middleware framework.
+## Maintenance signal — before you adopt
+Read the repo before adopting. Answers to these in one sitting:
+- When was the last commit, release, CVE response? Dormant ≠ dead, but "last release 2019" for a security-adjacent lib is a risk.
+- How many maintainers? Solo-maintainer packages are a bus-factor and takeover risk (npm `event-stream`, PyPI `ctx`).
+- Does the project publish a security policy (SECURITY.md, GHSA history)? Projects that have handled CVEs well handle them well.
+- Download count and reverse-dependency count: high-popularity packages get eyes on them; low-popularity is higher chance of silent badness.
+- Typosquat / slopsquat check: is this the real package name? LLM-generated install instructions now routinely hallucinate package names that bad actors then register. Verify from the project's own README / GitHub.
+## Install-time execution
+- `postinstall` / `preinstall` / `prepare` hooks in npm, arbitrary `setup.py` code in Python, Gradle init scripts, Cargo build scripts — all run with your developer's or CI's permissions.
+- Does your package manager have a way to disable these? npm `--ignore-scripts`, `pnpm install --ignore-scripts` + allowlist via `packageExtensions`. Pip has `--no-binary` but less granular.
+- CI should install with the strictest flags. Developers can run with scripts enabled *after* review.
+## Pinning & lockfiles
+- Lockfile (`package-lock.json`, `pnpm-lock.yaml`, `yarn.lock`, `poetry.lock`, `Cargo.lock`, `go.sum`) committed. No exceptions for "libraries" — downstream lockfiles are the user's responsibility, but your CI needs reproducibility.
+- Range pinning in the manifest (`^1.2.3`) is fine for libraries; applications benefit from exact pins + a lockfile for reproducibility.
+- Lockfile verification in CI (`npm ci`, `pnpm install --frozen-lockfile`, `yarn install --immutable`, `poetry install --no-update`). Without verification, a drifted lockfile ships unknown code.
+## Vendoring vs. registry
+- Registry (npm, PyPI, Go proxy, crates.io): convenient, but the registry is a trust root. Compromise of a maintainer account has shipped malware repeatedly.
+- Registry mirror / proxy (Artifactory, Cloudsmith, Google Artifact Registry): lets you cache + scan + pin. Best-of-both for teams with infra.
+- Vendoring: committing dependency code into your repo. Highest control, highest cost. Justified for (a) critical dependencies you need to patch locally, (b) airgapped builds, (c) compliance requirements.
+## SCA — hook it in, don't treat it as a quarterly task
+- SCA on every PR and on main: Dependabot, Renovate, Snyk, Trivy, Grype, `rafter run` (which aggregates SCA).
+- Auto-PRs for dependency updates: accept them with tests gating. Batching 3 months of updates is worse than a weekly drip.
+- Critical CVEs (known-exploited, CVSS ≥ 9): page on detection, not "log and review later".
+- Noise management: not every CVE applies to how you use the library. Triage policy is part of the design — who decides what's accepted, and how is the decision logged?
+## Supply chain attacks to design against
+- **Typosquat / slopsquat**: package name misspellings, especially for names an LLM might generate. Pin from upstream README only.
+- **Dependency confusion**: your private package name registered publicly. Publish a placeholder of your internal package names, or use scoped packages with registry routing.
+- **Maintainer takeover**: compromised maintainer account publishes malware. Defenses: pin by digest (where supported), monitor for unexpected releases.
+- **Protestware / hacktivism**: maintainer deliberately ships malware or destructive code (e.g., `node-ipc`). Pinning catches it; SCA post-mortem confirms.
+- **Compromised CI**: build-time tamper that injects malware into your artifact. Defenses: reproducible builds, signed provenance (SLSA), isolated build environment.
+## Transitive depth
+- How deep is the dep tree? `npm ls` / `cargo tree` / `pipdeptree`. Dozens of transitive deps per direct dep = huge attack surface.
+- Does each direct dep pull in its own HTTP client, its own JSON parser, its own date library? Consolidate at the application level where possible.
+- Transitive version conflicts: which wins? In npm / pnpm, hoisting rules. In Python, last-wins. Explicit `overrides` / `resolutions` let you force a patched version.
+## Container images as dependencies
+- Base images are dependencies — same maintenance questions apply. Distroless (Google-maintained) and Chainguard (security-first) are first-party; random Docker Hub images are not.
+- Pin by digest. `image:tag` is mutable.
+- Multi-stage builds: builder image can be heavy; final image should be minimal. Don't ship your build toolchain to prod.
+- Image scanning in CI: `trivy image`, `grype`, cloud-native scanners. Block deploys on critical findings for production.
+## SaaS dependencies
+- Adopting a SaaS is also a dep: your data, their availability and security posture.
+- Do they publish a SOC 2 / ISO 27001 / security whitepaper? Not gospel, but absence is a signal.
+- Where does the data live (region, sub-processors)? For PII, this is a compliance question.
+- Offboarding: if they vanish or you churn, how do you migrate? Vendor lock-in is a security issue too (can't rotate away from a breach).
+## LLM / AI libraries — the new supply chain
+- Model weights are dependencies. Which model, which version, hosted where?
+- Inference SDKs (openai, anthropic, litellm) are dependencies with the standard risks *plus* credential-surface (API keys per provider).
+- Vector DB clients (pinecone, qdrant, chroma) are dependencies that also hold your embeddings — classify accordingly.
+- `prompt-injection-guard` style libraries are pattern-based and will never catch novel attacks — adopt but don't trust absolutely.
+## Refuse-list
+- Pulling a dependency from a raw git URL or GitHub tarball without pinning commit SHA.
+- Adopting a package because an LLM suggested the name, without verifying it exists upstream (slopsquat bait).
+- `:latest` tags on base images or dependency versions.
+- CI that installs with `postinstall` enabled on every run without script review.
+- Solo-maintained packages in your critical path (auth, crypto, payments) without a forking / vendoring plan.
+- Adopting a SaaS for a compliance-scoped workload without reviewing their posture.
+- Skipping the lockfile because "we're a library".
+- SCA as a quarterly scan rather than a PR-level gate.
+---
+## Exit criteria
+- Every new direct dependency has a one-line justification (pick vs. write, maintenance signal reviewed).
+- Install-time execution policy is specified for CI.
+- Lockfile + verification in CI is confirmed.
+- SCA tool is wired to PRs, with a triage policy for findings.
+- Base images are pinned by digest with a rebuild cadence.
+- If the design uses a SaaS or LLM provider, the data-flow and credential-scope are drawn.

package/resources/skills/rafter-secure-design/docs/deployment.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Deployment — Design Questions
+Deployment is where "the app is secure" meets reality. Network boundaries, runtime posture, secret distribution, build provenance — each decided here survives every refactor of the code.
+## Network topology — zones, not flat
+- Sketch zones: public edge (LB / CDN / WAF), app tier, data tier, admin tier, third-party egress. Each is a distinct security zone.
+- What traffic is allowed **between** zones, and what's denied by default? Default-deny is the only sane starting point. If the default is allow and you block selectively, you're one misconfiguration from exposure.
+- Public edge: what terminates TLS? WAF in front or not? WAF is good for cheap-filter; not a substitute for app-side validation.
+- Admin access (SSH, kube-exec, DB console): over the public internet? Over a VPN / zero-trust proxy (Tailscale, Cloudflare Access, Teleport)? The public-internet-with-a-bastion is a 2005 pattern.
+## Egress — the forgotten boundary
+- Can your app reach arbitrary internet destinations? Default should be "allowlist of known egress targets" (external APIs you integrate with, OS package mirrors, telemetry).
+- Egress control is the best SSRF defense *and* the best data-exfiltration defense. If a compromised app can only reach `api.stripe.com`, the blast radius is Stripe calls.
+- Metadata services (169.254.169.254): block at the network layer, not just the app. IMDSv2 on AWS (required hop limit = 1 + session token) blocks the rebinding variant.
+## Identity & IAM
+- Every compute workload has a workload identity (AWS IAM role, GCP service account, Kubernetes ServiceAccount + bound tokens, SPIFFE ID). **Not shared credentials, not long-lived keys.**
+- Least privilege per workload. "The web service has DB read + DB write + admin on this one table" is better than "the web service has AdminAccess".
+- Break-glass access: there's an auditable path for a human to gain emergency privileges. Not a shared `root` password.
+- IAM changes go through code review (Terraform PR, Pulumi PR). Click-ops IAM is how wide-open permissions persist.
+## Secret distribution
+- Where does each service get its secrets? Secret manager (Vault, AWS SM, GCP SM, Kubernetes Secrets with sealed / external-secrets), *not* Terraform-plan output, *not* env vars set by a deploy script that logs them.
+- Secrets rotate. Short-lived DB credentials (Vault dynamic secrets, IAM database auth) > long-lived passwords. If your design says "quarterly rotation of a static password", name who does it and how.
+- Secrets are scoped per service. The web tier doesn't have the admin DB credential.
+- Encryption-at-rest for the secret manager itself: by default on all cloud-managed; verify for self-hosted.
+- Secrets in CI: scoped per job, never printed to logs, masked in output. PR workflows triggered from forks don't see secrets.
+## Container / runtime posture
+- Run as non-root. If `USER 0` or `runAsUser: 0`, flag it.
+- Read-only root filesystem where possible. Writable mounts are explicit (`/tmp`, named volumes).
+- Capabilities: drop all, add back only what's needed. `CAP_NET_BIND_SERVICE` is the usual one.
+- Seccomp / AppArmor / SELinux profile: a real profile, not "Unconfined".
+- Resource limits: CPU and memory limits per container. No limit = one compromised pod can starve the node.
+## Base images
+- Distroless / Alpine / minimal / scratch > Ubuntu full. Fewer packages = fewer CVEs, smaller attack surface.
+- Pin by digest (`image@sha256:...`), not tag. `:latest` and even `:v1.2.3` can be overwritten; digests are immutable.
+- SCA on base images in CI. Re-pull / rebuild cadence (weekly) to pick up upstream patches.
+- Who maintains the base image? First-party (your team) > team-adjacent > "some Docker Hub account". Unmaintained bases rot.
+## Build provenance & supply chain
+- Is the build reproducible? Given the same inputs, does a rebuild produce the same artifact? Not always achievable, but worth asking.
+- SLSA level: aim for SLSA 3 (hosted builder, signed provenance) for anything shipping to production. SLSA 1 (provenance exists) is the minimum.
+- Artifact signing: Sigstore / Cosign / Notary. Signatures verified at deploy, not just at build.
+- Dependency pinning: lockfile committed, lockfile verified in CI.
+- `postinstall` / `prepare` scripts from dependencies: ban or audit. These execute arbitrary code on install — it's the npm supply-chain attack class.
+- SBOM generation at build time. Store it with the artifact.
+## CI/CD posture
+- Who can deploy to prod? Production deploys gated on approval, signed tags, or protected branch merges.
+- CI runners: ephemeral (fresh VM / container per job), not long-running hosts with persistent state.
+- Workflow permissions: least-privilege GITHUB_TOKEN / equivalent. Write-all is the click-to-compromise default.
+- Self-hosted runners + public repo = RCE. Either make the repo private, use GitHub-hosted runners for public workflows, or lock runners to specific workflows.
+- Branch protection: required reviews, required status checks, no force-push to main. Linear history if you need audit simplicity.
+## Production-vs-staging parity
+- Same architecture in staging as prod, with masked / synthetic data. Staging that uses prod data = a second prod blast radius with half the controls.
+- Config differences are explicit and minimal. "We disable auth in staging" is how auth gets disabled in prod one day by accident.
+- Feature flags that default-off in prod and default-on in staging: tested in both states.
+## Multi-region / DR
+- If the design spans regions: is the active/passive or active/active model clear? What's replicated, what's per-region?
+- Encryption keys per region, or a global key? (Global is simpler but expands blast radius.)
+- Failover runbook exists and was tested in the last 12 months. Not-yet-tested = doesn't work.
+## Logging & monitoring posture
+- Structured logs, shipped to a separate system (not the same DB the app writes to). A compromise of app storage shouldn't delete the audit trail.
+- Authentication to the log system: workload identity, not shared token.
+- What paging signals exist? Login-anomaly rates, authZ denials, 5xx surges, unusual egress — without these, the breach is found by the customer.
+- Retention: logs often outlive production data. Classify log contents and apply retention accordingly.
+## Refuse-list
+- Long-lived static cloud credentials baked into container images or env vars.
+- Privileged containers (`privileged: true`, `runAsUser: 0` without justification).
+- `:latest` tags or unpinned base images in production manifests.
+- CI workflows with write-all GITHUB_TOKEN scope by default.
+- "We'll add network policy later" — network default-allow is not a plan.
+- Secrets set via Terraform variable with plan output visible in logs.
+- Shared SSH keys, shared `root` password, shared admin console.
+- Metadata service reachable from a public-facing container (IMDSv1, or IMDSv2 with unlimited hop count).
+---
+## Exit criteria
+- Zone diagram exists; cross-zone traffic is allowlisted, not denylisted.
+- Each workload has a named identity and a scoped IAM role.
+- Secret distribution names the secret manager and the rotation model.
+- Container runtime posture is specified: user, filesystem, capabilities, resource limits.
+- Build pipeline specifies provenance (SLSA), signing, and dependency pinning.
+- Log shipping + retention is set, independent of application storage.

package/resources/skills/rafter-secure-design/docs/ingestion.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Ingestion — Design Questions
+Every byte crossing your trust boundary is a question: "who says this is safe, and how?" Most of the OWASP Top 10 lives at ingestion — parsers, decoders, fetchers, uploaders.
+## Trust boundaries — name them
+- Draw the boundary: external (internet, partner API, user upload) → your edge → your internal services → your storage.
+- Each boundary crossing is a *validation point*. Validation means: shape check (schema), size check (bytes / fields), semantic check (does this make sense here?).
+- Validation at the edge is necessary but not sufficient — internal services that re-read the data need to re-validate if the trust delta matters (e.g., a cached input re-used later as a filename).
+- Parsers *are* the boundary for complex formats. A "validated JSON blob" that contains an eval-able code path is still a hole.
+## Input schemas — declare, don't hand-parse
+- Have a typed schema for every external input: JSON Schema, Zod, Pydantic, protobuf, OpenAPI-generated types. Reject unknown fields (`additionalProperties: false`).
+- Accepting unknown fields is how mass-assignment bugs enter — the attacker ships `is_admin: true` and the schema silently accepts it.
+- Length / size / range bounds on every field. Strings have max lengths, numbers have ranges, arrays have max sizes, nesting has max depth. Unbounded = DoS shape.
+- Regex validation: anchor with `^` and `$`. Fear catastrophic backtracking — test with a regex-safety linter or prefer RE2-backed engines.
+## Size limits — everywhere, early
+- Request body size cap at the edge (reverse proxy / API gateway). Don't rely on the framework to cap — it parses first, rejects second.
+- Per-field limits inside the body.
+- Upload size limits, file-count limits, total-request-size limits.
+- Decoder limits: JSON depth, XML entity count, zip expansion ratio (zip bomb). The default parser often has no cap — configure it explicitly.
+## Parser selection — safe default, not fast default
+- JSON: language-standard parser with strict mode. Reject duplicate keys (behavior varies across parsers — pick one that matches what your schema validator sees).
+- YAML: `yaml.safe_load` in Python, `js-yaml` with `safeLoad` / schema, `serde_yaml` with explicit types. **Never `yaml.load` without `SafeLoader`.**
+- XML: disable external entity resolution (XXE). `defusedxml` in Python, libraries with XXE off by default. If your design needs XML, flag this explicitly and pick the right library.
+- CSV: beware formula injection (`=CMD(...)` in a field opened by Excel). Prefix fields starting with `= + - @ \t \r` when exporting.
+- Protobuf / Thrift / MessagePack: safe-by-construction for schema violations, but size limits still needed.
+- Regex-heavy parsers: ReDoS risk. Prefer PEG / EBNF grammars for untrusted input where possible.
+- HTML / Markdown: never innerHTML raw; always sanitize (DOMPurify, bleach). Markdown renderers have inline-HTML modes — disable them for untrusted content.
+## Deserialization — the silent RCE
+- Any of `pickle.loads`, `yaml.load` (default), Java `ObjectInputStream`, PHP `unserialize`, .NET `BinaryFormatter`, `Marshal.load` — on untrusted bytes — is RCE-shaped.
+- If you *need* cross-language serialization: JSON, Protobuf, MessagePack, Avro. If you *need* native: sign the payload (HMAC) so only your own emitters are accepted, and still validate after deserialization.
+- Node `JSON.parse` + object assignment: prototype pollution via `__proto__` / `constructor` / `prototype` keys. Use `Object.create(null)` for dictionaries or a library that filters.
+## File uploads
+- What file types are accepted? Allowlist by **content sniff + declared MIME + extension**, not any one of them alone.
+- Storage: write under a random name (UUID) — never preserve the client-supplied filename in the path. Preserving it enables path traversal and overwrite attacks.
+- Scanner: for user-to-user content, run an AV / malware scan. For images, re-encode to strip EXIF + polyglot tricks.
+- Serving: serve from a different origin / subdomain than your app (so a rendered SVG or HTML can't steal same-origin cookies). Set `Content-Disposition: attachment` for anything that isn't trusted media.
+- Size: per-file and per-user/per-day quotas. Unbounded upload = cheap DoS + storage bomb.
+## Server-side fetchers — SSRF-shaped
+If any part of the design does "take a URL from user, fetch it":
+- Is there a concrete business reason? Image proxy, webhook configurer, PDF-from-URL, OAuth metadata fetch — each is a known SSRF vector.
+- Allowlist the destination **after** DNS resolution. `https://attacker.com` that DNS-resolves to `127.0.0.1` is the rebinding attack — resolve first, then decide.
+- Deny: RFC1918 (10/8, 172.16/12, 192.168/16), link-local (169.254/16), loopback (127/8, ::1), cloud metadata (169.254.169.254, metadata.google.internal, fd00:ec2::254), IPv6 equivalents, and any internal CIDR you own.
+- Redirects are fresh SSRF checks per hop. Disable redirects or re-validate each one.
+- Timeouts + max-response-size: unbounded fetches = DoS.
+- Response parsing: the fetched content is *still untrusted*. Don't eval it, don't template it, don't copy it to storage unsanitized.
+## Content rendering — templates, markdown, rich text
+- Which template engine? Autoescape on by default for HTML (`{{ user }}` escapes). The unsafe marker is `|safe` (Jinja), `{!!  !!}` (Blade), `dangerouslySetInnerHTML` (React), `v-html` (Vue). Every use of the unsafe marker is a review point.
+- Markdown: does the renderer allow inline HTML? For untrusted authors, disable it or sanitize post-render with a DOMPurify-equivalent.
+- Rich text (TinyMCE, Quill, Slate): sanitize the HTML output *server-side* before storing. Client-side sanitization is advisory, not authoritative.
+- SVG: SVGs can embed scripts. Re-render to PNG server-side, or sanitize with a tool that strips `<script>`, event handlers, and external references.
+## Search inputs
+- Full-text search: user input goes into a query parser (Lucene syntax, etc.). Is there an injection risk (`field:*` to bypass scoping)? Sanitize or use parameterized search API.
+- Sort / filter parameters: if user-controlled, allowlist the column names. `ORDER BY {user_input}` is SQL injection even if the rest of the query is parameterized.
+## Imports (batch data)
+- CSV / XLS / JSON imports are trust-boundary crossings at scale. Same rules — schema, size, field limits — applied per row.
+- Streaming vs. load-all: streaming is kinder to memory and enables early rejection. Load-all with a 1GB file = OOM.
+- Partial-failure semantics: if row 500 is bad, does the import roll back rows 1-499? Either answer can be right, but it must be *decided*, not accidental.
+## Refuse-list
+- `yaml.load` / `pickle.loads` / `Marshal.load` on any externally-sourced bytes.
+- XML parsers with external entity resolution enabled.
+- Uploads stored under client-supplied filenames.
+- Server-side URL fetchers without an allowlist + post-DNS IP denylist.
+- Schemas that accept unknown fields (`additionalProperties: true` by default).
+- Unbounded sizes: no request body cap, no per-field length, no decoder depth limit.
+- Markdown / HTML rendering of untrusted content without server-side sanitization.
+- Regex patterns without anchors or on backtracking engines with untrusted input.
+---
+## Exit criteria
+- Every external input has a named schema and a size/shape limit.
+- Parser choices are listed with the safe variant selected.
+- If any fetcher is in the design, its allowlist + IP denylist + redirect policy is specified.
+- File upload flow names the content-sniff library, the storage-naming scheme, and the serving origin.
+- The design identifies every "untrusted bytes → executable context" path and closes it.

package/resources/skills/rafter-secure-design/docs/standards-pointers.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Standards & Frameworks — Pointers
+This skill won't re-derive a compliance checklist. Pick the right baseline for your context, read the small number of sections that actually apply, and point your design doc at them. The goal is *known-adequate*, not *comprehensive*.
+## How to choose a baseline
+Answer these three before picking a framework:
+1. **Regulatory scope**: GDPR / CCPA (personal data)? HIPAA (health)? PCI-DSS (payment cards)? SOX (financial reporting)? Each forces specific controls; skipping the wrong one is non-negotiable risk.
+2. **Maturity goal**: "We need something defensible in review" (ASVS L1), "We handle meaningful PII" (ASVS L2 + SAMM intermediate), "We're a high-value target or regulated" (ASVS L3 + NIST SSDF + SOC 2).
+3. **Audit horizon**: will anyone external look at this? If yes, align with their expected framework early; retrofitting evidence is expensive.
+## App security baseline — OWASP ASVS
+[OWASP Application Security Verification Standard](https://owasp.org/www-project-application-security-verification-standard/).
+- **L1 — opportunistic**: external-facing apps without sensitive data. Covers basic auth, input validation, encoding, config. This is the floor; below L1 is "indefensible."
+- **L2 — standard**: apps handling PII, business-critical data, or B2B integrations. Adds cryptography requirements, session depth, access-control rigor.
+- **L3 — advanced**: high-value targets, regulated industries, critical infrastructure. Adds deep crypto requirements, defense-in-depth, hostile-environment assumptions.
+**Design-time use**: pick your level, scan the chapter for your domain (auth → V2, session → V3, access control → V4, validation → V5, crypto → V6, etc.), lift the requirements that match your design. Don't copy all 280+ requirements — that's audit prep, not design.
+`rafter-code-review/docs/asvs.md` has a deeper walk for review time.
+## Secure development lifecycle — NIST SSDF / SP 800-218
+[NIST Secure Software Development Framework](https://csrc.nist.gov/projects/ssdf).
+Four practice areas — *PO* (prepare the org), *PS* (protect the software), *PW* (produce well-secured software), *RV* (respond to vulnerabilities). Most relevant at design time:
+- **PW.1**: design software to meet security requirements and mitigate risks — the "why are we doing this skill" requirement.
+- **PW.4**: reuse existing well-secured software — dependency selection (see `docs/dependencies.md`).
+- **PW.6**: configure compilation/build/runtime for security — deployment posture.
+- **RV.1**: identify vulnerabilities on ongoing basis — SCA + scanning.
+Use SSDF as the program-level framework; ASVS fills in the per-app details.
+## Cloud & infra — CSA CCM / CIS Benchmarks / AWS Well-Architected
+- **[CSA Cloud Controls Matrix](https://cloudsecurityalliance.org/research/cloud-controls-matrix/)**: cloud-native control framework, maps to ISO 27001 / SOC 2 / NIST / etc. Good for answering "are we doing the cloud thing right?"
+- **[CIS Benchmarks](https://www.cisecurity.org/cis-benchmarks)** per cloud / OS / Kubernetes: concrete configuration checklists. Use for hardening specific components.
+- **[AWS Well-Architected — Security Pillar](https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/welcome.html)** (or GCP/Azure equivalents): opinionated cloud-architecture guidance. Start here before CCM for most AWS-native designs.
+Pick one per component. Don't adopt all three to "maximize coverage" — they overlap and you'll drown.
+## Threat modeling reference
+- **[Microsoft STRIDE](https://learn.microsoft.com/en-us/security/securing-devops/threat-modeling)** — the classic, used in `docs/threat-modeling.md`.
+- **[MITRE ATT&CK](https://attack.mitre.org/)** — tactics/techniques for *real-world* attacker behavior. Use to sanity-check "what would an attacker actually do?"
+- **[OWASP ASVS-companion ATM process](https://owasp.org/www-project-threat-modeling/)** — OWASP-flavored threat modeling methodology.
+- **[LINDDUN](https://linddun.org/)** — privacy-focused threat modeling (complement to STRIDE for PII-heavy designs).
+## Privacy & compliance
+- **GDPR**: data minimization, purpose limitation, user rights (access, deletion, portability), data transfer restrictions. Design-time decisions: what you collect, why, how long, where it lives.
+- **CCPA / CPRA**: similar shape, California-specific. Know-your-rights, opt-out of sale.
+- **HIPAA** (US health): PHI definition, covered entity / BA relationships. Requires BAAs with sub-processors.
+- **PCI-DSS** (cards): scope reduction is the name of the game — tokenize early, keep cardholder data out of your systems.
+- **SOC 2**: not a regulation, a report. Trust Services Criteria (Security, Availability, Confidentiality, Processing Integrity, Privacy). Design maps to controls; audit reads evidence.
+- **ISO 27001**: information security management system. Certification is program-level, not design-level, but many controls live in design decisions.
+At design time, the answer is usually "we're in scope for X, Y; Z doesn't apply; here's the mapping." Not a full compliance plan — just a pointer.
+## AI / LLM-specific
+- **[OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/)** (2025): prompt injection, data disclosure, supply chain, poisoning, improper output handling, excessive agency, prompt leakage, vector/embedding weaknesses, misinformation, unbounded consumption.
+- **[MITRE ATLAS](https://atlas.mitre.org/)**: ATT&CK for AI/ML systems.
+- **[NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework)**: risk management framework for AI (govern, map, measure, manage).
+- **EU AI Act** (if you ship in EU): risk categories + obligations. High-risk systems have heavy requirements; general-purpose AI has lighter.
+`rafter-code-review/docs/llm.md` walks LLM top 10 for review time.
+## Cheap-and-fast subset to start with
+If you have 30 minutes and need a defensible baseline for a new feature:
+1. Pick ASVS L1 or L2 (L2 if any PII).
+2. Read the chapter that matches your design's top risk (auth / input / access control / crypto — whichever is most novel).
+3. Check CIS Benchmark for your cloud + one container-level CIS (if containerized).
+4. Write the applicable compliance scope (GDPR? HIPAA? none?).
+5. Document the threat-model pass result (from `docs/threat-modeling.md`).
+This is less than a full compliance program and more than "we winged it." Most features don't need more at kickoff.
+## When to hire the specialist
+The pointers above get you to "informed designer." They do not replace:
+- A pentester on a high-risk launch.
+- A compliance counsel for novel regulatory scope (PCI-DSS 4.0, GDPR cross-border, HIPAA BAs).
+- A third-party audit for SOC 2 / ISO 27001 / FedRAMP.
+Budget for these where the risk warrants. Skipping them is a risk transfer to the future, usually at a higher cost.
+---
+## Exit criteria
+- The applicable regulatory scope is named (or "none, B2B internal only, accepted").
+- The baseline framework is picked (ASVS L?, SSDF yes/no).
+- The specific sections of the framework that match this design's novel risks are cited in the design doc.
+- Compliance obligations (if any) are routed to a human owner, not left as "TBD".

package/resources/skills/rafter-secure-design/docs/threat-modeling.md ADDED Viewed

@@ -0,0 +1,128 @@
+# Threat Modeling — STRIDE on the Specific Design
+This is the capstone. Walk after the individual decisions (auth, data, API, ingestion, deployment) are drafted. The goal: stress-test the design by asking "how would an attacker break *this specific thing*?"
+## Setup — the diagram you actually need
+Before STRIDE, draw two things. Prose is fine; ASCII is fine. Drawings get handwaved.
+1. **Data-flow diagram**: boxes for processes, cylinders for stores, arrows for flows. Label each arrow with what crosses it (request type, data fields).
+2. **Trust boundaries**: dotted lines *across* the arrows — every arrow that crosses a boundary is a security control point.
+Minimum sketch:
+```
+[Browser] → [CDN/WAF] ┆→ [API Gateway] → [App Service] ┆→ [DB]
+                                           ↓
+                                     [Third-Party API]
+```
+Boundaries: browser↔edge, edge↔app, app↔DB, app↔third-party.
+Each boundary is where STRIDE is most productive.
+## STRIDE — one per category, per boundary
+The trick is not to apply STRIDE globally; apply it to each trust-boundary crossing and each data-store.
+### S — Spoofing (identity)
+Applied per boundary: can the entity on the other side be impersonated?
+- Browser → edge: can an attacker present a valid-looking session cookie / token they didn't earn? (Authn strength, token theft, XSS → cookie steal.)
+- App → DB: is the DB credential stealable? Replayable? Scoped to the app's workload identity, or shared?
+- App → third-party: does the third-party authenticate the calling app? (Mutual TLS? Signed request?) If not, anyone on their egress path can spoof.
+- Human → admin console: how is admin access authenticated, and is that *separate* from user authN?
+### T — Tampering (data integrity)
+Per boundary + per store:
+- Data in transit: TLS version, cert validation, downgrade defenses. "We assume the internal network is safe" is where tampering happens.
+- Data at rest: can a DB compromise *modify* records undetectably? Append-only audit stores + signed rows are the high-assurance pattern.
+- Data in cache / queue: is message integrity validated? (HMAC on queue payloads, especially if they cross services with different trust levels.)
+- Build artifacts: tampering between build and deploy. Signed provenance catches it.
+### R — Repudiation
+- Is there an audit log that names the actor, the action, the resource, the time, and a request id?
+- Are the actor's identity and the action tamper-evident in the log? A log the app writes to a DB the app can also update is repudiable.
+- For high-value actions (payments, data exports, admin changes), is the log shipped to an append-only store? Separately from app storage?
+- Agents acting on behalf of users: does the log name both? "User X, via agent Y, did Z at T."
+### I — Information disclosure
+Per boundary + per store:
+- Errors: what do error responses reveal? (See `docs/api-design.md` error taxonomy.)
+- Side-channels: timing of login responses (does valid vs invalid username take different time?), response size, cache-hit timing.
+- Logs: what fields are logged? Do they contain credentials / PII / secrets?
+- Backups: who can read them? Are they encrypted separately from live?
+- Debug endpoints: `/debug`, `/metrics`, `/health` — what do they expose? `/metrics` with unauthenticated Prometheus is fine for latency, not for business counters that hint at usage.
+- URL leakage: does the URL contain sensitive data (tokens, email in query string)? URLs end up in logs, browser history, referer headers.
+- Third-party telemetry: does Datadog / Sentry / LogRocket see data it shouldn't? (Session replay tools are notorious for capturing PII.)
+### D — Denial of service
+- Rate limits exist per endpoint, per user, per IP (see `docs/api-design.md`).
+- Resource exhaustion: big uploads, deep JSON, big arrays, catastrophic regex, zip bombs (see `docs/ingestion.md`).
+- Downstream dep failures: what happens if the third-party API is down? Timeout, circuit-break, fallback? Synchronous calls with no timeout = cascading outage.
+- Queue / cache exhaustion: can a user enqueue infinite work? Background jobs that fan out per user need per-user caps.
+- Expensive operations (LLM calls, ML inference, PDF rendering): per-user and per-tenant quotas. Cost DoS is real.
+### E — Elevation of privilege
+- AuthZ gaps: user role → admin role escalation. Mass assignment of `role` / `is_admin`. Server-side role check on every sensitive endpoint.
+- Tenant escalation: cross-tenant data access. Row-level isolation enforced by policy engine, not by convention.
+- Horizontal privilege (same role, other user's data): the IDOR / BOLA surface. Resource-scoped authZ.
+- Agent / service escalation: a compromised less-privileged service calling a more-privileged one. Per-caller authZ at the callee.
+- Infra-level: a compromised container breaking out to the host, or to other containers. Non-root, read-only FS, seccomp, network policy.
+## Negative-space questions
+STRIDE catches the known categories. These catch what STRIDE misses:
+- **What did we assume is safe?** List the implicit trust assumptions. "We trust the CDN", "we trust that service X has done authN", "we trust the user to provide their own tenant_id". Each is a fragile assumption to revisit.
+- **What's the worst-case single compromise?** Pick one component — the web server, the DB, the build runner, a maintainer's laptop. How far does compromise spread? Is that acceptable, or does the design need more segmentation?
+- **What's the attacker's goal?** Data theft (who pays for it?), financial fraud (how does it monetize?), denial (who benefits from us being offline?), reputational (activist / extortion). The feasible attacks depend on who'd try.
+- **What changes in an incident?** Under compromise, can you freeze sessions, rotate secrets, disable endpoints? If the runbook starts with "we'll figure it out", design in the controls now.
+## Abuse cases — the flipside of use cases
+For each primary use case, write the abuse twin:
+- "User invites a friend" → "Attacker invites 10,000 friends to spam; legitimate invitee sees their address used as spam source."
+- "User uploads a profile picture" → "Attacker uploads a polyglot SVG to execute script in another user's browser."
+- "User requests a password reset" → "Attacker bulk-enumerates emails or sends reset-spam."
+- "User exports their data" → "Attacker exfiltrates via unthrottled export endpoint."
+One abuse twin per use case is enough at kickoff. Each surfaces a control that *should* be in the design but often isn't.
+## Agentic / LLM-specific threats (if in scope)
+If the design includes LLM or agent components, add these to the walk:
+- Prompt injection: untrusted content reaches the model. Can it alter behavior of subsequent tool calls?
+- Excessive agency: what tools does the agent have access to? Tools that write (email, file, DB, shell) are the blast-radius questions. Read-only tools are low-stakes.
+- Data poisoning: RAG indexes over user content — can a user plant content that affects another user's retrieval?
+- Model theft / extraction: API designs that let attackers reconstruct model behavior.
+- Cross-tenant context bleed: if the model sees data from tenant A during a tenant B session, even as a system prompt leak, it's a disclosure bug.
+## Output
+A threat-modeling pass should produce:
+- The DFD / trust-boundary sketch (prose or image).
+- For each boundary / store: the STRIDE findings and the proposed mitigations.
+- The refuse-list items that surfaced (if any).
+- A short list of residual risks the team is knowingly accepting, with a reason.
+- Follow-up items to file as issues (new controls, instrumentation, tests).
+---
+## Exit criteria
+- DFD + boundaries are drawn.
+- STRIDE is applied per boundary (not globally).
+- Negative-space questions are answered.
+- At least one abuse twin is written per primary use case.
+- Residual risks are explicit and accepted in writing, not implicit.
+- Design is ready for implementation — `rafter-code-review` will walk the PR when it lands.